silveradept | Jan. 14th, 2025

As requested by

azurelunatic, who has people interested in how I can use my voice commands to make sure that if I need to go do something else, I don't miss out on anything that's happened on the TV or the movie in the interim.

This is a thing that requires four components to work in sequence - the voice assistant, the smart home brain, a programmable IR blaster, and an IR receiver attached to the device that's driving the video stream. So our workflow goes: Rhasspy → Home Assistant → Broadlink IR blaster → FLIRC receiver.

Rhasspy has featured from the very beginning of this series, and the intent scripts involved in getting voice input paired up with actions. The key insight into making this work was getting Rhasspy to understand numeric input and pass those parameters to Home Assistant, such that I can develop a sentence to train Rhasspy on that will collect the correct parameters, and an intent script and supplemental script on the Home Assistant side to format those parameters in a way that Home Assistant is expecting to run a timer with, running the appropriately-named timer with those parameters, and then repeating back what function it heard me invoke and any parameters that were passed to it. The time-out script accepts parameters, or, if none are provided, uses the default value that I've set for it (two minutes, which is the approximate length of a commercial break in the United States.) The time-out timer runs in the same way as any other timer helper or arbitrary timer, and there is an automation listening in Home Assistant for when that timer runs out that will run appropriate actions for the expiry of that timer.

What happens when that timer runs out is, essentially, the IR blaster fires a specific code to the FLIRC receiver interprets as having had the "pause" key pressed on a remote. Which sounds like it's easy, but it's not, because that IR blaster has no database of codes to look up to know what to send, and instead has to be taught what various IR codes are by having them beamed at it from a convenient remote. The FLIRC is just an IR receiver and interpreter. It can't transmit anything to the IR blaster to teach it anything. And while we have wireless keyboards for the convenience of the ten-foot interface, those keyboards don't transmit IR signals to a receiver. (nor do we want them to.)

What I do have, and have had, are programmable remotes, like a Logitech Harmony (no longer manufactured) or the Skip1s remote from the FLIRC folks. They do have databases of IR codes that can be downloaded into their remote, and therefore, that gives our IR blaster something to learn from. (Digression: The FLIRC can be used as a receiver to learn codes from an OEM remote and then teach those codes to the Skip1s, if the device isn't in the Skip database, but that's very much advanced fooling-about with both of those devices, and if you have the OEM remote, you can just teach the Broadcom device directly, rather than going through a Skip or other universal remote. I use programmable universal remotes because I want one remote to control all the things, rather than having to deal with multiple remotes to adjust things. And because while using the voice assistant is great, I don't want to have to use it (or a smartphone app) to do all the sound and picture-related remote control stuff.)

Actually getting IR codes into the IR blaster is actually an adventure in building another callable script that sets the remote into learning mode, and then goes through a sequence of "waiting for command" inputs, where in the script, I tell it what device it's learning how to control, and what the name of the command its learning is. Thankfully, the process on how to get a remote to learn, and then how to get it to send commands, is very well-documented in Home Assistant, far better than the underlying Python library that the integration is based on. It's a very manual process, but when set up right, I only have to do it once, and the IR blaster remembers what it's been taught and can then turn around and transmit the same thing. That's remote → IR blaster → FLIRC receiver, and then I can use both the remote or the IR blaster as needed.

Since Home Assistant can use the IR blaster to transmit to the receiver, the actual automation listening for the end of the time-out timer has one command associated with it - use the IR blaster to transmit a press of the "pause" key to the receiver. So long as the receiver is within range to receive, it will receive the keypress and do the associated action. Thus, the whole chain completes, voice command to script to timer to automation to IR transmission to keypress. It doesn't feel like a complex operation, because when it works, it works, and it doesn't invite contemplation of all the parts that have to work with each other to achieve the equivalent of pressing a key on a remote at the right time.

There are drawbacks to this setup. The biggest and most obvious one is that the pause key will only work on whatever item currently has focus on the receiving computer. If you tend to use picture-in-picture to watch multiple streams and/or listen to multiple sounds, the time-out keypress will only hit the one that the underlying operating system believes is currently in focus. As far as I know, there's no "pause all/resume all" that can be transmitted and interpreted from the things I have available. (After all, even though computers are very good at doing multiple tasks in sequence, humans are thought to be the kind of beings that only want to concentrate on one thing at a time, so why would you need something that pauses and/or resumes everything at once?) If that does actually exist, then I'll do my best to figure out how to incorporate it into scripting and see if I can teach it properly to the IR blaster / receiver that this is what I want.

Second, what the operating system believes is in focus and what the human believes is in focus are not always the same, so sometimes when I'm trying to get one thing to pause, it turns out the operating system has focus on something else. That requires manual intervention to get the focus where it should be, and at that point, you're already on the machine that needs to be stopped, so you may as well just click the right pause button yourself.

And third, not all programs, sites, streams, and the like respond correctly to a pause key, so there are occasions where I could push all the pause keys I wanted to, in whatever way I wanted to, and nothing would happen, because the site or program doesn't recognize it or has locked out that particular input from going through until some other thing is cleared, acknowledged, or otherwise managed. Thankfully, the number of situations where this has happened to me is pretty small, and there are sometimes some efficient and effective workarounds to this problem like the aforementioned picture-in-picture pop-out, which is usually pretty properly responsive when it's the thing that has focus, regardless of what it going on with the underlying website and its playback controls.

This idea is also very scalable and configurable - so long as you can get the remote to learn the appropriate command from another remote, or you have the appropriate base64 encodings that will work for the remote available so that you can drop it directly into the learned codes file, and the receiver on the other end knows what to do with the codes that it receives, you can basically scale this idea to anything that you might want to do with a remote control. I'd suggest using it only for things where you can either have a gap for execution in between commands, or if you have a workflow that can immediately begin recording a new input from the voice assistant after the last command has finished executing. Your own set-up will likely depend on what applications you want to control and their potential quirks, but it is rather nice to have that option in place where you can either let the commercials play and then pause before the action resumes, or give yourself until a breaking point and then have the media pause automatically so that you can task switch without FOMO or so that you can get your brain in gear to do the actual switching instead of just continuing to binge whatever it is that you're doing. (Kickstarting executive function with computers can be really helpful, if for no other reason than that they will do exactly what you tell them to do, on time.)