silveradept | Adventures in Home Automation: Recognizing Swears As Input

So, after

seperis talked about Home Assistant's dedicated, installed device and about how good Home Assistant had been as a smart home hub, I was convinced by others in my household to buy one and see what I can do with it. I like the open source part, I really love the local control option, and there's good compatibility with devices, at least up to this point. (I would like the community Sengled integration to Just Work with the hub without requiring the username and password for the app, but it doesn't seem like that's an option, and if I really want to go that route, I can purchase a zigbee dongle and use a different community integration to get the dongle to be the hub instead.)

Anyway, so Home Assistant is a suite of software that turns a device, like a Raspberry Pi computer, into the brain of a smart home. Using a modular system of integrations and scripts, Home Assistant can be used to control various devices. Integrations generally create devices or entities that can then be referenced by Home Assistant's scripts, either as automations (always-on scripts listening for triggers that then perform actions) or as scripts (sequences of actions performed when specifically invoked). With those functions alone, Home Assistant can do quite a lot, like making motion-sensitive lights, or automatically turning screens on or off, or building a system where streaming audio (and possibly video) is piped from room to room as you move throughout the house, based on the presence of one of your devices, if your devices are all connected to something Home Assistant can recognize and control. (Being open source and extensible, one Home Assistant can potentially do the work of many dedicated smart hubs and devices, as the community figures out how to properly communicate with those devices, and allow for devices made from different and competing companies to communicate with each other.)

One of the many integrations that Home Assistant has is the ability to handle intents, messages that ask other components to do things and provide data about the parameters of that ask (who to do it to, who it's from, what state to change things to, that kind of stuff). The intent "spraypaint my coworker John red" has an action (spraypaint) and data about the action (the color of the paint should be red, the target of the painting should be John, John is part of the group "my coworker(s)". On the other side of intents are functions and programs that catch intents and then do the actions requested using the data provided. If there isn't a function somewhere that understands what "spraypaint" means, the intent does nothing and nothing happens. (Errors might get thrown, but nobody gets spraypaint on them.) If there isn't the right data accompanying, spraypaint might still fire, but it might substitute default values, use whatever was last passed to it, or other behavior as defined by the function. Fun times. Intents help make things even more modular, because any program that professes to know how to handle an intent can be slotted into doing so without changing anything about the intent to match the new program.

Intents are commonly used as the end product of voice or text assistants, programs that generally have components that take in natural language from humans and then process that language into something computers can understand. Humans understand "spraypaint my coworker John red", but computers have to be taught how to transform that natural language into "call this function with these parameters." Voice usually has a first step where the spoken language is transformed into text, but then the text is run through a natural language processor, and if the processed language matches something the assistant has been trained to recognize ("spraypaint my coworker John red"), the assistant packages the intent with the requested data and sends it off.

Assistants can be trained to recognize variations in speech that should send the same intents, in a number of clever ways. Which is what leads us to the actual point of the post, which is "How do I get Home Assistant to recognize it's being sworn at?" Because, not soon after managing to get the voice assistant online and Home Assistant recognizing intents, I was informed that all computers that accept natural language input should conform to

synecdochic's requirements for understanding swears.

For my voice assistant, I'm using an add-on providing Rhasspy, a fully local voice assistant (once it's components have been downloaded and activated) that can directly post its intents to a specific place controlled by Home Assistant, once that point has been activated. So, dutifully, I placed the single line needed into my Home Assistant's configuration.yaml file to turn on the intent endpoint and then created an intent script in Home Assistantto catch the intent that I put into Rhasspy's sentences.ini file, allowing Rhasspy to understand several common foul utterances directed at it or Home Assistant.

Rhasspy and other voice assistants must be triggered into listening commands through the use of a wake word. They are otherwise listening for the wake word to do something. At least, in theory all they are doing is listening for the word, but there's a reasonable suspicion that a lot of commercial assistants are not only passively listening, they're recording and using that information for whatever purposes someone has to agree to in the end user licensing agreement (EULA) to use the product. So, yes, you have to wake up the assistant to swear at it, but at the same time, with a responsive wake word, it won't be difficult to wake it up to swear at it. Also, Rhasspy being entirely local and restrictable to only that which it has been trained on means that someone armed with xkcd 1807 will be unable to order two tons of creamed corn and have it delivered to the house. (This, like being able to swear at your assistant, is a benefit, we assure you.)

The way the intent script works with Home Assistant, according to the documentation, the response sent back to the text to speech system is whatever is in the "speech:" line. The documentation never explicitly states this (or does in the introduction somewhere, or it's something you're supposed to just intuit from reading the docs), but the line can be used to parse some amount of Python (the language that Home Assistant is written in), so it is possible to create a random response, if you provide an array of possibilities and then call "random" on that array. Which will look like '{{ ["Option One" , "Option Two" , "Option Three"] | random }}' with as many options as you can think of (and, it being Python, if you want to get fancy, you can concatenate two or more different arrays together by using the + operator, so you could have one part of the responses be "I'm sorry" and a different part be "I'll do better" and put them together, but a different time would result in "My apologies" and "I'll do better" being put together.

So that's how you get the system to apologize back to you when you swear at it. And that can be extended to responses that happen when you want to praise the system or otherwise produce results that don't run scripts per se, but do provide spoken responses (or textual ones, as you can also set the action of an intent script to send text to a notifier or to a particular component that will flash it up as a message).

After a little digging, and seeing an example somewhere else on the Internet, I've also figured out how to use the input number component to keep a record of the total balance between how often the system is praised and how often it is cursed at. After adding the input number in the Helpers part of Home Assistant, defining a minimum and maximum range and the step to increase our decrease by, I added a script to be run when the appropriate intent is called that uses the input_number.increment or input_number.decrement service to adjust the number upward or downward by the step that I defined earlier. I did have to manually set the value to zero to start with, because unless you use the code interface to define an initial value, all input numbers start at their minimum values. But after that, the scripts work quite nicely, and I trained Rhasspy on some sentences that would check the value of the score and report back.

Adventures in home automation continue! Have already done a lot to automate and use voice to call forth things. There's more in my future, likely learning how to build integrations of my own, because eventually I would like to be able to catch the data from intents and manipulate them to my liking, and I haven't figured out how to do that with the intent scripts, if it is even possible to do so that way.

Flat | Top-Level Comments Only

From:

thewayne

Sounds like you're having fun, I look forward to reading your continuing adventures.

silveradept

Fun in the sense of puzzling out problems and setting up things where voice control can be exercised and dependence on outside applications can be minimized, heck yes. Although there's been a hot few situations so far where what seems like the obvious solution is instead a thing that has to be routed through another thing instead.

vass

The open source part is very appealing. I like the idea of (some) home automation tasks, but the spyware aspect gives me the creeps, plus I'm assuming that Google Home and Alexa would keep trying to be helpful, whether or not I want them to stop helping.

This is very much a "you get to choose what things are controlled and how the automatons and scripts work" pair of softwares. The downside being that you have to envision all the things that you would like to have happen and explicitly put them in or figure out how to get the pieces to coordinate themselves to do what you want.

The entire pair can be put on a recent version of the Raspberry Pi or similar computer if you want to experiment with that software and see what happens. The dedicated device is nice to have, although it comes with its own hangups.

azurelunatic

Currently, the wake word is "Porcupine" (name of some component of the listening part) and it has been known to wake up for "Pokemon".

When it was "Castle", I could sometimes get it to respond to "Asshole".

lonespark

vaguely related, have just learned that google meet captions, which seem decent overall, will bleep b*******, but leave "horseshit" alone

Sounds right. There was a little bit of "what will the captions censor" as part of the last game night I was participating in. The conclusion seems to be that it's keyed to U.S. English rather than others.

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Sense, Nonsense, and Not-Sense

A waystation of News and Opinion

Adventures in Home Automation: Recognizing Swears As Input

Adventures in Home Automation: Recognizing Swears As Input

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

July 2025

Materials Of Potential Interest

Most Popular Tags

Page Summary

Active Entries

Style Credit

Expand Cut Tags