silveradept | Adventures in Computer Maintenance: Out With the Old, In With the Error Messages (Reply)

The Home Assistant is humming along smoothly at this point, probably because I haven't actually tried to do a whole lot with it in terms of extending functionality, increasing it, or writing new functions for it to achieve. It's behaving stably, for the moment, even as we keep everything updated and fix issues when they come up, like when the robot vacuum changed its IP address by one from the DHCP assignment and it took me weeks to figure out what the problem was. So this edition of Adventures in Computer Stuff is a lot about fixing and errors and using librarian skills to find solutions to those errors.

It starts with one of the single-board computers that I have hooked up to a microphone as an input for Rhasspy, which takes voice commands and turns them into things for the Home Assistant to do. The machine is running a console-only version of Linux, since it doesn't need a GUI to do anything and most of the configuration for the singular service it runs is through a web interface. I followed the install instructions for it exactly, but lately, as I've been updating it, the update process has been failing when it comes to trying to generate the backup initial image to boot, and every time I wanted to update, I'd have to delete the incomplete backup image file so as to give enough space for the primary image to complete, and then we'd have the same problem again. Since it was giving me a helpful error message (no space left on device), I knew the problem was that I'd made the partition too small for the increased size those images now needed. So that was a relatively easy fix, in the sense of re-partitioning the drive. The thing refused entirely to grow itself into the new size, so I eventually had to copy all the data off the old partition to somewhere else, delete the old partition and then create a new one. The good thing is, doing it was pretty easy (and will be for someone who is confident in the use of partition managers) and now the update process doesn't error out with a lack of space on the device.

From there to slightly more complex problems involving backing up data and trying to get it into a place where it can be redownloaded. You see, after finding a visually appealing version of Linux that actually worked on my main machine and that will likely get me away from Windows in time for the EOL of Win10, I decided I liked it enough, and it had low enough system requirements, that I would gracefully retire my current Linux setup, which has served me well or about seven years of updates and having to solve problems when they developed, and install freshly on the machine this new visually appealing distribution. I tried to do this with a backup drive that I thought was fine, given that it had backed things up before for me. Getting the files off the drive seemed to work all right, except for the part where it started to freeze up regularly when it wasn't in Windows. Oh, well, put it in Windows, let it run files off. Then, using the live image for the new Linux, copy off all the current files and get them on the drive. That went off without a hitch, actually. Install new Linux, that goes fine.

Copy the files back off the drive? Definitely not fine. It would get in about so far and then the machine would lock up. But it also did it to a different drive of much smaller capacity, too. That's no fun. And it also did it on a different version of the same Linux, with a different Desktop Environment and tools available, so it wasn't that this particular version of it was somehow busted. But the new install also didn't seem to have any trouble downloading things from the network, or running the update script so that everything was up to date after the install. It was just copying from drives that was the issue. But I still have a Windows install, so I let the drive copy the data to the NAS through Windows. This takes several possible iterations to complete, because even on Windows, there would be the occasional failure as Windows choked on seemingly random dotfiles or directories. Thankfully, none of those things were mission-critical and could be deleted from the backup without losing any critical data. And once I had the data on the NAS, the new install could download everything into place just fine. So, backup successful, data re-download successful, nothing major other than that I should probably trust that drive a lot less than I did. Or perhaps reformat it to something friendlier and see what happens there. You don't really have backups until you need to use them, I guess. But this has also conveniently put data on the NAS that the other computer could probably use and would want some of itself, so I've been slowly trying to get that copied down and organized properly as well. Eventually I'll get the hang of getting all my data copied and organized into good ways, right?

So, now that I'm in stable situations for both machines, and enjoying their Linux goodness, each of the machines has its own funny thing happening, but not at the same time, thankfully. So, once upon the once upon, the TV monitor that we have driving one of the screens did not particularly like something about being connected to the soundbar and to the multiple-inputs, multiple-outputs box, such that every time in Windows that we tried to play something that had HDCP (copy protection, essentially, for video feeds), the screen would blank itself completely and would need to be reset, at which point the HDCP video would play just fine. This was sufficient annoyance that the suggested solution to the problem was to buy an inexpensive HDMI splitter that would absorb the HDCP signal and discard it, passing on the HDMI signal to the split objects. Theoretically, this was good for recording things that were being sent to the screen, but I was using it so that the screen didn't blank.

However, the new Linux installation basically rejected the splitter and did not want to work with it at all. Plus, the splitter had the less helpful situation of making it so I couldn't get to the BIOS/UEFI settings on the machine attached to the splitter without disconnecting the splitter and reconnecting the output from the matrix box directly to a screen. Since the splitter was causing issues, I pulled it out and wired everything in properly so that the cabling goes from matrix box to soundbar to TV. Lo, everything immediately started working properly, and as a bonus, I could now see the boot-up sequences again! So now I don't have to have additional cabling, I've removed an impediment, and now things are working better than they were before, all hail Linux.

The other issue on that end is the program that's supposedly for Linux to run the remote control and its button configurations just didn't work for me, and I couldn't find a way of making it work. Probably because it was built for something different than what I have, even though it's supposed to be built in such a way that it's distribution-agnostic, or close to it. So we grumble, but everything is fine, because we could make everything work properly in the Windows app version and made a backup of the configuration to load in when the Linux app actually manages to run the way its supposed to, or someone develops a version of it that works for this particular flavor of Linux and we can use that. I tried to troubleshoot the problem with this, but the app itself doesn't run properly and doesn't give any useful error messages, and the forum post where the error message supposedly appears only gets a "Oh, shoot, we made a mistake, we'll fix it" reply, rather than anything more substantive or how to fix it if the error reappears. Which is to say that there's not a lot of information present in places where it should be, whether that's the documentation or the forums (and this is after the point in time where I had to teach this remote the commands to use to control that particular device by using a different IR receiver to capture and analyze the signals, because they didn't have the specific model in their database at the time.) I do love this little remote. It's not a perfect replacement for the Harmony that lost the ability to use a button (and whose line has now been discontinued), but it does work for what I want it to, and that's enough.

So, that's the ones that were plaguing the beefier, more powerful, more ept machine. Not the worst things in the world. The less-capable machine (which has 8GB of RAM and a much less powerful video card in it) had been working fine, but with the new Linux distribution, it seemed to be having significant trouble launching any apps at all. This was across different Desktop Environments and styles of the system, so it wasn't the kind of thing where one bit would work perfectly and another would not work at all. The error message on display for this situation was related to services that were apparently trying to start. (And, now that I have perspective on what was going wrong, probably were also related to the troubles I was having trying to transfer Firefox profiles into their respective directories.) It seemed capricious as to which things would start and which ones wouldn't, but after a certain point, nothing would work if launched from the menus or the launchers. Many things would launch if they were started directly from the console, and new console windows could be spawned from one that already existed, but there was an awful lot of things not working and throwing error messages. Which meant it was time for the librarian skills, which did turn up, on the forums, people asking about similar kinds of messages that were happening for them, and mostly getting rudeness and dickery in response from people who were supposedly associated with the project at high levels. And this was past the "you get no help unless you provide the output of these specific commands" kinds of rudeness that can show up in general in tech forums and support requests. Eventually, I managed to luck into a thread that had actually useful information for someone who was having a similar issue, where someone asked for a useful command, and then, most crucially, explained what was going on, interpreting the output of the command and showing what was the thing that made sense to them as the likely cause of the errors. The most likely culprit didn't have to do with any part of the systems that launched programs or otherwise brought apps into existence, but instead that the temporary file system where things are written to RAM while they're in use were filling up and not being emptied out to start new things. Which tracks a little bit better with why things would start failing after enough programs were open, and also why Firefox could accelerate that process significantly, since it has a lot of memory asks. The solution for the problem of the temporary filesystem filling up was to define an explicit value of how much RAM that temporary filesystem could take up if it needed to, since the default value of 10% of installed RAM was too little. I increased the amount of possible RAM to a bigger value in the file indicated, rebooted the computer, and all of the problems I had been having regarding launching programs all went away, and the temporary filesystem is happy about having enough space to work with. (The problem didn't replicate with the other machine because 10% of 32GB is a fair chunk bigger than 10% of 8GB.) Which means that my librarian skills helped me find out that the error in question was not at all related to the things that would make the most sense for it to be erroring on, to get an explanation of what the real issue was, and to get the solution to the problem and apply it. I beat unhelpful error messages and unhelpful forum people to find something that had an explanation. And if I weren't a trained information professional and someone who has been doing a Linux for a few years, I probably would have given up long before finding this solution and done something else that actually would have Just Worked. So, once again, useful error messages, please, and if someone is asking questions, yes, this might be the fiftieth forum thread with that question, but you really have to point people at the helpful thing that will assist them or at least give them the information on what to do to have you give them better information. And not be a dick about it, even if the user's assumptions or conclusions are wrong. After all, if you want more people to use a Linux, and especially to use your Linux, then you want to help people and invite them in to using your Linux. (Unfortunately, there are a fair number of Linuxes out there where it gets put out and the assumption is that people who are sufficiently advanced or interested will find their Linux and use it because of Inherent Superiority rather than for reasons that would make it attractive to someone who is new or new to their type of Linux. (This one does have a nice set of wizards to help with software installation on specific themes, proprietary video drivers, and other such things that are normal and common things that people want to do, and I'm familiar with how the underlying distribution works, but both it and the thing it bases itself on are "good documentation, not necessarily interested in helping users to understand what has happened that has them coming to the forums to ask for help.")

So, once again, we have managed to make computers work and do what we want them to, mostly through the skills of search and persistence and finding a workaround when the direct method doesn't work. Not because of superior technical prowess or any of the skills where I would be able to directly understand what went wrong and know what commands to run immediately to confirm the issue or fix it. It's why having machines that you can play around with is vital to your learning, because then you have the freedom to try solutions and restore from backups if those solutions make a bigger mess than they had before. And you learn a little bit more about how the systems work every time that you succeed, and sometimes more every time that you fail. Have fun, everyone!