silveradept | December Days 2020 #06: When We Say Private, We Mean It, Damn It.

[O hai. It's December Days time, and this year, I'm taking requests, since it's been a while and I have new people on the list and it's 2020, the year where everyone is both closer to and more distant from their friends and family. So if you have a thought you'd like me to talk about on one of these days, let me know and I'll work it into the schedule. That includes things like further asks about anything in a previous December Days tag, if you have any questions on that regard.]

alexseanchai asked a follow up from yesterday's entry.

Define "works" and "behaves sensibly" in this context [an application, product, or resource used in a library environment.] What things would this software need to do, and on the wishlist including rainbow unicorns, what would you want it to do?

There are two major targets involved on the unicorn wishlist, and they are differentiated between something the library uses for its own resources and something the library rents or gets access licenses to from another entity. Things the library (nominally) owns and things it does not.

For the things the library owns, the big things to look at are the public computer systems and the integrated library system, or ILS. The public computer systems are almost assuredly going to be running Windows in some form and using a Microsoft or Google browser in them, because Microsoft makes it easy to throw money at the problem of keeping the software up to date and computer manufacturers like Dell and HP will offer significant volume discount on machine purchases, with Windows licenses rolled into them and compatibility with remote management and updating tools. And they'll be compatible with most of the filtering tools that are chosen as the easiest method for showing Children's Internet Protection Act (CIPA) compliance, which gives access to further discounts and funding for broadband Internet access and equipment available through a program funded with Universal Service Fee assessments, more commonly known as "e-rate". Particularly well-funded and well-liked libraries can choose not to be bound by CIPA, but many schools and libraries can only make their ends meet by accepting and demonstrating that they are using filters and, in the case of schools, monitoring what students do online. Libraries are nominally about privacy in a person's dealings, but children often are conceived of entities that need to be protected from things, and many of the filter maintainers (and library boards) want to block a far more expansive subset of the Internet and the Web than the things spelled out in the relevant statutes, some that make sense (we don't want users going to sites that will phish them or install malware) and some that don't (filter categories almost always include "LGBT", for example, and ultimately, other than specific things mentioned, what's considered inappropriate for minors is left up to the local authority administering the access to determine). A library could build a CIPA-compliant system using open source operating systems and software, with remote management and software management available to administrators, but they would likely also need to have on staff someone with the relevant expertise to make sure things are set up and maintained properly, and very few library systems have enough money to run a proper IT department, much less attract good talent and pay them well. So instead we outsource it and hope that our vendors know what they're doing and they're not making unacceptable decisions without telling us. It is easier, if not necessarily cheaper, to outsource things to experts offering things as a service, running on a platform that most companies develop for, but in my unicorns and rainbows experience, a public library's public computers would not run Windows at all, except in situations where it was absolutely necessary (if a document or other item had to be created and saved in the native document format, created specifically by that program, to work) and that libraries would be able to do things like host fediverse instances, run TOR exit nodes, and otherwise be a crucial part of a community infrastructure free from corporations and companies looking to sell product, mine data of their users, and undermine privacy and security of people and their actions on the Web and the Internet more broadly. (There's some FUD about the use of TOR, in that certain entities want you to believe that only criminals and those with things to hide use onion routing and anonymizing services, when it should be a regular thing for any entity that believes in usee privacy. They also suggest that the exit node can be prosecuted if someone on the network is using that exit node for conduct that is illegal in the country where it's happening, but the TOR project disputes this and has some suggested language to use in response to any law enforcement requests made for data that came through an exit node.)

The Integrated Library System is the other part of this internal element, and while open source ILS exist (Evergreen and Koha are the two most well known ones), they come with the same problem of needing people on staff who can manage, troubleshoot, and code for them, or having a knowledgeable hosting solution for the same that is private, secure, and allows for the ILS to still be used effectively if the connection to the hosting is down for whatever reason. ILSes work on the idea of needing only one program to do everything a library needs to do with their materials, rather than have a few smaller programs that all work together to keep the library up and running. My first request would be for an ILS that is not a single monolith, but instead a set of (ideally open source) programs that have different functions and can interact with the database(s), focusing on importing and exporting data with ease, and according to the standards that are already in place, so that people can use the programs that work best for them and everything still works, but that is basically the antithesis of what a vendor wants, which is to get you using their product and never leaving because the process of importing and exporting (and correcting bad data from the export and the import) is too painful to have happen. Adversarial interoperability should be mandatory, and if we could ever get to the point where vendors had to compete at being the best thing, rather than being the only thing, we would all prosper a lot more. Looking at you, regulatory entities that are supposed to make this happen.

Also, I want an ILS that works on the principle of "less is more" about users and "more is more" about things. The books, discs, and other materials should have a significant amount of descriptive metadata, the possibility of tagging or other folksonomies, although that would have to be monitored and wrangled to be both useful and actively anti-troll, because libraries have all sorts of people who use them, and otherwise the materials and their content should be described within an inch of their lives, and quite possibly, someone should be able to request a snapshot of the library holdings database records for their own use or personal projects, on the condition that the library holdings only ever contain data about the things themselves, and nothing at all about the people using them. (There's some fiddling that would have to be done for holdings that contained scanned pages or other such, but in my unicorns fantasy, the full-text and other authorization-required resources should be in another database separated from the catalog, and only a link to the right spot present in said catalog.)

The people, on the other hand, should have as little data collected about them as is necessary for library operations, and this is also behavior that I want from vendors of library databases, as well. About the only things that are needed for a library card are a name, an address that's in the service area, a means of contact, and probably an age marker, because CIPA says that the under-17s don't get a choice about the restrictions on their Internet access. None of these things require a government-issued identification, nor a "legal name" field to differentiate between what's on an ID versus a person's name. We don't really even something as official as postal mail delivered to an address. It's really just a check to see whether or not someone is in the local area and therefore paying some amount of the taxes that supports the library system. Odds are, even the people who are without a fixed home or address are paying local taxes on things they obtain and consume, so they're supporting the library system. We don't need a gender marker in the data, although a pronoun field will be handy for those who wish to tell us. We'll need the card number. And that's pretty much it.

In my perfect world, only card numbers take materials out, so overdue materials contacts would say "Hey, the card number ending in these numbers took out these materials and they haven't come back yet. Could you bring them back?" without having to mention names and addresses like the ones that my library system does, so the card number might have to have the contact information association, too, but essentially this system runs on the principle that anything that isn't strictly necessary to know isn't associated. Which would mean that yes, we wouldn't keep reading lists of things previously checked out (my employer does make it explicitly opt-in, and with a notice saying that such records could be requested by law enforcement, which is nice), because there are other tools to use for keeping track of things and people should be able to use whatever thing it is they want for that. And the authentication server for a vendor resource access should be configured to essentially say "Yep" or "Nope" to the question of "Is this a valid library card with y'all?" And that should be the sum total of the questions asked between vendor and library system, as well. (Although, in practice, it's usually that authentication happens on the library end and that means the library's proxy server translates requests through to the vendor and back, so the vendor only goes "Is this an authorized IP to access this? Yep, cool. On we go."

When it comes to vendor behavior, though, there's one rule I desperately wish every library would enforce upon them, but they won't be able to, because that vendor is probably the only one that is willing to license the content to libraries, or has some sort of exclusive to themselves that no other vendor has, is the rule that if the library is paying for the subscription service, the vendor gets to collect zero personal data, and no data that could be disaggregated to the point where someone could say "oh, hai, this card number looked at these resources / checked out these materials" about the use of the service. And that's for both things that have a login by proxy or things that are more individualized, like Overdrive. They can collect things like "this book got checked out a metric forkton by your users, you might want to get some more checkouts on that" or "your users really seem to be hitting the romance hard right now, perhaps you would like to add these titles to your collection?" Stuff that can help shape collections and subscription data and otherwise make the library's dollar be more effective at serving the community, but nothing that would then have people turning over their data to the company in addition to the money that the library is paying for access. Now, there are tool integrations that are going to happen for things like citations and possibly "bookmark this for later" things, but it should be possible for someone to access the full text and download it to their device, or copy and paste a citation into their citation software, or otherwise not have to sign up for a personal account in addition to what their institution's access is paying for, so that, again, the only data available is aggregate data that can't be used to identify any one particular user among everyone else. Because the library's already paying for it, and that should mean that the people are no longer the product and should no be treated or encouraged to become the product as well. And why a stunt like Kanopy's should have resulted in hellfire and brimstone raining down upon them, mass cancellations, and a polite but firm inquiry from the appropriate enforcement body about misuse of private data. Even though they'll all point at their Terms of Service and say "we did say this was something we were going to do." ToSes in lawyer-speak and the whole "you only licensed this, not own it" has been the cause of so, so many terrible actions by companies that the entire thing needs to be reformed, if not destroyed and rebuilt from the ground up with a completely different paradigm of how privacy and user data should be treated. Because, oh yeah, the less data you carry, the less there is for someone to request and the less appealing of a target you look like to people that want to compromise your systems and steal all of that data. And, as the nominally privacy-centric entity that the library claims to be, less data is better.

So, yeah, in a perfect, unicorns-and-rainbows world, content would be owned as a default, rather than leased or licensed, paying for content would mean that the vendor would be restricted only to the collection of aggregate, non-identifiable data, the ILS would not be a single monolithic object, but a set of programs that could be swapped and tailored and that all spoke the common standards so that they could all work together in harmony, and the public computers and equipment would run as much open source software as possible, with closed-source computers available on specific request. Plus, the library would have a robust and knowledgeable IT staff to run and update all of these things, as well as being a point for community infrastructure and private World Wide Web activity that kept as little data as they could manage and regularly deleted logs and other elements that contained personally-identifiable information that might be requested by law enforcement or seen as valuable to attackers. And we wouldn't have to filter kids' access as a condition of getting money, either. We'd still do diligence and try to prevent access to malicious sites and harmful places, as security measures, and there's a good chance that policies would be written that say, as a general rule, we wouldn't permit access to specific categories of things, so that we could, by policy, ban things like child porn or QAnon conspiracies or other such things that make sense to not allow in the library space because they're harmful to the community. Which would, yes, rely on the public trusting that the library is going to make good decisions about what's harmful, but they already do that, and we could use a kick in the backside to think better about what communities are being harmed by our current policies and their enforcement.

But the big thing, when I was saying that, was the vendors being locked out of collecting personal data and the library not keeping any more data than strictly necessary, to the point of even being able to remove the possibility of the field existing in a record if it's not being used. And to otherwise overthrow the shackles of our monolithic oppressors and do things in a more community way. Hopefully that was as clear as mud for all of you.

Flat | Top-Level Comments Only

so building such a thing would be a pretty big open-source project, probably, and there are a number of problems it couldn't solve anyway, since this project wouldn't have whatever licenses exclusive to other people

Yeah. There are things where it makes sense to lease access and otherwise not build it yourself, but those situations should still have to conform to the supposedly high ethical standards of the library (which means the library should not be doing things like using Google Analytics on their own site, because that gives data to Google. Social media stuff is another one of those "we can interact with the walled gardens and such as outreach to our users, but really that should be minimized and done in such a way that gives the social media companies basically zero usable data from us."

Edited 2020-12-07 05:52 (UTC)

December Days 2020 #06: When We Say Private, We Mean It, Damn It.

no subject

no subject