silveradept | Dec. 19th, 2017

[This year's December Days are categorized! Specifically: "Things I should have learned in library school, had (I/they) been paying attention. But I can make that out of just about anything you'd like to know about library school or the library profession, so if you have suggestions, I'll happily take them.]

redisxwing asked about "The worst thing they taught me / the thing I never use", and I have to tailor this one extremely narrowly, because even the things that I don't use on a regular basis in the specific, the general principles behind them are used even more often. This is one of those cases - even though I don't use the specific thing all that often, the stuff that I learned so that I can do the specific thing is something that I use all the time in person-computer interactions all the time.

So, one of the required courses to graduate out of library school, at least for me, is Search. The course numbering suggests that it's a good class to take in your second year, but regardless, everyone who wants out of the school will have to pass Search. Search builds on the foundation courses about the nature of information, how it can be organized in electronic and non-electronic systems, and how people exhibit information-seeking behavior. Your favorite search engine has people with degrees like mine who are looking at the way people use the engine, what they search for, what they click on as possible answers, whether they then come back to the results page and keep clicking, what metadata the results pages have to describe themselves, and so forth. All of this information gets fed into algorithms that take into account authority, popularity, inbound links, outbound links, and whatever secret sauce the company believes will make their product the best at what it does, in addition to any additional tracking data the company has collected on you through web bugs, cookies, and other "personalizing" things.

(As an aside, I've switched to using DuckDuckGo as my engine of choice for their stated promise to not track you through personalizing materials, but also because they offer a syntax that allows you to specify which other search unit you would like to use for your query -- things like Wolfram and Wikipedia are in there, along with more standard offerings like Amazon and Google Images.)

Much of the information that's collected on user search behavior goes back into the people programming the algorithms to provide results that are more in line with user expectations, and some of that information goes back to the people in charge of the natural language interpreters so they can fine-tune their guesses and processing so that when someone inputs a question instead of a sequence of search terms, the engine can put back something that indicates it has parsed things correctly. A thing I like about Wolfram is that it explicitly shows you the assumptions that it is making on the results page so you can track whether or not it has interpreted you correctly. This is fantastic on maths or science-related queries where language can cause wrong assumptions. There are likely people there that look at any time an assumption gets corrected and use that information to adjust their models and their processing so as not to need correction (as much) in the future. These things are all done in service of making the search engine easier to use and more likely to produce relevant results, which means you come back to this particular search engine and allow yourself to be served advertising, either in the sidebar, as a "Sponsored Link", or otherwise.

And for most people, this is enough to get them through their queries. Often times, that's enough even for the information professionals to answer the questions put forth to them. Where Search class comes, in however, is when that's not enough, or when you need more than just precision, you need [Spaceballs]Ludicrous Precision![/Spaceballs] Because most, if not all, of your search engines have a syntax to them that allows you to exploit the fact that they're machines parsing query strings to the fullest. Programmers that use and are familiar with regular expressions will not be surprised by this, although most search engines don't parse a regular expression and then do that.

A representative page, one of many many, that detail some of the special operators one can use with Google, in addition to things like logic operators (AND, OR, NOT) and wildcard operators (like * and ?) that can help you really get a good search construction going. In the earlier days of search engines, the logic operators were a particularly normal part of queries, but they've dropped off some with the ability to search more naturally, and a tweak gleaned from the aggregate data - most search engines silently added the OR between your terms if the natural language processing doesn't do something else to it. If that's not what you want them to do, it can be immensely frustrating to keep typing things in, only to keep getting bad results back. That silent OR can be overriden with some explicit ANDs or NOTs, but you have to know to do it before the search engine will sit up and take notice.

Same thing with wildcard operations - there's some silent pluralizing and de-pluralizing that goes on in search engine queries, so that if you esearch for "eye", you'll get results with "eyes". If what you wanted, though, was results with a phrase that begins with "eye", but you're not sure how many characters follow after that, stick an * afterward so that it will return "eye", "eyes", "eyeshadow", "eyes on me", and so forth. And if you're looking for both "woman" and "women" at the same time, well, most engines will do that for you, but if you have to go diving in various places for research, you might have to add on the idea of "womyn", which may have TERF-y baggage attached or not, depending on the results. Most engines won't silently search all three. But since they're all only a letter apart, and the letter that's different is in the same spot, you can search "wom?n" and snag all three at once. (And also "womin" or "wom3n" if such a thing is used somewhere, because the ? operator says "there's a character here, I just don't know what it is.") The ? is great if you're searching for something that you're not entirely sure how it's spelled, or if it has regional or historical spellings - was that conflagration a "fire" or a "fyre"? If you're looking for documents in a time where spelling is very much funetik, you're going to need all the wildcards you can get.

Most engines implement a version of special operators, logic operators, and wildcards, and the good ones will document the things they will accept and won't accept. Librarians in the time when online access to journal articles was just getting started will have a special...feeling...for one of the early subscription databases and its interface. DIALOG is very good at retrieving things, However, it also charges by the unit, rather than as an unlimited access subscription. Units accumulated while being connected to the service, for running searches against it, and for any retrieval and printing of information from the service. Kind of like how AOL used to charge by the minute/hour for World Wide Web access. Therefore, the budget-conscious librarian would spend a significant amount of time building their search string before even getting near the service, and then connect, run the searches in a flurry, retrieve their best results, download/print them, and then disconnect. Efficiency of search meant saving money to use for further searches in the future.

In case you were wondering, DIALOG offers operations on boolean logic, wildcards and truncation, proximity of words to each other, and the ability to search by just about any field that's been indexed, whether by word or by phrase. In addition to the ability to combine result sets and apply new terms to the combined sets. Search strings for DIALOG can, and have, resembled some of the more complex regular expression strings that you might see nowadays.

My instructor for Search wanted to be sure that all the students had a really good grasp of how searching worked, how engines were organized, and all of the things that modern search engines pull on people silently, often in the name of ease of use and accessibility to their content. Which means I learned how to search on DIALOG. Thankfully, there was a training and education subset of the service available for us to use that would still count our usage, but it wouldn't actually charge us for the use. Terminal access was the best way to get in and be sure you got what you wanted from there, so you can also imagine having to hope that your search query string could be held in the buffer of your favorite shell client, too. It seems like a throwback to an earlier era, but my instructor made the point perfectly - by forcing us to account for every character of the string, we made sure we got exactly what we wanted out of it. And we didn't have to fight the "helpful" features of the engine to get there, because there were no helpful features that we didn't have to explicitly invoke for ourselves.

I would never recommend throwing DIALOG at a first-time searcher without explaining the terminology, letting them research how the things they're going to be searching are indexed, and letting them construct a few practice strings to sand out the rough spots. It's an incredibly valuable training tool, though, for deeper understanding of how machines interpret search, what they're doing in the shadows, and how to test them out and see if they can understand the kind of precision language and syntax that you might need when you have to filter a wrong idea out completely from your query. I don't specifically use the knowledge of how to make DIALOG do my bidding in my day-to-day job, or, for that matter, outside of the specific class where I learned how, but the things that it taught me? I use those all the time in winnowing and getting other, less persnickety search engines, including our catalog, to cough up what I want with a minimum of fuss or bad things.

I can make your search engine dance to my tune, given enough documentation or enough tries that I can figure out the way it works. Most of the time, I don't have to. Most.