I have often said that the librarians were needed in previous decades in order to help researchers find information, but are needed today in order to help researchers skillfully navigate the glut of information available. We do this through a variety of means. Librarians are the janitorial engineers of the information world. We make sense of it all. We organize the information into nice neat little piles called subject headings, wayfinders, and databases. We sort laundry from the information hamper — deciding which information should go where and with what other information and then folding it nicely and placing it on a shelf (or in a database…) for you to find easily.
Sorry for that analogy. Something within me would not let me pass it up.
Chris Anderson of Wired Magazine has an interesting article about Google’s accomplishments and whether the new age of search will render our neat piles of information less relevant. He writes,
The Petabyte Age is different because more is different. Kilobytes were stored on floppy disks. Megabytes were stored on hard disks. Terabytes were stored in disk arrays. Petabytes are stored in the cloud. As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to — well, at petabytes we ran out of organizational analogies.
At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later.
This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.
There’s no reason to cling to our old ways. It’s time to ask: What can science learn from Google?
The question remains, though, what happens after Google? Libraries (though not all) will indeed weather the storm, but what they will look like on the other side is yet to be determined.