I’m taking a course on Web archiving for the second half of this winter term at U of M, and from the very beginning our major project has got my brain going on theoretical issues and implications of technology and our offline assumptions as they impact our approach to the Web.
Here’s the thing about the Web. (And let’s distinguish it from “Internet.” I am only talking about the Web.) Perhaps the most wonderful, inspiring, and revolutionary aspect of hypertext and hyperlinks are their difference from print, and from scanned book images or e-books treated as paper books. I am talking about text that means something to the computer (in a sense, in that it’s manipulable), rather than the image of words on a page, which is also how I’d describe print media.
How are hypermedia different? Two words: linked, and linear.
Technology dictates to an extent how we can approach, interact with, interpret, and give meaning to a text or other communication. Books are quite hierarchical: they have tables of contents, they are organized into large units that have smaller units inside (a series, a book, a chapter, a subsection, a paragraph). With a paper book or e-book, we can flip through it as our whims or interest dictate, so I can’t argue for extreme linearity. But with many books, the assumption – and thus the way the communication is crafted – is that the reader will go in order of hierarchy, reading deeper into chapters before looking at the next. Aside from Choose Your Own Adventure, I can’t think of many (or any) examples of books that encourage or even force the reader into a linear path rather than a hierarchy.
Going with the example of the Choose Your Own Adventure books, let’s think about the idea of a lack of emphasized hierarchy. I’m not saying that Web sites don’t have hierarchy; in fact, the majority do. The vast majority of things on the Web are crafted, organized, and communicated as though we are using computers as screens for traditional print media. Sure, hyperlinks are becoming much more common, for example within some news sites (although sadly they mostly manifest as randomly linked “keywords” that go to ads). But the hierarchical impulse seems to be even more easily implemented on the Web – with all of its potential for flatness and interlinked-ness – than in print!
What I want to encourage is an approach to thinking about the Web that doesn’t take hierarchical, mostly non-internlinked sites as the representation of what the nature of the Web is, and what can be done with hypertext and hyperlinks. Rather, let’s think about the paths of users of all kinds: starting here, going there, then going back to here, then going somewhere else, all by following links that interconnect these sites of communication and interaction. Let’s set those pages all on a plane together, make the links manifest, and think about this non-hierarchical plane of communication and meaning and the workings of users brains: making connections. Going back and forth without hierarchy. Linearity in an extreme sense: not linear in terms of going from beginning to end with no flipping, but linearity in the sense of lack of hierarchy. Going back and forth along lines, going by whim and by instinct.
And here in these paths we find serendipitous meaning.
Now what is my issue with Web archiving?
Basically, it’s that it follows sites as isolated, hierarchical entities. The software that we use for our class, Archive-It, takes seeds from domains, subdomains, directories, pages, RSS feeds. (And others, but I will work with these here.) It crawls to the boundaries of a site: unless specified, if we start with a seed of example.com, we won’t go to crawl pages linked to example2.com from a page on example.com. We supply new seeds for extra domains. The very technology itself forces a representation of the Web through the lens of site paths and organization, directory-style, hierarchical in the extreme. It disallows representation of the revolutionary nature of the interlinked Web and the organic paths of users through this landscape.
I wonder if what limits our imagination here is technology: have we not developed technology that interacts with the Web in a non-hierarchical way, or is it more difficult to implement? Or is it a way of thinking that precludes capturing dynamic, unpredictable paths across, rather than up and down, a landscape of communication and reference?
What I would really find interesting and much more experimentally informative would be to crawl along the paths of potential users, perhaps with semi-random paths weighted by probability, to keep the crawl more or less going on a concept or topic or question. Where does it go? We won’t know until we do it. The results may surprise us. The results create new meaning that we were not aware of. The results are an illustration of the serendipity of exploration and discovery when our expectations of outcome are kept at a minimum. And that has value in its doing something really new with a technology whose potential we are barely beginning to tap with our creativity and imaginations.