Category Archives: digital humanities

#dayofDH Meiroku zasshi 明六雑誌 project

It’s come to my attention that Fukuzawa Yukichi’s (and others’) early Meiji (1868-1912) journal, Meiroku zasshi 明六雑誌, is available online not just as PDF (which I knew about) but also as a fully tagged XML corpus from NINJAL (and oh my god, it has lemmas). All right!

Screen Shot 2014-04-08 at 11.09.55 AM

I recently met up with Mark Ravina at Association for Asian Studies, who brought this to my attention, and we are doing a lot of brainstorming about what we can do with this as a proof-of-concept project, and then move on to other early Meiji documents. We have big ideas like training OCR to recognize the difference between the katakana and kanji 二, for example; Meiji documents generally break OCR for various reasons like this, because they’re so different from contemporary Japanese. It’s like asking Acrobat to handle a medieval manuscript, in some ways.

But to start, we want to run the contents of Meiroku zasshi through tools like MALLET and Voyant, just to see how they do with non-Western languages (don’t expect any problems, but we’ll see) and what we get out of it. I’d also be interested in going back to the Stanford Core NLP API and seeing what kind of linguistic analysis we can do there. (First, I have to think of a methodology.  :O)

In order to do this, we need whitespace-delimited text with words separated by spaces. I’ve written about this elsewhere, but to sum up, Japanese is not separated by spaces, so tools intended for Western languages think it’s all one big word. There are currently no easy ways I can find to do this splitting; I’m currently working on an application that both strips ruby from Aozora bunko texts AND splits words with a space, but it’s coming slowly. How to get this with Meiroku zasshi in a quick and dirty way that lets us just play with the data?

So today after work, I’m going to use Python’s eTree library for XML to take the contents of the word tags from the corpus and just spit them into a text file delimited by spaces. Quick and dirty! I’ve been meaning to do this for weeks, but since it’s a “day of DH,” I thought I’d use the opportunity to motivate myself. Then, we can play.

Exciting stuff, this corpus. Unfortunately most of NINJAL’s other amazing corpora are available only on CD-ROMs that work on old versions of Windows. Sigh. But I’ll work with what I’ve got.

So that’s your update from the world of Japanese text analysis.

#dayofDH Japanese apps workshop for new Penn students

Today, we’re having a day in the library for prospective and new Penn students who will (hopefully) join our community in the fall. As part of the library presentations, I’ve been asked to talk about Japanese mobile apps, especially for language learning.

While I don’t consider this a necessarily DH thing, some people do, and it’s a way that I integrate technology into my job – through workshops and research guides on various digital resources. (More on that later.)

I did this workshop for librarians at the National Coordinating Council on Japanese Library Resources (NCC)’s workshop before the Council on East Asian Libraries conference a few weeks ago in March 2014. My focus was perhaps too basic for a savvy crowd that uses foreign languages frequently in their work: I covered the procedure for setting up international keyboards on Android and iOS devices, dictionaries, news apps, language learning assistance, and Aozora bunko readers. However, I did manage to impart some lesser known information: how to set up Japanese and other language dictionaries that are built into iOS devices for free. I got some thanks on that one. Also noted was the Aozora 2 Kindle PDF-maker.

Today, I’ll focus more on language learning and the basics of setting up international keyboards. I’ve been surprised at the number of people who don’t know how to do this, but not everyone uses foreign languages on their devices regularly, and on top of that, not everyone loves to poke around deep in the settings of their computer or device. And keyboard switching on Android can be especially tricky, with apps like Simeji. So perhaps covering the basics is a good idea after all.

I don’t have a huge amount of contact with undergrads compared to the reference librarians here, and my workshops tend to be focused on graduate students and faculty with Japanese language skills. So I look forward to working with a new community of pre-undergrads and seeing what their needs and desires are from the library.

#DayofDH Good morning and self introduction

Cross-posted from Day of DH Wasting Gold Paper

I’m up early on this Day of DH 2014. So much to do!

I thought I’d introduce myself to you all, so you have an idea of my background. I’m not your typical DH practitioner – I’m not in the academy (in a traditional way) and I’m also not working with Western-language materials. My concerns don’t always apply to English-language text or European medieval manuscripts. So, if you looked in Asia I’d be less remarkable, but here in the English-language DH world I don’t run across many people like myself.

Anyway, good morning; I’m Molly, the Japanese Studies Librarian at University of Pennsylvania, also managing Korean collection. That means that I take care of everything – from collection development to reference and instruction – that has to do with Japan/Korea, or is in Japanese/Korean at the library and beyond.


Let’s start off with my background. I went to college at University of Pittsburgh for Computer Science and History (Asian history of course) and studied Japanese there for 4 years. I fully intended at the outset to become a software developer, but somewhere along the line, I decided to apply my skills somewhere outside that traditional path: librarianship. And so off I went (with a two-year hiatus in between) to graduate school for a PhD in Asian studies (Japanese literature and book history) and an MSI in Library Science at University of Michigan. Along the way, I interned at the University of Nebraska-Lincoln’s Center for Digital Research in the Humanities (CDRH), redesigning the website for, and rewriting part of the code of, a text analysis app using XSLT for the Cather archive.

After Michigan, I spent a year as a postdoc at Harvard’s Resichauer Institute, working half-time on my humanities research and half-time on a digital archive (The Digital Archive of Japan’s 2011 Disasters, or JDArchive.) Then, in July 2013, I made my first big step into librarianship here at Penn, and have been happily practicing in my chosen profession since then. I’m still new, and there is a lot to learn, but I’m loving every minute.

I admit, finding ways to integrate my CS and humanities background has been a huge challenge. I was most of the way through graduate school when someone recommended going into DH (which didn’t exactly happen – there aren’t a lot of non-postdoc or non-teaching jobs out there now). My dissertation project, a very close-reading-based analysis of five case studies of single books as objects and in terms of their publishing and reception, did not lend itself at all to a digital methodology other than using digital archives to get ahold of their prefaces and keyworded newspaper databases to find their advertisements and reviews. I used a citation index that goes back to the Meiji (1868-1912) period to find sources. Well, most of my research in fact involved browsing physical issues of early 20th-century magazines in the basement of a library in Japan, and looking at the books themselves in addition to the discourse surrounding them. I simply couldn’t think of anything to do that would be “digital.”

So my research in that area – plus what I’m working on now – have continued to be non-DH, although if you’re the kind of person who involves anything “new media” in the DH definition, it may be a little. (I am not that person.) Why do I still call myself a DH practitioner, and why do I bother participating in the community even now?

Well, despite working full time, I’m still committed to figuring out how to apply my skills to new, more DH-style projects, even as I don’t want my other traditional humanities research to die out either. It’s a balancing act. How to find the time and energy to learn new skills and just plain old carve out space to practice ones I already have?

I have a couple of opportunities. One is my copious non-work free time. (Ha. Ha.) Second is my involvement in the open and focused lab sessions of Vitale II, the digital lab (okay, it’s a room with a whiteboard and a camera) at the Kislak Center for special collections in Van Pelt Library. I have a top-secret brainstorming session with a buddy today about how we can make even more social, mental, and temporal space for DH work in the library on a topically focused basis. I’m jealous of the Literary Lab; that should speak for itself. In any case, I also ran into a fellow Japanese studies DH aspirant at the Association for Asian Studies Conference a few weeks ago too, and he and I are plotting with each other as well.

So there are time and social connections to be made, and collaboration that can take place despite all odds. But it’s still a huge challenge. I can do my DH work at 5:30 am, in the evening (when I have no brainpower left), or early on the weekends. I have many other things competing for my time, not least two other research articles I’m working on. I could also be doing my real work at any of those times without the need to explain.

Yet I do it. It’s because I love making things, because I love bringing my interests together and working on something that involves a different part of my brain from reading and writing. I’m excited about the strange and wonderful things that can come from experimental analysis that, even if they aren’t usable, can make me think more broadly and weirdly.

More to follow. よろしくお願いします!

Japanese tokenization – tools and trials

I’ve been looking (okay, not looking, wishing) for a Japanese tokenizer for a while now, and today I decided to sit down and do some research into what’s out there. It didn’t take long – things have improved recently.

I found two tools quickly: kuromoji Japanese morphological analyzer and the U-Tokenizer CJK Tokenizer API.

First off – so what is tokenization? Basically, it’s separating sentences by words, or documents by sentences, or any text by some unit, to be able to chunk that text into parts and analyze them (or do other things with them). When you tokenize a document by word, like a web page, you enable searching: this is how Google finds individual words in documents. You can also find keywords from a document this way, by writing an algorithm to choose the most meaningful nouns, for example. It’s also the first step in more involved linguistic analysis like part-of-speech tagging (thing, marking individual words as nouns, verbs, and so on) and lemmatizing (paring words down to their stems, such as removing plural markers and un-conjugating verbs).

This gives you a taste of why tokenization is so fundamental and important for text analysis. It’s what lets you break up an otherwise unintelligible (to the computer) string of characters into units that the computer can attempt to analyze. It can index them, search them, categorize them, group them, visualize them, and so on. Without this, you’re stuck with “words” that are entire sentences or documents, that the computer thinks are individual units based on the fact that they’re one long string of characters.

Usually, the way you tokenize is to break up “words” based on spaces (or sentences based on punctuation rules, etc., although that doesn’t always work). (I put “words” in quotes because you can really make any kind of unit you want, the computer doesn’t understand what words are, and in the end it doesn’t matter. I’m using “words” as an example here.) However, for languages like Japanese and Chinese (and to a lesser extent Korean) that don’t use spaces to delimit all words (for example, in Korean particles are attached to nouns with no space in between, like saying “athome” instead of “at home”), you run into problems quickly. How to break up texts into words when there’s no easy way to distinguish between them?

The question of tokenizing Japanese may be a linguistic debate. I don’t know enough about linguistics to begin to participate in it, if it is. But I’ll quickly say that you can break up Japanese based on linguistic rules and dictionary rules – understanding which character compounds are nouns, which verb conjugations go with which verb stems (as opposed to being particles in between words), then breaking up common particles into their own units. This appears to be how these tools are doing it. For my own purposes, I’m not as interested in linguistic patterns as I am in noun and verb usage (the meaning rather than the kind) so linguistic nitpicking won’t be my area anyway.

Moving on to the tools. I put them through the wringer: Higuchi Ichiyō’s Ame no yoru, the first two lines, from Aozora bunko.

One, kuromoji, is the tokenizer behind Solr and Lucene. It does a fairly good job, although with Ichiyō’s uncommon word usage and conjugation, it faltered and couldn’t figure out that 高やか is one word; rather it divided it into 高 や か.  It gives the base form, reading, and pronunciation, but nothing else. However, in the version that ships with Solr/Lucene, it lemmatizes. Would that ever make me happy. (That’s, again, reducing a word to its base form, making it easy to count all instances of both “people” and “person” for example, if you’re just after meaning.) I would kill for this feature to be integrated with the below tool.

The other, U-Tokenizer, did significantly better, but its major drawback is that it’s done in the form of an HTTP request, meaning that you can’t put in entire documents (well, maybe you could? how much can you pass in an HTTP request?). If it were downloadable code with an API, I would be very happy (kuromoji is downloadable and has a command line interface). U-Tokenizer figured out that 高やか is one word, and also provides a list of “keywords,” which as far as I can tell is a bunch of salient nouns. I used it for a very short piece of text, so I can’t comment on how many keywords it would come up with for an entire document. The documentation on this is sparse, and it’s not open source, so it’s impossible to know what it’s doing. Still, it’s a fantastic tool, and also seems to work decently for Chinese and Korean.

Each of these tools has its strengths, and both are quite usable for modern and contemporary Japanese. (I really was cruel to feed them Ichiyō.) However, there is a major trial involved in using them with freely-available corpora like Aozora bunko. Guess what? Preprocessing ruby.

Aozora texts contain ruby marked up within the documents. I have my issues with stripping out ruby from documents that heavily use them (like Meiji writers, for example) because they add so much meaning to the text, but let’s say for argument’s sake that we’re not interested in the ruby. Now, it’s time to cut it all out. If I were a regular expressions wizard (or even had basic competency with them) I could probably strip this out easily, but it’s still time consuming. Download text, strip out ruby and other metadata, save as plain text. (Aozora texts are XHTML, NOT “plain text” as they’re often touted to be.) Repeat. For topic modeling using a tool like MALLET, you’re going to want to have hundreds of documents at the end of it. For example, you might be downloading all Meiji novels from Aozora and dividing them into chunks or chapters. Even the complete works of Natsume Sōseki aren’t enough without cutting them down into chapters or even paragraphs to make enough documents to use a topic modeling tool effectively. Possibly, run all these through a part-of-speech tagger like KH Coder. This is going to take a significant amount of time.

Then again, preprocessing is an essential and extremely time-consuming part of almost any text analysis project. I went through a moderate amount of work just removing Project Gutenberg metadata and dividing into chapters a set of travel narratives that I downloaded in plain text, thankfully not in HTML or XML. It made for easy processing. With something that’s not already real plain text, with a lot of metadata, and with a lot of ruby, it’s going to take much more time and effort, which is more typical of a project like this. The digital humanities are a lot of manual labor, despite the glamorous image and the idea that computers can do a lot of manual labor for us. They are a little finicky with what they’ll accept. (Granted, I’ll be using a computer script to strip out the XHTML and ruby tags, but it’s going to take work for me to write it in the first place.)

In conclusion? Text analysis, despite exciting available tools, is still hard and time consuming. There is a lot of potential here, but I also see myself going through some trials to get to the fun part, the experimentation. Still, stay tuned, especially for some follow-up posts on these tools and KH Coder as I become more familiar with them. And, I promise to stop being difficult and giving them Ichiyō’s Meiji-style bungo.

don’t learn to code

There is a lot of speculating going on, on the Internet, at conferences, everywhere, about the ways in which we might want to integrate IT skills – for lack of a better word – with humanities education. Undergrads, graduate students, faculty. They all need some marketable tech skills at the basis of their education in order to participate in the intellectual world and economy of the 21st century.

I hear a lot, “learn to code.” In fact, my alma mater has a required first-semester course for all information science students, from information retrieval specialists to preservationists, to do just that, in Python. Others recommend Ruby. They rightly stay away from the language of my own training, C++, or god forbid, Java. Coding seems to mean scripting, which is fine with me for the purposes of humanities education. We’re not raising software engineers here. We tend to hire those separately.*

I recently read a blog post that advocated for students to “learn a programming language” as part of a language requirement for an English major. (Sorry, the link has been buried in more recent tweets by now.) You’d think I would be all about this. I’m constantly urging that humanities majors acquire enough tech skills to at least know what others are talking about when they might collaborate with them on projects in the future. It also allows one to experiment without the need for hiring a programmer at the outset of a project.

But how much experimentation does it actually allow? What can you actually get done? My contention is: not very much.

If you’re an English major who’s taken CS101 and “learned a programming language,” you have much less knowledge than you think you do. This may sound harsh, but it’s not until the second-semester, first-year CS courses that you even get into data structures and algorithms, the building blocks of programming. Even at that point, you’re just barely starting to get an idea of what you’re doing. There’s a lot more to programming than learning syntax.

In fact, I’d say that learning syntax is not the point. The point is to learn a new way of thinking, the way(s) of thinking that are required for creating programs that do something interesting and productive, that solve real problems. “Learning a programming language,” unless done very well (for example in a book like SICP), is not going to teach you this.

I may sound disdainful or bitter here, but I feel this must be said. It’s frankly insulting as someone who has gone through a CS curriculum to hear “learn a programming language” as if that’s going to allow one to “program” or “code.” Coding isn’t syntax, and it’s not learning how to print to the screen. Those are your tools, but not everything. You need theory and design, the big ideas and patterns that allow you to do real problem-solving, and you’re not going to get that from a one-semester Python course.

I don’t think there’s no point to trying to learn a programming language if you don’t currently know how to program. But I wish the strategies generally being recommended were more holistic. Learning a programming language is a waste of time if you don’t have concepts that you can use it to express.


* I’m cursed by an interdisciplinary education, in a way. I have a CS degree but no industry experience. I program both for fun and for work, and I know a range of languages. I’m qualified in that way for many DH programming jobs, but they all require several years of experience that I passed up while busy writing a Japanese literature dissertation. I’ve got a bit too much humanities for some DH jobs, and too little (specifically teaching experience) for others.

fans, collectors, and archives

In the course of my research, I’ve been studying the connection between the first “complete works” anthology of writer Ihara Saikaku, his canonization, and the collectors and fans who created the anthology – a very archival anthology. (I say this because it has information about the contemporary provenance of the texts that make it up, among other things. It names the collector that contributed the text to the project on every title page!)

It’s struck me throughout this project that the role of fans – which these people were – and their connection with collectors, as well as their overlap, is of crucial importance in preserving, in creating archives and maintaining them, in creating resources that make study or access possible in the first place. They do the hard work of searching, finding, discovering, buying, arranging, preserving, and if we’re lucky, disseminating – through reprinting or, now, through making digital resources.

As I’ve become more acquainted with digital humanities and the range of projects out there, I can’t help but notice the role of collectors and fans here too. It’s not so much in the realm of academic projects, but in the numbers of Web sites out there that provide images or other surrogates for documents and objects that would otherwise be inaccessible. These are people who have built up personal collections over years, and who have created what would otherwise be called authoritative guides and resources without qualification – but who are not academics. They occupy a gray area of a combination of expertise and lack of academic affiliation or degree, but they are the ones who have provided massive amounts of information and documentation – including digital copies of otherwise-inaccessible primary sources.

I think we can see this in action with fandoms surrounding contemporary media, in particular – just look at how much information is available on Wikipedia about current video games and TV shows. Check out the Unofficial Elder Scrolls Pages and other similar wikis. (Note that UESP began as a Web site, not a wiki; it’s a little time capsule that reflects how fan pages have moved from individual labors of love to collective ones, with the spread of wikis for fan sites. A history of the site itself – “much of the early details and dates are vague as there are no records available anymore” – can be found here.)

I’m not a researcher of contemporary media or fan culture, but I can’t help but notice this and how little it’s talked about in the context of digital humanities, creating digital resources, and looking at the preservation of information over time.

Without collectors like Awashima Kangetsu and fans like Ozaki Kōyō and Ōhashi Otowa, we may not have Ihara Saikaku here today – and yet he is now among the most famous Japanese authors, read in survey courses as the representative Edo (1600-1867) author. He was unknown at the time, an underground obsession of a handful of literary youths. It was their collecting work, and their dedication (and connections to a major publisher) that produced The Complete Works of Saikaku in 1894, a reprinted two-volume set of those combined fans’ collections of used books. Who will we be saying this about in a hundred years?

For my readers out there who have their feet more in fandom and fan culture than I do, what do you think?

creativity, goals, and the dissertation

I’ve been consulting some books on art-making lately, that you could broadly say are on that nebulous idea of “creativity” itself. (Art and Fear is the most well known of them and I can’t recommend it enough. It’s the best tiny book you’ll ever own.) As I’ve read more, I have realized that they apply not only to my artistic life – my life outside of the “work” of research and writing – but also to my current writing project as well. In other words, writing a dissertation, essentially a non-fiction book, is a creative undertaking of great magnitude and can be considered with the same principles in mind as would a painting or a composition or a mathematical theory.  (Fill in your creative path here.)

This was a revelation for me, despite the fact that I engage in drawing, painting, and creative writing as a part of my life: why would non-fiction writing for my “real job” not be creative work as well, and best approached with the same attitudes? Why  not?

So one thing that comes out of this is the issue of the goal. Art and Fear talks about this one and I’d honestly never considered it before. The goal often sounds like this: have a solo show, or get a piece in MoMA, or get a book published, or whatever. The problem arises that when the artist is successful and meets that goal, art-making can often cease completely, forever, because the goal has been met and there is no direction anymore, and nothing to aim for.

This book in particular recommends that goals should be more along the lines of “find a group of like-minded artists and share work with them.” Things that won’t be attained in a single moment, but that continue for the rest of your life.

It made me realize that yes, as a scholar, I have an end goal right now, and that is finishing my dissertation. After that, it’s a few articles, a monograph. But then what? And I don’t have a good answer for that. Thus, I am at high risk for becoming the same as the writer who quits after her first bestselling novel, adrift without an ongoing goal.

I wonder how scholars deal with this (I may just go and ask a few of them), but I think for myself, I’ve found a seed of it in a digital humanities project I’m dreaming up but haven’t had time to start implementing yet. It’s one that is less about content and more about opening up possibilities for exploring questions in ways that didn’t exist before, and to experiment with new methodologies that wouldn’t have traditionally come from my discipline. Sure, it’s building a database. But then it’s what to do with that database that’s the real project.

At the same time, I think a huge issue both in the arts and the academic humanities is that of solitude. I am not saying anything new here. Right now, a colleague and I are planning on co-authoring an article and attempting to get it published (please cross your fingers for us). I think it may be in my best interests, more than anything else, to keep in close touch with this person who works on things that are similar to my own work, and to keep picking up those business cards I like to collect from people I meet at conferences who are interested in my research for some reason, and routinely emailing them. My database project is something I want to leave open source and twist others’ arms to take part in. So I’m thinking now, as I’m nearing the end of my PhD course, where to start with the idea of forming a like-minded group to continue to share and collaborate with. To keep the end goal always moving and yet always fulfilled, because it is within myself and other people, and not just about me and something outside of me.

is it ephemeral?

I work largely with sources that you would call “ephemeral” in my research these days. By that, I simply mean “in danger of disappearing easily, or have already done so.” Things prone to disappearing can range from things like theater playbills and concert programs to magazines and newspapers, to gum wrappers and signs and internet forum posts, not to mention non-archived Web sites and things that can be lost easily in a hard drive crash with no backup.* I’m being somewhat narrowminded by considering “non-ephemeral” sources to basically be books, but they are made for persistence through time, and they are often so redundant that they are de facto preserved through this.

In any case, I’ve been thinking as I write my dissertation, especially the current chapter that I’m working on, about what happens to ephemera when one decides to preserve it in a non-ephemeral form. Here, I’ll use the example of reprinting something in a book or putting it on microfilm. Not all magazines and newspapers are thrown out completely, although they do tend to be tossed out en masse every week throughout the world. Newspaper companies keep archives and libraries bind periodicals for preservation and (through) access and redundancy. Things get microfilmed. Sometimes they are reproduced in a traditional bound form at some point, as though they were books to begin with.

I’m working with two authors in particular who published almost solely in magazines that are now extremely hard to get ahold of, about 120 years ago. I’m studying the act of reprinting those stories in book form, here in anthologies of the “complete works” of those authors.** I talk a lot about the crucial role that reprinting in the form of an anthology plays in access and preservation: without reprints, these stories, published in sources that are very easily lost to us, may never have been accessible at all after a few decades of their original publication. The paper of these types of publications is rarely very durable and as time goes on, the surviving owners of the publications tend to throw them out, or the executors of their estates do it for them.

In fact, one magazine in particular is an extreme example of ephemerality. It was a handwritten magazine – really, a zine from the 1880s – that was passed around between members of a literary club, who annotated it as they went along, writing in the margins and then passing it on to the next member, sometimes making their own handwritten copies as well. In this way, the publication and distribution was profoundly decentralized and depended entirely on the efforts of the members of that club. Yet, they were all quite committed to literature and to each other, and so it was relatively successful – if you can call a magazine with only a few hand-written, hand-circulated copies successful.

The problem with the issues of this magazine (before it later was printed and sold commercially) is that they are literally no longer available. Garakuta bunko from the late 1880s is simply inaccessible to us as literary scholars and historians. There are no accessible copies, and possibly no surviving copies at all. This was the case even in the early 20th century, when the extant copies dwindled to a single set held in a private collection; only the tables of contents were published, reprinted in a book on the literary club. Now, that private collection is even inaccessible, and all we have left are those reprinted tables of contents.

Why is this important? It is now impossible for me to investigate, for example, early uses of pseudonyms by some of the authors that I study, and impossible to read their earliest works to evaluate their first efforts in literature. As this group became extremely influential from the late 1880s through the early 1900s, this is a big problem for studying its development over time, its roots, its connections with the literature of the late Edo period (1600-1867), and its early influence on others. In short, this work has been rendered impossible and these questions unanswerable.

Even as early as the 1920s, there were reprints of the publicly distributed, later issues of this magazine. It was a set of only 500 copies and its preface is extremely telling. Edited by former members of the club, the reason for the reprint is stated unequivocally: the number of surviving copies is very few, they are limited to the collections of private individuals, and the early works of club members are nearly impossible to get ahold of. It has been reprinted for posterity and for access at the time of the reprints. There are those who would like to read the works, and the reprints are made and distributed so it becomes possible again to do this.

This is a noble undertaking, and one that is extremely important to our access now. It is reasonable to wonder whether, if not for this early reprint set, even more of Garakuta bunko would be lost to the ether over time. We have more reprints now, in book form, and they are likely to persist through time thanks to this. But what if those reprints had nothing to reprint?

Finally, I come to the sticking point of all of this. It’s prompted by a question from a month or so ago: if ephemeral materials are preserved in such a way, through a digital archive, through photographs, through reprints, does that fundamentally change their nature as ephemera? I don’t have a concrete, definitive answer to this, but I do think there are two issues at the heart of this. One is a practical issue – the major difference between ephemera and other sources when attempting to create a digital archive is that there is even more impetus for careful preservation, because the danger of loss is so high. If a magazine could almost entirely disappear less than 50 years after its initial publication, what does that say about even more volatile materials? We lose a major part of the historical record and in most cases we will be unable to ever retrieve it. This means that there are historical, cultural, and literary questions that we simply cannot ask – or rather, can never answer. It reduces our understanding of the past and even of the present, given that ephemera can disappear in the blink of an eye, historically speaking.

The other issue is thornier. My answer on reprints or digital reproductions is this: it does not change the status of the source as ephemeral. Rather, I think that in some way it both attempts to obscure its ephemeral nature, and yet also makes it even more evident. What is the need for a reprint, after all, if there is no danger of disappearance? If a work is already persisting through redundancy, is there a need for preservation? And there is the issue of the reprint fundamentally altering the context, and thus the meaning, of that ephemeral source. That highlights even more its ephemeral nature, because by recuperating its pre-reprint context, its pre-preservation context, we cannot help but focus on its ephemeral nature, because we are reprinting ephemera, preserving ephemera.

In other words, we can perhaps think of reprints or digitally archived versions as separate objects entirely from the ephemera that they preserve, and this stresses even more the ephemeral nature of what has been preserved. Of course, a work reprinted in book form is less likely to be ephemeral. But what has been reprinted, a serial in a newspaper or in a magazine, is tremendously so, and this very gap in the nature of the medium is emphasized in the process. These are ephemera, preserved. Preservation does not change the fact that these sources are always, will always be, in imminent danger of permanent loss.***


* In fact, I have lost some of these things that I had never considered ephemeral until they were gone. How fragile is an older hard drive full of personal data and artwork? Very. How about things you burn to a CD-ROM for safekeeping? Even worse. A personal web site that you had a few years ago? If the Internet Archive didn’t grab it, it might as well never existed. We talk quite a bit these days about the danger of things never being erased if you put them out in public, on the Internet, but they’re more endangered than we give them credit for.

** Take that with a grain of salt; “complete” is more aspirational than literal, and it has quite a lot to do with “completely” being able to know or possess the author as an author, rather than a complete set of works in themselves. I digress.

*** The fact that Garakuta bunko was reprinted in the 1920s, after all, does not change the fact that the original copies of the magazine are in grave danger of being completely lost to us. A reprint is not the same as the source that it reprints. The reprint, if not an ephemeral source in itself (this short print run of the Garakuta bunko reprint suggests that it can qualify as such), is not ephemera. But what it reprints will never stop being ephemeral.

Video Podcast: London Seminar in Digital Text and Scholarship

The School of Advanced Study at the University of London has just started a video (and audio) podcast series of the full talks from each session of the London Seminar in Digital Text in Scholarship.

Find the podcasts online here, or subscribe via iTunes (there is a link on the page to do so).

The first talk is Jan Rybicki with ‘The Translator’s Other Invisibility: Stylometry in Translation.’ Just another day I wish I lived in London, with all of the great digital humanities related seminars and talks going on. I read this scholar’s paper on the same subject in Literary & Linguistic Computing not too long ago and it was, in a word, awesome.