Category Archives: digital libraries

Using Collections – Virtually

I heard a remark the other day that struck a cord – or, churned my butter, a bit.

The gist of it was, “we should make digital facsimiles of our library materials (especially rare materials) and put them online, so they spark library use when people visit to see them in person, after becoming aware of them thanks to the digitized versions.”

Now, at Penn, we have digitized a couple of Japanese collections: Japanese Juvenile Fiction Collection (aka Tatsukawa Bunko series), Japanese Naval Collection (in-process, focused on Renshū Kantai training fleet materials), and a miscellaneous collection of Japanese rare books in general.* These materials have been used both in person (thanks to publicizing them, pre- and post-digitization, on library news sites, blogs, and social media as well as word-of-mouth), and also digitally by researchers who cannot travel to Penn. In fact, a graduate student in Australia used our juvenile fiction collection for part of his dissertation; another student in Wisconsin plans to use facsimiles of our naval materials once they’re complete; and faculty at University of Montana have used our digital facsimile of Meiji-period journal Hōbunkai-sui (or Hōbunkai-shi).

These researchers, due to distance and budget, will likely never be able to visit Penn in person to use the collections. On top of that, some items – like the juvenile fiction and lengthy government documents related to the Imperial Navy – don’t lend themselves to using in a reading room. These aren’t artifacts to look over one page at a time, but research materials that will be read extensively (rather than “intensively,” a distinction we book history folks make). Thus, this is the only use they can make of our materials.

The digitization of Japanese collections at Penn has invited use and a kind of library visit by virtue of being available for researchers worldwide, not just those who are at Penn (who could easily view them in person and don’t “need” a digital facsimile), or who can visit the library to “smell” the books (as the person I paraphrased put it). I think it’s more important to be able to read, research, and use these documents than to smell or witness the material artifact. Of course, there are cases in which one would want to do that, but by and large, our researchers care more about the content and visual aspects of the materials – things that can be captured and conveyed in digital images – rather than touching or handling them.

Isn’t this use, just as visiting the library in person use? Shouldn’t we be tracking visits to our digital collections, downloads, and qualitative stories about their use in research, just as we do a gate count and track circulation? I think so. As we think about the present and future of libraries, and people make comments about their not being needed because libraries are on our smartphones (like libraries of fake news, right?), we must make the argument for providing content both physically and virtually. Who do people think is providing the content for their digital libraries? Physical libraries, of course! Those collections exist in the real world and come from somewhere, with significant investments of money, time, and labor involved – and moreover, it is the skilled and knowledgable labor of professionals that is required.

On top of all of this, I feel it is most important to own up to what we can and cannot “control” online: our collections, by virtue of being able to be released at all, are largely in the public domain. Let’s not put CC licenses on them except for CC-0 (which is explicitly marking materials as public domain), pretending we can control the images when we have no legal right to (but users largely don’t know that). Let’s allow for free remixing and use without citing the digital library/archive it came from, without getting upset about posts on Tumblr. When you release public domain materials on the web (or through other services online), you are giving up your exclusive right to control the circumstances under which people use it – and as a cultural heritage institution, it is your role to perform this service for the world.

But not only should we provide this service, we should take credit for it: take credit for use, visits, and for people getting to do whatever they want with our collections. That is really meaningful and impactful use.

* Many thanks to Michael Williams for his great blog posts about our collections!

#dayofDH Japanese digital resource research guides

Another “digital” thing I’ve been doing that relates to the “humanities” (but is it even remotely DH? I don’t know), is the creation of research guides for digital resources in Japanese studies of all kinds, with a focus on Japanese-language free websites and databases, and open-access publications.

So far, I’ve been working hard on creating guides for electronic Japanese studies resources, and mobile apps easily accessible in the US for both Android and iOS that relate to Japanese research or language study. The digital resources guide covers everything from general digital archives and citation indexes to literature, art, history, pop culture, and kuzushiji resources (for reading handwritten pre- and early modern documents). They range from text and image databases to dictionaries and even YouTube videos and online courseware for learning classical Japanese and how to read manuscripts.

This has been a real challenge, as you can imagine. Creating lists of stuff is one thing (and is one thing I’ve done for Japanese text analysis resources), but actually curating them and creating the equivalent of annotated bibliographies is quite another. It’s been a huge amount of research and writing – both in discovery of sources, and also in investigating and evaluating them, then describing them in plain terms to my community. I spent hours on end surfing the App and Play Stores and downloading/trying countless awful free apps – so you don’t have to!

It’s especially hard to find digital resources in ways other than word of mouth. I find that I end up linking to other librarians’ LibGuides (i.e. research guides) often because they’ve done such a fantastic job curating their own lists already. I wonder sometimes if we’re all just duplicating each other’s efforts! The NCC has a database of research guides, yes, but would it be better if we all collaboratively edited just one? Would it get overwhelming? Would there be serious disagreements about how to organize, whether to include paid resources (and which ones), and where to file things?

The answer to all these questions is probably yes, which creates problems. Logistically, we can’t have every Japanese librarian in the English-speaking world editing the same guide anyway. So it’s hard to say what the solution is – keep working in our silos? Specialize and tell our students and faculty to Google “LibGuide Japanese” + topic? (Which is what I’ve done in the past with art and art history.) Search the master NCC database? Some combination is probably the right path.

Until then, I will keep working on accumulating as many kuzushiji resources as I can for Penn’s reading group, and updating my mobile app guide if I ever find a decent まとめ!

Japanese tokenization – tools and trials

I’ve been looking (okay, not looking, wishing) for a Japanese tokenizer for a while now, and today I decided to sit down and do some research into what’s out there. It didn’t take long – things have improved recently.

I found two tools quickly: kuromoji Japanese morphological analyzer and the U-Tokenizer CJK Tokenizer API.

First off – so what is tokenization? Basically, it’s separating sentences by words, or documents by sentences, or any text by some unit, to be able to chunk that text into parts and analyze them (or do other things with them). When you tokenize a document by word, like a web page, you enable searching: this is how Google finds individual words in documents. You can also find keywords from a document this way, by writing an algorithm to choose the most meaningful nouns, for example. It’s also the first step in more involved linguistic analysis like part-of-speech tagging (thing, marking individual words as nouns, verbs, and so on) and lemmatizing (paring words down to their stems, such as removing plural markers and un-conjugating verbs).

This gives you a taste of why tokenization is so fundamental and important for text analysis. It’s what lets you break up an otherwise unintelligible (to the computer) string of characters into units that the computer can attempt to analyze. It can index them, search them, categorize them, group them, visualize them, and so on. Without this, you’re stuck with “words” that are entire sentences or documents, that the computer thinks are individual units based on the fact that they’re one long string of characters.

Usually, the way you tokenize is to break up “words” based on spaces (or sentences based on punctuation rules, etc., although that doesn’t always work). (I put “words” in quotes because you can really make any kind of unit you want, the computer doesn’t understand what words are, and in the end it doesn’t matter. I’m using “words” as an example here.) However, for languages like Japanese and Chinese (and to a lesser extent Korean) that don’t use spaces to delimit all words (for example, in Korean particles are attached to nouns with no space in between, like saying “athome” instead of “at home”), you run into problems quickly. How to break up texts into words when there’s no easy way to distinguish between them?

The question of tokenizing Japanese may be a linguistic debate. I don’t know enough about linguistics to begin to participate in it, if it is. But I’ll quickly say that you can break up Japanese based on linguistic rules and dictionary rules – understanding which character compounds are nouns, which verb conjugations go with which verb stems (as opposed to being particles in between words), then breaking up common particles into their own units. This appears to be how these tools are doing it. For my own purposes, I’m not as interested in linguistic patterns as I am in noun and verb usage (the meaning rather than the kind) so linguistic nitpicking won’t be my area anyway.

Moving on to the tools. I put them through the wringer: Higuchi Ichiyō’s Ame no yoru, the first two lines, from Aozora bunko.

One, kuromoji, is the tokenizer behind Solr and Lucene. It does a fairly good job, although with Ichiyō’s uncommon word usage and conjugation, it faltered and couldn’t figure out that 高やか is one word; rather it divided it into 高 や か.  It gives the base form, reading, and pronunciation, but nothing else. However, in the version that ships with Solr/Lucene, it lemmatizes. Would that ever make me happy. (That’s, again, reducing a word to its base form, making it easy to count all instances of both “people” and “person” for example, if you’re just after meaning.) I would kill for this feature to be integrated with the below tool.

The other, U-Tokenizer, did significantly better, but its major drawback is that it’s done in the form of an HTTP request, meaning that you can’t put in entire documents (well, maybe you could? how much can you pass in an HTTP request?). If it were downloadable code with an API, I would be very happy (kuromoji is downloadable and has a command line interface). U-Tokenizer figured out that 高やか is one word, and also provides a list of “keywords,” which as far as I can tell is a bunch of salient nouns. I used it for a very short piece of text, so I can’t comment on how many keywords it would come up with for an entire document. The documentation on this is sparse, and it’s not open source, so it’s impossible to know what it’s doing. Still, it’s a fantastic tool, and also seems to work decently for Chinese and Korean.

Each of these tools has its strengths, and both are quite usable for modern and contemporary Japanese. (I really was cruel to feed them Ichiyō.) However, there is a major trial involved in using them with freely-available corpora like Aozora bunko. Guess what? Preprocessing ruby.

Aozora texts contain ruby marked up within the documents. I have my issues with stripping out ruby from documents that heavily use them (like Meiji writers, for example) because they add so much meaning to the text, but let’s say for argument’s sake that we’re not interested in the ruby. Now, it’s time to cut it all out. If I were a regular expressions wizard (or even had basic competency with them) I could probably strip this out easily, but it’s still time consuming. Download text, strip out ruby and other metadata, save as plain text. (Aozora texts are XHTML, NOT “plain text” as they’re often touted to be.) Repeat. For topic modeling using a tool like MALLET, you’re going to want to have hundreds of documents at the end of it. For example, you might be downloading all Meiji novels from Aozora and dividing them into chunks or chapters. Even the complete works of Natsume Sōseki aren’t enough without cutting them down into chapters or even paragraphs to make enough documents to use a topic modeling tool effectively. Possibly, run all these through a part-of-speech tagger like KH Coder. This is going to take a significant amount of time.

Then again, preprocessing is an essential and extremely time-consuming part of almost any text analysis project. I went through a moderate amount of work just removing Project Gutenberg metadata and dividing into chapters a set of travel narratives that I downloaded in plain text, thankfully not in HTML or XML. It made for easy processing. With something that’s not already real plain text, with a lot of metadata, and with a lot of ruby, it’s going to take much more time and effort, which is more typical of a project like this. The digital humanities are a lot of manual labor, despite the glamorous image and the idea that computers can do a lot of manual labor for us. They are a little finicky with what they’ll accept. (Granted, I’ll be using a computer script to strip out the XHTML and ruby tags, but it’s going to take work for me to write it in the first place.)

In conclusion? Text analysis, despite exciting available tools, is still hard and time consuming. There is a lot of potential here, but I also see myself going through some trials to get to the fun part, the experimentation. Still, stay tuned, especially for some follow-up posts on these tools and KH Coder as I become more familiar with them. And, I promise to stop being difficult and giving them Ichiyō’s Meiji-style bungo.

New issue of D-Lib magazine

D-Lib magazine has just published their most recent issue, available at http://www.dlib.org

This looks to be a great issue, with a number of fascinating articles on dissertations and theses in institutional repositories, using Wikipedia to increase awareness of digital collections, MOOCs, and automatic ordering of items based on reading lists.

Please check it out! All articles are available in full-text on the site.

NDL makes public the Historical Recordings Collection digital archive

On March 15, 2013, the National Diet Library made public their new digital archive of historical recordings. In partnership with a number of groups, including NHK, they have digitized and made available recordings from SPs from 1900 to the 1950s, in order to preserve them and prevent their becoming lost.

As time goes on, they plan to hold approximately 50,000 recordings in the archive. Although many recordings can be accessed via the Internet, some are only available to listen at the NDL itself due to copyright restrictions.

You can also access an NDL article on the digitization of recordings, entitled 音の歴史を残す (PDF link).

The archive is the Historical Recordings Collection, accessible at http://rekion.dl.ndl.go.jp/

digital surrogates and utility

As someone who studies the history of the book, often as an object in itself, my research tends to require that I go look at books in person. However, I use the Kindai Digital Library quite regularly as a way to survey what exists (although I fully realize how incomplete Kindai is), and indeed, I would never have found my research topic without being able to preview books using this digital library.

The point is, I previewed the books using Kindai, and then got on a plane to Japan to actually study the books for my research. I had to locate a physical copy and literally get my hands on it, in order to understand how it was made, what impression it would make on readers, and its intended audience. (For example, how well-made is it? Does it have color illustrations or text? What’s the quality of the paper like? Does it feel or look cheap? How is the binding? None of these questions can be answered from the black-and-white copy in Kindai.)

The history of the Kindai Digital Library is interesting: it’s a digitization project undertaken by the National Diet Library and based in the same collection as the Maruzen Meiji Microfilm: books microfilmed and owned by the NDL. Neither covers the entire collection of Meiji books that the NDL owns, it’s not clear if Kindai and Maruzen are coextensive (to me anyway), and the NDL’s collection does not contain every book published in the Meiji period. So, yes, it has limitations – it’s not every book from the Meiji period, and it’s scanned microfilm in black-and-white, not grayscale.

But the Kindai Digital Library, unlike the Maruzen microfilm collection, is being added to continuously, and out-of-copyright books from the Taisho and Showa periods (1912-1989) are also being scanned and included in the collection. For the newer books, they themselves are being digitized, rather than having microfilm as an intermediate step. Check out the difference between these two books by Wakamatsu Shizuko, published in 1897 (color) and 1894 (black and white):

Sure, there is a big impressionistic difference in seeing a full-color cover illustration versus a black-and-white scan of what used to be a color cover. But you can see from these images that it’s very difficult to tell the quality and condition of the monochrome image, versus the higher-quality color image that captures things like discolorations on paper and the quality of the cloth binding (not pictured here).

This makes all the difference for someone doing my kind of research: if I had scanned copies of the anthologies I study that are as good as the color book above, it’s likely that I could still do decent research – if incomplete – without going to Japan to look at these books in person. With the higher-quality color image, the digital surrogate has become a usable surrogate for me, a reasonable facsimile if you will. It provides me with enough information to be able to draw conclusions about more than just the content of the book.

This matters for more than book historians, however. One reason that Kindai Digital Library is so great is that it provides digital surrogates of the full text of books, not just their covers. Every page that is available is scanned, either from microfilm or from the book itself, and provided for viewing online – and, if you have the patience, as a PDF download a few pages at a time. Yet compare these images, again from the 1897 and 1894 books introduced above. Click to view the full size so you can see the quality of the text in each. They are both at 25% zoom in Kindai’s page viewer.

 

Here, you can appreciate the difficulty of reading the monochrome text – and this is an exceptionally clear one. The books I have read (with difficulty) excerpts from on Kindai are typically much lower quality and many characters are difficult to make out. Zooming in doesn’t help, because the quality of the image itself is relatively low.

On the other hand, you have the newer additions with higher-quality surrogates such as this color book. Of course, it’s not necessary to have color pages to read a text that was originally printed in black and white, but the inclusion of values other than straight black or white increases readability by allowing for a higher quality image. It also allows for clearer text when zooming out, viewing at say, 33% (a percentage where the monochrome text would look terrible).

As you can see, the point is that the newer Kindai texts are more usable than the older ones, not just prettier. They express the idea that there is a point where a digital surrogate becomes a usable surrogate, where it becomes “good enough” to live up to its name. Of course, “usable” depends on the purpose, but I think we can agree that if “reading” is the purpose, these new scans are far closer to the goal than the old ones.

Kindai should be commended for this commitment to higher quality in new additions to the library; I only wish there were the resources to re-digitize everything in the library at this standard.

Why is it important to? It’s not just because it would be an even more convenient resource for myself and my colleagues, an even more usable one. It’s because of the very real danger of losing some of these books. There are few, if any, copies of many of them left outside of the NDL’s collection, and many of them can no longer be viewed at the NDL in any format other than microfilm. It’s not clear to me whether the originals are being protected from the public, or if NDL actually only owns the microfilm, with the original lost to time at some point. Regardless, for many books, the Kindai scan (or NDL microfilm, its source) is the only copy of the book available. If it’s not even fully readable – the most basic level of utility beyond knowing from search results that it exists – then we have failed in our task of preservation, and in our task of creating a digital surrogate in the first place. A surrogate can’t take the place of the original if it can’t mimic it in the most basic ways. Given the fragility of Meiji and Taisho (and early Showa) sources, it’s crucial that we make available the highest-quality digital surrogates we can, and as soon as possible, before we no longer can.

*The first few editions of The Complete Works of Higuchi Ichiyo, which feature prominently in my dissertation, are a case of this. I never found a physical copy of the very first edition, actually, even outside of NDL.

more room for annotations

Poking around on the Kindai Digital Library, as I am wont to do, I came across yet another book that leaves ample room for reader annotations without providing any of its own (where they would usually appear). This is a page from 華胥国物語 : 履軒中井先生遺稿:

For comparison, here is a page from Murasaki Shikibu nikki (1892) that does have annotations in that spot:

As you can see, too, there’s quite a difference between working with the first edition of a mid-Meiji book (my photo, immediately above), a microfilm version (not pictured), and a scanned and PDFed version of the microfilm version (the first image in this post). Thankful as I am for the Kindai Digital Library, its source material could be a lot better. (Post forthcoming on their new efforts to digitize and what a difference it makes. I’d like to point out that that photo was taken with Instagram on my iPhone, not some kind of high quality camera, and is yet still higher quality and more readable than most of what is on KDL.)

digital resource: JAIRO

Today I’d like to introduce a digital resource that I’ve found phenomenally helpful in the past: Japan Institutional Repositories Online, or JAIRO.

This is exactly what it sounds like: a federated search for Japanese institutional repositories (IRs), with (of course) downloadable PDF full text of all the works that are in the database.* What’s amazing (to me) about JAIRO is that, unlike my stereotype of IR, it contains not only academic papers but theses and dissertations (which are also included in University of Michigan’s Deep Blue and many other American IRs), entire books, pieces of software, datasets, presentations, conference papers, and various types of bulletin and technical papers. Check it out:

The number of institutions involved in JAIRO is similarly mind-blowing. There’s no total listed on the page, but it’s well over a hundred, including universities ranging from Okinawa Christian Junior College to Waseda University. JAIRO also provides a separate full list of all IRs in Japan, 200 long, with links to each.

The content tends toward the scientific, but I’ve certainly found a large number of humanities resources. It’s great to have so many “departmental bulletin papers,” as they’re called, because the length and content of these is comparable to a “normal” journal article and they’re both current research and much, much easier to get in digital form. I’ve used several in my research already and have found them to be, hands down, the most valuable sources on the topics they cover.*

JAIRO has both a simple and advanced search, and it’s quite easy to use and browse through. Because it’s a site run by the National Institute of Informatics (NII) it also has some analysis of data about its own contents; additionally, that analysis is used to provide links to popular and new materials on the front page.

In comparison to the IRs I’ve used in the past, JAIRO’s interface is a miracle of both utility and usability (again, leave it to NII to create something this good): it’s powerful, easy to use, and quickly delivers you the content that you want. And it adds significant value by including even items as small as a list of frequently downloaded material or their (admittedly small) list of papers related to the Nobel Prize in Chemistry.

JAIRO is a project that falls under the umbrella of NII Institutional Repositories Program, which also includes the fascinating NII Institutional Repositories Database Contents Analysis with detailed statistics, graphs, and downloadable TSV files of data on IRs in Japan. JAIRO is also a search target of PORTA, the National Diet Library (NDL)’s digital archive search portal, which I’ve written about previously.

So my question to my readers is this: Is there anything like this resource for American or other English-language IRs? Anything like the PORTA digital archive federated search and portal? These are amazing resources and I only wish that I could search American universities’ IRs in the same powerful way.

* A caveat: I have no idea if it’s searching these multiple databases in real time or if it’s indexed and cached everything for search. (Reader question: does it still count as federated search if it’s not real-time?) Regardless, JAIRO retrieves results that would otherwise have to be accessed from over a hundred separate databases on their own individual sites.

** Two that come to mind are on the Meiji revival of Ihara Saikaku, and the posthumous reception of Kitamura Tōkoku.

is it ephemeral?

I work largely with sources that you would call “ephemeral” in my research these days. By that, I simply mean “in danger of disappearing easily, or have already done so.” Things prone to disappearing can range from things like theater playbills and concert programs to magazines and newspapers, to gum wrappers and signs and internet forum posts, not to mention non-archived Web sites and things that can be lost easily in a hard drive crash with no backup.* I’m being somewhat narrowminded by considering “non-ephemeral” sources to basically be books, but they are made for persistence through time, and they are often so redundant that they are de facto preserved through this.

In any case, I’ve been thinking as I write my dissertation, especially the current chapter that I’m working on, about what happens to ephemera when one decides to preserve it in a non-ephemeral form. Here, I’ll use the example of reprinting something in a book or putting it on microfilm. Not all magazines and newspapers are thrown out completely, although they do tend to be tossed out en masse every week throughout the world. Newspaper companies keep archives and libraries bind periodicals for preservation and (through) access and redundancy. Things get microfilmed. Sometimes they are reproduced in a traditional bound form at some point, as though they were books to begin with.

I’m working with two authors in particular who published almost solely in magazines that are now extremely hard to get ahold of, about 120 years ago. I’m studying the act of reprinting those stories in book form, here in anthologies of the “complete works” of those authors.** I talk a lot about the crucial role that reprinting in the form of an anthology plays in access and preservation: without reprints, these stories, published in sources that are very easily lost to us, may never have been accessible at all after a few decades of their original publication. The paper of these types of publications is rarely very durable and as time goes on, the surviving owners of the publications tend to throw them out, or the executors of their estates do it for them.

In fact, one magazine in particular is an extreme example of ephemerality. It was a handwritten magazine – really, a zine from the 1880s – that was passed around between members of a literary club, who annotated it as they went along, writing in the margins and then passing it on to the next member, sometimes making their own handwritten copies as well. In this way, the publication and distribution was profoundly decentralized and depended entirely on the efforts of the members of that club. Yet, they were all quite committed to literature and to each other, and so it was relatively successful – if you can call a magazine with only a few hand-written, hand-circulated copies successful.

The problem with the issues of this magazine (before it later was printed and sold commercially) is that they are literally no longer available. Garakuta bunko from the late 1880s is simply inaccessible to us as literary scholars and historians. There are no accessible copies, and possibly no surviving copies at all. This was the case even in the early 20th century, when the extant copies dwindled to a single set held in a private collection; only the tables of contents were published, reprinted in a book on the literary club. Now, that private collection is even inaccessible, and all we have left are those reprinted tables of contents.

Why is this important? It is now impossible for me to investigate, for example, early uses of pseudonyms by some of the authors that I study, and impossible to read their earliest works to evaluate their first efforts in literature. As this group became extremely influential from the late 1880s through the early 1900s, this is a big problem for studying its development over time, its roots, its connections with the literature of the late Edo period (1600-1867), and its early influence on others. In short, this work has been rendered impossible and these questions unanswerable.

Even as early as the 1920s, there were reprints of the publicly distributed, later issues of this magazine. It was a set of only 500 copies and its preface is extremely telling. Edited by former members of the club, the reason for the reprint is stated unequivocally: the number of surviving copies is very few, they are limited to the collections of private individuals, and the early works of club members are nearly impossible to get ahold of. It has been reprinted for posterity and for access at the time of the reprints. There are those who would like to read the works, and the reprints are made and distributed so it becomes possible again to do this.

This is a noble undertaking, and one that is extremely important to our access now. It is reasonable to wonder whether, if not for this early reprint set, even more of Garakuta bunko would be lost to the ether over time. We have more reprints now, in book form, and they are likely to persist through time thanks to this. But what if those reprints had nothing to reprint?

Finally, I come to the sticking point of all of this. It’s prompted by a question from a month or so ago: if ephemeral materials are preserved in such a way, through a digital archive, through photographs, through reprints, does that fundamentally change their nature as ephemera? I don’t have a concrete, definitive answer to this, but I do think there are two issues at the heart of this. One is a practical issue – the major difference between ephemera and other sources when attempting to create a digital archive is that there is even more impetus for careful preservation, because the danger of loss is so high. If a magazine could almost entirely disappear less than 50 years after its initial publication, what does that say about even more volatile materials? We lose a major part of the historical record and in most cases we will be unable to ever retrieve it. This means that there are historical, cultural, and literary questions that we simply cannot ask – or rather, can never answer. It reduces our understanding of the past and even of the present, given that ephemera can disappear in the blink of an eye, historically speaking.

The other issue is thornier. My answer on reprints or digital reproductions is this: it does not change the status of the source as ephemeral. Rather, I think that in some way it both attempts to obscure its ephemeral nature, and yet also makes it even more evident. What is the need for a reprint, after all, if there is no danger of disappearance? If a work is already persisting through redundancy, is there a need for preservation? And there is the issue of the reprint fundamentally altering the context, and thus the meaning, of that ephemeral source. That highlights even more its ephemeral nature, because by recuperating its pre-reprint context, its pre-preservation context, we cannot help but focus on its ephemeral nature, because we are reprinting ephemera, preserving ephemera.

In other words, we can perhaps think of reprints or digitally archived versions as separate objects entirely from the ephemera that they preserve, and this stresses even more the ephemeral nature of what has been preserved. Of course, a work reprinted in book form is less likely to be ephemeral. But what has been reprinted, a serial in a newspaper or in a magazine, is tremendously so, and this very gap in the nature of the medium is emphasized in the process. These are ephemera, preserved. Preservation does not change the fact that these sources are always, will always be, in imminent danger of permanent loss.***

Thoughts?

* In fact, I have lost some of these things that I had never considered ephemeral until they were gone. How fragile is an older hard drive full of personal data and artwork? Very. How about things you burn to a CD-ROM for safekeeping? Even worse. A personal web site that you had a few years ago? If the Internet Archive didn’t grab it, it might as well never existed. We talk quite a bit these days about the danger of things never being erased if you put them out in public, on the Internet, but they’re more endangered than we give them credit for.

** Take that with a grain of salt; “complete” is more aspirational than literal, and it has quite a lot to do with “completely” being able to know or possess the author as an author, rather than a complete set of works in themselves. I digress.

*** The fact that Garakuta bunko was reprinted in the 1920s, after all, does not change the fact that the original copies of the magazine are in grave danger of being completely lost to us. A reprint is not the same as the source that it reprints. The reprint, if not an ephemeral source in itself (this short print run of the Garakuta bunko reprint suggests that it can qualify as such), is not ephemera. But what it reprints will never stop being ephemeral.

finally: vertical text and aozora on the kindle!

Trying to figure out how to a) display vertical Japanese text on almost anything, and b) get Aozora texts on my Kindle in a way that makes for pleasant reading, has been driving me mad for some short time now.

One reason I bought a Kindle, in fact, was to have a convenient way to read books in Japanese. My options are either to order paperbacks from Japan at exorbitant shipping costs, or (especially if the books aren’t available in paperback anyway) carry around thick photocopies or bad PDF scans of works from large reference anthologies. Neither of these is a pleasant way to read a book. I love my 文庫本 just as much as the next person, but I think they’re the major factor in my continually worsening eyesight. If I keep reading them, I’m sure I’ll be blind within 5 years or so at this rate.

I was going to write a whole post here about how I wish I could get vertical text going (because this is much more comfortable for me to read), and how I was trying to devise some system for automatically converting books to Kindle-sized PDFs or even .mobi format.

Well, someone has – thank god – beaten me to it! I give you the simplest, free, web-based system for converting any Aozora book to Kindle-sized PDF, by pasting a link from Aozora into a box and downloading the PDF. It preserves ruby (furigana) and lets you choose a text size. (I recommend 大 because even 中 was giving me eye strain. Trust me, you don’t need the 文庫本 aesthetic on a Kindle screen.)

And with no further delay, here is the post from the friendly blogger at JapanNewbie who explains it all:

How I Use My Kindle

Please give him a big thanks when you visit!

Here’s a direct link to the PDF conversion site too:

http://a2k.aill.org/