Tag Archives: digitization

Using Collections – Virtually

I heard a remark the other day that struck a cord – or, churned my butter, a bit.

The gist of it was, “we should make digital facsimiles of our library materials (especially rare materials) and put them online, so they spark library use when people visit to see them in person, after becoming aware of them thanks to the digitized versions.”

Now, at Penn, we have digitized a couple of Japanese collections: Japanese Juvenile Fiction Collection (aka Tatsukawa Bunko series), Japanese Naval Collection (in-process, focused on Renshū Kantai training fleet materials), and a miscellaneous collection of Japanese rare books in general.* These materials have been used both in person (thanks to publicizing them, pre- and post-digitization, on library news sites, blogs, and social media as well as word-of-mouth), and also digitally by researchers who cannot travel to Penn. In fact, a graduate student in Australia used our juvenile fiction collection for part of his dissertation; another student in Wisconsin plans to use facsimiles of our naval materials once they’re complete; and faculty at University of Montana have used our digital facsimile of Meiji-period journal Hōbunkai-sui (or Hōbunkai-shi).

These researchers, due to distance and budget, will likely never be able to visit Penn in person to use the collections. On top of that, some items – like the juvenile fiction and lengthy government documents related to the Imperial Navy – don’t lend themselves to using in a reading room. These aren’t artifacts to look over one page at a time, but research materials that will be read extensively (rather than “intensively,” a distinction we book history folks make). Thus, this is the only use they can make of our materials.

The digitization of Japanese collections at Penn has invited use and a kind of library visit by virtue of being available for researchers worldwide, not just those who are at Penn (who could easily view them in person and don’t “need” a digital facsimile), or who can visit the library to “smell” the books (as the person I paraphrased put it). I think it’s more important to be able to read, research, and use these documents than to smell or witness the material artifact. Of course, there are cases in which one would want to do that, but by and large, our researchers care more about the content and visual aspects of the materials – things that can be captured and conveyed in digital images – rather than touching or handling them.

Isn’t this use, just as visiting the library in person use? Shouldn’t we be tracking visits to our digital collections, downloads, and qualitative stories about their use in research, just as we do a gate count and track circulation? I think so. As we think about the present and future of libraries, and people make comments about their not being needed because libraries are on our smartphones (like libraries of fake news, right?), we must make the argument for providing content both physically and virtually. Who do people think is providing the content for their digital libraries? Physical libraries, of course! Those collections exist in the real world and come from somewhere, with significant investments of money, time, and labor involved – and moreover, it is the skilled and knowledgable labor of professionals that is required.

On top of all of this, I feel it is most important to own up to what we can and cannot “control” online: our collections, by virtue of being able to be released at all, are largely in the public domain. Let’s not put CC licenses on them except for CC-0 (which is explicitly marking materials as public domain), pretending we can control the images when we have no legal right to (but users largely don’t know that). Let’s allow for free remixing and use without citing the digital library/archive it came from, without getting upset about posts on Tumblr. When you release public domain materials on the web (or through other services online), you are giving up your exclusive right to control the circumstances under which people use it – and as a cultural heritage institution, it is your role to perform this service for the world.

But not only should we provide this service, we should take credit for it: take credit for use, visits, and for people getting to do whatever they want with our collections. That is really meaningful and impactful use.

* Many thanks to Michael Williams for his great blog posts about our collections!

arsenal of research: organizing citations, PDFs, notes, brainstorming, and drafts

Post title courtesy of the tyrannical Brian Vivier.

Although I post about the content of my research quite a bit (when I do post), I thought I’d take a step back and talk about the research process today. I’m going to write about a very specific aspect: the ways in which the computer helps me organize and engage in my research.

Obviously, there are things like databases and library catalogs, which are a topic for another day. Many people I talk to don’t know the first thing about WorldCat, so it needs to be addressed! But let’s pretend I already have my sources. Now what do I do?

When I read, I’m very traditional. I take notes with pen and paper when I have a book or a photocopied source. In fact, I used to print out PDFs too, and highlight and write in the margins. Well, that turned out to be a terrible idea. Your highlights and margin notes are not very accessible when you’re coming back to the document later to brainstorm, outline, or write.

My lesson learned – learned after many difficult situations – was to take notes like I’m never going to see the source again. My advisor recommended I do this with primary sources, but if you take long notes that involve mostly direct quotes from the sources, there’s no need to buy the book or really even check it out again. There’s no need to keep binders and binders of printed-out PDFs. So that’s the kind of note-taking I do with pen and paper, first.

The next step is to get them into the computer, because I want them to be 1) stored somewhere safe (I do daily external HD backups, plus sync, more later on that), and 2) searchable, and also 3) copy and paste-able. But where to keep them? How to organize?

I have gone through several pieces of software trying to figure this out, and I’ve settled on Mendeley. I first used Scrivener even for note-taking, which is a great program, but bad for citation management. I then tried Zotero, but that turned out to be bad for PDF management. What I really wanted was a good database that would save my citations, any PDFs I happened to have (I’m currently digitizing all of my sources from my dissertation so they don’t get lost or damaged, and so I can free up my filing cabinet for other things), and ideally let me take notes and even annotate or highlight the PDFs.

Well, despite Mendeley being owned by the devil (Elsevier), it’s free and it actually does everything I need with only a few minor nitpicks, and does it in a way that makes me supremely happy. (My nitpicks are no nested bulleted lists in the notes, and no shortcut keys for bold/italics in the notes.) If you have a PDF attached to your citation and it has OCR, Mendeley’s search function will search not only your citations, notes, and annotations, but also inside the PDFs. It can be overkill at times, but it’s pretty amazing.

So step two of my research organization process is the painstaking, mindless, thankless task of typing my pen-and-paper notes into Mendeley under the appropriate citation. It’s boring but worth it. As I mentioned above, it searches all my notes, and I can copy and paste them into Scrivener, which I will address next. As I type my notes, at the very least I copy and paste them into brainstorming documents as appropriate (usually full quotes), and if I’m up to it, I do some free-writing to brainstorm how the source informs my topic and what I could write about related to it. This usually brings up new ideas I didn’t know I had.

What happens after I get all the notes typed in, PDFs organized and annotated if I have them? I next move over to Scrivener. I’ve been using it for over five years, for both research and creative writing, and can’t sing its praises enough. It’s a word processor that creates a database for your project, where you can store your reference materials, brainstorming ideas, notes, and draft. And more, if you can think of other areas you need to record notes in. Unlike old Scrivener (when I first started using it), you can now add footnotes and comments that port straight to MS Word when you compile your document for it, making the transition to final draft in Word very easy. (Sadly, publishers seem to prefer things that are not Scrivener databases when reviewing.) The typical things I store are the draft itself (of course), a research diary of brainstorming that I update periodically, brainstorming specifically about sources and particular concepts or points, and also under the “Notes” section the comments and suggestions and draft corrections I receive from others. So I keep my full writing process, except for mind mapping/concept mapping (another post), all in one place. It’s amazing.

I’m extremely happy with these two pieces of software; my only complaint is that neither of them does all of what I want, and I have to use two different things complementarily. Well, the situation is still significantly better than several years ago, when I used Mendeley Alpha and it deleted my entire library of citations multiple times. Yikes. Now its syncing works perfectly and I haven’t had a library failure yet. (Fingers crossed).

Next posts will include mind mapping software, how I take notes, how to effectively find and import source citations, and how I deal with multiple languages in my citations.

digital surrogates and utility

As someone who studies the history of the book, often as an object in itself, my research tends to require that I go look at books in person. However, I use the Kindai Digital Library quite regularly as a way to survey what exists (although I fully realize how incomplete Kindai is), and indeed, I would never have found my research topic without being able to preview books using this digital library.

The point is, I previewed the books using Kindai, and then got on a plane to Japan to actually study the books for my research. I had to locate a physical copy and literally get my hands on it, in order to understand how it was made, what impression it would make on readers, and its intended audience. (For example, how well-made is it? Does it have color illustrations or text? What’s the quality of the paper like? Does it feel or look cheap? How is the binding? None of these questions can be answered from the black-and-white copy in Kindai.)

The history of the Kindai Digital Library is interesting: it’s a digitization project undertaken by the National Diet Library and based in the same collection as the Maruzen Meiji Microfilm: books microfilmed and owned by the NDL. Neither covers the entire collection of Meiji books that the NDL owns, it’s not clear if Kindai and Maruzen are coextensive (to me anyway), and the NDL’s collection does not contain every book published in the Meiji period. So, yes, it has limitations – it’s not every book from the Meiji period, and it’s scanned microfilm in black-and-white, not grayscale.

But the Kindai Digital Library, unlike the Maruzen microfilm collection, is being added to continuously, and out-of-copyright books from the Taisho and Showa periods (1912-1989) are also being scanned and included in the collection. For the newer books, they themselves are being digitized, rather than having microfilm as an intermediate step. Check out the difference between these two books by Wakamatsu Shizuko, published in 1897 (color) and 1894 (black and white):

Sure, there is a big impressionistic difference in seeing a full-color cover illustration versus a black-and-white scan of what used to be a color cover. But you can see from these images that it’s very difficult to tell the quality and condition of the monochrome image, versus the higher-quality color image that captures things like discolorations on paper and the quality of the cloth binding (not pictured here).

This makes all the difference for someone doing my kind of research: if I had scanned copies of the anthologies I study that are as good as the color book above, it’s likely that I could still do decent research – if incomplete – without going to Japan to look at these books in person. With the higher-quality color image, the digital surrogate has become a usable surrogate for me, a reasonable facsimile if you will. It provides me with enough information to be able to draw conclusions about more than just the content of the book.

This matters for more than book historians, however. One reason that Kindai Digital Library is so great is that it provides digital surrogates of the full text of books, not just their covers. Every page that is available is scanned, either from microfilm or from the book itself, and provided for viewing online – and, if you have the patience, as a PDF download a few pages at a time. Yet compare these images, again from the 1897 and 1894 books introduced above. Click to view the full size so you can see the quality of the text in each. They are both at 25% zoom in Kindai’s page viewer.

 

Here, you can appreciate the difficulty of reading the monochrome text – and this is an exceptionally clear one. The books I have read (with difficulty) excerpts from on Kindai are typically much lower quality and many characters are difficult to make out. Zooming in doesn’t help, because the quality of the image itself is relatively low.

On the other hand, you have the newer additions with higher-quality surrogates such as this color book. Of course, it’s not necessary to have color pages to read a text that was originally printed in black and white, but the inclusion of values other than straight black or white increases readability by allowing for a higher quality image. It also allows for clearer text when zooming out, viewing at say, 33% (a percentage where the monochrome text would look terrible).

As you can see, the point is that the newer Kindai texts are more usable than the older ones, not just prettier. They express the idea that there is a point where a digital surrogate becomes a usable surrogate, where it becomes “good enough” to live up to its name. Of course, “usable” depends on the purpose, but I think we can agree that if “reading” is the purpose, these new scans are far closer to the goal than the old ones.

Kindai should be commended for this commitment to higher quality in new additions to the library; I only wish there were the resources to re-digitize everything in the library at this standard.

Why is it important to? It’s not just because it would be an even more convenient resource for myself and my colleagues, an even more usable one. It’s because of the very real danger of losing some of these books. There are few, if any, copies of many of them left outside of the NDL’s collection, and many of them can no longer be viewed at the NDL in any format other than microfilm. It’s not clear to me whether the originals are being protected from the public, or if NDL actually only owns the microfilm, with the original lost to time at some point. Regardless, for many books, the Kindai scan (or NDL microfilm, its source) is the only copy of the book available. If it’s not even fully readable – the most basic level of utility beyond knowing from search results that it exists – then we have failed in our task of preservation, and in our task of creating a digital surrogate in the first place. A surrogate can’t take the place of the original if it can’t mimic it in the most basic ways. Given the fragility of Meiji and Taisho (and early Showa) sources, it’s crucial that we make available the highest-quality digital surrogates we can, and as soon as possible, before we no longer can.

*The first few editions of The Complete Works of Higuchi Ichiyo, which feature prominently in my dissertation, are a case of this. I never found a physical copy of the very first edition, actually, even outside of NDL.