Category Archives: japanese literature

Rethinking Adaptation in Meiji Japan

I recently read J. Scott Miller’s Adaptations of Western Literature in Meiji Japan (New York: Palgrave, 2001) and am full of Thoughts on Meiji writers, literature, zeitgeist, continuity, and adaptation. Let me express some of them here.

Continue reading Rethinking Adaptation in Meiji Japan

Pre-processing Japanese literature for text analysis

I recently wrote a small script to perform a couple of functions for pre-processing Aozora Bunko texts (text files of public domain, modern Japanese literature and non-fiction) to be used with Western-oriented text analysis tools, such as Voyant, other TAPoR tools, and MALLET. Whereas Japanese text analysis software focuses largely on linguistics (tagging parts of speech, lemmatizing, etc.), Western tools open up possibilities for visualization, concordances, topic modeling, and other various modes of analysis.

Why do these Aozora texts need to be processed? Well, a couple of issues.

  1. They contain ruby, which are basically glosses of Chinese characters that give their pronunciation. These can be straightforward pronunciation help, or actually different words that give added meaning and context. While I have my issues with removing ruby, it’s impossible to do straightforward tool-based analysis without removing it, and many people who want to do this kind of analysis want it to be removed.
  2. The Aozora files are not exactly plain text: they’re HTML. The HTML tags and Aozora metadata (telling where the text came from, for example) need to be removed before analysis can be performed.
  3. There are no spaces between words in Japanese, but Western text analysis tools identify words by looking at where there are spaces. Without inserting spaces, it looks like each line is one big word. So I needed to insert spaces between the Japanese words.

How did I do it? My approach, because of my background and expertise, was to create a Python script that used a couple of helpful libraries, including BeautifulSoup for ruby removal based on HTML tags, and TinySegmenter for inserting spaces between words. My script requires you to have these packages installed, but it’s not a big deal to do so. You then run the script in a command line prompt. The way it works is to look for all .html files in a directory, load them and run the pre-processing, then output each processed file with the same filename, .txt ending, a plain text UTF-8 encoded file.

The first step in the script is to remove the ruby. Helpfully, the ruby is contained in several HTML tags. I had BeautifulSoup traverse the file and remove all elements contained within these tags; it removes both the tags and content.

Next, I used a very simple regular expression to remove everything in brackets – i.e. the HTML tags. This is kind of quick and dirty, and won’t work on every file in the universe, but in Aozora texts everything inside a bracket is an HTML tag, so it’s not a problem here.

Finally, I used TinySegmenter on the resulting HTML-free text to split the text into words. Luckily for me, it returns an array of words – basically, each word is a separate element in a list like [‘word1’, ‘word2’, … ‘wordn’] for n words. This makes my life easy for two reasons. First, I simply joined the array with a space between each word, creating one long string (the outputted text) with spaces between each element in the array (words). Second, it made it easy to just remove the part of the array that contains Aozora metadata before creating that string. Again, this is quick and dirty, but from examining the files I noted that the metadata always comes at the end of the file and begins with the word 底本 (‘source text’). Remove that word and everything after it, and then you have a metadata-free file.

Write this resulting text into a plain text file, and you have a non-ruby, non-HTML, metadata-free, whitespace-delimited Aozora text! Although you have to still download all the Aozora files individually and then do what you will with the resulting individual text files, it’s an easy way to pre-process this text and get it ready for tool-based (and also your-own-program-based) text analysis.

I plan to put the script on GitHub for your perusal and use (and of course modification) but for now, check it out on my Japanese Text Analysis research guide at Penn.

#dayofDH Meiroku zasshi 明六雑誌 project

It’s come to my attention that Fukuzawa Yukichi’s (and others’) early Meiji (1868-1912) journal, Meiroku zasshi 明六雑誌, is available online not just as PDF (which I knew about) but also as a fully tagged XML corpus from NINJAL (and oh my god, it has lemmas). All right!

Screen Shot 2014-04-08 at 11.09.55 AM

I recently met up with Mark Ravina at Association for Asian Studies, who brought this to my attention, and we are doing a lot of brainstorming about what we can do with this as a proof-of-concept project, and then move on to other early Meiji documents. We have big ideas like training OCR to recognize the difference between the katakana and kanji 二, for example; Meiji documents generally break OCR for various reasons like this, because they’re so different from contemporary Japanese. It’s like asking Acrobat to handle a medieval manuscript, in some ways.

But to start, we want to run the contents of Meiroku zasshi through tools like MALLET and Voyant, just to see how they do with non-Western languages (don’t expect any problems, but we’ll see) and what we get out of it. I’d also be interested in going back to the Stanford Core NLP API and seeing what kind of linguistic analysis we can do there. (First, I have to think of a methodology.  :O)

In order to do this, we need whitespace-delimited text with words separated by spaces. I’ve written about this elsewhere, but to sum up, Japanese is not separated by spaces, so tools intended for Western languages think it’s all one big word. There are currently no easy ways I can find to do this splitting; I’m currently working on an application that both strips ruby from Aozora bunko texts AND splits words with a space, but it’s coming slowly. How to get this with Meiroku zasshi in a quick and dirty way that lets us just play with the data?

So today after work, I’m going to use Python’s eTree library for XML to take the contents of the word tags from the corpus and just spit them into a text file delimited by spaces. Quick and dirty! I’ve been meaning to do this for weeks, but since it’s a “day of DH,” I thought I’d use the opportunity to motivate myself. Then, we can play.

Exciting stuff, this corpus. Unfortunately most of NINJAL’s other amazing corpora are available only on CD-ROMs that work on old versions of Windows. Sigh. But I’ll work with what I’ve got.

So that’s your update from the world of Japanese text analysis.

#dayofDH Japanese apps workshop for new Penn students

Today, we’re having a day in the library for prospective and new Penn students who will (hopefully) join our community in the fall. As part of the library presentations, I’ve been asked to talk about Japanese mobile apps, especially for language learning.

While I don’t consider this a necessarily DH thing, some people do, and it’s a way that I integrate technology into my job – through workshops and research guides on various digital resources. (More on that later.)

I did this workshop for librarians at the National Coordinating Council on Japanese Library Resources (NCC)’s workshop before the Council on East Asian Libraries conference a few weeks ago in March 2014. My focus was perhaps too basic for a savvy crowd that uses foreign languages frequently in their work: I covered the procedure for setting up international keyboards on Android and iOS devices, dictionaries, news apps, language learning assistance, and Aozora bunko readers. However, I did manage to impart some lesser known information: how to set up Japanese and other language dictionaries that are built into iOS devices for free. I got some thanks on that one. Also noted was the Aozora 2 Kindle PDF-maker.

Today, I’ll focus more on language learning and the basics of setting up international keyboards. I’ve been surprised at the number of people who don’t know how to do this, but not everyone uses foreign languages on their devices regularly, and on top of that, not everyone loves to poke around deep in the settings of their computer or device. And keyboard switching on Android can be especially tricky, with apps like Simeji. So perhaps covering the basics is a good idea after all.

I don’t have a huge amount of contact with undergrads compared to the reference librarians here, and my workshops tend to be focused on graduate students and faculty with Japanese language skills. So I look forward to working with a new community of pre-undergrads and seeing what their needs and desires are from the library.

politics and anthologizing

In this past year, I’ve spent a lot of time thinking about how the form of the anthologies I study (literary individual author anthologies in Japan at the turn of the 20th century) impacts possibilities of reading and interpretation. I’ve also commented at a couple of conferences that the narratives of who these authors “belong” to have been shaped and guided in these anthologies, and have written that taking works out of their original contexts fundamentally erases a part of their meaning (in terms of the ways readers encounter them) and simultaneously alters the work in terms of its received meaning.

After doing some reading this morning, I realized that one thing links these various threads in anthologies, and it’s a word I wasn’t using: politics.

I want to talk specifically about the example of Higuchi Ichiyō. For much of her career, she wrote for the magazine Bungakukai (among others) which was a driver of the first Romantic movement in Japan. In her anthologies, of course her serial works from that magazine are included as whole pieces, as though they were wholes from the outset, which has its own implications for reading. But the other piece of this is that just as the editors were writing the Bungakukai coterie social and ideological connections out of her career in their prefaces, they simultaneously erased this connection – this fundamental supplier of meaning – from her works by taking them out of their original Romantic context.

The first readers of Ichiyō’s works would have seen them embedded in theory and poetry heavily influenced by western Romanticism, including translations of English works and illustrations of faded ruins and statuary. The readers of her individual anthology, as well as reprints in wider circulation magazines such as Bungei kurabu before her death, would have encountered a very different context: in the magazines, other “modern” mainstream Japanese literature (presented as unaffiliated with any coterie or group other than the influential publishers of the magazines), and in the anthology, Ichiyō’s own works as a cohesive and self-contained whole. No longer would her work be infused, by virtue of proximity, with the politics of literature at the time she wrote in the early-to-mid 1890s. She becomes depoliticized, ironically despite the heavily social and what I would call political themes of her work: that is, the plight of the lower class and the inequity of Japanese society at the turn of the 20th century.

Especially in her second anthology, published in 1912, Ichiyō becomes a timeless woman writer, an elegant author of prose and poetry whose works are infused with tragedy – just as her poverty-stricken life was, to paraphrase the editors of the two volumes. Yet it is not a structural tragedy that pervades society, as it is in her work, but a personal, elegant, and heart-wrenching individual tragedy, one that makes her work even more poignant without necessarily having political implications. I can’t speak to the Romantic movement’s attitude toward this kind of theme found in Bungakukai, not being as familiar with its politics as I should be, but I can say that Kitamura Tōkoku – the founder of Bungakukai – basically started his career with the publication of Soshū no shi, a piece of “new-form” poetry about a prisoner, written at the height of his political involvement in the late 1880s.

So there is an association, simply by virtue of publishing in the same venues, between Ichiyō’s politics and those of Tōkoku, and the literary politics of the Romantic movement vis-à-vis the multitude of other ideologies of writing that existed at the time. Yet in her anthologies, this politics disappears and her context is lost entirely, in favor of a new context of Ichiyō alone, her works as something that stand alone without interference from the outside world. It is a profound depoliticization and something to think about in considering other anthologies as well, both early ones in Japan, current ones, and those found elsewhere in the world.

Japanese tokenization – tools and trials

I’ve been looking (okay, not looking, wishing) for a Japanese tokenizer for a while now, and today I decided to sit down and do some research into what’s out there. It didn’t take long – things have improved recently.

I found two tools quickly: kuromoji Japanese morphological analyzer and the U-Tokenizer CJK Tokenizer API.

First off – so what is tokenization? Basically, it’s separating sentences by words, or documents by sentences, or any text by some unit, to be able to chunk that text into parts and analyze them (or do other things with them). When you tokenize a document by word, like a web page, you enable searching: this is how Google finds individual words in documents. You can also find keywords from a document this way, by writing an algorithm to choose the most meaningful nouns, for example. It’s also the first step in more involved linguistic analysis like part-of-speech tagging (thing, marking individual words as nouns, verbs, and so on) and lemmatizing (paring words down to their stems, such as removing plural markers and un-conjugating verbs).

This gives you a taste of why tokenization is so fundamental and important for text analysis. It’s what lets you break up an otherwise unintelligible (to the computer) string of characters into units that the computer can attempt to analyze. It can index them, search them, categorize them, group them, visualize them, and so on. Without this, you’re stuck with “words” that are entire sentences or documents, that the computer thinks are individual units based on the fact that they’re one long string of characters.

Usually, the way you tokenize is to break up “words” based on spaces (or sentences based on punctuation rules, etc., although that doesn’t always work). (I put “words” in quotes because you can really make any kind of unit you want, the computer doesn’t understand what words are, and in the end it doesn’t matter. I’m using “words” as an example here.) However, for languages like Japanese and Chinese (and to a lesser extent Korean) that don’t use spaces to delimit all words (for example, in Korean particles are attached to nouns with no space in between, like saying “athome” instead of “at home”), you run into problems quickly. How to break up texts into words when there’s no easy way to distinguish between them?

The question of tokenizing Japanese may be a linguistic debate. I don’t know enough about linguistics to begin to participate in it, if it is. But I’ll quickly say that you can break up Japanese based on linguistic rules and dictionary rules – understanding which character compounds are nouns, which verb conjugations go with which verb stems (as opposed to being particles in between words), then breaking up common particles into their own units. This appears to be how these tools are doing it. For my own purposes, I’m not as interested in linguistic patterns as I am in noun and verb usage (the meaning rather than the kind) so linguistic nitpicking won’t be my area anyway.

Moving on to the tools. I put them through the wringer: Higuchi Ichiyō’s Ame no yoru, the first two lines, from Aozora bunko.

One, kuromoji, is the tokenizer behind Solr and Lucene. It does a fairly good job, although with Ichiyō’s uncommon word usage and conjugation, it faltered and couldn’t figure out that 高やか is one word; rather it divided it into 高 や か.  It gives the base form, reading, and pronunciation, but nothing else. However, in the version that ships with Solr/Lucene, it lemmatizes. Would that ever make me happy. (That’s, again, reducing a word to its base form, making it easy to count all instances of both “people” and “person” for example, if you’re just after meaning.) I would kill for this feature to be integrated with the below tool.

The other, U-Tokenizer, did significantly better, but its major drawback is that it’s done in the form of an HTTP request, meaning that you can’t put in entire documents (well, maybe you could? how much can you pass in an HTTP request?). If it were downloadable code with an API, I would be very happy (kuromoji is downloadable and has a command line interface). U-Tokenizer figured out that 高やか is one word, and also provides a list of “keywords,” which as far as I can tell is a bunch of salient nouns. I used it for a very short piece of text, so I can’t comment on how many keywords it would come up with for an entire document. The documentation on this is sparse, and it’s not open source, so it’s impossible to know what it’s doing. Still, it’s a fantastic tool, and also seems to work decently for Chinese and Korean.

Each of these tools has its strengths, and both are quite usable for modern and contemporary Japanese. (I really was cruel to feed them Ichiyō.) However, there is a major trial involved in using them with freely-available corpora like Aozora bunko. Guess what? Preprocessing ruby.

Aozora texts contain ruby marked up within the documents. I have my issues with stripping out ruby from documents that heavily use them (like Meiji writers, for example) because they add so much meaning to the text, but let’s say for argument’s sake that we’re not interested in the ruby. Now, it’s time to cut it all out. If I were a regular expressions wizard (or even had basic competency with them) I could probably strip this out easily, but it’s still time consuming. Download text, strip out ruby and other metadata, save as plain text. (Aozora texts are XHTML, NOT “plain text” as they’re often touted to be.) Repeat. For topic modeling using a tool like MALLET, you’re going to want to have hundreds of documents at the end of it. For example, you might be downloading all Meiji novels from Aozora and dividing them into chunks or chapters. Even the complete works of Natsume Sōseki aren’t enough without cutting them down into chapters or even paragraphs to make enough documents to use a topic modeling tool effectively. Possibly, run all these through a part-of-speech tagger like KH Coder. This is going to take a significant amount of time.

Then again, preprocessing is an essential and extremely time-consuming part of almost any text analysis project. I went through a moderate amount of work just removing Project Gutenberg metadata and dividing into chapters a set of travel narratives that I downloaded in plain text, thankfully not in HTML or XML. It made for easy processing. With something that’s not already real plain text, with a lot of metadata, and with a lot of ruby, it’s going to take much more time and effort, which is more typical of a project like this. The digital humanities are a lot of manual labor, despite the glamorous image and the idea that computers can do a lot of manual labor for us. They are a little finicky with what they’ll accept. (Granted, I’ll be using a computer script to strip out the XHTML and ruby tags, but it’s going to take work for me to write it in the first place.)

In conclusion? Text analysis, despite exciting available tools, is still hard and time consuming. There is a lot of potential here, but I also see myself going through some trials to get to the fun part, the experimentation. Still, stay tuned, especially for some follow-up posts on these tools and KH Coder as I become more familiar with them. And, I promise to stop being difficult and giving them Ichiyō’s Meiji-style bungo.

Introducing Waseda bungaku #2 早稲田文学第二次

Waseda bungaku, the literary magazine of Waseda University (Tokyo Senmon Gakkō until 1902), was originally published in the 1880s by famed writer and theater critic (and professor) Tsubouchi Shōyō, and ceased publication in the 1890s. It was started up again by his successors, explicitly in his honor and in that of the original magazine, in 1906, and went until 1927. This, as opposed to the first run (dai ichi-ji) is known now as the second series or run, dai ni-ji. It’s since gone through a number of changes in ji and is on dai-jūji (#10) in its current form – it’s still a running literary magazine today.

I’m particularly interested in this second run of the magazine because of its content, as well as its clear intent to do honor to the original, influential mid-Meiji (1868-1912) periodical. As I’ve touched on in previous posts, it’s highly nostalgic, with articles not only on current novels but on earlier Meiji works, and memories of the writers regarding their literary and social groups from their youths in the 1880s and early 1890s. There were some special Meiji literature issues (特別号) that came out in expanded form and cost significantly more than the typical issue, but even the other issues are full of memories, not just current concerns.

photoThe publisher of the magazine, Tōkyōdō, is also of interest to me, and I’m currently starting to try to look into the relationship of this commercial publisher and the academic interest group behind Waseda bungaku. Surprisingly to me, there is quite a lot published (in a relative sense, and relative to my expectations) on both Waseda University, and also Tōkyōdō itself. (Including great titles like A Stroll Through 100 Years of Tōkyōdō History.) I’m fast checking these books out and they’re becoming a growing mountain on my office bookshelves, with a significant amount of space taken up by four volumes of the 9-volume set 100 Years of Waseda University History.

Why am I so interested in this publishing history? Well, I recently received the 1929 Meiji bungaku kenkyū, which is ostensibly (according to catalog records, anyway) a reprint edition of the special Meiji literature issues of Waseda bungaku. However, when I examined the two-volume set itself, it’s a set of rebound issues – original covers and advertisements and all, bound up in hardcovers. Even the preface refers to new binding (新装) specifically, rather than a new printing or a collection. It’s extremely explicit that it’s a literal collection of old magazine issues.

The fact that Tōkyōdō seems to have rebound its overstock in 1929, two years after the journal ceased, and sold it at relatively low prices (5 yen for the set) is interesting enough, but what is even better is the fact that the advertisements are not from 1925, when the first issues included were originally published, but from 1927. Even more interesting, they’re Meiji-focused, largely for the series Meiji bungaku meicho zenshū, a collection of “famous writers” of Meiji literature (which I’ve posted on previously). These are obviously reprinted issues of the magazine from 1927, two years after their original publication date, and have had current advertisements related to the content of the issues (remember, “special Meiji literature” issues) inserted into them instead of the original 1925 ads for things like books written by the journal editors on Western philosophers. (By “original” I’m referring actually to a reproduction I have of these same issues with 1925 ads, but am not actually sure if these are from “originals” as in first printings, or if these are also later printings that have been reproduced.)

So this indicates that not only are these overstock that Tōkyōdō wanted to try to sell off in a repackaged format (“as a resource for future Meiji scholars” rather than “old issues of a literary magazine from four years ago”), but they were later printings than the 1925 original first printings. This means that there was enough interest in and demand for the Meiji special issues, whether at the time or after the fact, for them to be reissued by a commercial publisher whose goal is to make money off of them. There must have been such demand that the publisher saw profit in it.

This brings me back to previous posts about interest in Meiji, Meiji nostalgia, and Meiji and Meiji literature themselves as “things” to be studied, as fields, newly invented post-Meiji and specifically in the late 1920s. (Even if this isn’t the first appearance of the phrase “Meiji literature,” I’d still argue that as a “thing,” it really came into being at this time in terms of being popular, published, studied, and talked about.) There is obviously a market and demand for things Meiji at this time, testified to by both the reissued magazines and their rebinding, packaging, and marketing to “scholars.” I’m still on the fence about what the interest in Meiji actually meant – was it really scholarly work as these collections advertise themselves, or was it something about grasping onto recently lived past and lost youth? Or perhaps both?

digital surrogates and utility

As someone who studies the history of the book, often as an object in itself, my research tends to require that I go look at books in person. However, I use the Kindai Digital Library quite regularly as a way to survey what exists (although I fully realize how incomplete Kindai is), and indeed, I would never have found my research topic without being able to preview books using this digital library.

The point is, I previewed the books using Kindai, and then got on a plane to Japan to actually study the books for my research. I had to locate a physical copy and literally get my hands on it, in order to understand how it was made, what impression it would make on readers, and its intended audience. (For example, how well-made is it? Does it have color illustrations or text? What’s the quality of the paper like? Does it feel or look cheap? How is the binding? None of these questions can be answered from the black-and-white copy in Kindai.)

The history of the Kindai Digital Library is interesting: it’s a digitization project undertaken by the National Diet Library and based in the same collection as the Maruzen Meiji Microfilm: books microfilmed and owned by the NDL. Neither covers the entire collection of Meiji books that the NDL owns, it’s not clear if Kindai and Maruzen are coextensive (to me anyway), and the NDL’s collection does not contain every book published in the Meiji period. So, yes, it has limitations – it’s not every book from the Meiji period, and it’s scanned microfilm in black-and-white, not grayscale.

But the Kindai Digital Library, unlike the Maruzen microfilm collection, is being added to continuously, and out-of-copyright books from the Taisho and Showa periods (1912-1989) are also being scanned and included in the collection. For the newer books, they themselves are being digitized, rather than having microfilm as an intermediate step. Check out the difference between these two books by Wakamatsu Shizuko, published in 1897 (color) and 1894 (black and white):

Sure, there is a big impressionistic difference in seeing a full-color cover illustration versus a black-and-white scan of what used to be a color cover. But you can see from these images that it’s very difficult to tell the quality and condition of the monochrome image, versus the higher-quality color image that captures things like discolorations on paper and the quality of the cloth binding (not pictured here).

This makes all the difference for someone doing my kind of research: if I had scanned copies of the anthologies I study that are as good as the color book above, it’s likely that I could still do decent research – if incomplete – without going to Japan to look at these books in person. With the higher-quality color image, the digital surrogate has become a usable surrogate for me, a reasonable facsimile if you will. It provides me with enough information to be able to draw conclusions about more than just the content of the book.

This matters for more than book historians, however. One reason that Kindai Digital Library is so great is that it provides digital surrogates of the full text of books, not just their covers. Every page that is available is scanned, either from microfilm or from the book itself, and provided for viewing online – and, if you have the patience, as a PDF download a few pages at a time. Yet compare these images, again from the 1897 and 1894 books introduced above. Click to view the full size so you can see the quality of the text in each. They are both at 25% zoom in Kindai’s page viewer.

 

Here, you can appreciate the difficulty of reading the monochrome text – and this is an exceptionally clear one. The books I have read (with difficulty) excerpts from on Kindai are typically much lower quality and many characters are difficult to make out. Zooming in doesn’t help, because the quality of the image itself is relatively low.

On the other hand, you have the newer additions with higher-quality surrogates such as this color book. Of course, it’s not necessary to have color pages to read a text that was originally printed in black and white, but the inclusion of values other than straight black or white increases readability by allowing for a higher quality image. It also allows for clearer text when zooming out, viewing at say, 33% (a percentage where the monochrome text would look terrible).

As you can see, the point is that the newer Kindai texts are more usable than the older ones, not just prettier. They express the idea that there is a point where a digital surrogate becomes a usable surrogate, where it becomes “good enough” to live up to its name. Of course, “usable” depends on the purpose, but I think we can agree that if “reading” is the purpose, these new scans are far closer to the goal than the old ones.

Kindai should be commended for this commitment to higher quality in new additions to the library; I only wish there were the resources to re-digitize everything in the library at this standard.

Why is it important to? It’s not just because it would be an even more convenient resource for myself and my colleagues, an even more usable one. It’s because of the very real danger of losing some of these books. There are few, if any, copies of many of them left outside of the NDL’s collection, and many of them can no longer be viewed at the NDL in any format other than microfilm. It’s not clear to me whether the originals are being protected from the public, or if NDL actually only owns the microfilm, with the original lost to time at some point. Regardless, for many books, the Kindai scan (or NDL microfilm, its source) is the only copy of the book available. If it’s not even fully readable – the most basic level of utility beyond knowing from search results that it exists – then we have failed in our task of preservation, and in our task of creating a digital surrogate in the first place. A surrogate can’t take the place of the original if it can’t mimic it in the most basic ways. Given the fragility of Meiji and Taisho (and early Showa) sources, it’s crucial that we make available the highest-quality digital surrogates we can, and as soon as possible, before we no longer can.

*The first few editions of The Complete Works of Higuchi Ichiyo, which feature prominently in my dissertation, are a case of this. I never found a physical copy of the very first edition, actually, even outside of NDL.

more room for annotations

Poking around on the Kindai Digital Library, as I am wont to do, I came across yet another book that leaves ample room for reader annotations without providing any of its own (where they would usually appear). This is a page from 華胥国物語 : 履軒中井先生遺稿:

For comparison, here is a page from Murasaki Shikibu nikki (1892) that does have annotations in that spot:

As you can see, too, there’s quite a difference between working with the first edition of a mid-Meiji book (my photo, immediately above), a microfilm version (not pictured), and a scanned and PDFed version of the microfilm version (the first image in this post). Thankful as I am for the Kindai Digital Library, its source material could be a lot better. (Post forthcoming on their new efforts to digitize and what a difference it makes. I’d like to point out that that photo was taken with Instagram on my iPhone, not some kind of high quality camera, and is yet still higher quality and more readable than most of what is on KDL.)