Taiyō project: first steps with data

As I begin working on my project involving Taiyō magazine, I thought I’d document what I’m doing so others can see the process of cleaning the data I’ve gotten, and then experimenting with it. This is the first part in that series: first steps with data, cleaning it, and getting it ready for analysis. If I have the Taiyō data in “plain text,” what’s there to clean? Oh, you have no idea.

taiyo_data Continue reading Taiyō project: first steps with data

#dayofDH Meiroku zasshi 明六雑誌 project

It’s come to my attention that Fukuzawa Yukichi’s (and others’) early Meiji (1868-1912) journal, Meiroku zasshi 明六雑誌, is available online not just as PDF (which I knew about) but also as a fully tagged XML corpus from NINJAL (and oh my god, it has lemmas). All right!

Screen Shot 2014-04-08 at 11.09.55 AM

I recently met up with Mark Ravina at Association for Asian Studies, who brought this to my attention, and we are doing a lot of brainstorming about what we can do with this as a proof-of-concept project, and then move on to other early Meiji documents. We have big ideas like training OCR to recognize the difference between the katakana and kanji 二, for example; Meiji documents generally break OCR for various reasons like this, because they’re so different from contemporary Japanese. It’s like asking Acrobat to handle a medieval manuscript, in some ways.

But to start, we want to run the contents of Meiroku zasshi through tools like MALLET and Voyant, just to see how they do with non-Western languages (don’t expect any problems, but we’ll see) and what we get out of it. I’d also be interested in going back to the Stanford Core NLP API and seeing what kind of linguistic analysis we can do there. (First, I have to think of a methodology.  :O)

In order to do this, we need whitespace-delimited text with words separated by spaces. I’ve written about this elsewhere, but to sum up, Japanese is not separated by spaces, so tools intended for Western languages think it’s all one big word. There are currently no easy ways I can find to do this splitting; I’m currently working on an application that both strips ruby from Aozora bunko texts AND splits words with a space, but it’s coming slowly. How to get this with Meiroku zasshi in a quick and dirty way that lets us just play with the data?

So today after work, I’m going to use Python’s eTree library for XML to take the contents of the word tags from the corpus and just spit them into a text file delimited by spaces. Quick and dirty! I’ve been meaning to do this for weeks, but since it’s a “day of DH,” I thought I’d use the opportunity to motivate myself. Then, we can play.

Exciting stuff, this corpus. Unfortunately most of NINJAL’s other amazing corpora are available only on CD-ROMs that work on old versions of Windows. Sigh. But I’ll work with what I’ve got.

So that’s your update from the world of Japanese text analysis.

Introducing Waseda bungaku #2 早稲田文学第二次

Waseda bungaku, the literary magazine of Waseda University (Tokyo Senmon Gakkō until 1902), was originally published in the 1880s by famed writer and theater critic (and professor) Tsubouchi Shōyō, and ceased publication in the 1890s. It was started up again by his successors, explicitly in his honor and in that of the original magazine, in 1906, and went until 1927. This, as opposed to the first run (dai ichi-ji) is known now as the second series or run, dai ni-ji. It’s since gone through a number of changes in ji and is on dai-jūji (#10) in its current form – it’s still a running literary magazine today.

I’m particularly interested in this second run of the magazine because of its content, as well as its clear intent to do honor to the original, influential mid-Meiji (1868-1912) periodical. As I’ve touched on in previous posts, it’s highly nostalgic, with articles not only on current novels but on earlier Meiji works, and memories of the writers regarding their literary and social groups from their youths in the 1880s and early 1890s. There were some special Meiji literature issues (特別号) that came out in expanded form and cost significantly more than the typical issue, but even the other issues are full of memories, not just current concerns.

photoThe publisher of the magazine, Tōkyōdō, is also of interest to me, and I’m currently starting to try to look into the relationship of this commercial publisher and the academic interest group behind Waseda bungaku. Surprisingly to me, there is quite a lot published (in a relative sense, and relative to my expectations) on both Waseda University, and also Tōkyōdō itself. (Including great titles like A Stroll Through 100 Years of Tōkyōdō History.) I’m fast checking these books out and they’re becoming a growing mountain on my office bookshelves, with a significant amount of space taken up by four volumes of the 9-volume set 100 Years of Waseda University History.

Why am I so interested in this publishing history? Well, I recently received the 1929 Meiji bungaku kenkyū, which is ostensibly (according to catalog records, anyway) a reprint edition of the special Meiji literature issues of Waseda bungaku. However, when I examined the two-volume set itself, it’s a set of rebound issues – original covers and advertisements and all, bound up in hardcovers. Even the preface refers to new binding (新装) specifically, rather than a new printing or a collection. It’s extremely explicit that it’s a literal collection of old magazine issues.

The fact that Tōkyōdō seems to have rebound its overstock in 1929, two years after the journal ceased, and sold it at relatively low prices (5 yen for the set) is interesting enough, but what is even better is the fact that the advertisements are not from 1925, when the first issues included were originally published, but from 1927. Even more interesting, they’re Meiji-focused, largely for the series Meiji bungaku meicho zenshū, a collection of “famous writers” of Meiji literature (which I’ve posted on previously). These are obviously reprinted issues of the magazine from 1927, two years after their original publication date, and have had current advertisements related to the content of the issues (remember, “special Meiji literature” issues) inserted into them instead of the original 1925 ads for things like books written by the journal editors on Western philosophers. (By “original” I’m referring actually to a reproduction I have of these same issues with 1925 ads, but am not actually sure if these are from “originals” as in first printings, or if these are also later printings that have been reproduced.)

So this indicates that not only are these overstock that Tōkyōdō wanted to try to sell off in a repackaged format (“as a resource for future Meiji scholars” rather than “old issues of a literary magazine from four years ago”), but they were later printings than the 1925 original first printings. This means that there was enough interest in and demand for the Meiji special issues, whether at the time or after the fact, for them to be reissued by a commercial publisher whose goal is to make money off of them. There must have been such demand that the publisher saw profit in it.

This brings me back to previous posts about interest in Meiji, Meiji nostalgia, and Meiji and Meiji literature themselves as “things” to be studied, as fields, newly invented post-Meiji and specifically in the late 1920s. (Even if this isn’t the first appearance of the phrase “Meiji literature,” I’d still argue that as a “thing,” it really came into being at this time in terms of being popular, published, studied, and talked about.) There is obviously a market and demand for things Meiji at this time, testified to by both the reissued magazines and their rebinding, packaging, and marketing to “scholars.” I’m still on the fence about what the interest in Meiji actually meant – was it really scholarly work as these collections advertise themselves, or was it something about grasping onto recently lived past and lost youth? Or perhaps both?

Meiji nostalgia: the 1910s-1920s

I’m always struck by the nostalgia for the Meiji period (1868-1912) that I find even before the end of Meiji, but especially in what ramps up in the 1910s-late 1920s, in particular with the reprinting of literary coterie Ken’yūsha’s Garakuta bunko (late 1880s) in 1927, the re-publication of Waseda bungaku‘s special Meiji articles and issues in the form of Meiji bungaku kenkyū in 1929, and the publication of Meiji bungaku meicho zenshsū (The Complete Collection of Famous Meiji Literary Writers) from 1926. It’s something about this late-20s flurry of Meiji activity, plus what precedes it in the literary journal Waseda bungaku, that fascinates the part of me that is interested in archives and social memory.*

Why social memory? Well, Waseda bungaku, the literary journal of Waseda University (started by Tsubouchi Shoyo in the 1880s-1890s, then on hiatus until 1906, restarting in that year – late Meiji), contains a huge number of articles written by surviving members of Meiji literary groups about their memories and their friends, long or recently dead, and their reminiscences of the early days of those groups and associated publications. Shimazaki Tōson writes of the founding and early period of literary magazine Bungakkai and its coterie in the early 1890s, Kōda Rohan writes of the death and life of Awashima Kangetsu, and Emi Suiin writes volumes about Ken’yūsha and its early and late history.

In fact, Suiin not only wrote these lengthy articles, he also penned the book Meiji bundanshi – jiko chūshin (A History of the Meiji Literary World – Focused on Myself) in 1927, and another, Ken’yūsha to Kōyō (Ken’yūsha and [Ozaki] Kōyō) in the same year. These are focused entirely on his memories of his life in the Meiji literary world, including big shot Ozaki Kōyō, Ken’yūsha’s founder and one of the most popular and influential writers of the mid-Meiji period (d. 1902). His books, coincidentally – or perhaps not – came out in the very same year as a reproduction of Ken’yūsha’s first literary magazine, Garakuta bunko, reprinted by an individual (Kaneyama Fumio) with the express purpose of providing more material to Meiji literary scholars interested in that coterie’s activities, for whom the archives were dwindling if they existed at all. Likewise, in 1927 an article appeared in Waseda bungaku on Ken’yūsha’s somewhat later Edo murasaki magazine, testifying to renewed (if perhaps not sustained) interest in that coterie’s publications and, importantly, that specific time period of the early Meiji 20s (late 1880s-early 1890s).

Just two years later, in 1929, a publication came out that commemorated the 27th anniversary of Ozaki Kōyō’s death with a special society pamphlet, for lack of a better word (kaishi 会誌). Why it’s the 27th anniversary is anyone’s guess (or, if I’m missing something culturally significant, please fill me in!).

I recently received a fascinating set of books for my library that collects the “Meiji issues” (Meiji bungaku gō) of Waseda bungaku from 1925-1927, and was published in 1929. It appears to be bound volumes of individual, original Waseda bungaku issues, although there is a discrepancy between those and the reproduction of the “originals” that also arrived – the ads are different, and the ones in the “1925” issues all date from 1927 or later. Leaving this fascinating publishing story aside for the time being, let’s take a look at the preface. Just as with the Garakuta bunko reprints, the editor (Honma Hisao) of Waseda bungaku and these volumes claims that there is a dearth of material for those studying “Meiji literature” and in order to help future scholars, it is a mission of “a magazine with a tradition stretching back into the Meiji period” (i.e., Waseda bungaku) to collect its issues in a gappon 合本 and re-release them to the public.

preface As Michael Williams pointed out to me, this isn’t even primary sources on Meiji literature – it contains Taisho and Showa writing on Meiji. But I think there’s a particular draw, an almost-primary-source quality, because the articles are by and large written by other Meiji big shots (if not the deceased Kōyō himself) such as Rohan and Tōson and Suiin, and they’re about those Meiji memories and Meiji experiences. They’re social memories of Meiji, giving the reader a direct connection to events and literature of the past through the firsthand experiences of the writers.

So is it really about a lack of Meiji sources? Possibly, but unlikely. Meiji literature was being reprinted and recirculated both in single-volume form as well as in zenshū, or “complete” literary collections, of various kinds. I think it’s more a mixture of nostalgia and fear of the experiences and memories of the period disappearing, perhaps along with the fires that accompanied the 1923 Great Kanto Earthquake, and along with those who were dying, like Awashima Kangetsu had only a few years before. It was a time when the original Ken’yūsha members were old and dying off, when major Meiji figures were disappearing and no longer accessible – and no longer surrounded by others who could also remember the time of their youth.

I have one other tidbit to add to the Meiji nostalgia boom of the late 20s. The series I referenced above, Meiji bungaku meicho zenshū, was published in 12 volumes from 1926-1927 and there are publisher advertising leaflets for it stuffed into the books that make up Meiji bungaku kenkyū (the Meiji re-issues of Waseda bungaku that has been discussed). One is nearly poster-sized. The books that make them up, save for Kōyō’s Irozange and Rohan’s Fūryūbutsu, are largely forgotten now, and it even includes one translation by Morita Shiken. Yet it’s a “scholarly resource” including explications, criticism, photographs, and illustrations – not exactly nostalgic. But I’d argue that it’s the context in which I find those leaflets that makes them intimate parts of the fabric of Meiji social memory: they’re reprints of the very books that the writers of the nostalgic essays would have read in their youths, and supply the means to remember Meiji through direct experience in 1927, 15 years after the end of the period in 1912.

All of this Meiji-related publishing activity, I see as a flurry of nostalgia for and fear of the loss of Meiji memories, of Meiji experiences, and ultimately of the memories of the writers’ and publishers’ very youth itself. These actions bind up inextricably the institutions of archives (personal and official), publication (private and commercial), remembering (individually and socially), and commemorating – creating the very idea of “Meiji” and “Meiji literature,” an idea that can never be severed, at least in the late 1920s, from the memory and social fabric of those Meiji survivors still living.

leafletsmall leaflet

* Actually, I came to my dissertation research topic – literary anthologies of the recently deceased – through a course entitled “Archives and Institutions of Social Memory.”

Meiroku zasshi (明六雑誌) now available online

The Meiji periodical founded and written by Fukuzawa Yukichi and others, Meiroku zasshi 明六雑誌, has now been put online in full text – or rather, page images. They’re available in both JPG and PDF format. This is a great resource for Meiji researchers, as it’s not exactly easy to get ahold of this 1874-1875 periodical otherwise. And let me tell you, these are high quality color images, highly readable, and you can even get a sense of the texture of the page. It’s a beautiful digitization and a valuable project.

You can access it at the 明六雑誌画像 website.

New issue of D-Lib magazine

D-Lib magazine has just published their most recent issue, available at http://www.dlib.org

This looks to be a great issue, with a number of fascinating articles on dissertations and theses in institutional repositories, using Wikipedia to increase awareness of digital collections, MOOCs, and automatic ordering of items based on reading lists.

Please check it out! All articles are available in full-text on the site.

new magazine: yū

I came across a new magazine online recently that, as always, makes me wish I were still in Japan so I could grab a copy of myself. It’s called Yū 幽, or spirit in my translation – and by spirit I mean the supernatural.

In case you can’t guess, it’s all about the supernatural and ghostly, and is your typical “literary” magazine in Japan – some fiction (short enough for a single issue, usually), plus essays and other relevant short non-fiction. When in Japan (and now, through my sizeable collection of back issues) I consumed these kinds of magazines regularly. I would say voraciously, but it makes for some somewhat slow reading given that it’s literary fiction not in my native language. Still, I love magazines, and I love this type in particular. (Some of my favorites in Japan are Bungakkai and Yom Yom.)

Best of all, Yū has a fantastic web site: Web Yoo. It has a number of blogs, including by authors that write for the magazine, about related books, and ones that have news about current and upcoming issues. They even have their own supernatural fiction prize, 幽怪談文学賞. (Never quite sure how to translate that one; I like to use “weird” as in “weird tales” of the early 20th century here in the US.)

Please check it out, especially if you’re in Japan and can get ahold of it. At the very least, you’ll be treated to great content and some seriously fantastic images and typography on the web site.

is it ephemeral?

I work largely with sources that you would call “ephemeral” in my research these days. By that, I simply mean “in danger of disappearing easily, or have already done so.” Things prone to disappearing can range from things like theater playbills and concert programs to magazines and newspapers, to gum wrappers and signs and internet forum posts, not to mention non-archived Web sites and things that can be lost easily in a hard drive crash with no backup.* I’m being somewhat narrowminded by considering “non-ephemeral” sources to basically be books, but they are made for persistence through time, and they are often so redundant that they are de facto preserved through this.

In any case, I’ve been thinking as I write my dissertation, especially the current chapter that I’m working on, about what happens to ephemera when one decides to preserve it in a non-ephemeral form. Here, I’ll use the example of reprinting something in a book or putting it on microfilm. Not all magazines and newspapers are thrown out completely, although they do tend to be tossed out en masse every week throughout the world. Newspaper companies keep archives and libraries bind periodicals for preservation and (through) access and redundancy. Things get microfilmed. Sometimes they are reproduced in a traditional bound form at some point, as though they were books to begin with.

I’m working with two authors in particular who published almost solely in magazines that are now extremely hard to get ahold of, about 120 years ago. I’m studying the act of reprinting those stories in book form, here in anthologies of the “complete works” of those authors.** I talk a lot about the crucial role that reprinting in the form of an anthology plays in access and preservation: without reprints, these stories, published in sources that are very easily lost to us, may never have been accessible at all after a few decades of their original publication. The paper of these types of publications is rarely very durable and as time goes on, the surviving owners of the publications tend to throw them out, or the executors of their estates do it for them.

In fact, one magazine in particular is an extreme example of ephemerality. It was a handwritten magazine – really, a zine from the 1880s – that was passed around between members of a literary club, who annotated it as they went along, writing in the margins and then passing it on to the next member, sometimes making their own handwritten copies as well. In this way, the publication and distribution was profoundly decentralized and depended entirely on the efforts of the members of that club. Yet, they were all quite committed to literature and to each other, and so it was relatively successful – if you can call a magazine with only a few hand-written, hand-circulated copies successful.

The problem with the issues of this magazine (before it later was printed and sold commercially) is that they are literally no longer available. Garakuta bunko from the late 1880s is simply inaccessible to us as literary scholars and historians. There are no accessible copies, and possibly no surviving copies at all. This was the case even in the early 20th century, when the extant copies dwindled to a single set held in a private collection; only the tables of contents were published, reprinted in a book on the literary club. Now, that private collection is even inaccessible, and all we have left are those reprinted tables of contents.

Why is this important? It is now impossible for me to investigate, for example, early uses of pseudonyms by some of the authors that I study, and impossible to read their earliest works to evaluate their first efforts in literature. As this group became extremely influential from the late 1880s through the early 1900s, this is a big problem for studying its development over time, its roots, its connections with the literature of the late Edo period (1600-1867), and its early influence on others. In short, this work has been rendered impossible and these questions unanswerable.

Even as early as the 1920s, there were reprints of the publicly distributed, later issues of this magazine. It was a set of only 500 copies and its preface is extremely telling. Edited by former members of the club, the reason for the reprint is stated unequivocally: the number of surviving copies is very few, they are limited to the collections of private individuals, and the early works of club members are nearly impossible to get ahold of. It has been reprinted for posterity and for access at the time of the reprints. There are those who would like to read the works, and the reprints are made and distributed so it becomes possible again to do this.

This is a noble undertaking, and one that is extremely important to our access now. It is reasonable to wonder whether, if not for this early reprint set, even more of Garakuta bunko would be lost to the ether over time. We have more reprints now, in book form, and they are likely to persist through time thanks to this. But what if those reprints had nothing to reprint?

Finally, I come to the sticking point of all of this. It’s prompted by a question from a month or so ago: if ephemeral materials are preserved in such a way, through a digital archive, through photographs, through reprints, does that fundamentally change their nature as ephemera? I don’t have a concrete, definitive answer to this, but I do think there are two issues at the heart of this. One is a practical issue – the major difference between ephemera and other sources when attempting to create a digital archive is that there is even more impetus for careful preservation, because the danger of loss is so high. If a magazine could almost entirely disappear less than 50 years after its initial publication, what does that say about even more volatile materials? We lose a major part of the historical record and in most cases we will be unable to ever retrieve it. This means that there are historical, cultural, and literary questions that we simply cannot ask – or rather, can never answer. It reduces our understanding of the past and even of the present, given that ephemera can disappear in the blink of an eye, historically speaking.

The other issue is thornier. My answer on reprints or digital reproductions is this: it does not change the status of the source as ephemeral. Rather, I think that in some way it both attempts to obscure its ephemeral nature, and yet also makes it even more evident. What is the need for a reprint, after all, if there is no danger of disappearance? If a work is already persisting through redundancy, is there a need for preservation? And there is the issue of the reprint fundamentally altering the context, and thus the meaning, of that ephemeral source. That highlights even more its ephemeral nature, because by recuperating its pre-reprint context, its pre-preservation context, we cannot help but focus on its ephemeral nature, because we are reprinting ephemera, preserving ephemera.

In other words, we can perhaps think of reprints or digitally archived versions as separate objects entirely from the ephemera that they preserve, and this stresses even more the ephemeral nature of what has been preserved. Of course, a work reprinted in book form is less likely to be ephemeral. But what has been reprinted, a serial in a newspaper or in a magazine, is tremendously so, and this very gap in the nature of the medium is emphasized in the process. These are ephemera, preserved. Preservation does not change the fact that these sources are always, will always be, in imminent danger of permanent loss.***


* In fact, I have lost some of these things that I had never considered ephemeral until they were gone. How fragile is an older hard drive full of personal data and artwork? Very. How about things you burn to a CD-ROM for safekeeping? Even worse. A personal web site that you had a few years ago? If the Internet Archive didn’t grab it, it might as well never existed. We talk quite a bit these days about the danger of things never being erased if you put them out in public, on the Internet, but they’re more endangered than we give them credit for.

** Take that with a grain of salt; “complete” is more aspirational than literal, and it has quite a lot to do with “completely” being able to know or possess the author as an author, rather than a complete set of works in themselves. I digress.

*** The fact that Garakuta bunko was reprinted in the 1920s, after all, does not change the fact that the original copies of the magazine are in grave danger of being completely lost to us. A reprint is not the same as the source that it reprints. The reprint, if not an ephemeral source in itself (this short print run of the Garakuta bunko reprint suggests that it can qualify as such), is not ephemera. But what it reprints will never stop being ephemeral.

literature in fashion

Now here’s something you don’t see every day – a photoshoot dedicated to literary-inspired fashion in Vogue.

‘Summer Reading Inspired by the Fall Collections’

Oh wait, of course, I have that the other way around. Some creative Vogue employee(s) actually dreamed up novels FOR these outfits.

Now that is impressive. Seriously!

Being that I’m in Nebraska for the summer, I did a cheer for Willa Cather and My Antonia. The first one at that!

quick note: digital reading coverage in Eureka 8/10

Eureka, a monthly poetry and criticism publication in Japanese, has a theme of “reading digital materials” for the August 2010 issue. If you’re in a position to do so, I recommend picking it up. There are a lot of interesting perspectives in here. Not least is the fact that it specifies “reading materials,” not “books,” and that kind of take on digital reading vs. print reading isn’t something I see enough of in English-language coverage.

Not to mention that Japan is living proof that the magazine industry is not only not dead, but will never die – at least not here. I had to wade through literally hundreds of different magazines in a corner bookstore in Ueno station to find my copy of this one.

The info in Japanese is ユリイカ2010年8月号・特集「電子書籍を読む!」 (“let’s read digital stuff!”) If anyone has a more eloquent translation for 書籍 please leave it in the comments. I am coming up empty at the moment.