Convert docs with OS X terminal

I’m teaching a workshop on Japanese text mining this week and am getting all kinds of interesting practical questions that I don’t know the answer to. Today, I was asked if it’s possible to batch convert .docx files to .txt in Windows.

I don’t know Windows, but I do know Mac OS, so I discovered that one can use textutil in the terminal to do this. Just run this line to convert .docx -> .txt:

textutil -convert txt /path/to/DOCX/files/*.docx

You can convert to a bunch of different formats, including txt, html, rtf, rtfd, doc, docx, wordml, odt, or webarchive. It puts the files in the same directory as the source files. That’s it: enjoy!

* Note: This worked fine with UTF-8 files using Japanese, so I assume it just works with UTF-8 in general. YMMV.

Writing Process: NaNoWriMo and Me

I’ve been meaning to write about my writing process for quite a while now and am surprised, looking back through my blog archives, that I have not yet addressed it.

This post could alternately be titled “How NaNoWriMo Enabled Me to Write My Dissertation in Three and a Half Months” or “The Importance of NaNoWriMo for Academic Writing.” Or just “Do NaNoWriMo at Least Once, People.”

NaNoWriMo stands for “National Novel Writing Month” and has been going since the turn of the twenty-first century. I’ve done it myself since 2002, most years. No, I don’t have a published novel, and in fact I only finished two of them in that time. (And the first one didn’t even “win” — the only criterion for winning is having a file containing 50,000 words — because it came in about 40,000 words when it was done. Oh well. My best and first finished work, so I’m cool with it. In fact, I’m still working on revising that work and trying to cut a version of it into a 10,000-word short story.) But man, what I got out of it.

NaNoWriMo taught me how to write. I don’t mean how to write well, or grammar or mechanics or plot or anything like that. It taught me how to put words on the page. And, after all, that is the first step to writing something. You have to just start making words. Continue reading Writing Process: NaNoWriMo and Me

Taiyō project: first steps with data

As I begin working on my project involving Taiyō magazine, I thought I’d document what I’m doing so others can see the process of cleaning the data I’ve gotten, and then experimenting with it. This is the first part in that series: first steps with data, cleaning it, and getting it ready for analysis. If I have the Taiyō data in “plain text,” what’s there to clean? Oh, you have no idea.

taiyo_data Continue reading Taiyō project: first steps with data

WORD LAB: a room with a whiteboard

Several years ago, I attended Digital Humanities 2011 at Stanford and had the opportunity to meet with Franco Moretti. When Franco asked what I was interested in, I admitted that I badly wanted to see the Literary Lab I’d heard so much about, and seen so much interesting research come out of. He laughed and said he’d show it to me, but that I shouldn’t get too excited.

Why? Because Literary Lab is a windowless conference room in the middle of the English department at Stanford. Literary Lab is a room with a whiteboard.

I couldn’t have been more excited, to Franco’s amusement.

A room with a whiteboard. A room dedicated to talking about projects, to collaborating, to bringing a laptop and getting research done, and to sharing and brainstorming via drawing and notes up on a wall, not on a piece of paper or a shared document. It was an important moment for me.

When I was in graduate school, I’d tossed around a number of projects with colleagues, and gotten excited about a lot of them. But they always petered out, lost momentum, and disappeared. This is surely due to busy schedules and competing projects – not least the dissertation – but I think it’s also partly due to logistics.

Much as our work has gone online, and despite these being digital projects – just like Literary Lab’s research – a physical space is still hugely important. A space to talk, a space to brainstorm and draw and write, a space to work together: a space to keep things going.

I had been turning this over in my head ever since I met with Franco, but never had the opportunity to put my idea into action. Then I came to Penn, and met a like-minded colleague who got just as excited about the idea of dedicated space and collective work on projects as I was.

Our boss thought the idea of a room with a whiteboard was funny, just as Franco had thought my low standards were kind of silly. But you know what? You don’t need a budget to create ideas and momentum. You don’t need a budget to stimulate discussion and cross-disciplinary cooperation. You just need space and time, and willing participants who can make use of it. We made a proposal, got the go-ahead, and took advantage of a new room in our Kislak Center at Penn that was free for an hour and a half a week. It was enough: the Vitale II lab is a room with a whiteboard. It even has giant TVs to hook up a laptop.

Thus, WORD LAB was born: a text-analysis interest group that just needed space to meet, and people to populate it. We recruited hard, mailing every department and discipline list we could think of, and got a mind-boggling 15+ people at the first meeting, plus the organizers and some interested library staff, from across the university. The room was full.

That was the beginning of September 2014. WORD LAB is still going strong, with more formal presentations every other week, interspersed with journal club/coding tutorials/etc. in OPEN LAB on the other weeks. We get a regular attendance of at least 7-10 people a week, and the faces keep changing. It’s a group of Asianists, an Islamic law scholar, Annenberg School of Communication researchers, political scientists, psychologists, and librarians, some belonging to more than one group. We’ve had presentations from Penn staff, other regional university researchers, and upcoming Skype presentations from Chicago and Northeastern.

A room with a whiteboard has turned into a budding cross-disciplinary, cross-professional text analysis interest community at Penn.

academic death squad

Are you interested in joining a supportive academic community online? A place to share ideas, brainstorming, motivation and inspiration, and if you’re comfortable, your drafts and freewriting and blogging for critique? If so, Academic Death Squad may be for you.

This is a Google group that I believe can be accessed publicly (although I’ve had some issues with signing up with non-Gmail addresses) although you appear to have to be logged in to Google to view the group’s page. Just put in a request to join and I’ll approve you. Or, if that doesn’t work, email me at mdesjardin (at)

Link: [Academic Death Squad]

I’m trying to get as many disciplines and geographic/chronological areas involved as possible, so all are welcome. And I especially would love to have diversity in careers, mixing in tenure-track faculty, adjuncts, grad students, staff broadly interpreted, librarians, museum curators, and independent scholars – and any other career path you can think of. Many of us not in grad student or faculty land have very little institutional support for academic research, so let’s support each other virtually.

In fact, one member has already posted a publication-ready article draft for last-minute comments, so we even have a little activity already!

Best regards and best wishes for this group. Please email me or comment on this post if you have questions, concerns, or suggestions.


*footnote: The name came originally based on a group I ran called “Creative Death Squad” but the real origin is an amazing t-shirt I used to own in Pittsburgh that read “412 Vegan Death Squad” and had a picture of a skull with a carrot driven through it. I hope the name connotates badass-ness, serious commitment to our research, and some casual levity. Take it as you will.

arsenal of research: organizing citations, PDFs, notes, brainstorming, and drafts

Post title courtesy of the tyrannical Brian Vivier.

Although I post about the content of my research quite a bit (when I do post), I thought I’d take a step back and talk about the research process today. I’m going to write about a very specific aspect: the ways in which the computer helps me organize and engage in my research.

Obviously, there are things like databases and library catalogs, which are a topic for another day. Many people I talk to don’t know the first thing about WorldCat, so it needs to be addressed! But let’s pretend I already have my sources. Now what do I do?

When I read, I’m very traditional. I take notes with pen and paper when I have a book or a photocopied source. In fact, I used to print out PDFs too, and highlight and write in the margins. Well, that turned out to be a terrible idea. Your highlights and margin notes are not very accessible when you’re coming back to the document later to brainstorm, outline, or write.

My lesson learned – learned after many difficult situations – was to take notes like I’m never going to see the source again. My advisor recommended I do this with primary sources, but if you take long notes that involve mostly direct quotes from the sources, there’s no need to buy the book or really even check it out again. There’s no need to keep binders and binders of printed-out PDFs. So that’s the kind of note-taking I do with pen and paper, first.

The next step is to get them into the computer, because I want them to be 1) stored somewhere safe (I do daily external HD backups, plus sync, more later on that), and 2) searchable, and also 3) copy and paste-able. But where to keep them? How to organize?

I have gone through several pieces of software trying to figure this out, and I’ve settled on Mendeley. I first used Scrivener even for note-taking, which is a great program, but bad for citation management. I then tried Zotero, but that turned out to be bad for PDF management. What I really wanted was a good database that would save my citations, any PDFs I happened to have (I’m currently digitizing all of my sources from my dissertation so they don’t get lost or damaged, and so I can free up my filing cabinet for other things), and ideally let me take notes and even annotate or highlight the PDFs.

Well, despite Mendeley being owned by the devil (Elsevier), it’s free and it actually does everything I need with only a few minor nitpicks, and does it in a way that makes me supremely happy. (My nitpicks are no nested bulleted lists in the notes, and no shortcut keys for bold/italics in the notes.) If you have a PDF attached to your citation and it has OCR, Mendeley’s search function will search not only your citations, notes, and annotations, but also inside the PDFs. It can be overkill at times, but it’s pretty amazing.

So step two of my research organization process is the painstaking, mindless, thankless task of typing my pen-and-paper notes into Mendeley under the appropriate citation. It’s boring but worth it. As I mentioned above, it searches all my notes, and I can copy and paste them into Scrivener, which I will address next. As I type my notes, at the very least I copy and paste them into brainstorming documents as appropriate (usually full quotes), and if I’m up to it, I do some free-writing to brainstorm how the source informs my topic and what I could write about related to it. This usually brings up new ideas I didn’t know I had.

What happens after I get all the notes typed in, PDFs organized and annotated if I have them? I next move over to Scrivener. I’ve been using it for over five years, for both research and creative writing, and can’t sing its praises enough. It’s a word processor that creates a database for your project, where you can store your reference materials, brainstorming ideas, notes, and draft. And more, if you can think of other areas you need to record notes in. Unlike old Scrivener (when I first started using it), you can now add footnotes and comments that port straight to MS Word when you compile your document for it, making the transition to final draft in Word very easy. (Sadly, publishers seem to prefer things that are not Scrivener databases when reviewing.) The typical things I store are the draft itself (of course), a research diary of brainstorming that I update periodically, brainstorming specifically about sources and particular concepts or points, and also under the “Notes” section the comments and suggestions and draft corrections I receive from others. So I keep my full writing process, except for mind mapping/concept mapping (another post), all in one place. It’s amazing.

I’m extremely happy with these two pieces of software; my only complaint is that neither of them does all of what I want, and I have to use two different things complementarily. Well, the situation is still significantly better than several years ago, when I used Mendeley Alpha and it deleted my entire library of citations multiple times. Yikes. Now its syncing works perfectly and I haven’t had a library failure yet. (Fingers crossed).

Next posts will include mind mapping software, how I take notes, how to effectively find and import source citations, and how I deal with multiple languages in my citations.

#dayofDH Japanese apps workshop for new Penn students

Today, we’re having a day in the library for prospective and new Penn students who will (hopefully) join our community in the fall. As part of the library presentations, I’ve been asked to talk about Japanese mobile apps, especially for language learning.

While I don’t consider this a necessarily DH thing, some people do, and it’s a way that I integrate technology into my job – through workshops and research guides on various digital resources. (More on that later.)

I did this workshop for librarians at the National Coordinating Council on Japanese Library Resources (NCC)’s workshop before the Council on East Asian Libraries conference a few weeks ago in March 2014. My focus was perhaps too basic for a savvy crowd that uses foreign languages frequently in their work: I covered the procedure for setting up international keyboards on Android and iOS devices, dictionaries, news apps, language learning assistance, and Aozora bunko readers. However, I did manage to impart some lesser known information: how to set up Japanese and other language dictionaries that are built into iOS devices for free. I got some thanks on that one. Also noted was the Aozora 2 Kindle PDF-maker.

Today, I’ll focus more on language learning and the basics of setting up international keyboards. I’ve been surprised at the number of people who don’t know how to do this, but not everyone uses foreign languages on their devices regularly, and on top of that, not everyone loves to poke around deep in the settings of their computer or device. And keyboard switching on Android can be especially tricky, with apps like Simeji. So perhaps covering the basics is a good idea after all.

I don’t have a huge amount of contact with undergrads compared to the reference librarians here, and my workshops tend to be focused on graduate students and faculty with Japanese language skills. So I look forward to working with a new community of pre-undergrads and seeing what their needs and desires are from the library.

#DayofDH Good morning and self introduction

Cross-posted from Day of DH Wasting Gold Paper

I’m up early on this Day of DH 2014. So much to do!

I thought I’d introduce myself to you all, so you have an idea of my background. I’m not your typical DH practitioner – I’m not in the academy (in a traditional way) and I’m also not working with Western-language materials. My concerns don’t always apply to English-language text or European medieval manuscripts. So, if you looked in Asia I’d be less remarkable, but here in the English-language DH world I don’t run across many people like myself.

Anyway, good morning; I’m Molly, the Japanese Studies Librarian at University of Pennsylvania, also managing Korean collection. That means that I take care of everything – from collection development to reference and instruction – that has to do with Japan/Korea, or is in Japanese/Korean at the library and beyond.


Let’s start off with my background. I went to college at University of Pittsburgh for Computer Science and History (Asian history of course) and studied Japanese there for 4 years. I fully intended at the outset to become a software developer, but somewhere along the line, I decided to apply my skills somewhere outside that traditional path: librarianship. And so off I went (with a two-year hiatus in between) to graduate school for a PhD in Asian studies (Japanese literature and book history) and an MSI in Library Science at University of Michigan. Along the way, I interned at the University of Nebraska-Lincoln’s Center for Digital Research in the Humanities (CDRH), redesigning the website for, and rewriting part of the code of, a text analysis app using XSLT for the Cather archive.

After Michigan, I spent a year as a postdoc at Harvard’s Resichauer Institute, working half-time on my humanities research and half-time on a digital archive (The Digital Archive of Japan’s 2011 Disasters, or JDArchive.) Then, in July 2013, I made my first big step into librarianship here at Penn, and have been happily practicing in my chosen profession since then. I’m still new, and there is a lot to learn, but I’m loving every minute.

I admit, finding ways to integrate my CS and humanities background has been a huge challenge. I was most of the way through graduate school when someone recommended going into DH (which didn’t exactly happen – there aren’t a lot of non-postdoc or non-teaching jobs out there now). My dissertation project, a very close-reading-based analysis of five case studies of single books as objects and in terms of their publishing and reception, did not lend itself at all to a digital methodology other than using digital archives to get ahold of their prefaces and keyworded newspaper databases to find their advertisements and reviews. I used a citation index that goes back to the Meiji (1868-1912) period to find sources. Well, most of my research in fact involved browsing physical issues of early 20th-century magazines in the basement of a library in Japan, and looking at the books themselves in addition to the discourse surrounding them. I simply couldn’t think of anything to do that would be “digital.”

So my research in that area – plus what I’m working on now – have continued to be non-DH, although if you’re the kind of person who involves anything “new media” in the DH definition, it may be a little. (I am not that person.) Why do I still call myself a DH practitioner, and why do I bother participating in the community even now?

Well, despite working full time, I’m still committed to figuring out how to apply my skills to new, more DH-style projects, even as I don’t want my other traditional humanities research to die out either. It’s a balancing act. How to find the time and energy to learn new skills and just plain old carve out space to practice ones I already have?

I have a couple of opportunities. One is my copious non-work free time. (Ha. Ha.) Second is my involvement in the open and focused lab sessions of Vitale II, the digital lab (okay, it’s a room with a whiteboard and a camera) at the Kislak Center for special collections in Van Pelt Library. I have a top-secret brainstorming session with a buddy today about how we can make even more social, mental, and temporal space for DH work in the library on a topically focused basis. I’m jealous of the Literary Lab; that should speak for itself. In any case, I also ran into a fellow Japanese studies DH aspirant at the Association for Asian Studies Conference a few weeks ago too, and he and I are plotting with each other as well.

So there are time and social connections to be made, and collaboration that can take place despite all odds. But it’s still a huge challenge. I can do my DH work at 5:30 am, in the evening (when I have no brainpower left), or early on the weekends. I have many other things competing for my time, not least two other research articles I’m working on. I could also be doing my real work at any of those times without the need to explain.

Yet I do it. It’s because I love making things, because I love bringing my interests together and working on something that involves a different part of my brain from reading and writing. I’m excited about the strange and wonderful things that can come from experimental analysis that, even if they aren’t usable, can make me think more broadly and weirdly.

More to follow. よろしくお願いします!

Free Information Literacy Book

This is belated news, but the School of Information class SI641 (University of Michigan) has published a book, Everything You Always Wanted to Know About Information Literacy But Were Afraid to Google, ed. Kristin Fontichiaro, online. The book can be found at Smashwords (

It ranges from K-12 to higher education and specialized settings (including archives and special academic libraries), and thoughts on creating content and methodologies.

Briefly, from the book regarding SI641’s content and objectives (this is a core course in the LIS curriculum, and I took it too!):

This course introduces theories and best practices for integrating library-user instruction with faculty partnerships. Instructional roles are presented within the wider context of meeting institutional learning goals. Students acquire explicit knowledge, skills, and competencies needed to design, develop, integrate, and assess curriculum and instruction in a variety of information settings, including educational and public organizations. The integral relationship between technology and information literacy is examined. Students are given opportunities to partner with professional mentors in schools, academic libraries, museums, and in other educational institutions.

Please check out the book!

why print?

I recently uploaded a new (and my first) resource to my site, a guide to print reference resources for Japanese humanities held by the University of Michigan. This guide was originally made for a reference class in 2008, so it’s about time that it saw the light of day. It certainly wasn’t doing much good sitting on my hard drive.

You might ask, though, on viewing this: Why would Molly make a resource guide for only print books? Aren’t they a little, well, archaic and outdated? Isn’t it more convenient to check out digital resources from the comfort of my own laptop, perhaps in bed? After all, there are fantastic reference resources – available through institutional subscription – such as the JapanKnowledge database that suit many needs, and bring together information from a wide variety of (originally) print sources and other databases. With something like JapanKnowledge, going to the Asia Library Reference Room and thumbing through dictionaries seems a little slow and pointless.

Let me tell you something. In the process of looking at the various humanities reference resources, for literature in particular, I found a large number of unique sources that aren’t available online. These range from the legendary Morohashi Dai kanwa jiten Chinese character dictionary to synopses and reception histories, guides to folk literature, a multi-lingual proverb dictionary (it has translations and annotations in Japanese, English, French, and German), and a guide to Buddhist terms found in Japanese literature that include the original Sanskrit and phrases from the classic literary works containing the terms.

Among the books that are entirely unique – an equivalent resource doesn’t exist in any other format (or, sometimes, language) – are a biographical dictionary of foreigners in Japan from the 1500s-1924, an annotated bibliography of translations into European languages dating from 1593-1912, an annotated bibliography of Japanese secondary sources on literary history published between 1955-1982, poetry indexes, and a dictionary of popular literature (taishū bungaku).

The process of making this bibliography was the pure joy of a scavenger hunt, and did I ever come up with a list of treasures. Leafing through a book of English-language synopses of untranslated Japanese work from the 19th-20th centuries may not sound exciting, but the fact that it exists as a quick reference resource for those looking to read some Meiji or Taishō literature is pretty amazing. I had a good time in the Reference Room finding these resources, and I’ve put some of them to very good use over the years.

Yes, I use digital resources; in fact, I couldn’t have come up with my dissertation topic without them. (As always, many thanks to the National Diet Library for the existence of the Kindai Digital Library.) But Japan is still a world of print – it’s nigh impossible to get a journal article in electronic form at this point – and, more importantly, print reference sources like these don’t go out of style. A guide to poetic allusions from the 1950s, or a popular literature dictionary from 1967, do not become outdated or irrelevant; we may wish for an update to the latter, but the information it provides is still valuable. Being able to use print reference works opens up a world of information to us by supplying that which has not been converted to database form.

Finally, why this guide? Is a guide coming for electronic resources? The short answer is, save for one-off blog posts, no. There are already so many excellent guides to electronic resources out there on the Web that my own meager contribution wouldn’t make much of a difference. The reason for this guide is that I haven’t found a good annotated bibliography of print reference books for Japanese literature specifically, and humanities more generally, that live at what used to be my own institution. I wanted to both know for myself, and share with others, what treasures were hiding on those rarely-used shelves (and, worse, in the off-site book storage) – what treasures were at my fingertips.

I hope you find it useful, and if you’re at the University of Michigan – or hey, anywhere else, for I can always check the catalog – and you have your own preferred humanities reference works, please send them along or leave the info in the comments. This is an evolving work and I’d like to include everything I possibly can!