Tag Archives: word

Convert docs with OS X terminal

I’m teaching a workshop on Japanese text mining this week and am getting all kinds of interesting practical questions that I don’t know the answer to. Today, I was asked if it’s possible to batch convert .docx files to .txt in Windows.

I don’t know Windows, but I do know Mac OS, so I discovered that one can use textutil in the terminal to do this. Just run this line to convert .docx -> .txt:

textutil -convert txt /path/to/DOCX/files/*.docx

You can convert to a bunch of different formats, including txt, html, rtf, rtfd, doc, docx, wordml, odt, or webarchive. It puts the files in the same directory as the source files. That’s it: enjoy!

* Note: This worked fine with UTF-8 files using Japanese, so I assume it just works with UTF-8 in general. YMMV.