Category Archives: systems

thinking about ‘sentiment analysis’

I just got off the phone with a researcher this morning who is interested in looking at sentiment analysis on a corpus of fiction, specifically by having some native speakers of Japanese (I think) tag adjectives as positive or negative, then look at the overall shape of the corpus with those tags in mind.

A while back, I wrote a paper about geoparsing and sentiment analysis for a class, describing a project I worked on. Talking to this researcher made me think back to this project – which I’m actually currently trying to rewrite in Python and then make work on some Japanese, rather than Victorian English, texts – and my own definition of sentiment analysis for humanistic inquiry.*

How is my definition of sentiment analysis different? How about I start with the methodology? What I did was look for salient adjectives, which I searched for by looking at most “salient” nouns (not necessarily the most frequent, but I need to refine my heuristics) and then the adjectives that appeared next to them. I also used Wordnet to look for words related to these adjectives and nouns to expand my search beyond just those specific words to ones with similar meaning that I might have missed (in particular, I looked at hypernyms (broader terms) and synonyms of nouns, and synonyms of adjectives).

My method of sentiment analysis ends up looking more like automatic summarization than a positive-negative sentiment analysis we more frequently encounter, even in humanistic work such as Matt Jockers’s recent research. I argue, of course, that my method is somewhat more meaningful. I consider all adjectives to be sentiment words, because they carry subjective judgment (even something that’s kind of green might be described by someone else as also kind of blue). And I’m more interested in the character of subjective judgment than whether it should be able to be considered ‘objectively’ as positive or negative (something I don’t think is really possible in humanistic inquiry, and even in business applications). In other words, if we have to pick out the most representative feelings of people about what they’re experiencing, what are they feeling about that experience?

After all, can you really say that weather is good or bad, that there being a lot of farm fields is good or bad? I looked at 19th-century British women’s travel narratives of “exotic” places, and I found that their sentiment was often just observations about trains and the landscape and the people. They didn’t talk about whether they were feeling positively or negatively about those things; rather, they gave us their subjective judgment of what those things were like.

My take on sentiment analysis, then, is clearly that we need to introduce human judgment to the end of the process, perhaps gathering these representative phrases and adjectives (I lean toward phrases or even whole sentences) and then deciding what we can about them. I don’t even think a human interlocutor could put down a verdict of positive or negative on these observations and judgments – sentiments – that the women had about their experiences and environments. If not even a human could do it, and humans write and train the algorithms, how can the computer do it?

Is there even a point? Does it matter if it’s possible or not? We should be looking for something else entirely.

(I really need to get cracking on this project. Stay tuned for the revised methodology and heuristics, because I hope to write more and share code here as I go along.)

* I’m also trying to write a more extensive and revised paper on this, meant for the new incarnation of LLC.

Showa 40s vs 1970

I was listening to a podcast interview with a favorite author just now (Kakuta Mitsuyo if you’re wondering) and I came to a realization about Japanese and Western calendars. From Meiji onward, I have come to crave dates in Japanese reign years when using Japanese, crazy as it sounds – I want to ditch Western years altogether!

Why is this? Frankly, Western years take a mouthful to say and are much harder to pick up when you’re listening, especially if the speaker is talking quickly. 2009 becomes “the year two thousand nine.” Try 1999: “the year one thousand nine hundred ninety-nine!” Do you see the problem?

Well, many learners of Japanese hate the confusion of having a separate, less often used Japanese reckoning of years according to emperors’ reigns. The infamous Hirohito is known as the Showa emperor in Japan and the Showa period starts with his coronation in 1926. I was born in Showa 56 – 1981. Incidentally I know this because the high school where I taught my first year in Japan functioned on Japanese years and one needs to know one’s birthday! For most official forms, you are still expected to write your birthday in Japanese years. If you want to impress a functionary, learn this and write it proudly. They will be unnecessarily astounded.

In any case, the author was talking about her childhood and said “In the Showa 40s…” I actually sighed with relief! When the host breezed over that day’s date in Western years at the beginning of the podcast I had simply stopped listening, but Showa 40s – it just clicked. 1965-1975. It just makes sense to me somehow.

As you may know, I study the Meiji period (1868-1912). Specifically, it’s the Meiji 20s, or 1887-1897. I find myself writing dates as Meiji 20-something all the time. Why? I can’t explain it. It’s certainly in part because publication dates in the books I read are all in Meiji years – it was still the norm then to use Japanese years. But I can convert easily now from studying it for so long. So, why?

Really, it’s not me becoming accustomed to Japan. No normal Japanese person born after the Taisho period (1912-1926) would do such a thing. I think it’s much simpler: I am a nerd who studies literary history. It’s the sad truth. Well, a happy Heisei 23 to you all then!

(By the way, why no pre-Meiji reign years for me? They’re too short, numerous, and confusing. They sound the same. I just can’t take it. But if I studied early modern? I bet I’d be using Bunka-Bunsei like it’s 1821!)

post office as information central?

The future of the post office – and of snail mail generally – is a frequent topic these days. (Well, it has been for a while.) I listened to an excellent show from On Point the other week that had on several people, including someone with the post office. It was excellent in that the guests made several really strange points that were extremely thought-provoking, and I’d never heard them before. I think they deserve to be discussed widely: they broaden the conversation from just “post office or not?” and think about the actual role of this institution in serving its consituents. What is the point of the post office, anyway?

The post office delivers information, reliably (mostly) and often securely. It provides a way to get delivery confirmation and insurance on your stuff, rents out mailboxes (especially important for people in neighborhoods where mail delivery is unreliable, often due to the lack of safety and lack of access to mailboxes – and lack of maintenance by landlords). It lets you get stuff where it’s going, fast. I know that UPS and FedEx and DHL do these kinds of things too, but for general purpose information delivery, the post office is here to serve all of us, no matter where we are, no matter what. This is its mission.

As time goes on, demand and form of information changes, obviously. We’ve already had new technologies and new regimes of categorization that bave been developed to accommodate changing needs. I have only to look at pre-ZIP code letters to be reminded of this. Honestly, for someone who has grown up with ZIP codes, it’s shocking. Within my lifetime, moreover, the place of ZIP codes on the envelope has changed (no longer a need for a new line; in fact writing it along with the city and state is encouraged). We have even more, better technology for reading the messiest handwriting, for distinguishing that ZIP code (and now, a 4-digit code afterwards that means it’s your house) from the text written next to it. We’re getting pretty advanced, here, if you think about it.

So now there is the fairly dramatic change of declining mail volume, which has not been accompanied by a high enough increase in stamp prices to keep up with the times (really, every other country in which I’ve mailed a letter has been close to $1 for even domestic mail). We have a lot of people conducting their information needs online, even those bits of information that must be kept secure: banking, shopping, student financial aid and loan applications and processing, university business (I’m thinking of my own stuff here). We need secure document delivery, and we need it to be a lot better than it is now. Recent break-ins to companies that are holding customer and credit card information (ahem, SONY) are making this abundantly clear.

In light of this, do we need some kind of central, trusted authority that we can go to for secure document delivery?

I argue yes, and I argue that this is exactly a natural place for the post office to step in. I’m not talking about printing out PDFs and making sure they get securely to their destinations. I’m talking about a secure information infrastructure provided as a public service for all of us. No, it will not replace our banking or our insecure game network accounts. But don’t you think that this would be a great service, one that we can’t quite imagine now what it would look like… and one that exactly fits the mission and history of the post office?

Through any kind of calamity, no matter what, we will get your stuff securely and reliably to where it needs to be. We will make it available to you, no matter what.

This sounds a lot like the current mission that surrounds the delivery of paper mail and packages. I am not arguing that this should replace what they’re doing. Don’t close all the post offices and argue Internet for everything. There are still a lot of things that need to be delivered securely by post: you wouldn’t believe how few forms will take my secure Adobe digital signature on the PDF as the equivalent of a pen signature. Imagine being able to develop that pen signature (so easy to forge) into something more secure, in digital form. Would that not be awesome?

With the way things are going, I hardly think that anyone in government would consider this kind of natural evolution as worthy of supporting, as worthy of seed money for infrastructure. We are not so good at thinking outside the current narrow box of the status quo; we have blinders on that we can’t seem to remove. But the post office itself sees itself as needing a transformation for proceeding from here on out. If only innovation and creativity could win out, but I’m not holding my breath.

Incidentally, this whole post office closure thing? Most articles I read are about people complaining they would have to drive 6 miles to the nearest post office. Guess what. I have had to drive miles to the nearest post office my whole life, because I have been unlucky enough to grow up in the suburbs, then live in a city that thinks it’s a great idea to build their fancy new post office (and library!) miles away from our small but active downtown, and make it miles away from any public transit: you can’t walk either, because you’d need to get across several very dangerous freeway on and off ramps. Seriously.

where are the japanese exchange students?

I was recently reading Jake Adelstein’s review of Reimagining Japan and he noted the need for openness as a topic explored in the book – and defined that as a reluctance for both young Japanese to go abroad, and for companies to reach outside of their own borders. I don’t have any profound insights into this issue (or even on whether it is the issue that people make it out to be), but it reminded me of a touching conversation I had last year.

I organized a flower-viewing party (for cherry blossom season) in the last April that I lived in Tokyo. For those of you who haven’t had the experience of cherry blossom season in Japan, I’ll give you a representative image: cold, cloudy, miserable day, often with no cherry blossoms in sight, a park completely covered in blue tarps (“leisure sheets!”), populated by shivering drunk people trying their damndest to get even more drunk while snacking on party foods like octopus dumplings. Doesn’t it sound like that romantic image of an elegant branch of cherry blossoms against the clear blue sky, perhaps with Fuji-san lurking nearby, that has become the representative of Japan? Well, don’t believe it. The version I’ve given you is hard cold reality.

Still, you do hanami (“flower viewing”) in late March and early April, flowers or not, and regardless of whether you need a winter coat. Once you’re plastered, you won’t notice anyway! Well, let’s move on now that I’ve set the scene.

I was chatting with my friend Naoko who had brought her shy but nice cousin along with her, who was interested in learning and practicing English. We got to talking about how I’d come to Japan in the first place and how my experiences were.

As we talked, she surprised me with her reaction: She revealed that she’d love to live abroad for a year or two after college, and she was so jealous of people like me and my friends who had been able to do that in Japan. It really threw me. After all, she was shy and hesitant about speaking, but her English was passable enough that after a few weeks in a place like the US or UK, she’d be doing fine. Her interest in foreign countries and languages was obvious. So why the resignation to not having a chance? What was stopping her?

Her answer to this really blew me away. “Well,” she said, “companies want to hire their new employees when they graduate from college, on time.” (By this she means that graduation is in late February, and the start of the new school/employment year is in April. Because of this, if you happen to not pass your entrance exams or not get a job offer, you have to wait until that time in the next year to try again.) “So if I were to go abroad, I’d come back and I wouldn’t be a new graduate, and I wouldn’t be with the class I graduated with. So it would be really difficult to get a job because I wouldn’t be in the category of people that companies want to hire.”

It really shouldn’t have surprised me so much. After all, it’s true. There is a deeply ingrained system of when and how, from college exams to job interviews. Of course, part-time jobs, and going into business for yourself, are different. But overall, despite a lot of shifting preferences and more varied ways of living compared to a decade or two ago, that system is still there. And if you don’t fit into the path that leads you to a position as a regular company employee (as opposed to contract or part-time), you are going to be stuck in what’s still considered by many to be an underclass.

The irony here is that Japanese firms would benefit immensely from young employees who have at least traveled, if not lived, abroad – anywhere. In fact, I worked very informally at a large company in Tokyo and my contact there confided in me more than once that he’d like the old guard to open their minds to hiring foreigners as in-house workers, instead of using contractors to do tasks like translation. His argument was that it would be more cost-effective and flexible (at one point they needed a rush translation and had to pay through the nose to get someone else to do it), and even more than that, that it would change the culture of the workplace in a positive way. But at the same time, he sighed when he said this and said, “There’s no way that could happen now. I’m hoping in two or three years, it might be possible, if I keep working on them.”

I see how this system is set up and I understand the logic of it, because everyone knows how it works and it’s self-perpetuating because of it. But looking at it from the outside, it doesn’t make a whole lot of sense. On the one hand I’m reassured and even inspired by the people I’ve talked to in Japan who have either made a path that weaves in and out of the system as they like, or who have become successful within powerful companies and used that position to shape their work into something international. There are plenty of people I know who lived abroad for a while (from three months to like 20 years) who were then perfectly successful when they returned. But by and large, if this attitude is there – well, the fear that success can only be had along one specific, limited path is self-fulfilling.

I can only hope that more people like my friend’s cousin who have a desire to go abroad and experience life in other countries do get the courage to do it regardless of the rigid system that they live in, and make something successful out of it. But in the meantime, I am a lot more understanding of why the Asian exchange students at my university come to us from every nation but Japan.

my poor laptop, cont’d.

I’m being dragged kicking and screaming into obsolescence, despite having perfectly good hardware and a brand new battery.

This time, it’s not being able to upgrade to Java 1.6 without installing Yellow Dog Linux, following instructions for putting IBM’s PowerPC release of 1.6 on it, and hoping for the best. Ordinarily, I would do just that, but I didn’t know I needed Java 6 for anything until, well, yesterday.

It’s downright embarrassing. I have to borrow a laptop from a kind workshop organizer on Saturday at DH2011 because one of the visualization tools we’re running is a Java app that needs, yes, 1.6.

I’m being pressured toward a newer laptop more and more, apropos of my recent two posts which were more my complaining about something that wouldn’t necessarily force me to upgrade to something less than 5 years old. How frustrating!

(And I never thought I’d regret not having brought my Linux netbook along with me this summer, thinking there’s no way I could need a desktop and two laptops, which is ridiculous – but there is probably a JDK 1.6 sitting on that Ubuntu install. But there are 12 hours between me and the netbook until August. Too bad!)

A random positive note to end this series of posts about my ridiculous computing situation. When I was doing research to find Java 6 for PowerPC, I came across a cottage industry of people helping others install it (and Linux) on their – get this – 64-bit PPC Playstation3! It warms the heart to know that there’s still a phenomenal console out there (and really, it is the best of the three) that uses PPC architecture. Hooray for Sony (and for IBM, which is using 64-bit PPC architecture in their workstations and releasing the JDK for the rest of us).

mac woe update: adobe drops flash for PPC

Sigh.

This article talks about much of my last post, with the focus not on Google Apps but on Adobe Flash: “Adobe Flash Has Left PowerPC Macs Behind

The reason I’m linking to this piece is that it makes an excellent point about “obsolete” PowerPC Macs (and even Intel Macs) not being so obsolete relative to their PC counterparts, but made so by Apple’s hardware decisions. Given that I haven’t owned a PC for at least 9 years, I had nothing to compare to, but this author points out that Apple dropping support for its older hardware sends perfectly good Macs to an early grave despite having the same or even better performance for still-supported older PCs.

On the Power Mac G5 and PowerBook G4:

While these highly capable PowerPC machines meet or exceed the Windows-based minimum hardware specifications required for the latest release of Flash Player, it matters not. Progress in the world of Mac OS X tends to make Apple hardware obsolete much faster than comparable Windows computers released in the same time frame.”

“I’m simply dumbfounded that fully capable PowerPC Macs continue to lose support and functionality with so many things that similarly aged (and often far older) Intel machines still receive” – as am I! Because I did not know that older Intel machines were supported for so long. Then again, this author makes the excellent point that support is being dropped for OS 10.4 but still retained for Windows XP.

It also makes me remember my general policy of “if it’s old and getting too slow, put Linux on it” because a Linux install will usually make most of the problems of an older Windows box magically go away. Indeed, Linux on older hardware is a good thing: but where is the support for older PowerPC platforms? In comparison, it isn’t really there.

It’s really too bad to see the end of this era. First the Dreamcast (an excellent RISC console that also runs Linux), now the Mac PPC line. It’s not that Intel/AMD architecture is superior: it’s just so common that it’s simpler to drop support for anything else. Unless there is another explanation?

google dropping app support; molly has PPC angst

A decision I made over five years ago has ended up making me quite unlucky these days.

iBook G4 photo

I intentionally bought a PowerPC Mac, the iBook G4, when my iBook G3 succumbed to the infamous logic board defect a year or so after Apple stopped fixing it for free. My first winter semester at Michigan had just started, so I was stuck: I needed the data from my G3’s hard drive even more than I needed a computer, and I knew that Apple would soon drop PowerPC in favor of Intel. Like the idealist I can be, I went for the PowerPC instead of waiting a while for the new hardware, because after taking some computer architecture courses and having done a little assembly programming, I had come to the conclusion that RISC architecture is superior to CISC – meaning that I favored PowerPC over Intel.

Little did I know how ghettoized the PowerPC is out there in the real world. Naive, I had no idea that most operating systems and software are not ported to PowerPC – not even Linux.** In the first few years this wasn’t a problem and wasn’t anything I noticed beyond having a matte screen instead of a shiny one. I still love my G4, with its plucky reliability and long battery life.

Starting about last year, however, more and more software makers dropped PowerPC completely, as OS X only went up to version 10.4.x on PowerPC and many required 10.5, which is Intel-only. Even the software that is still released for 10.4 stopped supporting my laptop, including OpenOffice.***

I resigned myself to having a laptop that is circa 2009 in terms of what it runs. I am okay with running a Japanese version of OpenOffice 3 that will open .docx files for me, and running Adobe CS3 and Word 2004. Honestly, I don’t need the newer versions of these programs for a base model iBook that only has 40GB of hard drive space. What I need is the reliability, toughness, and 5 hour battery life (with the ability to buy new batteries) that my 5 year old friend provides. I have a desktop for everything else!

I have a sinking feeling about it now, though. We have a problem. Google is going to gradually drop support for older browsers, which includes pretty much every browser that I can download for my PPC Mac. While I applaud their strict use of HTML5 (I use it too!) and refusal to cater to legacy browsers that don’t understand it, I realize that I am basically screwed. And how much I rely on Google, frankly.

Here are things I would like to use a laptop for: Web browsing, Gmail, Google Docs, a little word processing, PDF reading and editing, writing, and possibly a little Photoshop. And some Twitter. If I suddenly can’t access or use Gmail or Google Docs, that is a huge blow to using my laptop to be productive – it’s the point of carrying something around that will let me access my files remotely to begin with!

“Get a MacBook,” a voice pleads in my head. They are so shiny, fast, small, and nice. They’re still only 13″ but have a wide screen that makes it seem so much bigger than the 12″ iBook. They have long battery life. I’m kind of in love with them despite myself. Admittedly, I resent the non-removable battery that will allegedly last for the average life of a laptop. But if I wasn’t suddenly losing all software support for my peculiar architecture, I wouldn’t even consider a new laptop.

I just bought the laptop a new battery. It has 5 hours of battery life, does everything I need it to, and is very hardy. It’s relatively small, light, and convenient. It has some very expensive software on it. Most importantly, it simply still works fine and has nothing wrong with it. I abhor wasting things. I am fond of this laptop. If it weren’t for the uncertain nature of old hard drives and impossibility of replacing that without breaking the case, I’d argue that it probably has many years of good life left in it. It’s the Volvo of laptops.

So even if I bought a new laptop (which I can’t exactly afford now), I’d want to keep using the iBook for as long as I can. Why waste it? But why have two laptops, one running Linux?^ (Seriously, I already have a netbook running Linux.) They’re the same size. It makes no sense to keep the iBook around for anything other than preserving my installation of many pieces of CS3. And because I heart the damn thing.

I’m at a crossroads: my PPC laptop is soon not going to just be dated, but unsupported. I don’t want to waste a perfectly wonderful laptop that has seen me through an entire PhD program. I have good software on it. Why buy a laptop the exact same size and type? Because it will save me from Google no longer supporting my laptop, and Web browsers that are actually implementing new W3C standards from not running on it.

Lesson learned: Even though I want superior architecture and don’t jump at trends (like oh, x86?) that I think are not worth it, I have to just go with the crowd, because sooner or later it will leave me behind. I am still not getting an iPad though. How long do you think I can scorn touch screens before I become officially old?

* (Yes, that is how old the G3 was. About three and a half years. Not bad for a laptop with a manufacturing defect that I was very hard on.)
** There are a number of PPC Linux distributions, but specific software may or may not be ported. Usually not.
*** Weirdly, there are a few local language versions of OpenOffice that do still support PowerPC architecture. Since one of those is the Japanese-language version, I now happily use a Japanese word processor and try to keep my language skills current, at least in terms of menu choices.
^ If I could get it to run for the newest AmigaOS I would run to it without hesitation, but I have only gotten reports of it running on a Mac Mini. Don’t think I haven’t considered getting a Mac Mini solely for this purpose. The lack of a monitor is mostly what’s stopping me.

new manga: a bride’s story

I was clued in to a newly translated manga by Mori Kaoru via Feministe: A Bride’s Story, the tale of an arranged marriage set in 19th-century central Asia.

'A Bride's Story' cover, vol. 1
A Bride's Story, vol. 1

 

To summarize briefly, it is the story of a woman sent to a neighboring village in an arranged marriage – naturally, without meeting her new husband first. It turns out that she is 20 and he is 12, making the situation even more awkward than usual. I haven’t ordered a copy yet (the first volume came out May 31, 2011), but between the detailed, grand artwork and the fascinating premise, I’m looking forward to reading it myself.

Beyond having a relatively unique setting and focus (I hear that much time is spent on women’s lives and communities within the villages), I have to say I’m in love with the role reversal. An arranged marriage of a young woman and older man is too familiar, and the surprise of the same age would be too boring, too ideal. To reverse the aspect of arranged marriage that can be most scandalous to Western (European and North American) sensibilities – the age difference – is the most intriguing part of the story.

What typically happens in a tale of older man, younger woman (even girl)? Not all situations are painted in a positive light, but I can think of few cases in which the younger party is not tremendously sexualized, far beyond what is often considered appropriate. (Then again, given the sexualization of even young teenagers in contemporary America – let alone historically – maybe this is not so surprising.) Sex is assumed no matter how young the bride. And rare are the stories – fiction, I’m talking about here – where the relationship doesn’t take on a weirdly romantic cast, or even an explicit gradual romance.

I’m looking at you here, Shining Genji. I was once in a graduate-level course and the professor threw out the question: what was it that happened to Murasaki? Unbelievably a woman in the room threw out “grooming” as the answer. Grooming for ideal sexuality. The professor cut her right off with “statutory rape at best.” Thank you. But if this could be the automatic answer for such a sick situation, one that is portrayed romantically even by a woman writer in 1000 AD – well, doesn’t that say something about conditioning?

In any case, I’m giving this background to highlight the unique situation of a much older bride and a groom that is still a child. I would argue that although this happens, we don’t have such an automatic social narrative for their relationship. If someone talked about “grooming” with regard to the boy, we would cut them off with “no, it’s sick. THAT is sick to even imagine.” Right? It’s creepy. I think that we’re more ready to imagine a developing romantic relationship between a much younger girl and an older man – I’ll dig up our favorite Shining Genji raising Murasaki to be his future wife as my example again. Can we imagine this bride in A Bride’s Story raising her child husband to be the perfect sexual object in the same way? I would say no. No way.

So I’m very interested to get my hands on this manga to see how this is treated. From the Feministe post, I gather that there is a warm relationship with a fondness developing on the part of the bride. But I would like to see for myself: there is no way she can’t participate in raising the boy, in some way, as an older woman who has come into his family’s home. But what is her role, and what is the intention? Does she raise him as a loving family member would raise any boy to be a proper man, or does she have something else invested in it, a la Genji and his child bride?

We’ll see. If any of you have read this in Japanese, let me know your thoughts. (Incidentally, I would kill to be able to go to Book Off right now and just buy this series used!)