Category Archives: computers

programming practice problems

One of the hardest things for me about learning a new programming language is not getting an understanding of the syntax or overarching concepts (like object-oriented programming or recursion), but rather a lack of opportunity for practice. It’s one thing to read a few books about Python, and quite another to look at others’ nontrivial code, or write nontrivial code yourself.

However, I’m often at a loss for ideas when I try to come up with programming projects for myself. Call me uninspired, but I just don’t have many needs for writing programs in my daily life, especially complex ones. And I don’t have any big creative ideas, either. I don’t even have uncreative ideas. So what to do?

It turns out there are a few good resources online for practice programming problems. They’re language-agnostic, presenting a problem and asking you for its solution. Unfortunately, there are only a few resources for this, but I thought I’d share the ones I found.

The first is the Association for Computing Machinery’s International Collegiate Programming Contest. This provides the contest problems from 1974 to the present! Talk about a treasure trove of programming challenges.

Second, UVa Online Judge. This site contains hundreds of programming problems, some simple and some complex. They have volume upon volume of problem sets. You could spend the rest of your life doing the problems on this site.

Does anyone have additional resources to add?

free software day in Cambridge 9/15

Hi everyone,

It’s Free Software Day in Cambridge, MA, this Saturday (9/15) and there is a day-long event happening in celebration, and to bring the community together. If you’re interested in attending, it’s located at Cambridge College (1000 Mass Ave) and starts at 10 AM.

http://www.fsf.org/blogs/community/celebrate-software-freedom-day-2012-in-cambridge-massachusetts

pseudonymity and the bibliography

My research is on authorship, and specifically on varied practices of writing and ways that authorship is performed.

For my study – that is, late 19th-century Japan – the practice of using pseudonyms, multiple and various, is extremely common. It’s an issue that I consider quite a bit, and a practice that I personally find simultaneously playful and liberating. It’s the ultimate in creativity: creating not just a work but one’s authorship, and one’s authorial name, every time.

This does raise a practical issue, however, that leads me to think even more about the meaning and implications of using a pseudonym.

How does one create a bibliography of works written under pen names?

The easy version of the problem is this: I have a choice when making my master dissertation bibliography of citing works in a number of ways. I can cite them with that instance’s pen name, then the most commonly known pen or given name in brackets afterward. I can do the reverse. Or I can be troublesome and only cite the pen name. Then again, I could adopt the practice that is the current default – born of now attributing works solely to the most commonly known name rather than to the name originally on the work – that is to not bother with the original pen name, obscuring the original publication context entirely. I can pretend, for example, that Maihime was written by Mori Ogai, and not Mori Rintaro. This flies in the face of convention but is the only way that I can cite the work and remain consistent with the overarching argument that I make in my dissertation: that is, use of and attribution to specific, variable pen names matters, both for understanding context and also understanding the work itself. It goes without saying that this is crucial for understanding authorship itself.

But there is another issue, and it goes hand-in-hand with citing works by writers whose name does not follow Western convention of given name first, last name second. Of having two names at all. The issue comes in the form of citation managers.

I’ve been giving Zotero a go lately and quite enjoying it. But I find myself making constant workarounds because of most of my sources being by Japanese writers, and the writers of my primary sources not only being Japanese but also using pen names. My workaround is to treat the entire name as one single last name, so I can write it in the proper order and not have it wrangled back into “last name”, “first name” – both of those being not quite true here. For citing a Japanese writer, I’d want to retain the last name then given name order; for someone using a pen name, the issue is that no part of the name is a last or given name. It’s what I’d like to call an acquired name.

Mori Ogai is now the most commonly used name of the writer Mori Rintaro (Mori being the last name, Rintaro being his given name). Ogai is a shortened version of his early pen name Ogai Gyoshi. Ogai Gyoshi isn’t a false last plus given name. It’s always in the order Ogai Gyoshi, neither of them is a “real” name, and it is a phrase, not a name. It’s as though he’s using a word that happens to have a space in it.

So when I put some of Mori Rintaro’s writing into Zotero, I put in “Mori Rintaro” as the last name. Sometimes I just put in “Ogai” as the name, when he signs a piece that way. Occasionally it’s “Ogai Mori Rintaro” (this is, in fact, the author of Maihime – I made a shortcut above in my example). And then there are some pieces in which the last name in Zotero is “Ogai Gyoshi.”

I don’t know how to go about this any other way, but it’s less about me having be a little hacky to get it to do what I want, and much more of a constant reminder of our current (Western) assumptions about names, authorship, and naming conventions. It’s a reminder of how different the time and place that I study is, and how much more dynamic and, frankly, fun it was to write in the late 19th century in Japan than it is now, either here as an American or even in Japan. Names are taken a bit more seriously now, I’d argue, and more literally. It’s a little harder to play with one’s name, to make one up entirely for a one-off use, at this point – and I think it’s for the worse.

(Obviously, there are exceptions: musicians come immediately to mind. And it’s not as though writers do not adopt pen names now. But it’s not in the same way. And this, incidentally, is something I love about the early Internet – I’m referring to the nineties in particular. Fun with handles, fun with names, all pseudonymous, and all about fluid, multiple identity.)

android slashdot reader: 和英コメントで言語学び!

Now that I have an Android phone and have found some pretty great things on the Android Market for getting ahold of Japanese content, I would like to start sharing with you all what I’ve been using and whether it’s worth downloading.

First up is my favorite new find: Slashdot Reader. Yeah. It’s an RSS feed reader for Slashdot. Why so great?

Well, can you imagine my reaction when I read its description and saw images of posts from slashdot.org and slashdot.jp showing up all mixed in together? Then I read the description: “just a feed reader, nothing more” – for both Japanese and English Slashdot.

It’s like I found an app made by my doppelganger. Really.

If you don’t want one or the other of the languages, it allows you to toggle both, Japanese only, and English only.

Because it’s a feed reader, you only get the headlines and leads from Slashdot, but can easily click through to the full story, and therein lies the amazing language learning tool that somehow never really occurred to me.

All of these years, I could have been learning Japanese through Slashdot comments! That’s right. Of course it’s not textbook Japanese. I already know how trolls (荒らし) talk after just a minute or two of reading. How nerds talk. (They always use が and never けど, although they do use ね sparingly for emphasis. A certain language teacher from several years ago who forbid us from using けど in class for an entire semester would be proud.) And how random users talk.

I also know how they’re basically saying exactly the same things that commenters do on Slashdot in English, only they’re saying it in Japanese. (open source != free as in beer, anyone? I seriously just read this. 無償 is free as in beer, and note that it’s not the same as the widely-used word for “free” 無料 – so I just learned something new about software licensing.) So if you’re a Slashdot reader, this is going to help you immensely. It’s all about context.

Yes, so there are people out there who would disparage the idea of learning language from internet comments. But I counter that with: it’s real language! And this is a specific forum where you know what is coming: some nerdspeak, some posturing, some trolls, some reasonable people, talking about a rather limited set of topics. So you are going to learn voices, not just “Japanese.” You are going to learn what people say in a certain situation, and also what not to say. I can’t think of anything more helpful than that!

And here you go: Slashdot Reader for Android (this takes you to Android Market).

Video Podcast: London Seminar in Digital Text and Scholarship

The School of Advanced Study at the University of London has just started a video (and audio) podcast series of the full talks from each session of the London Seminar in Digital Text in Scholarship.

Find the podcasts online here, or subscribe via iTunes (there is a link on the page to do so).

The first talk is Jan Rybicki with ‘The Translator’s Other Invisibility: Stylometry in Translation.’ Just another day I wish I lived in London, with all of the great digital humanities related seminars and talks going on. I read this scholar’s paper on the same subject in Literary & Linguistic Computing not too long ago and it was, in a word, awesome.

phone destroys blog

When I have a hiatus (as I periodically do from online life, and especially something as intensive as a reflective blog such as this one), it can be due to all kinds of things. Real life nuisances take many forms: moving (sometimes transcontinental moves); frequent travel, back to back is even worse; getting bored of the Internet; someone visiting. Well, for the most part, it involves being overly mobile: I’m just not at the computer engaging with the world via Web browser, and that ends up killing my blog, Facebook activity (as though there’s a lot of that anyway), my nascent G+ activity, sometimes Twitter.

So what has destroyed my Internet life these days, outside of email and intermittent Twitter usage? It’s my phone! Being mobile kills again.

Here’s the work I do 90% of the time: teaching (which involves reading, writing, and talking), and reading/writing for my dissertation. This stuff doesn’t even use a computer.

More than half the time I don’t bring a laptop with me when I travel to and from school, or on little coffeeshop trips to work. Why? It’s because I have used my smartphone as an Internet substitute for so long that a laptop has become overkill for everything that isn’t computer-demanding work. Everything else gets done on my home desktop, and since I don’t bother to turn it on unless I need to Do Stuff.

Thus, my Blackberry has killed my blog. You may ask, how is it that you write pages-long email on that thing and can’t just write a blog post here and there? It’s much less to do with the Blackberry Web browser (which we all know sucks) and much more to do with the format itself.

Here’s the problem: a phone is great for doing one thing at a time; at best I bounce between 4 separate things. (Typically, Twitter, email, Web browsing, and weather – or substitute weather with “talking on the phone” more rarely, because I have Sprint and I can do all that stuff at the same time.)

When I’m writing things for the Internet? I have tabs open like they’re going out of style. I have different articles sitting there waiting for reference; I may be using a text editor or looking at dissertation notes; I am linking my photos from Flickr; I am posting the links to my new posts via Twitter, Facebook (which doesn’t work well on my phone), and G+ (which doesn’t work at all – it has no usable mobile site). I work in a flat and non-linear way. I wouldn’t call it multi-tasking; I would call it working. Rarely do any of us simply have one window open, doing one activity. That’s like having a blindfold on while you listen to music, and also carefully not allowing yourself food or drink, or mobility. That’s not how we live.

I’m not really specifically blaming my phone, or saying that if I had a bigger-screen, touch screen (ugh), or Android/iOS based phone that things would be different just because they are prettier and can render the Web more effectively. Actually, I wouldn’t get nearly as much writing done if I weren’t using the Blackberry – its ergonomics and keyboard are second to none. I would have even less of a Web presence if I didn’t have it with me.

But as long as I’m using a phone (or hey, if I were using a tablet down the road), the lack of true multi-tasking ability is going to prevent me from doing real work outside of constantly emailing. You might argue that with a big enough tablet, I’m basically working on my iBook. You’re right that the screen is similar, and that tablets try to be more than giant smartphones, but as long as they’re trending toward one-thing-at-a-time style usage, it’s never going to be more useful for me than a cell phone. In other words, useful for some daily communication (and so much so that I use it exclusively as my regular device for communicating), but totally inadequate for getting real work done.

Now that I remembered to charge both my laptops’ batteries and am getting back to doing lots of daily notes for work, that backlog of posts will start clearing out – but when real life interferes and I’m back on the phone, my online life will go silent again.

the tradeoff: elegance vs. performance

Oh snap – I just fixed this by turning on caching in the Cocoon sitemap. Thanks Brian Pytlik Zillig for pointing out that this is where that functionality is useful! And note to self (and all of us): asking questions when you’re torn between solutions can lead to a third solution that does much better than either of the ones you came up with.

With programming or web design, “clean and elegant” is a satisfaction for me second only to “it’s working by god it’s finally doing what it’s supposed to.”  So what am I to do when I’ve got a perfectly clean and elegant solution – one that involves zero data entry and only takes up a handful of lines in my XSLT stylesheets – that crunches browser speed so hard that it takes nearly a minute to load the homepage of my application?

I’ve got a choice here: Two XML files (one for each problem area) that list all of the data that I’d otherwise dynamically be grabbing out of all files sitting in a certain directory. This is time-consuming and not very elegant (although it certainly could be worse). The worst part is that it requires explicit maintenance on the part of the user. Wouldn’t it be nice to be able to give my application to any person who has a directory of XML files without any need for them to hand-customize it, even just a small part?

On the other hand, I can’t expect Web users to sit there and wait at least 30 seconds for TokenX to dynamically generate its list of texts, an action that would take a split second if it were only loading the data out of an XML file. I already have all the site menu data stored in XML for retrieval, meaning that modifications need only take place once and that nested menus can be easily entered without having to worry about the algorithm I’m using to make them appear nested on the screen in the final product.

You can tell from reading my thought process here what the solution is going to be. It’s too bad, because aiming for elegance often ends up leading you to better performance at the same time. Practicality vs. idealism: the eternal question to which we already know the answer.

trusting the computer, and getting there with XSLT

If you are working in a functional, stateless language, but can still get away with for loops in a more conventional way thanks to for-each functions – should you still favor recursion over explicit for loops? Discuss.

Now that I am, as the title implies, “getting there,” I want to reflect a little on the learning process that has been XSLT. In my last post I glossed over what makes it (and functional programming languages generally) distinctive and, for people who are used to procedural languages, unintuitive and hard to grasp at first. This will be a post with several simple points, but that’s quite in keeping with the theme.

The major shift in thinking that needs to happen when working with XSLT, in my opinion, is one of trusting the computer more than we are accustomed to. It all stems from letting go of telling the computer how exactly to figure out when to execute sections of code, and letting it make the decisions for us.

I made a comment recently: “I know I’m getting more comfortable with XSLT because suddenly I’m trying to use recursion everywhere I can, and avoiding the for-loop like a crutch.” As others I talked to put it, this is idiomatic XSLT.*; In other words, it’s one of the mental leaps that you (and I) have to make in order to start writing elegant and functional code (no pun intended) using this language.

What is recursion? In this case, to oversimplify, it’s how XSLT loops.** In a procedural language – C++, Java, most languages other than Lisp dialects to be honest – recursion is clunky and wasteful; telling the computer to specifically “do this for the number of times I tell you, or until this thing reaches this state” is how you get things done. This means that the languages have state, too – you can change the value of variables. This is important for having counters that are the backbone of those loops. If there were no variable to increment or change in another way, the loop would either never execute (such as a while), only execute once, or loop endlessly. None of these things are very helpful.

So how do you get away with counter-based loop, at least of the “for each thing in this set” variety, with a stateless language (all variables are permanent, aka constants) that discourages use of for-each loops in the first place?

The first is much simpler: xsl:apply-templates or xsl:call-template. This involves the trust that I introduced above. With a procedural language it’s hard to trust the computer to take care of things without your telling it exactly how to do it (keep doing this thing until a condition is met) because you’ve had to become so used to it. It might have been hard to get used to having to explain the proverbial peanut butter sandwich recipe in excruciating detail for the sandwich to get made. Now, XSLT is forcing you to go back to the higher level of trust, where you can tell the computer “do this for all X” without telling it how it’s going to do that.

xsl:apply-templates simply means, “for all X, do Y.” (The Y is in the template.) It’s unsettling and worrying, at least for me at first, to just leave this up to the computer. There’s no guarantee that templates will ever be executed, or that they will be executed in order. How can I trust that this is going to turn out okay? Yet, with judicious application of xsl:apply-templates (like, where you want the results to be), it will happen.

Second, the recursive aspect. Keep calling the template until there are no more things left – whether that’s a counter, or a set of stuff. But how to get a counter without being able to change the variable? With each xsl:apply-templates (or call-template), do so with xsl:with-param, and adjust the parameter as needed. Call it with the rest of the set but not the thing that is being modified in the current template. When it runs out of stuff, that is when results are returned. Again, it takes the explicit instruction – xsl:for-each is very heavy-handed – and turns it into “if there’s anything left, keep on doing this.” It may seem from my description that there’s no real difference between these two, and in their end result, there isn’t. But this is a big leap, and moving from instinctively reaching for xsl:for-each to xsl:apply-templates is conceptually profound. It is getting XSLT.

Finally, a note on the brevity and simplicity of XSLT. I’ve noticed that once I’ve found a good, relatively elegant solution to what I’m trying to do (they can’t always be!), suddenly my code becomes very short and very simple. It’s not hard to write and I don’t type for a long time. It’s the thinking and planning that takes up the time. Obviously this is true for programming just about anything, but I find myself doing a whole lot less typing this summer than usual (compared to languages I’ve used such as C, C++, Java, Python).

It’s both satisfying and disappointing at the same time: getting a template that recursively creates arbitrary nested menus wants to make me jump up and high five myself; the fact that it’s only about four lines and incredibly simple makes me wonder if any of it was that hard to begin with. But this isn’t limited to XSLT or even programming: the 90-page thesis seems like more work than the 40-page thesis, but if the shorter one is talking about more profound ideas and/or is simply more well-written, the length and time comparison falls apart. The time spent typing and the length of the output doesn’t tell us as much as we’re used to assuming.

That’s what I have to say about what I’ve been doing this summer, as far as learning XSLT goes. I still can’t say I like it. The syntax is maddening. I haven’t been in this long enough to judge whether it’s the best choice for getting something done within a lot of constraints. But at the very least I’ve finally had that brain shift again, the one I had with Lisp so long ago, to a different approach to problem-solving entirely. And that feeling is profoundly gratifying.

Speaking of a good feeling, I’ve been able to have extended chats with multiple people about XSLT on the U of M School of Information mailing list this summer after someone posted asking for help with it. It’s a good thing I replied despite thinking “I’m not an expert, so I probably don’t have much to offer.” Talking with the questioner and the others who replied-all on our emails was really enlightening, both by getting feedback, hearing others’ questions about how the language works (questions that I hadn’t articulated very well), and also giving my own feedback. There’s nothing like teaching to help you learn. I would not have been able to write this post before talking to my fellow students and figuring it out together. (Or, you would have read a very unclear and aimless post.)

(Very last, I’d like to recommend the O’Reilly book XSLT Cookbook for using this language regularly after getting acquainted with it. If I were continuing on with an XSLT project after this internship, or working on adding more to this one, I’d be using this book for suggestions.)

* Thank you all for reminding me that this word exists.

** XSLT now includes not only the for-each loop, but also the xs:for tag. These do have their appropriate uses and I do use them quite a lot, because my application doesn’t give me a huge number of chances for recursion. I’m being dramatic to make a point.

Cross-posted from the iSchools & Digital Humanities intern blog

in which I acquire typographical empathy

Guys. I humbly apologize for forcing you to read monospace font of your browser’s (or OS’s) choice for the past year. I’ve learned the error of my ways.

From now on it’s serif all the way.

And I now supply Linux font defaults just in case, which I didn’t before out of ignorance. At least there’s something to try to fall back on before font-family: serif now.

What I’m doing this summer at CDRH: overview

I’ve been here at CDRH (The Center for Digital Research in the Humanities) at the University of Nebraska-Lincoln since early May, and the time went by so quickly that I’m only writing about what I’m doing a few weeks before my internship ends! But I’m in the thick of things now, in terms of my main work, so this may be the perfect time to introduce it.

My job this summer is (mostly) to update TokenX for the Willa Cather Archive (you can find it from the “Text Analysis” link at http://cather.unl.edu). I’m updating it in two senses:

  1. Redesigning the basic interface. This means starting from scratch with a list of functions that TokenX performs, organizing them for user access, figuring out which categories will form the interface (and what to put under each), and then making a visual mockup of where all this will go.
  2. Taking this interface redesign to a new Web site for TokenX at the Cather Archive.* The site redesign mostly involves adapting the new interface for the Web. Concretely, I’m doing everything from the ground up with HTML5 and styles separated into CSS (and aiming for modularity, I’m using multiple stylesheets that style at different levels of functionality – for example, the color scheme, etc., is separated from the rest of the site to be modified or switched out easily). The goal is to avoid JavaScript completely, and I think we’re going to make it happen. We’re also aiming for text rather than images (for example, in menus) and keeping the site as basic and functional as possible. After all, this is an academic site and too much flashy will make people suspicious. 😀
  3. The exciting part: Implementing as much of TokenX with the new interface as I can in the time I’m here. Why is it exciting?
    • TokenX is written in XSLT, which tends to be mentioned in a cursory way as “stylesheets for XML” as though it’s like CSS. It’s not. It’s a functional programming language with syntax devised by a sadist. XSLT has a steep learning curve and I have had 9 weeks this summer to try my best to get over it before I leave CDRH. I’m doing my best and it’s going better than I ever imagined.
    • I’m also getting to learn how XSLT is often used to generate Web sites (which is what I’m doing): using Apache Cocoon. Another technology that I had no idea existed before this summer, and which is coming to feel surprisingly comfortable at this point.
    • I have never said so many times in a span of only two months, “I’m glad I had that four years of computer science curriculum in college. It’s paying off!” Given that I never went into software development after graduating, and haven’t done any non-trivial programming in quite a long time, I had mostly dismissed my education as something that could end up being so relevant to my work now. And honestly, I miss using it. This is awesome.

I’m getting toward the end of implementing all of the functionality of TokenX in XSLT for the new Web site, hooking that up with the XSLT that then generates the HTML that delivers it to the user. (To be more concrete for those interested in Cocoon, I’m writing a sitemap that first processes the requested XML file with the appropriate stylesheet for transformation results, then passing those results on to a series of formatting stylesheets that eventually produce the final HTML product.) And I’m about midway through the process of doing from Web prototype to basic site styling to more polished end result. I’ve got 2.5 weeks left now, and I’m on track to having it up and running pretty soon! I’ll keep you updated with comments about the process – both XSLT, and crafting the site with HTML5 and CSS – and maybe some screenshots.

* TokenX can be, and is, used for more than this collection at CDRH. Notably it’s used at the Walt Whitman Archive in basically the same way as Cather. But we have to start somewhere for now, and expand as we can later.