Tim Finds His Groove

I’m happy to say I spent last month writing quality pages about Freedman’s study on the Lyrical Novel. My stacks of writing–up to this point, let’s recall, regarding Jackson, Nietzsche, Moretti and now Freedman–grew and were lovingly edited by Brad, Paul and myself. I wove these stacks together, with an intro and some methodology to boot. My first-half-draft, as I’m calling what I have up to this point, is about 24 pages. Solid!

The semester closed like all semesters finish. Cough drop wrappers rustle around the depths your backpack, piles of half-memorized flashcards litter your bedroom floor, last-minute slide decks you forgot you needed to contribute to flood your Google Drive. You close strong, though, because you have to.

And then break starts. And what do you do? A lot of my friends are spending this month picking up hours at an ancient outlet mall job or playing all the Xbox they can in one sitting. For me, though, this is a working break. Not much Xbox for me … At this point, I have most of the theoretical framework laid for my thesis. Now, I will probably fluff up my sections with other lyric scholars (like Culler) and digital denizens (like Jockers), but the rough structure for my experiments is all there. The pipeline is in place. And that’s great–as far as writing goes. This project, though, requires me to square the ideas of the critics with the realities of my primary texts through the use of both traditional and digital hermenuetics. This translates to a break spent coding.

Coding how, though? What have I been up to?

My primary focus for the first part of break is to properly “clean” my lyrical novels. I am breaking down my digital texts into my semantic units of interest– their paragraphs, sentences and words.

Why am I doing this?

A grounding assumption in literary studies is that there is an artfulness and delicacy attached to each sentence of the works of great authors such as Doctorow, Dickens and Zola. Their works are remembered beacuse of their command of words – recall Flaubert’s “mot juste” and the particular words and actions in Proust that elicit striking memories and emotions from readers – in addition to complex plots and expertly crafted characters. I’m concerned with synactic relationships, frequencies, etc.; features that manifest themselves through the words utilized in a given text. Great authors “have a way with words” that comes to fruition in memorable sentences and paragraphs. So I don’t think it would be right to only analyze a novel’s words, isolated from their original context. Indeed, its pretty easy to just fire up the ‘tm’ package, make a corpus, and run classifer X, Y, Z and optimize at will. This won’t get me the results I’m really after, though. This is where my rough and tumble text scrubbing comes into play.

What are you going to do next?

After I finish cleaning my corpus I am going to carry out two types of classification. The first will be traditional supervised classification using Support Vector machine. The second will be “cutting-edge” (to quote my Machine Learning professor, Dr. Qi)–a Recurrent Neural Network for Unsupervised Classification. I think that these methods will pair nicely with one another. I am not sure whether this will be done in R or Python. I was having trouble this past semester loading Keras into RStudio so I may just opt for Python. We’ll have to wait and see, though.

That’s all for me. All best for happy holidays!

-Tim.