Shakespearean Mythbusting: Vocab

24 October 2011

Holger Syme has a nice piece on Shakespeare’s vocabulary and why his supposed inventiveness with the English language is not exceptional.

Also, the list of other Elizabethan playwrights with their education (or lack thereof) and humble family backgrounds pretty much puts paid to the myth that Oxford wrote Shakespeare’s plays.

I would add two things to Syme’s article. The first is that the OED is used as a source for many such word studies, but the OED is heavily biased in favor of Shakespeare. His works were carefully picked apart for the first edition of the dictionary, with a care that no other writer got. The editors give precedence to Shakespeare citations over those of others. If they have two quotations and only room for one, the Shakespeare goes in the dictionary. (This is a sound editorial policy. After all, Shakespeare is much more likely to be read than other works and the need is greater, but it can skew the results when using the OED as a corpus.)

The second is that Shakespeare does not, as many claim, have the credit for the greatest number of first citations of words in the OED. That honor goes to Chaucer. As of today, the OED has 1,726 words with first citations from Shakespeare, but 2,220 by Chaucer, some 28% more. And Chaucer’s surviving corpus is only about half as big as Shakespeare’s, clocking in at about 385,000 words. This doesn’t mean that Chaucer was of singular talent either. The reason he gets so many first citations is chronology; he is the first major poet writing after the massive influx of Norman words into the language. (And there is a similar bias among OED editors in favor of Chaucer over his contemporaries.)

Word Clouds: The Mullets of the Internet

14 October 2011

Jacob Harris has a piece on the non-utility of word clouds over at the Nieman Journalism Lab.

Word clouds are basically a lazy way to inject a graphic and superficial analysis into a story. There are times when they can be useful, but as Harris points out, there is almost always a better way to visualize textual data.

Be sure to click on the links in Harris’s story for the examples of good and bad data visualization.

The Last Print Dictionary

2 October 2011

Dennis Baron, channeling American Heritage’s marketing department, asks whether the fifth edition of the American Heritage Dictionary will be the last print dictionary ever published.

He makes two related, but actually quite different points. The second, and more trenchant, point is that edited dictionaries, the ones we know and love that are carefully crafted by lexicographers, may not survive in an age of Wikipedia and fast, but false, facts at our fingertips. Baron points to urbandictionary.com as what must be avoided. (Urbandictionary.com is a fun site, but beyond achieving a rough level of confidence for whether or not a slang term actually exists, it is utterly useless as a tool for gaining knowledge about words.) Baron fears that sites that aggregate old dictionaries may kill the market for new ones. While crowdsourcing and other techniques made possible by digitization and the internet, if properly supervised by skilled lexicographers, may offer new methods for creating creating dictionaries, Baron has a valid point about the market. Can we afford to create new works and advance knowledge when old knowledge is free?

Baron’s first point is a nod in the direction of an elegy for print dictionaries. Baron is careful not to stray too far down this path. He readily admits that online dictionaries are better reference tools. But many others do revere the printed book over its electronic incarnation, and it is clearly this sentiment that American Heritage’s ad campaign is playing into. This reverence is really no more than a fetishization of the object. As Walter Benjamin put it, “technological reproducibility emancipates the work of art from its parasitic subservience to ritual” (1056–57). The celebration of the print dictionary is merely a celebration of its “aura,” not the work itself. What is important about dictionaries is not their physical form, the fact that we can touch them, their weight, their smell. What is important is the information contained within. It does not matter whether the information is presented in ink on a page or as pixels on a screen. I for one will not miss printed dictionaries. Even the smallest are heavy, bulky, and can’t be carried with you. Online dictionaries are freed from arbitrary restrictions on size and indexing. They are simply far superior as reference tools.


Works cited:

Benjamin, Walter. “The Work of Art in the Age of Technological Reproducibility.” 1939 (1955). The Norton Anthology of Theory and Criticism, Second Edition. Vincent B. Leitch, ed. W. W. Norton & Company. 2010. 1051–71. Print.

Zimmer on the Zuckerverb

30 September 2011

Ben Zimmer talks about the “new language of Facebook” in The Atlantic today. You can also hear Zimmer talk about it on WCBS radio.

Zimmer gives some trenchant analysis of what Facebook is doing, while avoiding the doom-and-gloom “the English language is going to hell” commentary that is so often heard:

This is what happens when language is optimized for social data-mining rather than natural communication. “Mark read a book.” “Mark listened to a song.” “Mark hiked a trail.” “Mark reviewed a movie.” The sentences flashed on the big screen behind Zuckerberg as he laid out his verb-y vision. Though these sentences are technically in the active voice, they present us with an oddly cramped kind of “activeness,” in which we the users engage with a world of commodified objects through verbs of consumption. And to see one’s “life story” reduced to a series of such prefab activities in a personal timeline? Some might call that the apotheosis of consumer culture.

It’s well worth a read.