Words of the Year

21 December 2010

The American Dialect Society is taking nominations. There are number that have already been nominated. I actually find the lists of nominations more interesting than the actual vote. They often include references to events of earlier in the year that have slipped from memory, but once recalled remind us how they dominated the news.

Ben Zimmer’s list includes: cablegate, mama grizzly, vuvuzela, backscatter, and freedom pat.

Grant Barrett’s list includes: junk shotporno scanner, and belieber.

Wayne Glowka’s list includes: refudiatewiki-leaker, and spillion.

Nancy Friedman’s list includes: cannabizhashtag, and man up.

And Joe Clark has contributed a number of Canadian nominations, including: G20 (a big news item here this year), busty hookers, and vampire squid.

Google Ngrams

19 December 2010

As you probably know because a zillion people have posted about this, Google has released a new feature that allows you to examine the relative frequency of words and phrases in the Google Books corpus. Here’s a description of the Ngram feature by one of the engineers.

But is the feature useful for linguistic research? Well, there is at least one serious study that uses Google’s corpus. A team led by Jean-Baptiste Michel and Erez Lieberman-Aiden have published “Quantitative Analysis of Culture Using Millions of Digitized Books” in Science (subscription required). Linguist Geoffrey Nunberg has a more accessible article about this research here. But this work does not use the publicly available Ngram tool. Can the Ngram tool, as opposed to the underlying corpus, be used for real work, or is it just an amusing diversion?

My impression is that with appropriate caution it can be useful as a quick tool verify or refute an impression or statement, but, like the counts of Google hits on the web, it is too crude to be definitive. There are simply too many uncertainties and uncontrolled factors underlying the numbers it provides. Any results it returns have to be verified by a much better controlled search of a corpus than this publicly available tool provides.

Christmas Cliches

19 December 2010

There’s an amicable, little spat going on between language writers John McIntyre and Jan Freeman over the permissibility of holiday cliches like “tis the season” and “Christmas came early...” in journalism. (No one is suggesting that individuals can’t use these if they please, but the argument is over whether or not they are over worn and threadbare in newspaper articles.) McIntyre’s original salvo is here.

Freeman’s response.

McIntyre’s response to the response.

While I’m not sympathetic to “bans,” I probably side more with McIntyre on this one. Most of these are true chestnuts, devoid of any originality or interest. Professional writers and publications should strive to avoid them, and we certainly don’t need the stories about the cost of the gifts in the “Twelve Days of Christmas.” But Freeman does have a point in when she says that editors, because they see these more often than the average reader, can become hypersensitive to them. And her advice to confining certain cliches to certain situations is a good one, as when she suggests allowing weather broadcasters to use “white stuff.”