Google Ngrams

19 December 2010

As you probably know because a zillion people have posted about this, Google has released a new feature that allows you to examine the relative frequency of words and phrases in the Google Books corpus. Here’s a description of the Ngram feature by one of the engineers.

But is the feature useful for linguistic research? Well, there is at least one serious study that uses Google’s corpus. A team led by Jean-Baptiste Michel and Erez Lieberman-Aiden have published “Quantitative Analysis of Culture Using Millions of Digitized Books” in Science (subscription required). Linguist Geoffrey Nunberg has a more accessible article about this research here. But this work does not use the publicly available Ngram tool. Can the Ngram tool, as opposed to the underlying corpus, be used for real work, or is it just an amusing diversion?

My impression is that with appropriate caution it can be useful as a quick tool verify or refute an impression or statement, but, like the counts of Google hits on the web, it is too crude to be definitive. There are simply too many uncertainties and uncontrolled factors underlying the numbers it provides. Any results it returns have to be verified by a much better controlled search of a corpus than this publicly available tool provides.

Christmas Cliches

19 December 2010

There’s an amicable, little spat going on between language writers John McIntyre and Jan Freeman over the permissibility of holiday cliches like “tis the season” and “Christmas came early...” in journalism. (No one is suggesting that individuals can’t use these if they please, but the argument is over whether or not they are over worn and threadbare in newspaper articles.) McIntyre’s original salvo is here.

Freeman’s response.

McIntyre’s response to the response.

While I’m not sympathetic to “bans,” I probably side more with McIntyre on this one. Most of these are true chestnuts, devoid of any originality or interest. Professional writers and publications should strive to avoid them, and we certainly don’t need the stories about the cost of the gifts in the “Twelve Days of Christmas.” But Freeman does have a point in when she says that editors, because they see these more often than the average reader, can become hypersensitive to them. And her advice to confining certain cliches to certain situations is a good one, as when she suggests allowing weather broadcasters to use “white stuff.”

OUP Should Know Better

18 December 2010

Aack! Another vector for “the English language is going to hell.” And this time from a publisher that should know better.

I’m not familiar with Duane Roller. His credentials make him out to be a respected historian of the classical period. And while I would hesitate to question him on the subject of Cleopatra, he does not appear to have any particular expertise in linguistics or modern language studies, and his blog post pretty much confirms he has no clue about the subject.

I’m not disputing Roller’s praise of Fowler’s Dictionary of Modern English Usage, which was once a superb reference. Although like all such references, the 1926 edition has outlived its utility and should be allowed to pass away with dignity. The more recent third edition, while a serviceable style manual, isn’t the best on the market.

But Roller proceeds from the assumption that there is one, single “proper” way to write in English, and his claims that the English language is going to the dogs are claptrap and completely unsubstantiated by any evidence (and how do you even measure something like this?). I’m sorry, a cite of the opinion of a Washington Post columnist doesn’t count as evidence. (Would Dr. Roller accept Mr. Weingarten’s opinion as fact if the subject were Antony and Cleopatra? I think not.) Jonathan Swift was making the same argument about the language going to the dogs three hundred years ago. People bemoaned the fact that the printing press made everyone an author, and there were probably those who despaired because poets like Chaucer were writing in the vulgar tongue of English instead of proper French.

It may be true that newspapers today have more spelling and grammatical errors than a decade or two ago. (That’s probably something you could quantify and track, although I don’t know anyone who has done so systematically.) But if so, it is more likely the result of having fired all their copy editors rather than because reporters are less skilled at writing than in days past. And never mind the fact that what most people consider “bad grammar” is actually nothing of the kind.

Dr. Roller has many more years of grading student papers than I, but in the few months that I’ve been doing it I have been surprised at how few grammatical and spelling mistakes the students make (even on in-class assignments without the benefit of electronic spell checkers). With few exceptions, mostly from English-as-a-second-language students, the papers consist of grammatically correct sentences. Where the students seem to have problems are in 1) structuring arguments, 2) insufficient academic vocabulary, and 3) inexperience in writing in a formal, academic register. A style manual like Fowler’s will not help with any of these. The solution is to give the students more practice in academic reading and writing. I do have many years of experience as a professional copy editor, and I my experience has shown that those in the business world, not professional writers, are grammatically skilled as well. Their sins are usually use of jargon in pieces intended for general audiences and verbosity.

And of course, what diatribe about bad English would not be complete without laying the blame on the “internet”? No more need be said on this. Dr. Roller has the standard write-a-blog-post-bitching-about-English-usage playbook and he’s running through it.

The line that most intrigues me is, “Astonishingly, one of our most distinguished literary magazines questioned something that I said because it could not be verified on the internet.” Now, I don’t know what the back story is here. I suspect that his reference was not available online and the editors were having trouble finding it, but it comes across as if Dr. Roller is in a huff because someone dared fact-check his work. Perhaps Dr. Roller should be less concerned with grammar mistakes of others and more concerned with what he is actually writing himself. In the words of the immortal Inigo Montoya, “I do not think it means what you think it means.”

Economist on Business Cliches

11 December 2010

The Economist’s “Johnson” blog had a nice posting on some current business clichés a couple of days ago, which I’m only getting around to linking to now. (Studying for end-of-term Latin test, sorry.) What I really like about this piece is the explanation of the subtext of each of the phrases. Instead of bemoaning how language is going to the dogs, which is what most articles of this type do, G. L. explains how each phrase is actually used. You are free to dislike the aesthetics of the phrases, but they do serve a purpose and communicate a message that is not encapsulated solely in the semantics of the words within them.