Are You Using the Wrong Dictionary?

6 June 2014

James Somers wrote a blog post on dictionaries a week or so ago, in which he extols older, more poetic dictionary definitions, criticizing modern dictionaries for being flat and uninspiring. He uncovers a very useful technique for punching up writing, one favored by famed, non-fiction writer John McPhee, but in the process Somers falls into a trap that many non-linguists do, ascribing to the myth that when it comes to language authority older is better. To put it bluntly, Somers is probably using the wrong dictionary.

Overall, Somers’s piece is a nice paean to older dictionaries, which are indeed fun to peruse. It’s true that the style of older references is more engaging than the crisp, dry style favored today. And in particular, Somers’s explication of McPhee’s method for using the dictionary is quite useful. McPhee’s technique is a good one for punching up writing in that final stage of editing, enabling a writer to turn a banal word into an evocative phrase. The blog post is well worth reading for these nuggets.

But as advice as to which dictionary to use as a general reference, Somers’s piece falls into the same category as those writings that elevate Strunk and White to a linguistic godhead. When it was first published in 1959, Strunk and White’s The Elements of Style was a quirky and idiosyncratic style guide filled with errors about how English is actually used. Nowadays, it remains quirky and inaccurate, but it is also woefully outdated, over sixty years old in its current incarnation and based on the original work by Strunk which is over ninety-five years old. The Elements of Style, while perhaps fun to read in places, is a lousy guide for writers. The fact that it is old makes it worse, not better.

The same is true of Somers’s extolling of Merriam-Webster’s 1913 edition of its dictionary. Recently edited dictionaries are generally better for two main reasons: they reflect English as it is used today, and modern lexicographers simply know more and are better at their job than those in days past. The meanings of words change over time; they acquire new senses and connotations, and old senses fall out of use. Relying on a hundred-year-old dictionary is a sure way to insert anachronisms into your writing and to miss connotations and subtleties that have developed over the past century.

And lexicographers are simply better nowadays; they have more information and better tools to analyze the corpus of English. Somers evokes the image of Noah Webster slaving away alone by candlelight, churning out his dictionary, comparing that image favorably with the modern teams of lexicographers using computerized tools and digital databases in an antiseptic office environment. But the simple fact is that one person working alone is bound to get a lot wrong, and a single person in 1828, no matter how brilliant or skilled, simply cannot match the power and scope of today’s digital corpora of English. Noah Webster’s definitions may be poetic in places, but they are also idiosyncratic, reflecting a single person’s views on the language.

Somers uses sport and one of the definitions from the 1913 dictionary as an example. The definition is “diversion of the field.” Somers cites McPhee’s use of this definition, and indeed, McPhee makes brilliant use of it. But as a dictionary definition, does it work? What the heck is a “diversion of the field”? One has to sit and cogitate on that a moment before figuring it out. The definition is imprecise and open to interpretation. The dictionary does add examples, “as fowling, hunting, fishing, racing, games, and the like, esp. when money is staked,” which help. But the entry still omits the most common use of the word today, which the OED (2008) defines as, “an activity involving physical exertion and skill, esp. (particularly in modern use) one regulated by set rules or customs in which an individual or team competes against another or others. Freq. in pl.” If you attempt to use sport in accordance with the definitions of Webster’s 1913, you run a strong risk of anachronistic writing. The main use of the word today was rare a century ago, and the most common use then is relatively rare today. McPhee got away with it because canoeing is the type of activity envisioned by the older definition, but to apply “diversion of the field” to the NFL in 2014 would simply court confusion and head-scratching.

Compare the 1913 definition to the current ones offered by Merriam-Webster Online: “a contest or game in which people do certain physical activities according to a specific set of rules and compete against each other : sports in general : a physical activity (such as hunting, fishing, running, swimming, etc.) that is done for enjoyment.” Sure, these definitions have nothing as evocative as “diversion of the field,” but they are precise, useful, and current.

By all means, read the 1913 Webster’s for enjoyment. Use it as McPhee does to find inspiration. A good and careful writer can avoid the pitfalls of an anachronistic or quirky resource. But as a general reference, you need a crisp, antiseptic, and up-to-date dictionary.

Beowulf Filmfest

4 June 2014

On 2 June, Medievalists.net, an excellent blog and website on all things medieval, posted links to eight different YouTube videos about the Old English poem Beowulf. The videos vary widely in quality, so here I re-present them, in a different order, with some commentary. Overall, this presentation shows that there is quite a bit of material about Beowulf out there, but it varies wildly in quality. The videos that make the most of the audio-visual medium tend to be low quality in terms of scholarship and accuracy, while the most insightful commentary is usually framed in achingly dull presentations of a person standing at a lectern and reading from a script.

First up is the best of the summary videos, which is surprisingly by Thug Notes. Usually such presentations sacrifice accuracy and analysis for humor. But not only is this video amusing and engaging, the writers clearly know the poem well, and their analysis of the poem is dead on. I note only one factual problem: the etymology of the name Beowulf as “man-wolf.” They pulled that one out of thin air. The only other problem is that there is so much more to say about the poem, but that comes with the territory when you try to condense an epic into four and a half minutes. This one is well worth watching.

Next is Beowulf by 60second Recap. It provides a very brief summary of the poem’s main plot points, a summary that is accurate, but by constrained by time is not at all detailed. That would be okay, except the brief analysis that follows is stunningly superficial and wildly off the mark. Beowulf is not relevant to today’s audience because it outlines the model hero: brave, strong, wise, compassionate, etc. Rather it presents the model hero and then continually undercuts him. Beowulf, despite his prowess in killing monsters, is basically a failure; all his victories come with a cost in additional lives and suffering, and he dies leaving his kingdom without an heir and at the mercy of invading armies. So, while sixty seconds isn’t much time, this video is proof that even a small amount of time can be wasted.

Educational Portal provides a fourteen-minute summary by Elspeth Green, whom I gather is a graduate student at Princeton. Green provides a reasonable summary of the poem and its alliterative form. But it’s evident that Green is not an Anglo-Saxonist and doesn’t know the poem all that well. While her summary is decent, she gets minor details wrong, and the video falls apart when she starts trying to explain why people study Beowulf today. The main reason, which she omits entirely, is that it is truly a great work of literature, well-crafted, intricate, emotionally powerful, and hauntingly beautiful as it recalls a nostalgia for pagan past that has slipped away with the coming of Christian “modernity.”

Almost not worth mentioning is the summary of the poem by R. J. Stephenson. (Who that is, I have no idea.) It’s wildly inaccurate and not very funny. Plus at eight and a half minutes, it’s too long. It might work if it were either accurate or funny, but Thug Notes does far, far better job in half the time.

In the category of non-summary videos, Francis Leneghan, a professor of Old English at Oxford, gives a twelve-and-a-half-minute lecture on the question of authorship, and what an analysis of Beowulf has to say about Anglo-Saxon ideas about what the role of a poet and author should be. It’s an excellent lecture. The only problem is that it is a lecture. You can do so much more with video than simply record a person reading a script. Still, the content is excellent.

Brian Patrick McGuire, a history professor at Roskilde University in Denmark, gives a four-minute summary of the poem in Danish with English subtitles. This one is chiefly interesting because McGuire connects the poem to historical characters and events, and it’s filmed at Lejre, a Viking-age site that might have contained Hrothgar’s mead hall. Be forewarned though, the historical connections to the poem are tenuous. Beowulf is a work of fiction containing a smattering of allusions to real people and events. Still, a neat video that evokes the landscapes in the poem.

Jonathan Broussard of Louisiana State University gives a conference paper that contends Beowulf was intended to be a “mirror for princes,” a medieval genre that sought to instruct future nobility on how to act properly. I have a lot to say about the pointlessness of the format of academic conferences in the humanities, but I’ll leave that for another day. Broussard’s paper is a good summary of the topic, which has been raised before. But the video is a low-quality one of a man reading from a script. If you’re interested in what Beowulf has to say about kingship, it may be worth sitting through this eighteen-and-a-half minute reading. Otherwise, give it a miss.

The final video is actually a series of videos that cover another conference presentation, one by Craig Jordan-Baker on adapting Beowulf to dramatic performance. All told, the videos total about an hour. This presentation does better than the last one, in that Jordan-Baker is talking to the subject, using notes, rather than reading from a script, and the video is edited to incorporate Jordan-Baker’s slides—a minimal use of the capabilities of video. (Ironically, this presentation of adapting Beowulf to another format is another poor example of adapting a conference presentation to another format.)

Jordan-Baker’s content, however, is quite good. Of all the videos listed here, this one, in part two, gives the most comprehensive coverage of the themes of Beowulf, what the poem is about beyond the narrative elements.

  • Part 1: a short introduction

  • Part 2: an overview of some of the themes of Beowulf

  • Part 3: Jordan-Baker discusses his own adaptation

  • Part 4: a four-minute discussion of interdisciplinary in the academy and the integration of creative and critical enterprises

  • Part 5: Q&A

Part two of the presentation is here:

Of all these videos, only two are probably worth the time of people who have a general interest in the poem, the Thug Notes recap and part two of Jordan-Baker’s presentation. Venture into the others if the specific topics seem to be of interest.

How Languages Evolve

31 May 2014

TedEd has a video by Alex Gendler on how languages evolve. Nothing surprising or out of the ordinary for those with any kind of background in historical linguistics, but it’s a good overview for the neophyte.

(I’m going to take a moment to bitch about TedEd’s presentation. First, they don’t allow the embedding of the video on other sites. That’s their prerogative, but I don’t see any logic to it. It’s not like they have ads or are otherwise losing money when others embed the video. The second, and much more important complaint, is the lack of a date. When was this video made? I’ve noticed more and more websites and blogs not noting when content is posted. This is a serious deficiency for a site that provides educational materials. Instructors need to know how current the materials are.)

Where Do English Words Come From?

30 May 2014

A recent thread on this site’s discussion forum got me thinking about the borrowing of foreign words into English, a language with a reputation for indiscriminately appropriating words from other languages. Etymology books and websites, including this one, often highlight the diversity of languages that English draws its vocabulary from, but how much of this reputation is deserved? Does English really borrow that many words? And does English really filch a lot of words from many different languages? The answers may be a bit surprising, but when you look at the data in light of history, they make a lot of sense.

To answer these questions, I collated all the data from the OED etymologies into a spreadsheet. The data is not perfect and must be taken with a grain of salt, but is generally representative of where English draws its words from. First, the data reflects the number of times a language is mentioned in the etymology; it does not necessarily represent the source languages that English draws its words from. Cognates—words with a common root—are frequently mentioned in the etymologies, and these are represented in the numbers as well. (For example, there is a single appearance of Sanskrit in the etymology of an Old English word. This does not mean that the Anglo-Saxons had some contact with India and borrowed the word. Rather the OED editors simply pointed out that the Old English word has a cognate in Sanskrit, i.e., they share a common Indo-European root.) Second, the OED predominately draws from literary sources. Drawing from a different corpus would result in somewhat different numbers. For example, if one were to poll technical writing, the number of words with Latin and Greek roots would rise. But these additional words would also tend to be rather arcane and uncommon, so the OED is a pretty good representation of the breadth of English as the language is spoken by most people.

WO_stacked.jpg

The chart here represents the sources of English words as a percentage of the total new entries the OED has for that particular century. By using percentages, I correct for the bias the OED has for drawing upon sources from certain centuries. For example, since the OED was primarily compiled in the nineteenth century, the dictionary has far more new words from that period than any other—almost as many as the seventeenth and eighteenth centuries combined, and more than twice as many as the twentieth. Using percentages allows comparisons across the centuries.

The most striking thing upon viewing the chart is the realization that by far, most English words are formed from already existing English roots. Four times as many words come from English roots as come from Latin, the closest competitor. For all its reputation for borrowing words, most English words are home grown. And most of the non-English words come from only a handful of sources: Latin, French, Old Norse, Greek, and non-language sources, such as acronyms, echoic and imitative words, personal and place names, etc. And the influence of these sources changes over time. The high point for English-rooted words was the Anglo-Saxon era, with over 80% of Old English words coming from native, Germanic roots. The low point was the fourteenth century, when the effects of the Norman Conquest were fully felt. Since then, the amount of borrowing has decreased, and by the twentieth century over 70% of new words were formed with English roots.

By far the biggest impact on the language has been the Norman Conquest of 1066. Prior to the arrival of the Normans, the only major influence on English that shows in the data was Latin, and the OED data probably over represents that influence due to the fact that most extant manuscripts from the Old English period were copied by monks, so ecclesiastical and other Latinate terms are probably more common in the surviving writing than in the everyday speech of the period. We can also see that following the Conquest, Latin all but disappears as a source of new words for several centuries. This is undoubtedly due to the fact that Anglo-Norman French, not English, was the language of the ruling class. Scholarly, literary, and theological writing, which would use Latinate terms, was far less likely to be written in English than it had in the Anglo-Saxon era. But in the fourteenth century—the time of Chaucer—we start to see the re-emergence of Latinate terms, as English starts to re-establish itself as the literary language of Britain. Latin would reach a peak in the seventeenth century, the Enlightenment, when over a quarter of new words were formed from Latin roots. Since then, Latin’s influence has declined. Greek, also associated with technical and scientific vocabulary, doesn’t begin to emerge as an influence until the seventeenth century.

Also as a result of the Conquest, French displaced Latin as the primary source of borrowing for a time. The thirteenth century, some two hundred years after the arrival of the Normans, was the peak of borrowing from French, with over a third of new words coming from French roots, and which also corresponds to the low point of Latin’s influence. So it took about two centuries for French to make itself fully felt. But as new generations of the aristocracy increasingly spoke English, not Anglo-Norman, as their native tongue, the borrowing from French declined. By the sixteenth century and the start of the Early Modern Era, French influence on the English language had sharply decreased.

An oddity in this data is the dating of the Old Norse influence. Overall, Old Norse has not had a large influence on English, with only about 1% of English words coming from Old Norse roots, but in the twelfth century some 10% of new words had Old Norse roots. The odd thing is that this should be much too late. The peak of Old Norse influence should date to the Anglo-Saxon period, when Danes, a.k.a., the Vikings, settled much of northern England. I don’t know why the data shows a delay of more than two centuries. The total number of new words from the eleventh and twelfth centuries is small compared to other centuries—a result of far less things being written in English following the Conquest. It may be that the Norman influence was less strong in the north of England, where the Old Norse influence was strongest. Perhaps a higher percentage of the relatively few English-language manuscripts were produced in the north during this period, resulting in more Old Norse words making it into the OED corpus in this later period. But that’s just a guess. A closer examination of what is going on here is needed before a definitive conclusion can be drawn.

Other European languages remain steady, contributing 3–4% of new words throughout the centuries. The exception is German, which starting in the eighteenth century begins to increase its contribution to English vocabulary, reaching 3% of new words all by itself, and nosing ahead of French by the twentieth century.

Non-European languages remain a negligible source of English words until the seventeenth century. Then, with voyages of discovery and colonialism and imperialism creating contact with languages from outside of Europe, words from non-European languages began to creep into the language. But by the twentieth century, all these languages combined were contributing less than 3% of new English words, and no single language makes a significant contribution.

The final contributor is non-language sources, things like personal and place names, acronyms, echoic words, and the like, including the infamous “origin unknown.” For most of English’s history, these sources contributed some 3–4% of words, but in the eighteenth century this rose to 5% and then to 10% in the twentieth. This is of note because it represents a significant shift in how new words are created.

I’m including two other charts which visualize the same data in different ways. Instead of using a percentage, I’ve normalized the raw numbers to eliminate the sampling bias in particular centuries. The Y-axis doesn’t represent a count, but is simply a magnitude. The absolute number, from 0–201, associated with a particular source in a given century has no meaning except relative to other numbers on this chart.

WO_linechart.jpg
WO_barchart.jpg

[Data source: Oxford English Dictionary Online, accessed May 2014]