Internet/Web vs. internet/web

5 April 2016

I like fivethirtyeight.com. Nate Silver and his crew have pioneered a new form of journalism, one based on data rather than punditry, but like anyone else, they get into trouble when they stray outside their wheelhouse. Most recently, a blog post on their website took on the announcement that the Associate Press (AP) has changed their stylebook to use lowercase letters when writing internet and web. Formerly, the news organization had advocated for Internet and Web. In so doing, they not only demonstrated a misunderstanding of how language works, but they also screwed up their analysis of the data.

This past week the AP announced that they will no longer be capitalizing internet or web. The change is significant because many journalistic organizations in the US follow the AP style.

But the AP is hardly on the leading edge of this trend. I discussed the capitalization of internet back in 2004 when Wired magazine made the switch. The fivethirtyeight.com blog post says the AP is a “cultural bellwether for writer types.” Yet as it is on most stylistic points, the AP style is inherently a conservative one, not one that rides the edge of linguistic change. Nor do most “writer-types” give a crap what the AP says. (In fact, AP style is something of a joke among most writer-types, or rather among writers who are aware that it exists at all.) Fivethirtyeight is viewing the situation from the perspective of a journalist, assuming that everyone else is a journalist too. They’ve moved from data analysis to punditry.

And what’s worse, the blog post misrepresents the data. Like any good fivethirtyeight.com story, it presents data, this time from Google Ngrams, but the words in the story don’t match what the data tells us. First, the data is from the wrong source for their purposes. It’s from Google Books. If you want to track journalistic style—this is an article about the AP stylebook after all—then Google Books is the wrong corpus to use. You want a corpus of news stories for that, not one that tracks usage in books.

Google_Internet.jpg

The article does say that “we passed peak ‘Internet’ and ‘Web’ sometime around 2002.” That is correct, as far as it goes, but it ignores the fact that the capitalized Internet and Web are still far more common in Google Books than their lowercased counterparts. Internet and Web started their rise in the Google Books data around 1989 and 1991 respectively. Both rose at nearly exponential rates until 2002, when their usage declined, although as of 2008 (the latest available year in the data set) Web was over twice as popular as web and Internet some six times as popular as internet. The lower case forms showed growth over the same period, but at a much slower rate, and in 2004 web showed a decrease in usage as well, and internet slowed its steady growth. What the data shows, therefore, is not a switch from the uppercase to the lowercase forms, but rather that people were writing less about the internet and the web. What had happened was the dot-com crash and the switch from the internet being a trending topic of discussion to simply being part of the background noise of our lives. (Again, this is Google Books. Journalistic usage may show something entirely different.)

Twelve years ago I said, “the significance of the Wired News style change should not be underestimated. The practice of capitalizing [internet] is clearly on the way out.” While that statement was not wrong, it was overly optimistic about the pace of that change. The capitalized forms are still very much the preferred form. But it was Wired that was the “cultural bellwether,” not the AP, which is in the middle of the flock.

Etruscan Tablet Found

31 March 2016

Seventy letters and punctuation marks make this find one of the longest surviving examples of the Etruscan language. Little is known about the language, and every single find has the potential to vastly increase our knowledge about the language, the Etruscan people, and their culture. This particular find is not a gravestone, which makes it especially valuable since most of the other examples of the language that we have are from graves; it has the possibility of providing evidence about things other than the dead.

Etruscan is not related to any living language, or so most historical linguists believe.

Iffy Gifs

23 March 2016

One of the hot trends in academia is digital humanities. Like many trending topics it is ill defined, but it generally encompasses the study of “texts” in our digital age, as well as the use of digital tools to interrogate more traditional print and manuscript texts. Mark Liberman over at Language Log highlights one such text, a press release from the US House of Representatives Judiciary Committee, which uses animated gifs to express its points. A quote from the release:

Animated gif of Kristen Wiig saying “No Way”

Animated gif of Kristen Wiig saying “No Way”

“House Republicans have introduced a bill that prevents the President from unilaterally shutting down the enforcement of our immigration laws. It does so by allowing state and local governments to enforce federal immigration law.“

Using animated gifs to augment prose text is hardly new, but their use in a press release, especially a release from a government body, is highly unusual. I suppose such use of new media is inevitable, but one must question the wisdom of such use. The gifs certainly convey the conservative outrage at the alleged acts of the Obama administration, but their use also transforms the medium of press release from pronouncement from an august body worthy of journalistic coverage to that of a Facebook post by your crazy cousin Walter. It’s not just what you say, but how you say it that is important.

Liberman’s discussion of the gifs in the release also uses a word that is new to me when he writes, “seriously, this press release was no doubt put together by a couple of sub-millennial staffers.” Sub-millennial refers to those born after the year 2000. While I get his point, I seriously doubt that the committee actually employs teenagers on its staff. It’s more like the staffers were acting like sub-millennials. Or maybe Liberman is using the word to mean those born before the millennial generation, in other words, older people trying desperately to be hip and to rap with the youth today.

Anyhoo, I’ll leave you with this philosophical meditation on the new medium:

Non-animated image of René Magritte’s painting “The Treachery of Images” overlaid the the “gif” symbol with the caption “Ceci n’est pas un gif.”

Non-animated image of René Magritte’s painting “The Treachery of Images” overlaid the the “gif” symbol with the caption “Ceci n’est pas un gif.”



What The Digamma!

21 March 2016

Jay Dillon found what may possibly be the earliest known use of what the fuck, which he has posted to Facebook. Dillon discovered the following poem, which was written by Joseph Dunn Lester and appeared in the 1881 Prize Translations, Poems, and Parodies. Jesse Sheidlower’s The F Word, the go-to resource for all fucking things, has the earliest use of what the fuck in Henry Miller’s 1942 Roofs of ParisWho the fuck and where the fuck appear in the 1930s. And there is a 1903 citation of what the puck. The abbreviation WTF is recorded in 1985. So this poem, if it is indeed a use of the phrase, would be a significant antedating.

But the phrase is not clearly spelled out. Like many early uses of fuck, it’s encoded. The poem reads:

Διος Ομηρος (The God Homer)

Polyphloisboisteros Homer of old
Threw all his augments into the sea,
Though he’d been firmly but courteously told,
Perfect imperfects begin with an E.

“What the digamma, does any one care!”
The Poet replied with a haughty stare,
And he sat him down by the wine-dark sea,
To write a fresh book of the Odyssey.

digamma is an archaic letter of the Greek alphabet that resembles the modern Latin letter F. (It had a sound value of /w/.) So the relevant the line could be read as “What the F, does any one care!” It seems likely the poem is a bit of an inside joke. Lester slipped a vulgar expression past the editors knowing that only those who knew Homeric Greek would get the joke. But this would mean that the phrase what the fuck was in use in 1881. That stretches credulity a bit, but not to the breaking point. Fuck was such a taboo word that there are very few printed uses of it in the nineteenth century, so its absence from the printed record is not evidence of its absence in spoken language.

If it is indeed an instance of what the fuck, it was hiding in plain sight. The poem was once well known, at least in certain rarefied circles. The word polyphloisboisteros even has an entry in the OED, with a first citation being from this poem. (It means noisy, boisterous.)

Ben Zimmer has a longer explanation, including a juicy tidbit about Charles Dodgson’s (a.k.a. Lewis Carroll) reaction to the poem, in the Strong Language blog.

English Time Machine

16 March 2016

This video isn’t terrible, but it’s really too simplistic to be useful:

It’s got some problems. It conflates poetic language and everyday speech. It’s focused on the London dialect, as if all English speakers spoke the same way. Some of the “unknown” words are hardly unknown today (abbess, anyone?) The discussion of the Great Vowel Shift is wrong in the timeline; it was pretty much over well before Shakespeare’s day. They don’t attempt to replicate pronunciation until they hit Chaucer; I’m pretty sure that Defoe didn’t sound like the actor who is reading from Robinson Crusoe.

And, while this really isn’t a mistake, you don’t have to go back in time to find hard-to-understand English. There’s a wealth of diversity in English spoken around the world today.

It’s a shame really. The video is very well produced, so why didn’t they take the time to get the subtleties right? It wouldn’t have been difficult.