Ur-etyma Count — Wordorigins.org

7 July 2014

Linguist Victor Mair has a post over at Language Log on how many root-concepts appear in Proto-Indo-European and other language families. Mair concludes that there are only some 1,200–1,500 root concepts in each of the language families that are used to form other words.

There are a lot of problems with the data, which Mair acknowledges, primarily that the number can vary wildly depending on how one defines what a root-concept is, not to mention that any detailed work on proto-languages is highly speculative. He is conducting a “back of the envelope” estimate, and we must be careful not to put too much faith in the data. Still, it’s an interesting result.

The 1,200–1,500 figure is a plausible one. There is an upper limit to the number of roots and words that can exist in a language at any given time—our human brains are limited. And that number of roots seems about right to support the standard vocabulary counts we find in languages.

I’d like to see someone do a similar, albeit more rigorous, study using modern languages.