How Many Words In The English Language?

This question is only tangentially related to word and phrase origins, but enough people ask it that I thought I’d provide a permanent answer.

This is an indeterminate question. First, there is the problem of what exactly is a word. Are mouse, mice, mousy, and mouselike separate words or just forms of one root word? Is a computer mouse the same word as the rodent? (To demonstrate the difficulty in counting words, over the centuries many scholars have attempted to count how many different words Shakespeare used in his corpus of work. The counts run anywhere from 16,000 to 30,000.)

Second, unlike French, English has no official body to determine what is proper and what is not. English dictionaries are (usually) descriptive in nature, not prescriptive. That is they describe how the language exists and is used, they do not prescribe its use. Just because a word “is not in the dictionary,” doesn’t mean that it is not a legitimate word. It simply means the dictionary editors omitted it for one reason or another.

The Oxford English Dictionary, the largest English-language dictionary, contains some 290,000 entries with some 616,500 word forms in its second edition. Of course, there are lots of slang and regional words that are not included and the big dictionary omits many proper names, scientific and technical terms, and jargon as a matter of editorial policy (e.g., there are some 1.4 million named species of insect alone). All told, estimates of the total vocabulary of English start at around three million words and go up from there.

Of these, about 200,000 words are in common use today. An educated person has a vocabulary of about 20,000 words and uses about 2,000 in a week’s conversation. (These estimates vary widely depending on who is doing the counting, so don’t take them as absolute.)

A (Very) Brief History of the English Language

Indo-European and Germanic Influences

English is a member of the Indo-European family of languages. This broad family includes most of the European languages spoken today. The Indo-European family includes several major branches:

  • Latin and the modern Romance languages;
  • The Germanic languages;
  • The Indo-Iranian languages, including Hindi and Sanskrit;
  • The Slavic languages;
  • The Baltic languages of Latvian and Lithuanian (but not Estonian);
  • The Celtic languages; and
  • Greek.

The influence of the original Indo-European language, designated proto-Indo-European, can be seen today, even though no written record of it exists. The word for father, for example, is vater in German, pater in Latin, and pitr in Sanskrit. These words are all cognates, similar words in different languages that share the same root.

Read the rest of the article...

Common Errors in Etymology

The fact is, man is an etymologizing animal. He abhors the vacuum of an unmeaning word. If it seems lifeless, he reads a new soul into it, and often, like an unskillful necromancer, spirits the wrong soul into the wrong body.

—The Reverend A. Smythe Palmer, Folk-Etymology, 1890

Popular wisdom often adopts tales about the origins of words. Sometimes these tales are correct, but more often they are not. These popular etymologies suffer from several recurring errors.

Superficial Resemblances

“Because two words look the same, they must come from the same source.” This is a very common error. To be fair, many, if not most, similar words are etymologically related, but this is not always the case. In fact, there are so many exceptions to this rule, that it is a very poor guide.

Read the rest of the article...

Etymological Tools

What are the various tools and methods for finding and validating the origins of words and phrases? Where can one go to find information and common mechanisms of word and phrase formation.

Standards of Evidence

By far, the most common methodological error I find that amateur word sleuths make regards standards of evidence. Someone comes up with a hypothesis that sounds plausible, but fails to back it up with any evidence. Explanations for words or phrases like the whole nine yards are interesting, but without actual evidence to support them they cannot be considered correct.

The gold standard for etymological evidence is a citation of usage. You must find a written instance of the word being used that is both earlier than any other known usage and that supports the hypothesis. If for instance you want to prove that the whole nine yards is a reference to the length of a formal Scottish kilt, you need to find someone using the phrase prior to the mid-1960s in reference to kilts, or since all the early known uses are American, at least from a Scottish source.

Read the rest of the article...

Methods of Word Formation

The formation of new words is not a random process. There are several distinct patterns and paths by which new words come into the language. These fall into four major categories, which in turn can be subdivided.

New Root Formation

Root formation is the creation of new words. Roots come from one of three sources: they are inherited from Old English, they are adopted from other languages, or they are coined by an individual to express a particular idea.

Inherited roots are those native to the language, dating back to antiquity. For English, this means roots that are found in Old English. Of all the words in the English language, relatively few (one or two percent) have Old English roots. But this number is deceptive. About half of the most commonly used words are Old English in origin. So the bulk of everyday English speech and writing is rooted in Old English.

Read the rest of the article...
Powered by ExpressionEngine
Copyright 1997-2019, by David Wilton