Classifying Human Knowledge, Part 2

8 September 2006

Last week we looked at the Dewey Decimal and the Cutter Expansive Classification systems for organizing books. The other major system in use by English language libraries is the Library of Congress Classification or LCC system. The LCC is used and maintained, obviously, by the Library of Congress in Washington, DC and is also used by most of the larger libraries in the United States, including most university and research libraries.

Originally designed by Herbert Putnam in 1897, the LCC replaced Thomas Jefferson’s classification system in the Library of Congress. The top-level hierarchies are based on Cutter’s classification system, but the rest of the LCC system differs markedly from Cutter’s.

The great advantage of the LCC is that usually the same number, the one assigned by the Library of Congress, is used by all other libraries, so finding a book across libraries is easier. Its classification of technical and scientific subjects is also well regarded.

But it does have some distinct disadvantages. Unlike Dewey, the subcategories are not consistent. Each major category is subdivided on its own without consideration of what designations are used in other categories. It is also designed with the needs of the U.S. Congress in mind. So categories are often broken out geographically, even when this doesn’t make much sense for many researchers.

The LCC divides all human knowledge into 20 major categories, each one given a letter:

A – General works
B – Philosophy, psychology, religion
C – Auxiliary sciences of history (e.g., archeology, heraldry, geneology, biography, diplomatic history)
D – History, general and Old World
E & F – American history
G – Geography, anthropology, recreation
H – Social sciences
J – Political science
K – Law
L – Education
M – Music
N – Fine arts
P – Language and literature
Q – Science
R – Medicine
S – Agriculture
T – Technology
U – Military science
V – Naval science
Z – Bibliography, information resources

You can see the government influence in the breakout of law, agriculture, military science, and naval science as top level domains.

Second Tier domains are designated with a second letter. The breakout for category P, language and literature, is as follows:

P – Philology, linguistics
PA – Greek & Latin
PB – Modern & Celtic
PC – Romanic
PD – East & North Germanic
PE – English
PF – Other West Germanic
PG – Slavic
PH – Uralic
PJ – Semitic
PK – Indo-Iranian
PL – East Asian, African, Polynesian
PM – Native American & artificial
PN – General literature
PQ – Romance literature
PR – English literature
PS – American literature
PT – Other Germanic literature
PZ – Fiction and children’s literature

These second-tier categories are further subdivided into categories designated with numbers. The PE subclass is broken out as follows:

PE101-458 – Old English
PE501-693 – Middle English
PE814-896 – Early Modern English
PE1001-1693 – Modern English
PE1700-3602 – Dialects
PE3701-3729 – Slang

These numbers are followed by a Cutter number for the author’s last name and they year of publication. The LCC number for Word Myths is PE1584 .W55 2004.

A high-level outline of the LCC is available here. (I must go off on a rant here. The Library of Congress charges many hundreds of dollars for access to the complete classification system. While OCLC (Dewey) and the UDC consortiums also charge, these are non-profits and the fees fund the continued maintenance of the systems. The LCC, however, is paid for by the U.S. taxpayer and should be available for free to all who ask. This is unconscionable.)

So far the three systems we’ve examined have represented a single ontological methodology. They are all designed to organize books on the shelves of a library. They are all designed to group like things in a category that is assumed to be useful to researchers. The aim is to place books on the same topic in proximity to one another so that researchers can scan the shelves of the appropriate section for relevant books.

The great advantage of this system is that the collection can grow without reorganizing the catalog. As shelf space in a particular section is used up, the books in that section can be moved without all the other categories changing or moving–all that needs to be changed is the map showing where each section is. It’s a great methodology if your intent is to catalog physical items that can be located in only one place–like books.

There are some obvious drawbacks. One is that most books can be classified into multiple categories. Is T.S. Eliot an American or British poet? Does Cassell’s Dictionary of Slang belong with the dictionaries or with the books on slang? Why is the book Marley & Me, about an unruly Labrador retriever, filed under animal husbandry? If you think about it, dogs are domestic animals, but how many people are actually going to think to look under this category for a book about a house pet?

Another disadvantage is that the categories that are useful change over time. Why does the Library of Congress classify Bantu and Mandarin in the same category (PL)? Because when the categories were created, there weren’t many books in the Library’s collection in those languages. Also, the Library of Congress has a category on East Germany . This was once very useful. But now, aside from books about the period from 1945-89, it’s not terribly relevant. Do we continue to file books about the region in this category?

But most of all, in a digital age such a system isn’t that important. If your "books" are electronic, who cares where they’re "shelved." You can create "virtual shelves" of material as users demand.

One method that is increasingly being used to categorize electronic resources is tagging. It’s not a formal system and information is not classified in advance. Instead, people assign tags to a resource as they go. An electronic source about baseball slang, for example, could be tagged with terms like baseball, slang, language, sports, American sports, etc. There are no restrictions on what tags can be used or on how many can be assigned to each resource.

The idea is that if enough tags are assigned by a number of different people, then patterns will begin to emerge. The baseball slang resource will appear in searches for both baseball and language. The advantage is that since the information can be classified in an unlimited number of categories the problem of looking in the wrong place goes away. With enough tags, you will find the book no matter what search path you take.

What would seem to be an obvious criticism of tagging, that there will be little consistency between how different people apply tags, actually turns out to be a great advantage. One person may tag works as being about cinema, while another uses movies. Wouldn’t it be better to use a single tag instead of both of these? Or would it? Aren’t cinema and movies really two similar, but distinctly different categories? In one you have The Bicycle Thief, in the other you have Titanic. There will be some overlap of course, with some using the same tag for both. But by ordering search results by "relevance," the person searching under cinema will be presented with The Bicycle Thief first, with Titanic being way down on the list. Someone searching on movies will get the opposite result.

And tagging makes virtual shelves possible. In the older systems, classifying 20th century American authors would be alphabetical. So Robert Heinlein would be interposed between Zane Grey and Larry McMurtry. Wouldn’t it be better to have Heinlein classified with Isaac Asimov in Sci Fi, and group Grey and McMurtry with the other Westerns? Tags let you organize 20th century American authors alphabetically or by genre, whatever your needs are at any particular moment.

Tagging is not restricted to online resources. One can also use it to classify books and other physical objects. In such a case, each book must be assigned a unique identifier (e.g., ISBN). This ISBN can then be used to locate the physical object when needed. This does not shelve these books in same place as other books on the same topic, but it is not a disadvantage in a closed stack library where books are brought to the researchers.

An example of a catalog that uses tagging is www.librarything.com, which brings us back full circle, for it was that web site that got me thinking about cataloguing. Another example is del.icio.us, a site that catalogs web bookmarks.

To read an excellent discussion of ontological methodologies and a more complete description of tagging, check out Clay Shirky’s Ontology is Overrated.