Controlled vocabulary/terminology concepts
What is a controlled vocabulary/terminolgy?
In DLESE, controlled vocabularies are words or phrases that are acceptable values for completing certain metadata fields. That is, they are terms with a definition. This is a simple approach to controlled vocabularies. In general, controlled vocabularies are more complex in organization and in the actual terms that are part of the controlled vocabulary. This means the following are controlled vocabularies (listed in order of complexity):
Source: Original figure (Morville and Rosenfeld, 2002) modified by Anita Coleman, Univ. of AZ
This document defines these different controlled vocabularies and addresses the following:
Flat term lists
A flat term list controlled vocabulary is a list of terms with no ordering implied. Good vocabularies include definitions and attribution if necessary. Each term is easily differentiated from each other term (no overlap). Terms are on the same scale, that is one term won't encompass other terms. For example:
Hierarchal term lists
A hierarchal term list controlled vocabulary is a list of terms that are grouped to imply a certain order or method of organization. There are parent and child relationships between terms. Terms within each group are differentiated from each other and all child terms across group are generally on the same scale. Again, good vocabularies include definitions and attribution if necessary. For example:
FISH - aquatic animals
MAMMALS - any of a class (Mammalia) of warm-blooded higher vertebrates (as placentals, marsupials, or monotremes) that nourish their young with milk
This type of controlled vocabulary works best in domains that are well defined and need to be mapped with explicit relationships (e.g. classifying animals by kingdom, phylum, class etc.) They can be used in domains that are less well-defined if explicit relationships exist or are needed.
A synonym ring extends controlled vocabulary term lists by providing additional terms that are equivalent to a term in the list. For example:
This means searches that include meteorology will expand to include the words weather and atmospheric science as well. The result will be based on all three words. Search systems can be controlled to use or not to use synonym rings. If the synonym ring is in effect, then its probably used whenever there are synonyms for a term.
The challenge in creating synonym rings is deciding what constitutes a synonym. From the the article, Synonym Rings and Authority Files, synonyms can be
From the article, Synonym Rings and Authority Files: "An authority file is similar to the synonym ring, with the addition of one type of term relationship. Instead of all of the terms being equal, one term is identified as the preferred term and the others are considered variant terms."
In a controlled vocabulary, it looks like this:
If one is cataloging using authority files, the metadata record stores the preferred term. If one is searching, authority files kick like synonym rings above.
Classification systems are codes (letters and/or numbers) that represent controlled vocabulary terms.
In the Dewey Decimal System, the number 822 stands for English Drama. Often these classification systems are hierarchical in nature or have a lot of words and phrases associated with them. Thus, the use of a code is easier to use in metadata records. A code list that says what the code means in terms of words or phrases is required.
Faceted controlled vocabularies
Faceted controlled vocabularies are generally mutually exclusive concept bins that capture an essential characteristic about a resource. For example, facets for restaurants might be US state, restaurant type and cost:
US states (Iowa, Illinois, Indiana, etc.)
The terms allowed in each facet may come from flat term lists, hierarchies, synonym rings, authority files, classification schemes or some other well defined domain. The idea behind facet is to provide different avenues into understanding a resource. Resource providers only need to associate a resource with relevant facets and searchers only need to select from facets of interest. There are many possible implementation methods for facets. Some include XFML, RDF and OWL.
Thesauri are controlled vocabularies networked together by relationships between terms and often indicate preferred and variant terms. Relationships are defined as follows:
An example thesaurus entry might be:
Thesauri require more work to create and may include facets.
Ontologies describe concepts and relationships in programmatic ways and enable arbitrary relationships. Because ontologies describe concepts and relationships, they are somewhat like thesauri but different because there are often no preferred terms and the concepts and relationships are described in machine readable ways. This supports semantic interoperabilty. Thus, ontologies generally adhere to one of the standard XML-based languages accepted by the W3C (RDF and OWL).
Ontologies enable child terms to inherit all properties of their parents. This enables knowledge reuse and scalable knowledge construction, as it is not necessary to redefine concepts already defined at higher levels. It is easy declare how child concepts differ from their parents. Finally, ontologies support multiple inheritance, so that compound concepts can be created.
Ontologies support "faceted" classification, where a resource is classified independently on multiple characteristics. Also, resources can be classified at any level of abstraction, so that future divisions of a class into subclasses does not mean the parent class disappears.
Controlled vocabulary definitions
Definitions provide the contexts for understanding controlled vocabularies. Terms in controlled vocabularies should be defined in order to promote consistent use. The Open Forum on Metadata Registries provides tips on writing vocabulary definitions.
What can controlled vocabularies do for DLESE?
Controlled vocabularies can help in the recall versus precision dilemma (i.e. if a lot of relevant information is missed, its poor recall; if you get flooded with a lot of irrelevant information then its poor precision). The whole idea is to provide some mechanism for querying multiple resources simultaneously and provide some commonality of description across the resources being made available for searching.
Controlled vocabularies can also assist in cataloging resources. For example, if you were cataloging resources that required a user to have the plug-in NIH Image, it's annoying for a cataloger to type this information into every metadata record being created. It would much easier to choose this piece of information from a list. For this example, NIH Image is a term on the DLESE controlled vocabulary for technical requirements.
Controlled vocabularies can also help with navigation. While such lists may or may not be part of any metadata record per se, they help organize a website if everyone uses the same terms everywhere and for drop down menus. In the search area, DLESE's drop down menus of resource type, grade level, collections and standards are controlled vocabularies that are part of the metadata. The function of browse resources by subject also uses a DLESE controlled vocabulary.
While vocabularies can help in searching, browsing, cataloging and navigating, they are only one means of helping to make these library services function. Sometimes a controlled vocabulary is not a good solution. Other times it may be. More powerful is the combination of well-constructed controlled vocabulary combined with DLESE indexing capabilities of all metadata fields and resource content.
All About Facets & Controlled Vocabularies, Karl Fast, Fred Leise and Mike Steckel, Boxes and Arrows, Dec 9, 2002
What Is A Controlled Vocabulary?, Karl Fast, Fred Leise and Mike Steckel, Boxes and Arrows, Dec 16, 2002
Creating A Controlled Vocabulary, Karl Fast, Fred Leise and Mike Steckel, Boxes and Arrows, Apr 7, 2003
Controlled Vocabularies: A Glosso-Thesaurus, Karl Fast, Fred Leise and Mike Steckel, Boxes and Arrows, Aug 26, 2003
Using controlled vocabularies to improve findability, Christina Wodtke, Digital Web Magazine, Aug 13, 2002
What are the differences between a vocabulary, a taxonomy, a thesaurus, an ontology, and a meta-model?, Johannes Ernst, metamodel.com
I say what I mean, but do I mean what I say?, Paul Miller, Publication Date: 22-Mar-2000 Publication: Ariadne Issue 23 Originating URL: Interoperability Focus UKOLN: http://www.ariadne.ac.uk /issue23/metadata/intro.html Copyright and citation information for the article.
Understanding Metadata, National Information Standards Organization (NISO), 2004
Synonym Rings and Authority Files, Karl Fast, Fred Leise and Mike Stecke, Boxes and Arrows
Information Architecture for the World Wide Web, Rosenfeld and Morville, 2nd. Edition, 2002
Rob Raskin, NASA JPL: personal communication Sep. 2004
Last updated: 09-17-04