Metadata Collections & QA
Skip navigation Digital Library for Earth System Education
Digital Library for Earth System Education
Search tips

Controlled vocabulary/terminology concepts

What is a controlled vocabulary/terminolgy?

In DLESE, controlled vocabularies are words or phrases that are acceptable values for completing certain metadata fields. That is, they are terms with a definition. This is a simple approach to controlled vocabularies. In general, controlled vocabularies are more complex in organization and in the actual terms that are part of the controlled vocabulary. This means the following are controlled vocabularies (listed in order of complexity):

Type of controlled vocabulary/terminology
Complexity
Relationships
  • Flat term lists
  • Hierarchal term lists
  • Synonym ring
  • Authority files
  • Alphanumeric classification schemes
  • Facet controlled vocabularies
  • Thesauri
  • Ontologies

Simple

to

Complex

Equivalence
to
Hierarchical
to
Associative
to
(Roles)

Source: Original figure (Morville and Rosenfeld, 2002) modified by Anita Coleman, Univ. of AZ

This document defines these different controlled vocabularies and addresses the following:

  • Controlled vocabulary definitions
  • What can controlled vocabularies do for DLESE?

Flat term lists

A flat term list controlled vocabulary is a list of terms with no ordering implied. Good vocabularies include definitions and attribution if necessary. Each term is easily differentiated from each other term (no overlap). Terms are on the same scale, that is one term won't encompass other terms. For example:

  • goldfish - a small usually golden yellow or orange cyprinid fish (Carassius auratus) often kept as an aquarium and pond fish
  • catfish - any of an order (Siluriformes) of chiefly freshwater stout-bodied scaleless bony fishes having long tactile barbels
  • cow - a domestic bovine animal regardless of sex or age
  • horse - a large solid-hoofed herbivorous mammal (Equus caballus, family Equidae, the horse family)

Top

Hierarchal term lists

A hierarchal term list controlled vocabulary is a list of terms that are grouped to imply a certain order or method of organization. There are parent and child relationships between terms. Terms within each group are differentiated from each other and all child terms across group are generally on the same scale. Again, good vocabularies include definitions and attribution if necessary. For example:

FISH - aquatic animals

  • goldfish
  • catfish

MAMMALS - any of a class (Mammalia) of warm-blooded higher vertebrates (as placentals, marsupials, or monotremes) that nourish their young with milk

  • cow
  • horse

This type of controlled vocabulary works best in domains that are well defined and need to be mapped with explicit relationships (e.g. classifying animals by kingdom, phylum, class etc.) They can be used in domains that are less well-defined if explicit relationships exist or are needed.

Synonym Rings

A synonym ring extends controlled vocabulary term lists by providing additional terms that are equivalent to a term in the list. For example:

Synonym Rings
Meteorology, Weather, Atmospheric science
Ultra-violet radiation, UV, ultraviolet radiation,

This means searches that include meteorology will expand to include the words weather and atmospheric science as well. The result will be based on all three words. Search systems can be controlled to use or not to use synonym rings. If the synonym ring is in effect, then its probably used whenever there are synonyms for a term.

The challenge in creating synonym rings is deciding what constitutes a synonym. From the the article, Synonym Rings and Authority Files, synonyms can be

  • Two words with the exact or very similar meanings
  • Acronyms: BBC, British Broadcasting Company; MPG, miles per gallon
  • Variant spellings: cancelled, canceled; honor, honour
  • Scientific terms versus popular use terms: acetylsalicylic acid, aspirin; lilioceris, lily beetle

Top

Authority Files

From the article, Synonym Rings and Authority Files: "An authority file is similar to the synonym ring, with the addition of one type of term relationship. Instead of all of the terms being equal, one term is identified as the preferred term and the others are considered variant terms."

Preferred Term
Other Terms
Meteorology Weather, Atmospheric science
Uultraviolet radiation UV, Ultra-violet radiation

In a controlled vocabulary, it looks like this:

Meteorology
USE FOR: Weather, Atmospheric science

Weather
USE Meteorology

Atmospheric science
USE Meteorology

If one is cataloging using authority files, the metadata record stores the preferred term. If one is searching, authority files kick like synonym rings above.

Classification schemes

Classification systems are codes (letters and/or numbers) that represent controlled vocabulary terms.

Examples include:

  • Dewey Decimal System
  • Library of Congress System
  • AAAS Benchmarks

In the Dewey Decimal System, the number 822 stands for English Drama. Often these classification systems are hierarchical in nature or have a lot of words and phrases associated with them. Thus, the use of a code is easier to use in metadata records. A code list that says what the code means in terms of words or phrases is required.

Top

Faceted controlled vocabularies

Faceted controlled vocabularies are generally mutually exclusive concept bins that capture an essential characteristic about a resource. For example, facets for restaurants might be US state, restaurant type and cost:

US states (Iowa, Illinois, Indiana, etc.)
Restaurant type (Steakhouse, Italian, Mexican, etc.)
Cost (high, low, medium)

The terms allowed in each facet may come from flat term lists, hierarchies, synonym rings, authority files, classification schemes or some other well defined domain. The idea behind facet is to provide different avenues into understanding a resource. Resource providers only need to associate a resource with relevant facets and searchers only need to select from facets of interest. There are many possible implementation methods for facets. Some include XFML, RDF and OWL.

Thesauri

Thesauri are controlled vocabularies networked together by relationships between terms and often indicate preferred and variant terms. Relationships are defined as follows:

  • Equivalence: synonym of terms and the ability to suggest which term is the preferred term
  • Hierarchy: reflect hierarchy (spatial, conceptual or terminological) of terms by showing how terms are linked to other terms. These links are often shown by the items belonging to a broad class (CL) and defining broad and narrow terms for the item (abbreviated as BT and NT).
  • Associative: a method to indicate relationships across hierarchies. This is expressed as related terms (abbreviated as RT).
  • Scope Notes: defines term or breadth of term and its usage (abbreviated as SN).

An example thesaurus entry might be:
Subtropical High Pressure Belt:
SN: includes all ocean basins and the northern and southern hemisphere
CL: Global Circulation
NT: Subtropical High
RT: Bermuda High, Pacific High

Thesauri require more work to create and may include facets.

Top

Ontologies

Ontologies describe concepts and relationships in programmatic ways and enable arbitrary relationships. Because ontologies describe concepts and relationships, they are somewhat like thesauri but different because there are often no preferred terms and the concepts and relationships are described in machine readable ways. This supports semantic interoperabilty. Thus, ontologies generally adhere to one of the standard XML-based languages accepted by the W3C (RDF and OWL).

Ontologies enable child terms to inherit all properties of their parents. This enables knowledge reuse and scalable knowledge construction, as it is not necessary to redefine concepts already defined at higher levels. It is easy declare how child concepts differ from their parents. Finally, ontologies support multiple inheritance, so that compound concepts can be created.

Ontologies support "faceted" classification, where a resource is classified independently on multiple characteristics. Also, resources can be classified at any level of abstraction, so that future divisions of a class into subclasses does not mean the parent class disappears.

Controlled vocabulary definitions

Definitions provide the contexts for understanding controlled vocabularies. Terms in controlled vocabularies should be defined in order to promote consistent use. The Open Forum on Metadata Registries provides tips on writing vocabulary definitions.

Top

What can controlled vocabularies do for DLESE?

Controlled vocabularies can help in the recall versus precision dilemma (i.e. if a lot of relevant information is missed, its poor recall; if you get flooded with a lot of irrelevant information then its poor precision). The whole idea is to provide some mechanism for querying multiple resources simultaneously and provide some commonality of description across the resources being made available for searching.

Controlled vocabularies can also assist in cataloging resources. For example, if you were cataloging resources that required a user to have the plug-in NIH Image, it's annoying for a cataloger to type this information into every metadata record being created. It would much easier to choose this piece of information from a list. For this example, NIH Image is a term on the DLESE controlled vocabulary for technical requirements.

Controlled vocabularies can also help with navigation. While such lists may or may not be part of any metadata record per se, they help organize a website if everyone uses the same terms everywhere and for drop down menus. In the search area, DLESE's drop down menus of resource type, grade level, collections and standards are controlled vocabularies that are part of the metadata. The function of browse resources by subject also uses a DLESE controlled vocabulary.

While vocabularies can help in searching, browsing, cataloging and navigating, they are only one means of helping to make these library services function. Sometimes a controlled vocabulary is not a good solution. Other times it may be. More powerful is the combination of well-constructed controlled vocabulary combined with DLESE indexing capabilities of all metadata fields and resource content.

Top

More information

All About Facets & Controlled Vocabularies, Karl Fast, Fred Leise and Mike Steckel, Boxes and Arrows, Dec 9, 2002

What Is A Controlled Vocabulary?, Karl Fast, Fred Leise and Mike Steckel, Boxes and Arrows, Dec 16, 2002

Creating A Controlled Vocabulary, Karl Fast, Fred Leise and Mike Steckel, Boxes and Arrows, Apr 7, 2003

Controlled Vocabularies: A Glosso-Thesaurus, Karl Fast, Fred Leise and Mike Steckel, Boxes and Arrows, Aug 26, 2003

Using controlled vocabularies to improve findability, Christina Wodtke, Digital Web Magazine, Aug 13, 2002

What are the differences between a vocabulary, a taxonomy, a thesaurus, an ontology, and a meta-model?, Johannes Ernst, metamodel.com

I say what I mean, but do I mean what I say?, Paul Miller, Publication Date: 22-Mar-2000 Publication: Ariadne Issue 23 Originating URL: Interoperability Focus UKOLN: http://www.ariadne.ac.uk /issue23/metadata/intro.html Copyright and citation information for the article.

Understanding Metadata, National Information Standards Organization (NISO), 2004

Synonym Rings and Authority Files, Karl Fast, Fred Leise and Mike Stecke, Boxes and Arrows

Information Architecture for the World Wide Web, Rosenfeld and Morville, 2nd. Edition, 2002

Rob Raskin, NASA JPL: personal communication Sep. 2004

Top
Last updated: 09-17-04
Maintained by: Katy Ginger (support@dlese.org), DLESE Metadata