Metadata Collections & QA

Vocabulary management process


The goals of the vocabulary management process are:

  • Provide a method for vocabulary development within DLESE metadata frameworks
  • Track vocabulary terms over time
  • Track most recent vocabulary definitions (definitions are not tracked over time)
  • Make controlled vocabularies available to other DLESE systems
  • Make vocabulary definitions, cataloging best practices and technical documentation available to DLESE systems (development in progress)


To accomplish the goals of vocabulary management, the DPC (DLESE Program Center) is responsible for four primary management functions. This does not mean the DPC does all the work associated with each management function. Rather various DLESE Core Services and community input is sought throughout the process.

  1. Need determination - decide if a new vocabulary is needed for a particular metadata framework
  2. Development and buy-in - decide upon vocabulary terms, write definitions, define parameters of use of the terms and, if needed, select user interface labels and maintain attribution
  3. Metadata framework implementation - integrate the vocabulary into the appropriate metadata framework; issue a new version of the metadata framework; this involves XML schema work and impacts all metadata records in the library
  4. Vocabulary/Metadata Manager/GUI implementation - enter vocabulary terms, definitions, cataloging best practices, technical documentation and GUI labels in the Vocabulary Manager; the vocabulary manager makes this information (terms only at the moment) available to other DLESE systems

Each of these management functions are described in detail next.


1. Need determination

DLESE metadata frameworks generally adopt controlled vocabularies when

  • Consistent data is required for library browsing or searching
  • New frameworks or new framework fields are developed and may warrant a vocabulary
  • A significant number (50% or greater) of metadata records across DLESE collections would benefit from a proposed specialized vocabulary
  • A vocabulary is not meeting the original need intent and therefore needs modification (i.e. the current technical requirement vocabulary needs adjustment because it is 5 years old and is out of date)

This part of the process may take hours to months in order to demonstrate a need.

2. Development

In developing vocabulary terms, definitions, cataloging best practices, technical documentation and user interface labels, any of the following procedures, or a combination thereof, may be used:

  • Adopt an existing vocabulary and its definition (e.g. National Science Education Standards (NSES))
  • Adopt some parts of an existing vocabulary and merge with new terms deemed missing (example teaching method uses GEM terms and DLESE developed terms)
  • Develop vocabularies from scratch (e.g. the subject terms resulted mainly from the AGU Dec.2000 meeting)

Vocabularies should be as simple as possible while meeting user needs, promoting naturalness of language and being informed by research on user behaviors and digital library development.

An ongoing DLESE vocabulary development issue has been implementing a well-defined Earth system vocabulary. Eventually, the library will include an Earth system vocabulary and the operational framework that explicates the library scope and balance. These library capabilities should ensure high quality and consistency in accessioning, cataloging, metadata, interoperability and the user experience.


The following is an example entry of the information that is needed for each term in a vocabulary:

Table 1: Vocabulary Information to Collect

Information to Complete Example
Term Teaching tip
Term definition Information for using a resource within a certain teaching or learning environment but generally not challenging or non-typical environments
Cataloging best practice - what it applies to and doesn't apply to Do not use this term when describing challenging teach or learning situations, use the term 'Information on challenging teaching and learning situations'
User interface label (definitions and labels) Teaching tip (same label for all DLESE interfaces)
Attribution Katy Ginger, Metadata Architect, DLESE Program Center; (July 2003)
Related terms  
Higher order term  
Broader term  
Narrower term  
Metadata framework to which it applies Annotation framework
Metadata path and field to which it applies item.type

As part of the development process, terms and definitions usually go through many iterations before a final vocabulary is decided upon. This part of the process may take anywhere from 6-24 months to complete.


3. Metadata framework implementation

This part of the vocabulary management process involves implementing the decided upon vocabulary (terms only not definitions) in the appropriate DLESE metadata framework using XML schemas. Once a vocabulary is made into an XML schema, it becomes valid content for metadata records when a new version of the metadata framework is issued. The actual schema creation may take anywhere from a couple hours to a few days to complete but its implications on other DLESE systems is huge and it may take months for metadata records and DLESE systems to be in sync with the new schema.


4. Metadata/Vocabulary Manager

Once a vocabulary is in an XML schema, it is ready to be put in the Metadata/Vocabulary Manager so that it becomes available to other DLESE systems (e.g. discovery). This is a labor intensive, non-automatic process and is currently undergoing re-development. The existing system is call the Vocabulary-UI Manager and the developing system is called the Metadata-UI Manager. Before describing the process of getting information into these systems, the two manager systems are compared and contrasted below.

Table 2: Comparison of Managers

Manager Tracking Functions
Vocabulary-UI Manager
Metadata-UI Manager
Vocab terms through time
Current vocab definitions  
Vocab definitions through time    
Definition attribution  
UI labels for vocab terms
UI labels for any metadata field (regardless of whether there is a vocab or not)  
UI layout for vocab terms
UI layout for any metadata field (regardless of whether there is a vocab or not)  
Technical documentation for any metadata field  
Near terms, broader terms, related terms    
Scope notes    
Cataloging best practices for any metadata field through time  
Metadata framework to which the vocab applies and the tag path  
Definition of the metadata field being completed  
Information for vocabs that are hierarchies  

To enter information into the manager systems, the DPC metadata working group does the following (in detail).

  1. For the particular metadata field, create a management XML schema. This management XML schema calls the appropriate metadata framework XML schemas and, if need be, the controlled vocabulary XML schema made in section 3 above. After these schema calls, the management XML schema creates the two XML documents used by the manager systems.
  2. The first XML document is called the metadata fields/terms XML file. It holds
    • Definition of the particular metadata field (developing manager system only)
    • Framework and tag path of the particular metadata field (developing manager system only)
    • Vocabulary terms and definitions
    • Definition attribution (developing manager system only)
    • Information for vocabs in hierarchies (developing manager system only)
    • Cataloging best practices (developing manager system only)
    • Technical documentation (developing manager system only)
  3. The second XML document is call the ui/groups XML file. It holds
    • UI labels for vocab terms
    • UI labels for any metadata field (developing manager system only)
    • UI layout for vocab terms
    • UI layout for any metadata field (developing manager system only)

  4. Table 3 below shows the two files in action. Column 1 contains the metadata vocabulary terms (from the fields/terms XML files) and columns 2 and 3 show how this these terms may look in the cataloging and discovery interfaces, respectively (from the ui/groups XML files)

    Table 3: Vocab Terms Versus User Interface Labels
    Metadata Vocabulary Terms Cataloging System: Default User Interface Discovery System: Default User Interface
    Primary elementary Primary elementary (K-2) Primary (K-2)
    Learning material: Virtual field trip Learning material: Virtual field trip For the classroom: Field trip - virtual

  5. Once the fields/terms and ui/groups XML files are made, they are ingested in the manager system. During the ingest process, the manager system verifies consistency between the two sets of files (vocab matching). However, the there is a critical missing link that neither the current nor the developing manger system addresses yet. There is no comparison to verify that all terms in the metadata framework vocabulary schema (from section 3 above) are represented in the fields/terms and the ui/groups XML files. This is problematic because metadata records may use a vocabulary term that the metadata framework says is valid but that DLESE systems (using the manager system) do not recognize because the term never made it into the manager system through the fields/terms and ui/groups XML files.
  6. Repeat ingest into the manager system until no errors exist and labels are as desired.
  7. Check all files into CVS and propagate to live DLESE systems.


The workflow surrounding vocabulary development is labor intensive and time consuming. But if resources are to be discoverable by a certain vocabulary, metadata records with the vocabulary must exist. This means the vocabulary must be allowed in the appropriate metadata framework and if possible supported in cataloging tools. For comparison, the timeline and process for putting the National Science Education Standards (NSES) into the DLESE ADN metadata framework and seeing resources with them in search results is described below:

  1. Need determined at the Portal to the Future Workshop in August 1999
  2. Develop the vocabulary, its definitions and parameters of use: Ordinarily this process can take anywhere from 6-24 months or longer depending on the vocabulary. This process was not required for NSES standards because the vocabulary was adopted from the National Academy of Science.
  3. Implement the vocabulary in the ADN metadata framework: August 2001
  4. Implement the vocabulary in a cataloging tool: March 2002
  5. Implement the vocabulary into the Metadata/Vocabulary Manager: August 2003
  6. Implement the vocabulary in the DLESE Discovery: August 2003

Technologies used

  • XML as the container for vocabulary terms and definitions
  • XML schema for validating against the metadata frameworks
  • Database (MySQL) for using the Metadata/Vocabulary Manager
  • CVS to manage the XML files

DLESE tools used


Outstanding issues

These are actions within the process that need automation in order for this process to scale more effectively:

  • Compare metadata framework vocabulary schema terms to the content of the fields/terms and ui/groups XML files. Be sure all terms are represented in the fields/terms and ui/groups files.
  • Continue development of the new manager system to better separates the GUI from metadata structures and to track vocabulary definitions and cataloging best practices for any fields
  • Possibly introduce the concept of binning to DLESE metadata frameworks and DLESE systems

Future development

The manager tool of this process is undergoing re-development in order to streamline how vocabularies are made available to DLESE systems and services.


Last updated: 03-09-05