Quality WG3: Metadata Structures Whitepaper
Summary from Quality Report:
Recommend a framework for new vocabulary development [and strategies for integrating it with the Discovery system]
The metadata structures group concentrated on developing strategies for the following topic areas:
The discussion of the topics above started with understanding background information about metadata and the DLESE metadata frameworks. This background information is provider here to help readers understand topic discussion starting points.
Why use metadata: Per a discussion by quality working group 5, the following suggestions emerged:
What are controlled vocabularies/terminologies? What types are there? Why use controlled vocabularies/terminologies: Controlled terminologies are useful metadata structures. For an explanation on what are controlled are see the controlled vocabulary/terminology concepts page.
What are the DLESE metadata frameworks: DLESE supports six metadata frameworks, ADN, annotation, news & opps., SMS, objects and collection. The frameworks were developed similarly; therefore any proposed strategies or recommendations should be carried out or considered across all the frameworks to avoid the possibility of maintaining two systems. Metadata is held as individual XML documents.
What fields in DLESE metadata frameworks have vocabularies: The vocabularies used in DLESE frameworks are accessible from the following pages for the various frameworks, ADN, annotation, news & opps., SMS, objects, collection.
Does DLESE have a vocabulary development/management process now: Yes. Please see the DLESE vocabulary management process page. Additions to this process are suggested later in this paper.
How do controlled vocabularies/terminologies relate to quality in DLESE: The use of controlled vocabularies/terminologies adds value and quality to the library user experience. For example, DLESE resources often do not contain explicit information about appropriate grade level. Since grade range is a piece of DLESE required metadata, the assignment of grade level information to a resource using the DLESE grade range controlled vocabulary can be thought of as a pedagogical service. If the controlled vocabulary is assigned consistently, tdds value and quality to the library user experience. For example, DLESE resources often do not contain explicit information about appropriate grade level. Since grade range is a piece of DLESE required metadata, the assignment of grade level information to a resource using the DLESE grade range controlled vocabulary can be thought of as a pedagogical service. If the controlled vocabulary is assigned consistently, then there is also a measure of quality across resources because DLESE metadata would be attedds value and quality to the library user experience. For example, DLESE resources often do not contain explicit information about appropriate grade level. Since grade range is a piece of DLESE required metadata, the assignment of grade level information to a resource using the DLESE grade range controlled vocabulary can be thought of as a pedagogical service. If the controlled vocabulary is assigned consistently, then there is also a measure of quality across resources because DLESE metadata would be attempting to indicate that this certain group of resources is appropriate for such and such a grade range. This provides a better quality experience to the library user in terms of finding appropriate materials.
The next sections discuss the topics considered.
Topic 1: Development, management and maintenance of controlled vocabularies/terminologies in DLESE metadata frameworks
This discussion revolved around answering the following questions.
When should controlled vocabularies/terminologies be used? The strategy is to identify needs or conditions that may trigger the use of controlled vocabularies/terminologies within DLESE metadata frameworks. The current DLESE vocabulary management process identifies the following conditions as possibilities:
The following considerations are needed in the decision making process as well:
Not all of these conditions need to be met for adoption of a controlled vocabulary/terminology. Rather these are the conditions to consider in whether it is prudent to use a controlled vocabulary/terminology. The decision makers should involve the DLESE Program Center (DPC) metadata group and other knowledgeable and appropriate metadata/content experts as necessary. Additionally, the decision process needs to account for evolutions of the digital library landscape in terms of information technology retrieval (ITR).
What controlled vocabulary/terminology type works best for the each of the appropriate metadata fields across the DLESE metadata frameworks? This table (analysis instrument) is meant as a starting point for determining the complexity of a metadata field that is, or may use a controlled vocabulary/terminology. It lists the metadata field for a particular DLESE metadata framework, its complexity in terms of high, medium and low and the type of controlled vocabulary/terminology that is recommended. The table below is meant only as a possible sample to diagram some fields in the ADN metadata framework. The choices of high, medium and low for element complexity refer to the task of applying a controlled vocabulary/terminology to the content of the resource not to a structure of a controlled vocabulary/terminology.
Table 1: Controlled Vocabulary (CV) Information table for ADN metadata
Which type of controlled vocabulary/terminologies should be used? This refers to when should a synonym ring versus an authority file versus an ontology be used. The answer to this question is highly dependent on the inherent complexity of the metadata field and what controlled vocabularies/terminologies may already exist to support the metadata field and if and how other significant metadata frameworks support the metadata field. Again the analysis table in the preceding section helps answer this question.
What controlled vocabularies/terminologies should be used? This question goes hand-in-hand with what type of controlled vocabulary/terminology should be used. It helps to complete information in the table above with the main goal of answering whether existing vocabularies can be used or does some development work need to occur and if so how. The following questions should be considered:
How are controlled vocabularies/terminologies incorporated into DLESE systems? A vocabulary manager is the mechanism by which DLESE systems and services (including web services) know about controlled vocabularies/terminologies. Data is entered into the manager through a series of XML files. Any recommendations from this paper that use different structures than the current manager model will require software development to incorporate. For example, software development would be required to incorporate an existing OWL or RDF ontology.
Topic 2: Integration of collection specific (local) controlled vocabularies/terminologies that collection builders want to catalog to, search by and browse through
Many DLESE collections builder ask to have their own specific subject controlled vocabularies/terminologies be searchable and browseable. The challenge is that collection builder controlled vocabularies/terminologies generally benefit only a single collection and do not have wide application across many DLESE collections. This topic was discussed at the DLESE Metadata Workshop in March 2004 and those discussions and recommendations are summarized next.
The first part of the discussion centered on when is it appropriate to use a collection-specific controlled vocabulary/terminology. These conditions exist if collection builders have terms and phrases that describe the collection better than using the DLESE metadata structures of free text descriptions for general characteristics, educational, technical, geospatial and temporal, annotation metadata records and existing DLESE controlled vocabularies/terminologies. If this is the case, the controlled vocabulary/terminology is expected to be for a small number of terms (15-30). The terms can be used in optional metadata fields and should avoid terms already in use in existing DLESE controlled vocabularies/terminologies. The workshop participants then discussed three methods to incorporate terms.
Method 1: Keyword method: The collection builder develops a list of terms or phrases and uses them consistently in the keyword field of the metadata record. These terms are also entered in the collection-level metadata record in order to provide library users hints at terms to use while searching the collection. The collection builder has complete control and no definitions are needed. No browse is provided but free text searching yields good results. There is no integration into DLESE system, services or metadata frameworks and no extra work is required by the DLESE Program Center. (DPC) Currently, the Global Change Master Directory (GCMD) takes a similar approach by allowing data providers to suggest terms for their data sets.
Method 2: Collection builder controlled vocabulary method: The collection builder develops a list of terms or phrases and provides a code or URL to identify the terms. The collection builder catalogs to these terms (in a consistent manner of spelling and use) and indicates the URL or code in the metadata records. (aside: the ADN metadata field of subjectOther was built for this purpose). Because the term list and code are known, any metadata records using them are searchable and available to DLESE web services.
An example of this approach is DLESE resources for meteorology (this link may go up and down). On the page, the list of terms, remote sensing, simulations, modeling, etc. are the collection builder terms. Resources that use this specific vocabulary are then searchable. These resources are available through DLESE web services but not as browseable histograms. This requires moderate work from both the collection builder and the DPC for functionality.
Method 3: XML schema method: This approach incorporates the collection builder terms or phrases completely into DLESE systems, services and metadata frameworks. It requires substantial work from the collection builder and the DPC. This method impacts every existing metadata record in the library.
Additionally, the collection builder needs to provide definitions, attribution of the definition and best practices for using their controlled vocabulary/terminology. The collection builder must also create XML schema and instance documents. This method allows the collection builder to have precise search results and browseable histograms as long as their terms do not overlap with existing DLESE controlled vocabularies/terminologies.
The overriding question to using collection-specific vocabularies is will it be beneficial? Is the potential metadata development, cataloging, software development and interface design worth it? How will DLESE systems and services keep up if many collection builders wish to use methods 2 or 3? Either way, the collection builder should consider the questions that were raised in Topic 1 above when developing their list of terms or phrases.
The Metadata Workshop recommendation concluded that browse structures were of benefit to the user and that DLESE should allow the collection builders some flexibility in employment of controlled vocabularies/terminologies. DLESE should investigate implementation of collection-specific vocabularies in a way that permits the collection builder to choose between method 1 and method 2 above. This quality group on metadata structures supports these recommendations.
Topic 3: Methods and processes for changing terms (keeping up with changes in pedagogic and scientific concepts)
Keeping DLESE controlled vocabularies/terminologies inline with the accepted scientific, pedagogic and technical terms is absolutely necessary to maintain library quality. However, once a controlled vocabulary/terminology is developed and in use, it can be a daunting task to update terms or phrase without disruption to end users, services and developers. Therefore, this group looked to the International Standards Organization (ISO) to see how often they review their standards. Per their procedures, standards are reviewed at least once every five years. A majority of reviewers decides whether the standard should be confirmed, revised or withdrawn.
A recommendation to DLESE is to review controlled vocabularies/terminologies at least once every five years as well. Some questions to ask during the review are:
If possible, the following groups and items should be included in the review:
Topic 4: Impacts of controlled vocabularies/terminologies on library stakeholders
The list of library stakeholders includes library end users, collection builders, catalogers, library developers, service providers and library evaluators. Because this is a broad list, the group narrowed this discussion primarily to controlled vocabulary/terminology impacts on library end users. The group acknowledged the following points in the use of controlled vocabularies/terminologies:
The National Science Digital Library (NSDL) evaluation group may be developing the idea of a digital learning testbed. One focus of this testbed could be to examine controlled vocabularies/terminologies on end users. If this plan goes forward, it is recommended that DLESE participate in this research in order to get the most benefit from it.
Last updated: 9-24-04