Metadata Collections & QA

DLESE Metadata

Defines the facets of DLESE metadata and collection building.


Metadata definition and purpose

Semantically, DLESE metadata is information (e.g. title, description, audience, geospatial coverage, keywords, etc.) about resources (e.g. classroom activities, curriculum, virtual field trips, websites, services, annotations, collections, images, news and opportunities, etc. ) that is used to promote searching, browsing and use of such resources by the Earth system science community. Technically, DLESE metadata is kept in the form of structured digital records in the eXentisible Markup Language (XML). It is these metadata records that DLESE holds, not the resources themselves.

As such, DLESE metadata encompasses the following facets:

  1. Framework Development: Develop the XML structures to support different DLESE metadata frameworks in capturing desired resource information.
  2. Metadata Maintenance: Make global structure or semantic changes to metadata records.
  3. Framework Documentation: Develop cataloging best practices, vocabularies, vocabulary definitions, semantic interpretations of all fields for all frameworks.
  4. Collection Building and Management: Work with collections builders to get their metadata records into DLESE and possibly onto the NSDL (National Science Digital Library).
  5. User Support: Provide catalog training, support the collections accessioning process, provide installation instructions for collection tools.

Top

1. Framework development

As mentioned above, DLESE provides access to a broad spectrum of resources. These resources range from learning resources, like classroom activities and curriculum, to news and opportunities to annotations and information about different collections and services. To fully describe these different resources, different metadata frameworks, ADN, DLESE collection, annotation, news and opportunities, have evolved to support the unique characteristics of the resources being described. These frameworks are summarized below.

  • ADN (ADEPT/DLESE/NASA) - The current framework used in the DLESE Discovery System. It describes resources typically used in learning environments (e.g. classroom activities, curriculum, virtual field trips, etc.). The framework is XML schema-based with strong data typing and many controlled vocabularies to support efficient and effective browsing, search and discovery.
  • News and Opportunities - The current framework used to describe events or time-sensitive resources that have specific start and end dates and are of an interest to the DLESE community as a whole (e.g. grants, workshops, scholarships, conferences, etc.) This framework is XML schema-based with no data typing.
  • Annotation - Describes additional information about resources or information not directly found in a resource. This information can include, but is not exclusive too, comments, educational standards, teaching tips, ideas for use, contextual explanations and other summary information. The framework is XML schema-based with strong data typing and controlled vocabulary support and was developed from the proposed NSDL annotation metadata framework.
  • DLESE-Collection - Describes a group of metadata records as a whole entity. The framework is XML schema-based with moderate data typing and controlled vocabulary support.
  • NSDL-DC - The current framework used by the National Science Digital Library (NSDL) to describe all resources within the library. The framework is XML schema-based and the DPC provides a crosswalk from the ADN metadata framework to this one.
  • DLESE-IMS - The current framework used in the DLESE Catalog System, not the DLESE Discovery System. It describes resources typically used in learning environments (e.g. classroom activities, curriculum, virtual field trips, etc.). The framework is XML-DTD based with no data typing. The DPC provides a crosswalk from this framework to the current ADN framework.
Top

2: Metadata maintenance

Periodically, DLESE metadata frameworks undergo changes because of policy, vocabulary or cataloging best practice changes. These updates are done using XML's XSLT language and/or a scripting language like Python. Metadata maintenance happens on a fairly regular basis, about every 6-8 weeks. The challenge is to tweak all records (either created at DLESE or by other collections), catalog systems and discovery mechanisms appropriately if they are affected by any of the changes. Metadata maintenance causes interruptions in services like cataloging and discovery as all metadata records generally need a tweak.

3: Framework documentation

DLESE metadata frameworks are really only usable if documentation has been provided or the essence of the framework is wrapped in a tool (e.g. cataloging tool). So framework development requires continual updates of framework documentation that include:

3a: Cataloging best practices

Best practices indicates how different metadata fields should be completed and with what data. For example, a cataloging best practice for completing the title field is: "Use the title displayed on screen to the user, not the title in a browser's title bar."

3b: Controlled vocabularies

Controlled vocabularies are word or phrase descriptors that are used as the only acceptable data to complete certain metadata fields. Controlled vocabularies aid resource discovery by providing coherent and consistent search mechanisms (words).

If resources are to be discoverable by a certain vocabulary, there first must be metadata records with the vocabulary which means the vocabulary must be in a cataloging tool which means the vocabulary must be in the appropriate DLESE metadata framework. All this takes time. For comparison, the timeline and process for putting the geography and science content standards into the item-level metadata framework about 2 years as described below:

  1. Develop the vocabulary, its definitions and parameters of use: Ordinarily this process can take anywhere from 6-12 months or longer depending on the vocabulary. This process was not required for science standards because the vocabulary was adopted from the National Academy of Science.
  2. Implement the vocabulary in the metadata framework: Aug. 2001
  3. Implement the vocabulary in a cataloging tool: March 2002
  4. Implement the vocabulary in the discovery system: Yet to be determined but hopefully by June 2003.

Top

3c: Framework informaion

In order to make the DLESE metadata frameworks usable, the following documentation is provided for each field (e.g. title or description) in a framework:

  • Definition:
  • Interpretation:
  • Technical Implementation:
  • Cataloging Best Practice:
  • XML Tag Set:
  • Obligation:
  • Other Occurrences:
  • Maximum Occurrences:
  • Data Type: (e.g. integer versus text versus boolean)
  • Domain: (controlled vocabulary or free text)
  • Example Entry:
  • Default Value:
  • Controlled Vocabulary:
  • Vocabulary Source:
  • Vocabulary Explanation:
  • Vocabulary Terms:

Unfornately, There is some documentation for the news and opps metadata framework, but none for the ADN item-level, collection or annotation metadata frameworks.

Top

4: Collection building and management

As mentioned above, collections management means working with collections developers to get their metadata records into DLESE and other digital libraries. As such, collections management functions can be divided into two spheres: managing incoming collections to the DPC and managing outgoing collections from the DPC.

4a: Managing incoming collections to the DPC

To support incoming collections, help is provided to assist collection builders in getting their item-level metadata records into a metadata format usable to the DPC. Each incoming collection must meet the requirements of the DLESE Interim Collections Accession Policy. Besides a usable metadata record format, this entails, creating a collection-level metadata record, using a metadata harvesting protocol such as OAI-PMH (Open Archives Initiative-Protocol for Metadata Harvesting), and providing a collection scope statement detailing the collection description, item selection, quality control, contacts and persistence.

While the above paragraph describes managing a single collection, managing several collections at once involves a few more activities like:

  • aggregating metadata records from different collections that point to the same resource
  • URL linking checking across all collections
  • the approval of collections into DLESE
  • connecting annotation information to appropriate metadata record(s)
  • assuring a collection has a one-to-one correspondence between identification numbers and the cataloged URLs
  • communicating with collection builders if a record does not work for some reason (e.g. not appropriate content or has something wrong technically)

All items in incoming collections will immediately be part of the DLESE Broad collection. Collection builders may choose to have certain items or their entire collection become part of the DLESE Reviewed Collection, if the appropriate requirements are met.

4b: Managing outgoing collections from the DPC

To support outgoing collections, item-level and collection-level metadata records are transformed into appropriate formats for other digital libraries. This first means creating a semantic mapping (requires intellectual knowledge of the framework being mapped from and to). Then an XSLT and/or programmatic solution can be written to change metadata formats. It is expected that all incoming collections can be (unless the collection builder indicates otherwise) outgoing collections.

Top

5: User support

The DLESE metadata group is involved in several user support activities, from cataloging to collection support. A brief summary of these activities is as follows:

  • Catalog Training/Support: The DLESE Program Center metadata group provides training and support of DLESE metadata frameworks. The metadata group will also provide training at the request and financial support of collection builders.
  • Writing Mac, Windows and Linux installation instructions for the cataloging and metadata harvesting (sharing) tools.
  • Other communications: The metadata team gets many other questions not related to cataloging or collections development (e.g. where can I download the DLESE vocabularies)
  • Collection Support: While collection management above deals with getting metadata records into the correct format, collection support here deals with helping collection builders understand DLESE policy documents (Scope Statement, Interim Collections Accession Policy, IP etc.) and helping the builders create such documents themselves.

Top

Last updated: 7-11-03