Configuring Search Fields for XML Frameworks
This page describes how to configure standard and custom search fields for any XML framework that is made available through the Search Service API. This information is provided for system administrators who are installing or managing a DDS repository system, which includes the Digital Discovery System (DDS) and the NSDL Catalog System (NCS). While it is not necessary to configure a framework in order for it to be used effectively in the repository, doing so adds additional search functionality that may be useful.
This document assumes familiarity with Apache Tomcat, Lucene, servlet configurations, and XML.
How search fields are generated
At index creation time, each record is inserted in the repository in it's native XML format. The indexer extracts standard, custom and XPath search fields from the contents of the XML and then generates a single entry containing each of the fields and inserts it into the index. All records are guaranteed to contain certain fields such as the
For detailed information about search fields and the content within them, see the Search Service documentation (Search fields section).
How to configure search fields
The standard and custom search fields for a given XML framework can be defined using an XML configuration file, which is described below. Standard search fields include title, description, ID, URL and geospatial bounding box coordinates. Custom search fields can be defined for any content extracted from the XML document.
To configure search fields for a specific XML framework, follow these steps:
1. Add XML frameworks to the configuration index file
Add the given XML framework to the search fields configuration index file, which contains a list of the individual configurations files for each XML framework. Entries in the index may contain relative or absolute URIs to the individual framework configuration files that may be located on the local file system (file://) or anywhere on the Web (http://).
The index file is named
Example index file:
<?xml version="1.0" encoding="ISO-8859-1"?> <XMLIndexerFieldsConfigIndex> <!-- List the location of each framework-specific configuration file --> <configurationFiles> <configurationFile>xmlIndexerFieldsConfigs/oai_dc_search_fields.xml</configurationFile> <configurationFile>xmlIndexerFieldsConfigs/my_framework_search_fields.xml</configurationFile> </configurationFiles> </XMLIndexerFieldsConfigIndex>
2. Define the search fields for each XML framework
Each configuration file describes the standard and/or custom search fields for an XML framework and where the content for those fields reside in the XML instance documents. For the following discussion, see the example configuration file below.
Standard fields are processed by the indexer in a uniform manner, allowing clients to search the fields in a consistent manner across frameworks.
The standard fields are the following:
To configure a standard field for a framework, add a
Custom fields can be defined for any content extracted from the XML document.
To define a custom field, add a
Note that the Lucene Analyzer that is defined for a given field is automatically applied both in the indexer and the searcher.
As the indexer processes the XML records, it first removes namespaces from the documents. This simplifies the XPath notation necessary to select the desired elements and attributes within. Do not include namespaces in your XPath notation.
For example, these XPaths select given elements in an
Example search configuration for the
<?xml version="1.0" encoding="ISO-8859-1"?> <!-- XMLIndexerFieldsConfig attributes: [xmlFormat OR schema] --> <XMLIndexerFieldsConfig xmlFormat="oai_dc"> <standardFields> <!-- standardField attributes: name=[id|url|title|description|geoBBNorth|geoBBSouth|geoBBWest|geoBBEast] --> <standardField name="url"> <xpaths> <xpath>/dc/identifier</xpath> </xpaths> </standardField> <standardField name="title"> <xpaths> <xpath>/dc/title</xpath> </xpaths> </standardField> <standardField name="description"> <xpaths> <xpath>/dc/description</xpath> </xpaths> </standardField> </standardFields> <customFields> <!-- customField attributes: name, store, [type OR analyzer] --> <customField name="dcIdentifier" store="yes" type="key"> <xpaths> <xpath>/dc/identifier</xpath> </xpaths> </customField> <customField name="dcType" store="yes" type="text"> <xpaths> <xpath>/dc/type</xpath> </xpaths> </customField> <customField name="dcPublisher" store="yes" type="text"> <xpaths> <xpath>/dc/publisher</xpath> </xpaths> </customField> </customFields> </XMLIndexerFieldsConfig>
How to verify it's working
Follow these steps to verify that the desired content is being indexed for search as expected:
Last revised: $Date: 2010/02/19 01:02:54 $