Search Service API DocumentationService version: DDSWS v1.1 Table of ContentsOverviewThe Digital Discovery System Search Service (DDSWS) is a search and retrieval service API for items that reside in a digital repository, and is available from the Digital Discovery System (DDS) and the NSDL Collection System (NCS). Service requests are expressed as HTTP argument/value pairs and responses may be returned as XML or JSON.The primary service request is Search, which provides a wide range of Information Retrieval features that are implemented using the Lucene search engine and supports textual searching over repository metadata and content, searching within specific fields, date ranges, geospatial bounding box search, and other functionality. Metadata are returned from the service for the objects that reside in the repository and may be disseminated in a number of XML formats as indicated by the ListXmlFormats request. Web service requests and responses are described in detail below and examples are provided for reference by developers. Definitions and conceptsDDSWS is a Representational State Transfer (REST) style Web service API. Service requests are expressed as HTTP argument/value pairs. These requests must be in either GET or POST format. Responses are returned in XML format by default, which varies in structure and content depending on the request as shown below in the examples section of this document. Responses can also be returned as JSON (JavaScript Object Notation) as an alternate output format to XML.
HTTP request formatThe format of the request consists of the base URL followed by the ? character followed by one or more argument=value pairs, which are separated by the & character. Each request must contain one verb=request pair, where verb is the literal string 'verb' and request is one of the DDSWS request strings defined below. All arguments must be encoded using the syntax rules for URIs. This is the same encoding scheme that is described by the OAI-PMH.Service requestsThis section defines the available requests, or verbs.The HTTP request format has the following structure: [base URL]?verb=request[&additional arguments]. For example: http://www.dlese.org/dds/services/ddsws1-1?
verb=GetRecord&id=DLESE-000-000-000-001Summary of available requests: Search - Allows a client to search across resources in the repository using standard Lucene queries, which support term, field and phrase searches, term and term/field boosting, term stemming, wildcard and fuzzy searches, term proximity searches, and other functionality. The Search request has access to a wide range of search fields, and through the use of query clauses, can be used to apply custom search rank algorithms (see example search queries). The request also supports searching by XML format, date ranges, geospatial bounding box search, and other functionality. UserSearch - Is nearly identical to the Search request except that it operates over educational resources in the ADN metadata format only, and it applies a default searcher that automatically performs word stemming and relevancy rank boosting for items that match higher relevancy search indicators such as when a matching term appears in the title field as opposed to elsewhere. These search algorithms are the same as those that are applied to user's searches in the DLESE library. This request is meant to to be used by clients working with ADN resources and that wish to leverage the automatic word stemming and search rank algorithms that are applied. ListFields - Accesses the fields in the index. ListTerms - Accesses the terms in a given field or fields. GetRecord - Accesses the metadata for a single record. ListCollections - Accesses the list of available metadata collections in the repository. ListGradeRanges - Accesses the list of DLESE-specific controlled vocabularies and search keys for grade ranges. ListSubjects - Accesses the list of DLESE-specific controlled vocabularies and search keys for subjects. ListResourceTypes - Accesses the list of DLESE-specific controlled vocabularies and search keys for resource types. ListContentStandards - Accesses the list of DLESE-specific controlled vocabularies and search keys for content standards. ListXmlFormats - Accesses the list of the available XML formats from this service. UrlCheck - Allows a client to check whether a given URL is cataloged in the repository. ServiceInfo - Accesses information about this Web service. SearchSample requestThe following request performs a search for the term "ocean" and returns 10 search results, starting at position 0: http://www.dlese.org/dds/services/ddsws1-1?verb=Search&q=ocean&s=0&n=10&client=ddsws-documentation Summary and usage The Search request allows a client to search across resources in the repository using standard Lucene queries, which support term, field and phrase searches, term and term/field boosting, term stemming, wildcard and fuzzy searches, term proximity searches, and other functionality. The Search request has access to a wide range of search fields, and through the use of query clauses, can be used to apply custom search rank algorithms (see example search queries). The request also supports searching by XML format, date ranges, geospatial bounding box search, and other functionality. The Search and UserSearch response consists of an ordered set of metadata records, sorted by relevancy. The Search request searches over all XML formats that are available in the repository, unless otherwise specified in the 'xmlFormat' argument as described below. The UserSearch request searches over the records available in the ADN format only. Flow control is managed by the client, which may 'page through' a set of results using the 's' and 'n' arguments as described below. The Search and UserSearch requests accept queries supplied in the standard Lucene Query Syntax (LQS). LQS supports advanced Information Retrieval query clauses such as term and field boosting, wildcard and fuzzy searches, etc. Queries are supplied in the q argument of the request. Arguments Textual and fielded searches: The following argument is used to conduct textual and fielded searches and may be performed independently or in combination with other search criteria described below.
Controlled vocabulary searches: The following arguments perform a search by controlled vocabulary and may be performed independently or in combination with other search criteria. The searchKey that must be used with these arguments must be discovered using the vocabulary list requests. Example searchKey gr=07. If supplied, the controlled vocabulary portion of the search criteria must match a given record in order for it to be included in the results. Note that searching by grade range ( gr), resource type (re), subject (su) or content standard (cs) is useful only for clients that wish to search over ADN records using these DLESE-specific vocabularies.
Date range searches: The following arguments instruct the service to search in a given index date field and may be performed independently or in combination with other search criteria. The values provided in the fromDate or toDate arguments must be a union date type string of the form yyyy-MM-dd or an ISO8601 UTC datastamp of the form yyyy-MM-ddTHH:mm:ssZ. Example dates include 2004-07-08 or 2004-07-26T21:58:25Z. The fields that are available for searching by date are listed below. If supplied, the date range portion of the search criteria must match a given record in order for it to be included in the results. These arguments are Not supported in the UserSearch request.
Geospatial searches: Geospatial searches operate over each record that has associated with it a geographic footprint (a geographic region representing the records's area of relevance) in the form of a box (defined below). A geospatial query takes a query region (also in the form of a box) and a spatial predicate (one of "within," "contains," "overlaps,") and returns all documents that 1) have a geographic footprint that 2) has the predicate relationship to the query region. Formally, a box is a geographic region defined by north and south bounding coordinates (latitudes expressed in degrees north of the equator and in the range [-90,90]) and east and west bounding coordinates (longitudes expressed in degrees east of the Greenwich meridian and in the range [-180,180]). The north bounding coordinate must be greater than or equal to the south. The west bounding coordinate may be less than, equal to, or greater than the east; in the latter case, a box that crosses the ±180° meridian is described. As a special case, the set of all longitudes is described by a west bounding coordinate of -180 and an east bounding coordinate of 180. The following arguments instruct the service to conduct a geospatial query over the subset of records that contain a geospatial footprint. Geospatial queries may be performed independently or in combination with other search criteria. To perform a geospatial query, all five of the required geospatial arguments must be included, otherwise none may be included, and thus are conditionally required. If an error in the request arguments is encountered, the service will return an appropriate error response and message. The optional geospatial argument may be included if desired.
Flow control: A search client can control the flow of paging through a set of search results and the size of the result set using the the s (starting offset) and n (number returned) arguments. As an example, when a search is initially performed, the client might construct a request that supplies the arguments s=0 and n=10 to return up to the first 10 matching results. The client would then page through the set of results by issuing subsequent requests indicating s=10 and n=10 for the next ten results, s=20 and n=10 for results 20 through 30 and so forth up to totalNumResults. To retrieve each successive segment of search results the client must supply identical search criteria in all search related arguments (q, xmlFormat, gr, su, cs, re, xmlFormat, so, etc.), sorting and date-restrictive arguments. DDS search is deterministic and the set and order of search results are guaranteed to be identical for any two identical searches (assuming the repository has not changed in the interim). Thus the s and n arguments can be thought of as indicating the 'window' into the set of ordered search results into which the client wants to see.
Additional arguments: The following arguments may also be supplied in the request.
Sorting the response: The following two arguments instruct the service to sort the response by a given index field. The service sorts the entire result set lexically prior to returning the requested portion of the results. Only one of these two arguments may be supplied in the request. Values must a sortable field in the index, as listed below. These arguments are Not supported in the UserSearch request.
Errors and exceptions See error and exception conditions. Examples Request Search for the word ocean. http://www.dlese.org/dds/services/ddsws1-1?
verb=Search&q=ocean&s=0&n=10Response
Request Search for the word ocean and limit the search to grade range High (9-12). http://www.dlese.org/dds/services/ddsws1-1?
verb=Search&q=ocean&gr=02&s=0&n=10Response
Request Search for all ADN records new to the repository since July 7th, 2004 and sort descending by the wndate field. http://www.dlese.org/dds/services/ddsws1-1? verb=Search&s=0&n=10&fromDate=2004-07-08&dateField=wndate &sortDescendingBy=wndate&xmlFormat=adn-localized Response
UserSearchSample request GetRecordSample requestThe following request displays the metadata for record ID DLESE-000-000-000-001 displayed in it's native XML format: http://www.dlese.org/dds/services/ddsws1-1?verb=GetRecord&id=DLESE-000-000-000-001 Summary and usage The GetRecord request is used to pull up the metadata for a single item in the repository. Clients should use this request to display the metadata from a single record, for example if the user has requested "more information" about a resource. The data is returned in ADN format and other formats including dlese_collect, dlese_anno, oai_dc, nsdl_dc and briefmeta. Sample ADN records are available here. Arguments
Errors and exceptions See error and exception conditions. Examples Request Request the record id DLESE-000-000-000-337 and get the response in ADN format. Shown without the required encoding, for clarity. http://www.dlese.org/dds/services/ddsws1-1?
verb=GetRecord&id=DLESE-000-000-000-337Response
ListFieldsSample requestThe following request lists all fields in the index: http://www.dlese.org/dds/services/ddsws1-1?verb=ListFields Summary and usage The ListFields request is used to get all search fields that reside in the index. It is not necessary for the Lucene fields to be stored. Arguments None
ListTermsSample requestThe following request lists all terms in the index for field 'title': http://www.dlese.org/dds/services/ddsws1-1?field=title&verb=ListTerms Summary and usage The ListTerms request is used to get all search terms that exist in the index for a given field or fields. It is not necessary for the Lucene fields to be stored. For each term the response indicates the number of times it appears in the index (termCount) as well as the number of documents (records) it appears in (docCount). Arguments
ListCollectionsSample requestThe following request lists the metadata collections that are available in the repository: http://www.dlese.org/dds/services/ddsws1-1?verb=ListCollections Summary and usage The ListCollections request is used to discover the available metadata collections in the repository and to retrieve the search field/key values used to perform searches across collections. Clients should use this request to generate user interface widgets for selecting collections to search from, or to display collection information such as the number of records in a collection. This request belongs to the vocabulary list class of requests. The response from ListCollections conforms to the vocabulary list response format but includes two additional elements: <recordId> and <additionalMetadata> Examples Refer to the documentation for the vocabulary list class of requests. ListGradeRangesSample requestThe following request lists the DLESE-specific grade range vocabularies and corresponding search keys: http://www.dlese.org/dds/services/ddsws1-1?verb=ListGradeRanges Summary and usage The ListGradeRanges request is used to discover the DLESE controlled vocabularies and search field/keys for grade ranges used in the adn and dlese_collect metadata frameworks. Clients that work with these DLESE frameworks may use this request to generate user interface widgets for selecting grade ranges to search from. This request belongs to the vocabulary list class of requests.Examples Refer to the documentation for the vocabulary list class of requests. ListSubjectsSample requestThe following request lists the DLESE-specific subject vocabularies and corresponding search keys: http://www.dlese.org/dds/services/ddsws1-1?verb=ListSubjects Summary and usage The ListSubjects request is used to discover the DLESE controlled vocabularies and search field/keys for subjects used in the adn and dlese_collect metadata frameworks. Clients that work with these DLESE frameworks may use this request to generate user interface widgets for selecting the subjects to search from. This request belongs to the vocabulary list class of requests.Examples Refer to the documentation for the vocabulary list class of requests. ListResourceTypesSample requestThe following request lists the DLESE-specific resource type vocabularies and corresponding search keys: http://www.dlese.org/dds/services/ddsws1-1?verb=ListResourceTypes Summary and usage The ListResourceTypes request is used to discover the DLESE controlled vocabularies and search field/keys for resource types used in the adn and dlese_collect metadata frameworks. Clients that work with these DLESE frameworks may use this request to generate user interface widgets for selecting the resource types to search from. This request belongs to the vocabulary list class of requests.Examples Refer to the documentation for the vocabulary list class of requests. ListContentStandardsSample requestThe following request lists the DLESE-specific content standard vocabularies and corresponding search keys: http://www.dlese.org/dds/services/ddsws1-1?verb=ListContentStandards Summary and usage The ListContentStandards request is used to discover the DLESE controlled vocabularies and search field/keys for content standards used in the adn and dlese_collect metadata frameworks. Clients that work with these DLESE frameworks may use this request to generate user interface widgets for selecting the content standards to search from. This request belongs to the vocabulary list class of requests.Examples Refer to the documentation for the vocabulary list class of requests. Vocabulary list requestsSummary and usageVocabulary list requests include ListGradeRanges, ListSubjects, ListResourceTypes, ListContentStandards, and ListCollections*. Each of the vocabulary list requests use the same request and response format. Vocabulary list requests are used to determine the search values supplied in the gr, su, re, cs and ky arguments of the Search and UserSearch requests and should be used to construct user interface menus for selecting the grade ranges, subjects, etc. for users to limit their search by. More specifically, vocabulary list requests represent the class of requests that expose controlled vocabularies in the repository (grade ranges, subjects, resource types, content standards and collections). Vocabulary list requests may be used to discover the vocabulary entries ('Primary elementary'. etc.), the search field/key pair used to perform and limit searches across the given vocabulary in the Search and UserSearch requests ('gr=07', etc.), and a set of rendering guidelines used to determine things such as whether to display the vocabulary listing to the user and the label that should displayed, for example 'Primary (K-2)'. Implementation tip: Library vocabularies change very infrequently (on the order of years or months). Clients should retrieve the vocabulary values once and cache them, for example at application start up, rather than retrieving them each time a user accesses the client. *Note: ListCollections conforms to the vocabulary list response but includes two additional elements: <recordId> and <additionalMetadata> Arguments None. Errors and exceptions See error and exception conditions. Examples Request Request the grade ranges that are available. Note the verb argument may contain any of the vocabulary list requests (ListGradeRanges, ListSubjects, ListResourceTypes, ListContentStandards, or ListCollections) corresponding to the vocabulary you are interested in. http://www.dlese.org/dds/services/ddsws1-1?verb=ListGradeRanges Response
*Note: ListCollections conforms to the vocabulary list response format shown above but includes two additional elements: <recordId> and <additionalMetadata> ListXmlFormatsSample requestThe following request lists the XML formats that may be disseminated from this service and their corresponding search keys: http://www.dlese.org/dds/services/ddsws1-1?verb=ListXmlFormats Summary and usage The ListXmlFormats request is used to discover the XML formats available from the repository as a whole or for a single record in the repository. Clients should use this request to discover the available XML formats and the keys that may be supplied in the 'xmlFormat' argument of the Search or GetRecord requests. DDSWS is able to disseminate a number of XML formats including ADN (adn), News&Opps (news_opps), DLESE annotation (dlese_anno), DLESE collection (dlese_collect), OAI Dublin Core (oai_dc), NSDL Dublin Core (nsdl_dc), and others. Certain records are available in multiple formats. For example, records that were originally cataloged in the ADN format may be returned in the adn, adn-localized, briefmeta, oai_dc, nsdl_dc, format. When a record is requested in a non-native format, it's XML is transformed to the requested format using XSLT or other transformation prior to being returned by the service. Many XML formats are available in namespace-specific form or a localized form that contains no namespace or schema declaration. Localized XML is indicated by adding -localized to the end of the XML format specifier, for example adn-localized. When localized XML is returned, the XML is generally easier to read and XPath notation is greatly simplified. By default, all requests in the service return localized versions of the metadata unless a non-localized specifier is indicated. Arguments
Errors and exceptions See error and exception conditions. Examples Request Show all XML formats available for ID DLESE-000-000-000-001. http://www.dlese.org/dds/services/ddsws1-1?
verb=ListXmlFormats&id=DLESE-000-000-000-001Response
UrlCheckSample requestThe following request searches for all records in the repository that have a URL ending in '.pdf': http://www.dlese.org/dds/services/ddsws1-1?url=http://.pdf&verb=UrlCheck Summary and usage The UrlCheck request is used to check whether a given URL is in the DDS repository. This request supports the use of the * wildcard construct. The * character, or wildcard construct, indicates that any character combination is a valid match. For example, a search for http://www.dlese.org/myResource* will match the two URLs http://www.dlese.org/myResource1.html and http://www.dlese.org/myResource2.html. The wildcard construct may appear at any position in the URL argument except the first position. Arguments
Errors and exceptions See error and exception conditions. Examples Request Determine whether the URL 'http://epod.usra.edu/' is in the repository. Shown without the required encoding, for clarity. http://www.dlese.org/dds/services/ddsws1-1?
verb=UrlCheck&url=http://epod.usra.edu/Response
Request Determine whether the URL 'http://epod.usra.edu/' or 'http://www.marsquestonline.org/index.html' is in the repository. http://www.dlese.org/dds/services/ddsws1-1? verb=UrlCheck&url=http://epod.usra.edu/& url=http://www.marsquestonline.org/index.html Response
Request Determine whether a URL that begins with 'http://www.dlese.org' is in the repository. The * character acts as a wildcard, which may appear at any position in the URL argument except the first position. http://www.dlese.org/dds/services/ddsws1-1?
verb=UrlCheck&url=http://www.dlese.org* Response
Request Determine whether the URL 'http://epod.usra.edu/zzzz' is in the repository. In this case no matching records are found. http://www.dlese.org/dds/services/ddsws1-1?
verb=UrlCheck&url=http://epod.usra.edu/zzzz Response
ServiceInfoSample requestThe following request displays information about this Web service: http://www.dlese.org/dds/services/ddsws1-1?verb=ServiceInfo Summary and usage The ServiceInfo request is used to retrieve general information about this Web service including name, description, the URL used to access the service (base URL), service version, the maximum number of search results allows by the Search and UserSearch requests, and an administrator e-mail. Arguments None Errors and exceptions See error and exception conditions. Examples Request Display information about the Web service http://www.dlese.org/dds/services/ddsws1-1?verb=ServiceInfo Response
Service responsesService responses are returned in XML or JSON format and vary in structure and content depending on the request made. This section describes common response structures that are returned by the service. The content and structure of each of the request responses are described above, not here.Common response elementsSeveral requests in the protocol share common XML elements in their responses. These include the <head> and <additionalMetadata> elements, which are described below.The head element The head element appears in the UserSearch, Search, GetRecord, UrlCheck responses. The head element is used to return information about a single record. This includes the ID of the record, the collection in which the record is a member of, the XML format of the record, the date the record was last modified, the whatsNewDate and an additionalMetadata element. Head element example:
The additionalMetadata element The additionalMetadata element appears in UserSearch, Search, GetRecord, UrlCheck and the vocabulary list class of responses. The additionalMetadata element is used to return additional information related to the record's format type, referred to as realms. The information realms include adn and dlese_collect, and each contains slightly different information related to underlying format type. additionalMetadata element example:
Error and exception conditionsIf an error or exception occurs, the service returns an <error> element with the type of error indicated by a code attribute. Clients are advised to test the value of these codes and respond with an appropriate message to users. For example, if a user conducts a search that has no matches, the codenoRecordsMatch will be returned from the server and a message indicating that the search had no results can be displayed. The error codes are similar to those defined by OAI-PMH.
Example error response Request Request a record id that does not exist in the repository using GetRecord. http://www.dlese.org/dds/services/ddsws1-1?
verb=GetRecord&id=BAD-ID-123
Response
Requesting JSON outputEach of the service responses can be returned as JSON (JavaScript Object Notation) as an alternate output format to XML. JSON is a simple data format based on the object notation of the JavaScript language and is commonly used in Ajax-style programming to bring content into Web pages asynchronously. For more information about JSON and how it is used, see Douglas Crockford's site www.json.org and the Yahoo! JSON developers page. A DDS client that illustrates it's use is shown in these examples.By default, all responses are output in XML format. To get JSON output, include the argument output=json in the request. Additionally, a callback argument callback=function may be included to wrap the JSON output in parentheses and a function name of your choosing. The JSON output by the service is a direct translation of the XML structure into JSON. Arguments
Removing namespaces from outputNamespaces can be removed from the XML and JSON output from the service, which can simplify working with and processing the output.By default, all responses are returned with the namespaces that appear in the requested format disseminated from the repository. To remove namespaces, include the argument transform=localize in the request. Arguments
Search fieldsThis section describes the search fields that are available in the DDSWS Search and UserSearch requests.
The repository contains fields that are extracted from each of the XML records within, and a given repository may contain records in many different native XML formats. Searches within a given field operate over the set of records that contain that field. For example, a search in the Fields may contain plain text, controlled vocabularies or encoded field values. How search fields are generated At index creation time, each record is inserted in the repository in it's native XML format. The indexer extracts standard, XPath and custom search fields from the content of the native XML and additional fields associated with the item may also be extracted from other sources, such as text derived from a crawl of the resource described by the metadata record. The indexer then generates a single entry containing each of the fields and inserts it into the repository. All records are guaranteed to contain certain fields such as the Searching across and within specific XML formats The Search request operates over and disseminates records in any available XML format. By default, searches operate over the available fields for all records in the repository regardless of format, and results may contain records of mixed XML formats. For example, a search for default:ocean searches the for the term ocean in the default field across all records in the repository and may return records in Requesting search results in a specific XML format: Certain XML formats can be disseminated from the service in multiple formats, for example records that reside natively as Limiting search to specific XML formats: Each record contains the special field The xml format keys that may be used in the Text versus stemmed text When searching in a text field, exact terms are matched. For example a search for ocean will return all records that contain the exact term ocean in the given field. Where indicated, certain textual fields have stemming applied to them using the Porter stemmer algorithm (snowball variation). When searching in a field that has been stemmed, all records containing morphologically similar terms in the given field are matched. For example a search for stems:ocean will return all records that contain the terms ocean, oceans or oceanic in the stems field. Note that when searching in a stemmed field, the client should not apply stemming to the terms it supplies for search. Stemming will be applied automatically by the search engine for these fields and no pre-processing is necessary by the client.
Standard Search FieldsStandard search fields are available across all XML formats that support them, which includes oai_dc, nsdl_dc, ncs_collect, adn, dlese_collect, dlese_anno, news_opps, concepts and all other formats that have them configured in a given DDS repository.
XPath Search FieldsXPath search fields provide separate searchable fields for the contents of every element and attribute found in the native XML of the records. For each element and attribute there are three forms of search fields: text, stemmed text and untokenized keywords. These provide a powerful, flexible way to search for specific text or data within and across the records in the repository. The XPath fields consist of a prefix followed by an XPath that addresses a specific XML element or attribute in the XML record. Prefixes are one of The three types of search fields are processed in the following manner:
The XPaths used for the search fields are the most simple form of XPath expression, containing no namespaces or position specifiers. For more information about XPath see XPath Language 1.0. The ZVON XPath Tutorial is also useful. Note that this is not an implementation of XQuery but rather a mapping of simple XPaths to searchable Lucene fields. For example, consider this simple XML instance document: <book>
<author birthDate="1955-01-25">
<firstName>John</firstName>
<lastName>Doe</lastName>
</author>
<identifier>http://books.org/catalog_123</identifier>
</book>
The index will contain the following search fields for this record:
As another example, consider the following Dublin Core
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<dc:title xmlns:dc="http://purl.org/dc/elements/1.1/">Ocean Science Leadership Awards</dc:title>
<dc:description xmlns:dc="http://purl.org/dc/elements/1.1/">This is a description of the
Ocean Science Leadership Awards... </dc:description>
<dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Earth system science</dc:subject>
<dc:subject xmlns:dc="http://purl.org/dc/elements/1.1/">Education</dc:subject>
<dc:format xmlns:dc="http://purl.org/dc/elements/1.1/">text/html</dc:format>
<dc:type xmlns:dc="http://purl.org/dc/elements/1.1/">Text</dc:type>
<dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">
http://www.usc.edu/org/cosee-west/quikscience/OceanLeadershipAwards.html
</dc:identifier>
</oai_dc:dc>
The following Lucene queries are examples that match specific text and data in this record. As with all fielded Lucene queries, these queries consist of a field name followed by a colon ":" and then followed by the term(s) to search for. Note that XPaths do not contain namespaces or position specifiers: /stems//dc/title:oceans - Matches the stemmed form of the term ocean found in the title element of the XML record. /text//dc/subject:education - Matches the term education found in one of the subject elements of the XML record. /key//dc/format:"text/html" - Matches the untokenized keyword term text/html found in the format element of the XML record.
Determining which XPaths have been indexed In addition to the XPaths fields, a special field named indexedXpaths:"/dc/subject" - Matches all records that have any value in the /dc/subject field. Conversely, the following query: allrecords:true !indexedXpaths:"/dc/subject" - Matches all records that have no value in the /dc/subject field.
Custom Search FieldsCustom search fields are available for specific XML formats as indicated below. Additional custom search fields that are not described here may also be available for a given DDS repository configuration.
Textual content - These fields contain the text of the content of the resources themselves, extracted by crawling the first page of the resource. These are available for all ADN resources in the reository whose primary content is in HTML or PDF.
Textual vocabulary fields - These fields contain DLESE controlled vocabularies that have been indexed as plain text.
Encoded vocabulary fields - These fields contain DLESE-specific controlled vocabularies used in the adn and dlese_collect metadata frameworks that have encoded into keys. Corresponding textual vocabulary fields are listed above, e.g. the same information is indexed both as text and as keys for these fields: gr - gradeRanges; re - resourceTypes; su - subjects; cs - contentStandards.
Defined key fields - These fields contain finite sets of key values that may be used to limit searches to a sub-set of records.
Fields available for searching by value or range of value - These fields may be searched by exact value or by range of value:
Fields available for searching by date - These fields may be supplied in the 'dateField' argument of the Search request:
Example search queriesThis section shows some examples of performing searches using the Search or UserSearch request. To perform these searches, the values shown below should be supplied in the 'q' argument, using the Lucene Query Syntax (LQS). Additional arguments may be supplied to the Search or UserSearch request to further limit the search, such as xmlFormat, dateField and the vocabulary fields gr, su, re and cs. Search for the term 'ocean' in the default field: ocean Search for the term 'ocean' in the stems field. This will return documents containing morphologically similar terms including ocean, oceans and oceanic: stems:ocean Search for the terms 'currents in the oceans' in the stems field. Notice that the client should supply the plain english version of the terms without pre-stemming them. In this example the resulting search matches documents that contain both currents, current or currently AND oceans, ocean, or oceanic (the terms 'in' and 'the' are stop words that are dropped for the purpose of search): stems:(currents in the oceans) Search for resources that that have an average star rating of 3.5 to 5.0: itemannoaveragerating:[3.500 TO 5.000] Search for resources that contain 'noaa.gov' in their URL: url:http*noaa.gov* Search for the term ocean within resources from 'noaa.gov': url:http*noaa.gov* AND ocean Search for term 'estuary' in the stems field, and limit the search to subject biological oceanography (subject key 02): stems:estuary AND su:02 Search for the term 'ocean' in the default field, and boost the ranking of results that contain 'ocean' in their title (stemmed) (uses the special clause allrecords:true to select the set of all records). Note that this clause returns the same number of results as if the search were performed only over the word 'ocean' in the default field, but it applies additional boosting for records that contain the term 'ocean' in their title (stemmed), which augments the search rank of the results that are returned. This example illustrates the kind of search rank augmentation that is applied automatically in the UserSearch request. ocean AND (allrecords:true OR titlestems:ocean^2) Show all records with subject biological oceanography, and boost results that contain florida in the title (stemmed), description or placeNames fields (uses the clause allrecords:true to select the set of all records): su:02 AND (allrecords:true OR titlestems:florida*^20
OR description:florida*^20 OR placeNames:florida^20)
Glossary
whatsNewDate - A date that describes when an item was new to the repository. Generally this corresponds to the item's accession date or the date in which the item first became accessible in the repository. Configure search fieldsThe following document provides information for system administrators who are installing and managing a DDS repository system, which includes the Digital Discovery System (DDS) and the NSDL Collection System (NCS).
John Weatherley
<>
Last revised: $Date: 2010/04/27 23:14:53 $ |