ISO Identifiers

From NOAA Environmental Data Management Wiki

[toc] The need for identifiers in metadata records was first recognized in the DIF Standard and FGDC Remote Sensing Extensions. These standards introduced identifiers for the metadata records themselves. In ISO 19115 this role is addressed by the fileIdentifier, a character string included in the MD_ or MI_Metadata object. This character string will be replaced with a MD_Identifier in the upcoming revision of 19115.

Including fileIdentifiers in the ISO metadata records gives metadata creators a mechanism for uniquely identifying them. This is becoming more important as metadata records evolve from single files into collections of related objects that can be harvested into repositories like geo.data.gov along multiple paths. There is no reliable way to identify duplicate records without a unique identifier in the actual record.

Identifier Object
ISO 19115 also includes a MD_Identifier object for associating identifiers with various objects in the metadata record. This object is improving significantly in the revision of 19115. It now includes a CodeSpace that gives the namespace for the identifier, a version for the identifier, and a description along with the identifier itself. The identifier is guaranteed to be unique in that namespace.
RS Identifier Object
The RS_Identifier object includes several important additions that were made because of the increased importance of namespaces in the XML and Web environments and because of ambiguity in how the namespace is actually defined in the CI_Citation associated with a MD_Identifier. These additions are now incorporated into the MD_Identifier.

The introduction of the RS_Identifier and the need for the additional attributes that it includes causes some confusion in ISO 19115 implementations. This will be resolved in the upcoming revision of 19115 by adding these attributes to MD_Identifier.

Digital Object Identifiers (DOIs) and Other Dataset Identifiers

Digital Object Identifiers are most commonly used to identify and cite published datasets. In the ISO standard these identifiers should be included as an MD_Identifier in the CI_Citation for the dataset. This citation describes how the dataset that the metadata describes is to be cited. If the metadata record itself also had a DOI, that would be in the fileIdentifier.

As DOIs become more ubiquitous, the prefix doi: is becoming a standard internet protocol. This means that browsers and other tools will know that the string doi:10.5067/MEASURES/DMSP-F8/SSMI/DATA302 means the same thing as the URL: http://dx.doi.org/10.5067/MEASURES/DMSP-F8/SSMI/DATA302. As this becomes more common, it addresses the problem of identifiers with no straightforward mechanism for resolution.


How to Mint and Publish a DOI at NOAA

Structure

Identifiers originally include two elements: a code and an authority. The code is an alphanumeric value identifying an object in a namespace that is maintained by the authority. There is no agreed upon approach for how the namespace is described in the CI_Citation. The revision to 19115 includes significant additions to the MD_Identifier class. It will include three new CharacterStrings:

  • codeSpace - which unambiguously defines the namespace for the identifier
  • version - which is a version for the identifier
  • description - a brief description of the meaning of the code

Several of these new elements are included in the RS_Identifier (referenceSystem identifier):

  • codeSpace: name or identifier of the person or organization responsible for namespace and
  • version: version identifier for the namespace.

Usage

Identifiers occur in many places in the ISO standard. The identifiers in Citations are particularly important because Citations also occur in many locations throughout the standard.

Usage Description and Xpath

Quality Measure Identifier

WhereAreCI Citations.DQ Element.png

/gmi:MI_Metadata/gmd:dataQualityInfo /gmd:MD_DataQuality/gmd:report/gmd:DQ_Element

/gmd:measureIdentification

Objective Identifier

WhereAreMI Identifiers.MI Objective.png
/gmi:MI_Metadata/gmd:acquisitionInformation /gmi:MI_AcquisitionInformation/gmi:objective/gmi:MI_Objective/gmi:citation

(CI_Citation + MD_Identifier)++

There are many cases in the ISO Standard where CI_Citations and MD_Identifiers are used together to reference and identify external resources. We term these (CI_Citation+MD_Identifier)++:

Usage Description and Xpath

Aggregate Citation and Identifier

WhereAreCI CitationMI Identifiers.MD AggregateInformation.png

(CI_Citation + MD_Identifier) + associationType + initiativeType = MD_AggregationInformation

/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification | srv:ServiceIdentification/gmd:aggregationInfo/gmd:MD_AggregateInformation/gmd:aggregateDataSetName

Software Citation and Identifier

WhereAreCI Citations.LE Processing.png

(CI_Citation + MD_Identifier) + description + scaleDenominator + sourceReferenceSystem + sourceExtent + processedLevel + resolution + sourceStep = LE_Source

/gmi:MI_Metadata/gmd:dataQualityInfo/gmd:MD_DataQuality /gmd:lineage/gmd:LE_Lineage/gmd:processStep /gmd:LE_ProcessStep/gmd:processingInformation /gmd:LE_Processing /gmd:softwareReference

Operation Citation and Identifier

WhereAreCI CitationMI Identifiers.MI Operation.png

(CI_Citation + MD_Identifier) + description + status + type = MI_Operation

/gmi:MI_Metadata/gmd:acquisitionInformation /gmi:MI_AcquisitionInformation/gmi:operation /gmi:MI_Operation/gmi:citation

Instrument Citation and Identifier

WhereAreCI CitationMI Identifiers.MI Instrument.png

(CI_Citation + MD_Identifier) + description + type = MI_Instrument

/gmi:MI_Metadata/gmd:acquisitionInformation /gmi:MI_AcquisitionInformation /gmi:instrument/gmi:MI_Instrument/gmi:citation

Platform Citation and Identifier

WhereAreCI CitationMI Identifiers.MI Platform.png

(CI_Citation + MD_Identifier) + description + sponsor = MI_Platform

/gmi:MI_Metadata/gmd:acquisitionInformation /gmi:MI_AcquisitionInformation /gmi:platform/gmi:MI_Platform/gmi:citation

Requirement Citation and Identifier

WhereAreCI CitationMI Identifiers.MI Requirement.png

(CI_Citation + MD_Identifier) + requestor + recipient + priority = MI_Requirement

/gmi:MI_Metadata/gmd:acquisitionInformation /gmi:MI_AcquisitionInformation /gmi:requirement/gmi:MI_Requirement/gmi:citation

fileIdentifiers and parentIdentifiers

One of the ironic aspects of the ISO 19115 is that the identifiers for metadata records (/gmi:MI_Metadata/gmd:fileIdentifier/gmd:characterString and /gmi:MI_Metadata/gmd:parentIdentifier/gmd:characterString) are characterStrings rather than MI_Identifiers. In order to help ensure uniqueness these strings should include a namespace and a code guarenteed to be unique in that namespace. For example:

<gmd:fileIdentifier>
<gco:CharacterString>gov.noaa.class:AERO100</gco:CharacterString>
</gmd:fileIdentifier>.

In this case, gov.noaa.class is a namespace, and AERO100 is a code guaranteed to be unique in that namespace. In this case, the code is meaningful to the data provider. Creating meaningful identifiers that are unique over a large collection can many times be difficult. It might make sense to consider using UUIDs for file names and identifiers, although this takes some getting used to.

The upcoming revision of ISO 19115 changes the type of fileIdentifiers and parentIdentifiers to MD_Identifiers, see Structure section above.

Should the fileIdentifier match the file name? There is no rule in ISO that specifies a relationship between the fileIdentifier and the file name. It is, however, very convenient to have the file name available from within the record, particularly for supporting access to the file name when transforming the XML into HTML.

Managing ISO Metadata in a Database

Including identifiers in ISO metadata records gives metadata creators a mechanism for uniquely identifying metadata records and pieces of those records for the first time. The importance of unique identifiers is well known to people that use relational database management systems, they are the primary keys that identify items and make relationships possible. This is also becoming more important as metadata records are harvested into repositories like Geospatial One-Stop along multiple paths. There is no reliable way to identify duplicate records without an identifier in the actual record.

Serving Metadata With a REST web service

Number of Identifiers

The REST approach to web services is becoming more and more commonplace. The first principle of REST is “Give Everything an ID”. Given this principle, one measure of how “RESTful” a metadata standard might be is the number of id’s that are included in the standard. This compares the number of IDs included in four metadata standards. The original FGDC standard has none, the Directory Interchange Format (DIF) and the FGDC Remote Sensing Extensions (RSE) have one and the ISO 19115(-2) has 36.

These ids make it possible to reference many individual elements in an ISO metadata record directly, so the ISO standard is very compatible with a RESTful approach.