NetCDF Attribute Convention for Dataset Discovery Conformance Test

From NOAA Environmental Data Management Wiki
NcML Rubric
Being able to test conformance with a convention or set of recommendations is important for data providers and users. This page describes a Conformance Test for the NetCDF Attribute Convention for Dataset Discovery. A Rubric is a helpful tool for presenting the results of conformance tests (see example at right). This Rubric divides the recommended attributes into eight groups (or spirals, each shown as a row in the rubric. The columns indicate how complete each spiral is in the record being tested. The example at right indicates that the Identification, Extent Search, Contributor, and Other Attributes spirals are complete, the Text Search, Other Extent Information, and Creator spirals are almost complete, and the Publisher spiral is empty. This information can be used by the data provider to identify areas where new content may be needed.

This rubric is implemented using an XML translators (XSLT) that count the number of recommended fields and report those counts in HTML. This stylesheet can be used on the desktop to facilitate creating complete documentation as part of the netCDF creation process.

The rubric sections are described below.

Identification and Metadata Link

As metadata are shared between National and International repositories it is becoming increasing important to be able to unambiguously identify and refer to specific records. This is facilitated by including an identifier in the metadata. Some mechanism must exist for ensuring that these identifiers are unique. This is accomplished by specifying the naming authority or namespace for the identifier. It is the responsibility of the manager of the namespace to ensure that the identifiers in that namespace are unique. Identifying the Metadata Convention being used in the file is very important.

The netCDF metadata model is focused on providing "use metadata" for the data included in the file (or granule). Other metadata dialects (i.e. ISO 19115) can provide information about collections and more details about the dataset. In order to make users aware of that additional metadata we recommend adding a global attribute named "Metadata_Link" to the netCDF file. The value of this attribute is a URL that gives the location of the more complete metadata. This element is not included in the current version of the NetCDF Attribute Convention for Dataset Discovery.

Total count: 4 Unidata Categories: Highly Recommended:0, Recommended:2, Suggested:0

Attribute Description THREDDS ISO 19115-2
id
The combination of the "naming authority" and the "id" should be a globally unique identifier for the dataset.
dataset@id
/gmi:MI_Metadata/gmd:fileIdentifier/gco:CharacterString
naming_authority
/gmi:MI_Metadata/gmd:fileIdentifier/gco:CharacterString
Metadata_Convention This attribute should be set to "Unidata Dataset Discovery v1.0" for NetCDF files that follow this convention.
Metadata_Link This attribute provides a link to a complete metadata record for this dataset or the collection that contains this dataset. This attribute is not included in Version 1 of the Unidata Attribute Convention for Data Discovery. It is recommended here because a complete metadata collection for a dataset will likely contain more information than can be included in granule formats. This attribute contains a link to that information.

Text Search

Text searches are a very important mechanism for data discovery. This group includes attributes that contain descriptive text that could be the target of these searches. Some of these attributes, for example title and summary, might also be displayed in the results of text searches.

Total count: 7 Unidata Categories: Highly Recommended:3, Recommended:4, Suggested:0

Attribute Description THREDDS ISO 19115-2 Rubric Category
title
A short description of the dataset.
dataset@name
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:title/gco:CharacterString (M)
Text Search
summary
A paragraph describing the dataset.
metadata/documentation[@type="summary"]
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:abstract/gco:CharacterString (M)
keywords
A comma separated list of key words and phrases.
metadata/keyword
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gco:CharacterString
keywords_vocabulary
If you are following a guideline for the words/phrases in your "keywords" attribute, put the name of that guideline here.
metadata/keyword@vocabulary /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:thesaurusName/gmd:CI_Citation/gmd:title/gco:CharacterString
standard_name_vocabulary
The name of the controlled vocabulary from which variable standard names are taken.
metadata/variables@vocabulary /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:thesaurusName/gmd:CI_Citation/gmd:title/gco:CharacterString
history
Provides an audit trail for modifications to the original data. metadata/documentation[@type="history"] /gmi:MI_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:statement/gco:CharacterString (O)
comment
Miscellaneous information about the data. metadata/documentation
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:supplementalInformation

Extent Search

This basic extent information supports spatial/temporal searches that are increasingly important as the number of map based search interfaces increases. Many of the attributes included in this spiral can be calculated from the data if the file is compliant with the NetCDF Climate and Forecast (CF) Metadata Convention.

Total count: 8 Unidata Categories: Highly Recommended:0, Recommended:10, Suggested:0

Attribute Description THREDDS ISO 19115-2
geospatial_lat_min
Describes a

simple latitude, longitude bounding box. For a more detailed geospatial coverage, see the suggested geospatial attributes.

metadata/geospatialCoverage/northsouth/start
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:southBoundLatitude/gco:Decimal
geospatial_lat_max metadata/geospatialCoverage/northsouth/size /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:northBoundLatitude/gco:Decimal
geospatial_lon_min metadata/geospatialCoverage/eastwest/start /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:westBoundLongitude/gco:Decimal
geospatial_lon_max metadata/geospatialCoverage/eastwest/size /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:geographicElement/gmd:EX_GeographicBoundingBox/gmd:eastBoundLongitude/gco:Decimal
time_coverage_start Describes the temporal coverage of the data as a time range. metadata/timeCoverage/start /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:temporalElement/gmd:EX_TemporalExtent/gmd:extent/gml:TimePeriod/gml:beginPosition
time_coverage_end metadata/timeCoverage/end /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:temporalElement/gmd:EX_TemporalExtent/gmd:extent/gml:TimePeriod/gml:endPosition
geospatial_vertical_min
Describes the vertical coverage of the data. metadata/geospatialCoverage/updown/start /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:verticalElement/gmd:EX_VerticalExtent/gmd:minimumValue/gco:Real
geospatial_vertical_max metadata/geospatialCoverage/updown/size /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:verticalElement/gmd:EX_VerticalExtent/gmd:maximumValue/gco:Real

Other Extent Information

This information provides more details on the extent attributes than the basic information included in the Extent Spiral. Many of the attributes included in this spiral can be calculated from the data if the file is compliant with the NetCDF Climate and Forecast (CF) Metadata Convention.

Total count: 7 Unidata Categories: Highly Recommended:0, Recommended:0, Suggested:7

Attribute Description THREDDS ISO 19115-2
geospatial_lat_units
Further refinement of the geospatial bounding box can be provided by using these units and resolution attributes.
metadata/geospatialCoverage/northsouth/units /gmi:MI_Metadata/gmd:spatialRepresentationInfo/gmd:MD_Georectified/gmd:axisDimensionProperties/gmd:MD_Dimension/gmd:resolution/gco:Measure/@uom
geospatial_lat_resolution metadata/geospatialCoverage/northsouth/resolution /gmi:MI_Metadata/gmd:spatialRepresentationInfo/gmd:MD_Georectified/gmd:axisDimensionProperties/gmd:MD_Dimension/gmd:resolution/gco:Measure Extent
geospatial_lon_units
metadata/geospatialCoverage/eastwest/units /gmi:MI_Metadata/gmd:spatialRepresentationInfo/gmd:MD_Georectified/gmd:axisDimensionProperties/gmd:MD_Dimension/gmd:resolution/gco:Measure/@uom
geospatial_lon_resolution metadata/geospatialCoverage/eastwest/resolution /gmi:MI_Metadata/gmd:spatialRepresentationInfo/gmd:MD_Georectified/gmd:axisDimensionProperties/gmd:MD_Dimension/gmd:resolution/gco:Measure
geospatial_vertical_units
metadata/geospatialCoverage/updown/units /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:extent/gmd:EX_Extent/gmd:verticalElement/gmd:EX_VerticalExtent/gmd:verticalCRS
geospatial_vertical_resolution
metadata/geospatialCoverage/updown/resolution
geospatial_vertical_positive
metadata/geospatialCoverage@zpositive

Creator Search

This group includes attributes that could support searches for people/institutions/projects that are responsible for datasets. This information is also critical for the correct attribution of the people and institutions that produce datasets

Total Count: 9

Attribute Description THREDDS ISO 19115-2
creator_name
The data creator's name, URL, and email. The "institution" attribute will be used if the "creator_name" attribute does not exist.
metadata/creator/name
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty/gmd:individualName/gco:CharacterString
CI_RoleCode="originator"
creator_url
metadata/creator/contact@url
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:onlineResource/gmd:CI_OnlineResource/gmd:linkage/gmd:URL
creator_email
metadata/creator/contact@email /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:electronicMailAddress/gco:CharacterString
institution
metadata/creator/name /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty/gmd:organisationName/gco:CharacterString
date_created The date on which the data was created.
metadata/date[@type="created"] /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:Date
/gmd:dateType/gmd:CI_DateTypeCode="creation"
date_modified
The date on which this data was last modified.
metadata/date[@type="modified"] /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:Date
/gmd:dateType/gmd:CI_DateTypeCode="revision"
date_issued
The date on which this data was formally issued.
metadata/date[@type="issued"] /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:Date
/gmd:dateType/gmd:CI_DateTypeCode="publication"
project
The scientific project that produced the data.
metadata/project
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:aggregationInfo/gmd:MD_AggregateInformation/gmd:aggregateDataSetName/gmd:CI_Citation/gmd:title/gco:CharacterString
DS_AssociationTypeCode="largerWorkCitation" and DS_InitiativeTypeCode="project"
and/or
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gco:CharacterString with gmd:MD_KeywordTypeCode="project"
acknowledgment A place to acknowledge various type of support for the project that produced this data.
metadata/documentation[@type="funding"] /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:credit/gco:CharacterString

Publisher Searches

This section allows a data provider to include contact information for the publisher of a data product in the metadata for the product.

Total count: 3 Unidata Categories: Highly Recommended:0, Recommended:0, Suggested:3

Attribute Description THREDDS ISO 19115-2
publisher_name
The data publisher's name, URL, and email. The publisher may be an individual or an institution. metadata/publisher/name
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty/gmd:individualName/gco:CharacterString
CI_RoleCode="publisher"
publisher_url
metadata/publisher/contact@url
/gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:onlineResource/gmd:CI_OnlineResource/gmd:linkage/gmd:URL
CI_RoleCode="publisher"
publisher_email
metadata/publisher/contact@email /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:electronicMailAddress/gco:CharacterString
CI_RoleCode="publisher"

Other Attributes

This group includes attributes that don't seem to fit in the other categories.

Total count: 3 Unidata Categories: Highly Recommended:0, Recommended:3, Suggested:0

Attribute Description THREDDS ISO 19115-2
license Describe the restrictions to data access and distribution. metadata/documentation[@type="rights"] /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:resourceConstraints/gmd:MD_LegalConstraints/gmd:useLimitation/gco:CharacterString
processing_level A textual description of the processing (or quality control) level of the data.
metadata/documentation[@type="processing_level"]
cdm_data_type
The THREDDS data type appropriate for this dataset. metadata/dataType /gmi:MI_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:spatialRepresentationType/gmd:MD_SpatialRepresentationTypeCode
May need some extensions to this codelist. Current values: vector, grid, textTable, tin, stereoModel, video.

Scores / Weights

The scores computed using this test are simple unweighted counts. The maximum possible total is 43. Our understanding of the scores will increase as the test is applied to more real-world examples. At this point, the scores are probably more useful for monitoring improvements in single records or related sets of records, rather than for making comparisons across record sets. Of course the scores also reflect the data provider's data discovery goals. It could be that those goals are achieved with minimal discovery information (i.e. geospatial/temporal bounding box). These reports are divided into sections that correspond to the types of searches that are to be supported. Unidata has classified the attributes into three categories. The number of fields in each category are shown for each group in the report.