Coverages and ISO Metadata

From NOAA Environmental Data Management Wiki

When is a Feature a Coverage?

Historically, geospatial tools have been divided into two groups: those that deal well with gridded datasets, termed "raster GIS", and those that deal with vectors and points, termed "vector GIS". This distinction developed through time because of differences between the technologies used to process and display these data types. The distinction does not exist in the conceptual definition of coverages (ISO 19123) which includes both continuous (grids) and discreet coverages (points and vectors) and as integration of tools for managing both data types has increased, this distinction is becoming less important.

The distinction between these two fundamental data types appears in ISO 19115 as two separate instantiations of the MD_ContentInformation object: MD_CoverageDescription for describing several types of grids and MD_FeatureCatalogDescription for describing feature types (vector and point data). Given the evolution of concepts and tools described above, it seems likely that the division between grids and features will become less important with time. Given this convergence, it seems likely that the content of datasets with MD_SpatialRepresentationTypeCode = vector or tin can be described in the context of a coverage.

The discussion here focuses on the grid side, but many of the ideas are equally applicable to features which give the values of a set of attributes at a set of locations. Such a dataset could easily be stored in a database or a spreadsheet.

Multi-Dimensional Gridded Datasets

Multi-Dimensional Dataset
Many satellite data sets are made up of multiple gridded parameters packaged together in a single file. A classic example is the NESDIS 50km Sea Surface Temperature product described in the NESDIS Polar Data Users Guide. As described in the guide and shown schematically in Figure Multi-Dimensional Dataset, the product includes physical measurements, auxiliary data, and quality information presented in a series of grids. The files also include a significant amount of scalar and vector information in "header" variables.

Understanding how to describe data like these is critical to adoption of the ISO standard in the satellite community. The approach proposed here is a first step towards that goal. It requires some changes in the standard that are clearly described here.

ISO ContentInformation and Coverages

Content Information
Information about the content of a resource is held in the MD_ContentInformation part of the ISO Standard shown in Figure Content Information. This section of the standard includes descriptions of three types of content. Feature Types, i.e. points, lines, polygons, are described in the MD_FeatureCatalogDescription; images are described using the MD_ImageDescription, and coverages are described using the MD_CoverageDescription. The first two and the related codeList, shown in red, are not relevant to this discussion, so the actual UML we need to consider is greatly simplified (see Figure Coverage Information)
Coverage Information
.


Coverage Information-2
The ISO 19115 Standard was extended (ISO 19115-2) in order to facilitate the description of imagery and gridded data. This extension added the MI_CoverageDescription object as an extension to MD_CoverageDescription in order to include the MI_RangeElementDescription. The Figure Coverage Information-2 includes this extension and shows all of the objects available for describing coverages.


MI_CoverageDescriptions
The UML simplifies further because the MD_ContentInformation object is abstract and the MI_CoverageDescription substitutes for the MD_CoverageDescription. This results in the MI_CoverageDescription "moving to the top" of the UML as suggested by the red arrow on the right side of this Figure and shown in the MI_CoverageDescriptions Figure.

ISO ContentInformation and Coverages

The MI_CoverageDescriptions Figure shows the simplified UML of the three objects available for describing multi-layered datasets like that illustrated in the first Figure. How are these objects used?

MI_CoverageDescription/attributeDescription The attributeDescription provides a description of the attributes in the dataset and potentially the format. It is typically implemented as an xlink to an external resource which is, hopefully, machine readable. It might be some time before this goal is reached. In the mean time, this could be a human readable resoiurce. For example, the SST dataset could include a link to the NESDIS Polar Data Users Guide like

<gmd:attributeDescription
xlink:href="http://www.ncdc.noaa.gov/oa/pod-guide/ncdc/docs/klm/html/c9/sec91-1.htm"/>

MI_CoverageDescription/MD_CoverageContentTypeCode is a codeList that gives the type of the coverage. The Standard includes three options: image, thematicClassification, and physicalMeasurement. It seems likely that we will need to add some other values to this content to describe the variety of types included in satellite products. Additions suggested for the revision of the Standard include: referenceInformation, qualityInformation, auxilliaryData, and modelResult.

MI_CoverageDescription/dimension An MD_CoverageDescription can contain any number of dimensions. This role name seems to cause quite a bit of confusion. The object that it refers to is a MD_RangeDimension or a MD_Band, and the elements in the MD_Band object (minValue, maxValue, units, scaleFactor, and offet) suggest that this could be one of the layers in a multi-dimensional dataset. The definitions of these elements, however, clearly indicate that this is, in fact, a band in the electromagnetic spectrum. Changes to this structure have been proposed as part of the revision to 19115. That proposal is described below.

MD_RangeDimension/sequenceIdentifier/MemberName This field uniquely identifies the band and relates it to the attributeDescription. It includes a name that does not get translated into other languages, and a type.

MD_RangeDimension/sequenceIdentifier/descriptor This field provides a short description of the band.

MD_Band Definitions

Element/Role NameCurrent Definition
MD_Band Band range of wavelengths in the electromagnetic spectrum
maxValue longest wavelength that the sensor is capable of collecting within a designated band
minValue shortest wavelength that the sensor is capable of collecting within a designated band
units units in which sensor wavelengths are expressed
scaleFactor scale factor which has been applied to the cell value
offset offset the physical value corresponding to a cell value of zero

Revisions to the ISO Content Information

The MD_Band object in 19115 and 19115-2 is specifically designed to describe a range of wavelengths in the electromagnetic spectrum that are detected by a sensor (see definition in Table 1). This information is useful for describing unprocessed instrument data commonly referred to as Level 1 data, but it does not work well for the more general case we are addressing here. The roles included in the MD_Band object could also be useful for describing higher-level products and model results, but the current definitions do not allow that. Unfortunately, this limits the application of 19115 to this important type of data.

There are two possible approaches to addressing these limitations: broadening the definitions in the current MD_Band object or adding a new object with similar roles and appropriate definitions. These options were discussed during a 19115 Revision Working Group meeting and the second approach was selected in order to continue existing support for Level 1 datasets and remain backward compatible with 19115 and 19115-2. The proposed revisions are described below.

Coverage Revisions
The proposed revision is the addition of a MD_SampleDimension class to hold the attributes associated with the coverage layer (dimension in this model). This class inherits two existing elements, sequenceIdentifier and description, from MD_RangeDimension and a new element, name, which is a MD_Identifier (code and authority). It shares the scaleFactor and offset roles with MD_Band and the definitions of those do not change. The new class shares the minValue and maxValue roles, which are defined as reals in MD_SampleDimension and are restricted to the current definitions in MD_Band. It includes two general descriptive statistics (meanValue and standardDeviation), the units of the layer, which are the more general UnitOfMeasure rather than UomLength, and numberOfValues which gives the number of values in a thematicClassification coverage. The units element is restricted to UoLength in the MD_Band object. The MD_SampleDimension also includes a RecordType (otherAttributeDescription) and Record (otherAttribute) that can be used for coverage attributes that are not covered by the standard roles in the class.

The usage of the RecordType and Record are described in ISO 19139. A RecordType element refers to a definition a type. The RecordType is implemented in XML as an xlink to the definition of that type. A Record is an instance of that type. In the example below, the RecordType is an xlink to the definition of a netCDF variable type from the NcML schema. The Record is an xlink to an instance of the variable provided by a web service (ncmlService).

<gmd:dimension>
    <gmd: SampleDimension >
        <gmd:otherAttributeType>
            <gco:RecordType xlink:href="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2.xsd
                #xpointer(//element[@name='variable'])">netCDF Variable Type</gco:RecordType>
        </gmd:otherAttributeType>
        <gmd: otherAttributeValue>
            <gco:Record xlink:href="http://www.ngdc.noaa.gov/ncmlService/granuleIdentifier
                #xpointer(/netcdf/variable[@name=MemberName])"> Attributes for variable = memberName in granule =
                granuleIdentifier</gco:Record>
        </gmd: otherAttributeValue>
    </gmd: SampleDimension >
</gmd: dimension >

The final element of the MD_SampleDimension is a MI_RangeElementDescription, defined in ISO 19115-2. This class defines content values for a thematicClassification coverage or flag values for other coverage types. The value of the flag is given in the rangeElement which has a Record type. Typically this will be a character string, but it can actually have any type given using the xsi:type attribute:

<gmi:rangeElement>
   <gmi:name>
      <gco:CharacterString>Missing Flag</gco:CharacterString>
   </gmi:name>
   <gco:Record xsi:type="gco:Integer_PropertyType">
      <gco:Integer>999</gco:Integer>
   </gco:Record>
</gmi:rangeElement>

The MI_RangeElementDescription is included in the MD_SampleDimension class so that each layer in the coverage can have associated flags without requiring a separate MD_CoverageDescription.

The MD_Band object in this model is a MI_SampleDimension restricted to the particular case of wavelength measurements (we are not changing MD_Band definition). It inherits all of the elements from MD_RangeDimension and MD_SampleDimension with the minValue, maxValue, and units elements restricted to UomLength and the cardinality of otherAttribute and otherAttributeType are restricted to [0..0].

We also propose either 1) repeating the processingLevelCode role in the MD_RangeDimension object or 2) moving the processingLevelCode into the MD_CoverageDescription object so that it would exist in the MD_CoverageDescription. MD_Band Definitions

Element/Role NameProposed Definition
processingLevelCode image distributor’s code that identifies the level of radiometric and geometric processing that has been applied.

Note: This role is currently included in the MD_ImageDescription object. We propose moving it to the MD_CoverageDescription so that it can be used in MD_ImageDescription and MD_RangeDescription

name [0..*]: MD_Identifier identifiers for each dimension included in the coverage. These identifiers can be used to provide names for the coverage attribute from a standard set of names (i.e. the CF Convention Standard Names)
MD_SampleDimension the characteristics of each dimension (layer) included in the coverage
maxValue maximum value of each dimension included in the coverage. Restricted to UomLength in the MD_Band class.
minValue minimum value of each dimension included in the coverage. Restricted to UomLength in the MD_Band class.
meanValuemean value of each dimension included in the coverage
standardDeviationstandard deviation of each dimension included in the coverage
units units of each dimension included in the coverage. Note that the type of this is UnitOfMeasure and that it is restricted to UomLength in the MD_Band class.
numberOfValuesthis gives the number of values used in a thematicClassification coverage (i.e. the number of classes in a Land Cover Type coverage) or the number of cells with data in other types.
scaleFactor scale factor which has been applied to each dimension included in the coverage
offset offset the physical value corresponding to a cell value of zero for each dimension included in the coverage
otherAttributeTypetype of other attribute description (i.e. netcdf/variable in ncml.xsd)
otherAttributeinstance of otherAttributeType that defines attributes not explicitly included in MD_CoverageType