Deep-ocean Assessment and Reporting of Tsunamis

From NOAA Environmental Data Management Wiki

The NOAA Deep-ocean Assessment and Reporting of Tsunamis (DART) Program is a great example of an end-to-end observing program that is entirely within NOAA. The components include observing platforms operated by the National Data Buoy Center (NDBC), scientific input and data processing done at PMEL, and long-term archiving at the National Geophysical Data Center (NGDC). There are challenges in documenting this data completely. For example, over time one station may be "occupied" by several distinct physical instruments (deployments) that measure the same or similar geophysical parameters. Or, different organizations (e.g. NDBC, PMEL, NGDC), may be responsible for different steps in the data collection/processing/distribution/archive sequence and therefore each agency is responsible for documenting their roles independently. This distributed responsibility introduces heterogeneity in the process that must be minimized or eliminated in order to have a clear end to end description of the DART data.

The following sections outline an ISO 19115 implementation of a procedure designed to document the data recorded by one DART station. A similar example, provided by the JCOMM OceanSITES program, shows an OGC SensorML implementation of a similar site specific documentation process. A description of the OceanSITES station documentation can be found here.

DART Metadata

The DART Observing Systems consists of a surface buoy and bottom pressure recorder (BPR). The BPR records sea level pressure and stores the data on an internal flash card for retrieval every 1-3 years. Lower resolution data is also sent to the operational center and research centers in near real-time. The high-resolution (15-sec) flash card data for each deployment comes to the archive and is documented with an FGDC metadata record. The metadata records are identified as station_deploymentYear. For example, the station D165 has had three deployments described in the metadata files D165_1999, D165_2000, and D165_2001.

DART BPR Data Flow

ISO 19115 Implementation

The DART situation of multiple deployments at each site seems well suited to description using ISO DS_Series. The framework of a metadata record for the D165 station might look like:

<gmd:DS_Series>
   <gmd:composedOf>
      <gmd:DS_DataSet>
         <gmd:has>
            <gmi:MI_Metadata>
               <!-- characteristics of the 1999 deployment -->
            </gmi:MI_Metadata>
         </gmd:has>
         <gmd:has>
            <gmi:MI_Metadata>
               <!-- characteristics of the 2000 deployment -->
            </gmi:MI_Metadata>
         </gmd:has>
         <gmd:has>
            <gmi:MI_Metadata>
               <!-- characteristics of the 2001 deployment -->
            </gmi:MI_Metadata>
         </gmd:has>
      <gmd:DS_DataSet>
   <gmd:composedOf>
   <gmd:seriesMetadata>
      <gmi:MI_Metadata>
         <!-- description of the collection -->
      </gmi:MI_Metadata>
   </gmd:seriesMetadata>
</gmd:DS_Series>

In this framework only the specific information required to describe each deployment (or granule) is included in the MI_Metadata objects in the first section (DS_DataSet). Information that is common to all of the deployments is included in the seriesMetadata. Note that this arrangement is similar to the familiar concepts of granule metadata, where a deployment is a granule, and collection metadata, where the series is a collection.


Deployment Metadata

In the DART case a good starting point for the deployment metadata might include:

  • A fileIdentifier that identifies the deployment:
<gmd:fileIdentifier>
   <gco:CharacterString>gov.noaa.ngdc.dart:D165_1999</gco:CharacterString>
</gmd:fileIdentifier>
  • A parentIdentifier that identifies the series as the parent record:
<gmd:parentIdentifier>
   <gco:CharacterString>gov.noaa.ngdc.dart:D165</gco:CharacterString>
</gmd:parentIdentifier>
  • A hierarchyLevel that describes the scope of the metadata using a standard codelist. In this case, the value "collectionSession" fits the deployment very well (see ISO Scope Codes for other possibilities):
<gmd:hierarchyLevel>
   <gmd:MD_ScopeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ScopeCode" codeListValue="collectionSession">collectionSession</gmd:MD_ScopeCode>
</gmd:hierarchyLevel>
  • A contact for the metadata. In this case this is a reference to the metadata contact that is fully described in the seriesMetadata section (this is required):
<gmd:contact xlink:href="#seriesMetadataContact"/>
  • A dateStamp for the metadata (this is required):
<gmd:dateStamp>
   <gco:Date>2007-07-23</gco:Date>
</gmd:dateStamp>
  • An identificationInfo object that identifies the deployment and includes minimal discovery information (title, abstract, spatial/temporal extent):
<gmd:identificationInfo>
   <gmd:MD_DataIdentification>
      <gmd:citation>
      <gmd:abstract>
      <gmd:language>
      <gmd:extent>
   </gmd:MD_DataIdentification>
</gmd:identificationInfo>
  • An acquisitionInformation object that describes the instrument used during the deployment:
<gmi:acquisitionInformation>
    <gmi:MI_AcquisitionInformation>
        <gmi:instrument>
            <gmi:MI_Instrument>
                <gmi:identifier>
                    <gmd:MD_Identifier>
                        <gmd:code>
                            <gco:CharacterString>21417_2007</gco:CharacterString>
                        </gmd:code>
                    </gmd:MD_Identifier>
                </gmi:identifier>
                <gmi:type/>
                <gmi:description>
                    <gco:CharacterString>Parameter,Definition/Units,Value | STATION_ID,"None",21417_2007 | DEPTH,"meters",5483 | PAROS_DUCER,"Serial Number of the Paros
                        transducer",105446 | BOARD,"Board Number. If = 999, then board number not applicable, never existed or not recorded for this time.",9005659 |
                        OSC,"N/A or NULL indicates early models. There was time bases on the early models.", | LATITUDE,"decimal degrees",43.1871 | LONGITUDE,"decimal
                        degrees",157.1397 | DEPLOYMENT_JD,"Julian day - if value = 999, exact deploy date is unknown.", | DEPLOYMENT_DATE,"YYYY-MM-DD - if value = 99999,
                        exact deploy date is unknown.",20070723 | RECOVERED,"Y/N",Y | RECOVERY_JD,"Julian Day - if value = 999, exact recovery date is unknown.", |
                        RECOVERY_DATE,"YYYY-MM-DD - if value = 99999, exact recovery date is unknown.",20080423 | SAMPLE_RATE,"Seconds",15 |
                        DATA_START_DATE,"YYYY-MM-DD",2007-07-23 | DATA_END_DATE,"YYYY-MM-DD',2008-04-30 | BASE,"If base = 99999, then it is not applicable.", |
                        TEMP_CORR,"If value = 999, MTR was not used.", | CLOCK_DRIFT,"if value = 999, technician did not record value.", | TR_PRE,"None", |
                        TR_POST,"None", | LOCATION,"N/A - if location information is not known.",Northwest Pacific near Kuril Islands | DATA_QUAL,"None",Good. | COMMENTS,
                        | DEPLOY_PLATFORM,"Ship name and cruise number.",M/V Bluefin | RECOVERY_PLATFORM,"Ship name and cruise number, if NULL - unit not recovered.",M/V
                        Bluefin | MIN_PRESSURE,"mm", | MAX_PRESSURE,"mm", | MIN_TEMP,"Degrees Celcius", | MAX_TEMP,"Degrees Celcius", |
                        DATE_RCVD,"None",2008-10-30T12:00:00 | BUOY_LATITUDE,"decimal degrees", | BUOY_LONGITUDE,"decimal degrees", | THRESHOLD,"mm",30 | DATA_URL,"None",
                        | BUOY_NO,"None", | D_TYPE,"None",2 | TIME_BASE,"None", | AC_PROP_NO,"None", | RT_START,"None", | RT_END,"None", | RT_COMMENTS,"None", |
                        DATA_FLAG,"None",0 | AC_REL_ENABLE,"None",422503 | AC_REL_DISABLE,"None",422520 | AC_REL_RELEASE,"None",434474 | AC_SERIAL_NO,"None",31158 |
                        REF_OSC_PERIOD,"microseconds",.476837015 | BPR_SOFTWARE_VERSION,"None",2.5.2 | STATION_OCCUPIED,"None",2007-07-23 00:00:00.0 |
                        PREVIOUS_NAME,"None", | BUOY_DEPLOY,"None",2007-07-23 00:00:00.0 | BUOY_RECOVER,"None",2008-04-23 00:00:00.0 | DATUM,"None",WGS-84 |
                        PAROS_CAL_COEFF,"None",</gco:CharacterString>
                </gmi:description>
            </gmi:MI_Instrument>
        </gmi:instrument>
    </gmi:MI_AcquisitionInformation>
</gmi:acquisitionInformation>

This section contains a blob of comma-separated value text that includes various instrument (and other) parameters, their definitions, and their values for this deployment. This presentation is ugly, but it gets the information into the record.

Series Metadata

The seriesMetadata includes the information that is applicable to the whole collection:

  • A fileIdentifier that identifies the collection:
<gmd:fileIdentifier>
   <gco:CharacterString>gov.noaa.ngdc.dart:D165</gco:CharacterString>
</gmd:fileIdentifier>
  • A hierarchyLevel that describes the scope of the metadata using a standard codelist. In this case, the value "collectionSession" fits the deployment very well (see ISO Scope Codes for other possibilities):
<gmd:hierarchyLevel>
   <gmd:MD_ScopeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ScopeCode" codeListValue="series">series</gmd:MD_ScopeCode>
</gmd:hierarchyLevel>
  • A contact for the metadata with complete information (note id for this contact that is referenced from the deployment metadata):
<gmd:contact>
   <gmd:CI_ResponsibleParty id="seriesMetadataContact">
      <gmd:individualName>
      <gmd:organizationName>
      <gmd:positionName>
      <gmd:contactInfo>
      <gmd:role>
         <gmd:CI_RoleCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_RoleCode" codeListValue="custodian"/>
       </gmd:role>
   </gmd:CI_ResponsibleParty>
</gmd:contact>

The role of this person comes from a code list. See ISO People for more possibilities.

  • A dateStamp for the metadata (this is required):
<gmd:dateStamp>
   <gco:Date>2007-07-23</gco:Date>
</gmd:dateStamp>
  • The metadata standard name and version:
<gmd:metadataStandardName>
   <gco:CharacterString>Geographic information — Metadata — Part 2: Extensions for imagery and gridded data</gco:CharacterString>
</gmd:metadataStandardName>
<gmd:metadataStandardVersion>
   <gco:CharacterString>ISO 19115-2:2009-02-15</gco:CharacterString>
</gmd:metadataStandardVersion>
  • Any number of spatialRepresentationInfo objects that describe what kinds of spatial objects are in the dataset or describes the axes for a gridded dataset.
<gmd:spatialRepresentationInfo>
   <gmd:MD_VectorSpatialRepresentation>
      <gmd:geometricObjects>
         <gmd:MD_GeometricObjects>
            <gmd:geometricObjectType>
               <gmd:MD_GeometricObjectTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_GeometricObjectTypeCode" codeListValue="point">point</gmd:MD_GeometricObjectTypeCode>
            </gmd:geometricObjectType>
         </gmd:MD_GeometricObjects>
      </gmd:geometricObjects>
   </gmd:MD_VectorSpatialRepresentation>
</gmd:spatialRepresentationInfo>
  • A referenceSystemInfo object that describes the coordinate reference system:
This is usually expressed using an identifier from the European Petroleum Survey Group (http://www.epsg-registry.org), typically urn:ogc:def:crs:EPSG:4326.
  • Any number of identificationInfo objects that identify the collection and include full discovery information and the bounding spatial/temporal extent for the collection:
<gmd:identificationInfo>
   <gmd:MD_DataIdentification>
      <gmd:citation>
      <gmd:abstract>
      <gmd:purpose>
      <gmd:credit>
      <gmd:status>
      <gmd:pointOfContact>
      <gmd:resourceMaintenance>
      <gmd:resourceFormat>
      <gmd:descriptiveKeywords> <!-- ISO 19115 Topic Category -->
      <gmd:descriptiveKeywords> <!-- NASA/GCMD Earth Science Keywords  -->
      <gmd:resourceConstraints>
      <gmd:language>
      <gmd:topicCategory>
      <gmd:extent> <!-- includes boundingExtent, boundingGeographicBoundingBox and boundingTemporalExtent -->
   </gmd:MD_DataIdentification>
</gmd:identificationInfo>
  • Two contentInformation objects that describe the content of the dataset. The first points to the DART Home page as a non-compliant feature catalog. The second is a coverage description that describes the parameters in the data files: time, temperature, and pressure, even though these are really point data:
<gmd:contentInfo>
   <gmi:MI_CoverageDescription>
      <gmd:attributeDescription>
      <gmd:contentType>
      <gmd:dimension>  <!-- Time -->
      <gmd:dimension>  <!-- Temperature -->
      <gmd:dimension>  <!-- Pressure -->
   </gmi:MI_CoverageDescription>
</gmd:contentInfo>
  • A distributionInfo object that describes where and how the data can be obtained.
  • Three dataQualityInfo objects that describe
    1. how the quality of the collection has been measured (general reports from the FGDC record)
    2. the lineage of the dataset and
    3. the history of the project (mission history from the FGDC-RSE).

The second dataQualityInfo object should include (at least) a source object for each deployment and a process step that describes the receipt of that object by the archive. The sourceExtent should match the deployment extent given in the granule metadata.

  • Any number of metadataConstraints objects that describe constraints on the use of the metadata.
  • A metadataMaintenance object that describes how and when the metadata are maintained. For translated records this includes a note describing the source and translation process.
  • An acquisitionInformation object that describes the instruments/platforms used to collect the observations.