Structural Data Types

From NOAA Environmental Data Management Wiki

The GEO-IDE effort acknowledges that NOAA's data systems are insufficiently integrated. This situation is a reflection of technology and management and decision-making strategies of the past that have tended to fragment data management, rather than to unify it. Lines of funding have traditionally been matched to observing system elements -- satellites, ships, profilers, etc.-- and data life cycle points -- measurement, real-time applications, climate analysis, archive, etc. Data management has been considered to be "owned" by the observing system element or the function. Each observing system element has therefore developed individualized approaches to data management, often involving the development of unique (and non-interoperable) data formats and protocols. Real-time data management strategies were devised with little thought to analysis or archive, and so on. Predictably these traditions have hindered the development integrated data management.

Communities of interest within data management are most naturally organized by structural type of data. The lines between these communities are drawn from the answers to key data management questions such as, What techniques are appropriate for searching for these data? For transporting (interchanging) these data? For visualizing or analyzing these data? For storing or archiving these data?

Communities of interest defined by structural data types provide a natural way to organize data management efforts and specify standards required for interoperability. For example, the kinds of standards, best practices, metadata, and access interfaces required for time-series data collections are similar for atmospheric, oceanic, hydrological, biological, or climate data.

Traditional communities of interest defined by pattern of usage will continue to thrive of course, based upon scientific and societal goals. These communities will provide the requirements to an increasingly integrated data management community. For example, weather forecasters will continue to require synoptic access to observations; climate modelers will continue to view the same observations as time series. The role of the data management community will be to find unified solutions that address both of these usage patterns.

Table 4.1 proposes an initial list of communities of interest based upon structural data types. In most cases the structural data types are the natural consequences of the manner in which the data are collected. For any given data stream there may be ambiguities regarding the appropriate structural data type under which it should be handled. As a general rule, the best way to resolve this ambiguity is to choose the most highly ordered data type that could describe the data. Table 4.1 is presented roughly in order from most highly structured data types at the top to least structured types at the bottom.

Structural Data Class Descriptions and subclasses Examples and further explanation
Grids (and collections of grids)
  • finite difference model outputs (structured)
  • finite element model outputs (unstructured)
  • gridded (binned) data products (structured)
  • level 4 (gridded) satellite fields (structured)
  • spherical harmonic spectral coefficients (structured or unstructured)(1)
Moving-sensor multidimensional fields (and collections of same)
  • swaths
  • radials
  • satellite passes
  • HF radar
  • side-scan sonar
  • weather radar
Time series (and collections of time series (2))
  • time-ordered sequence of records (2) associated with a point in space or a more complex spatial feature.
  • ocean moored measurements (3)
  • fish landings at a port
  • stream flow records
  • sun spot activity
  • climate data (surface atmospheric stations)
  • paleo-records from cores, corals, tree rings, …
  • computed climate indices such as SOI
Profiles (and collections of profiles)
  • height or depth-ordered sequence of records (1) at a fixed (or approximately fixed) point in time and position in lat/long
  • atmospheric soundings
  • ocean casts
  • profiling floats
  • acoustic Doppler instruments (structural overlap with time series)
Trajectories (and collections of trajectories)
  • time-ordered sequence of records (2) along a path through space
  • underway ship measurements
  • aircraft track data
  • ocean surface drifters
  • ocean AUV measurements
Geospatial Framework Data (4)
  • lines
  • polygonal regions
  • map annotations
  • shorelines
  • fault lines
  • marine boundaries
  • continually operating reference stations (CORS)
Point data (5)
  • scattered points
  • tsunami or seismic occurrences
  • species sitings
  • geodetic control
  • geospatial data

“data about data” – context information needed for the interpretation of data

Like other data types metadata has distinct requirements for storage, access, archival and transport.

Metadata content is a major focus of discussions within all of the data types. Metadata as a “data type” refers specifically to its unique requirement and properties with respect to archival, access, and transport.

  1. In some cases grids represent coordinate systems that are mathematically transformed from simple latitude-longitude-depth-time positions. Spherical harmonic spectral coefficients are an example of such.
  2. A “record” refers to one or more associated parameter values and associated metadata.
  3. Standards for time series need to consider small, time-dependent excursions in latitude, longitude and depth. Cabled ocean moorings are an example of such.
  4. The “GIS perspective” must be a major focus in the discussion of all of the data classes listed in this table.
  5. As an organizing principle for data “Point Data” is the lowest common denominator. Most structural data types are reducible to collections of points, though with a loss of essential semantics in most cases. For example, a grid may be represented as a collection of ordered tuples. Some types of measurements, for example tsunami occurrences or species sitings, naturally possess limited structure. For these measurements the Point Data structure is the natural classification.
Note that real time delivery of data will generally remove time structure, so that, for example, a collection of Time Series may reduce to Point Data when accessed in real time.

Global Earth Observation Integrated Data Environment CONOPS