GCOOS Data Management and Communications Committee

Terms of Reference

The DMAC Committee will oversee development of the data management and communications component of GCOOS and ensure its alignment with the IOOS DMAC Plan. The DMAC will make recommendations to the Board on research or pilot projects that are needed to sustain and enhance the coastal ocean observing system and associated data use. The DMAC also is responsible for recommending to the Board annual updates to the business plan for data management and communications activities. The DMAC Committee will be appointed by the GCOOS Board of Directors and the chair will be elected by the members of the Committee.

Data Management and Communications Committee

Member Affiliation Sector Represented
Steve Anderson Horizon Marine P
Brenda Babin LUMCON A
Steve Beaudet SAIC P
Julie Bosch NOAA G
Bill Burnett NDBC G
Jennifer Colee Mobile District, USACE G
James Davis Texas A&M University - Corpus Christi A
Matthew Howard TAMU A
Edward Kearns (Chair) South Florida Natural Resources Center G
Jay Ratcliff New Orleans District, USACE G
Robert Raye Shell P
Vembu Subramanian USF A

* Sectors: A = Academic; G = Government; P = Private

Meetings & Reports

The first meeting of the Data Management & Communications Committee was 26-27 April 2006 in Biloxi, MS. The second meeting was 27-29 November 2007 in New Orleans, LA. The meeting report is available.

Action Plan for December 2007 - June 2008

The GCOOS DMAC action plan addresses three general areas:

  1. Protocol selections leading to interoperable data systems,
  2. Survey, assessment and entrainment of data providers, and
  3. Promoting communications and IT exchanges between data providers.

I. Protocol selections leading to interoperable data systems.

The core elements of the IOOS DMAC plan include: data discovery, catalog, online browse, data access & transport, metadata, and archive. GCOOS DMAC adds quality control and quality assurance (QA/QC) to this list. GCOOS DMAC will work on several elements of these core elements during the coming year. The tasks, responsible individuals, and completion dates are as follows.

1) Adopt Data Dictionaries. Dictionaries are collections of words, spellings, and definitions. In this context they are the names of measured oceanographic parameters, spellings, and definitions including units. Each data provider uses a set of words (salinity, temperature) to offer their data to the world and to store their data in their databases. In order for automated machine-to-machine exchanges (e.g. catalog searches) to be successful, both machines must know the other's dictionary. If the dictionaries are not identical then a mapping between terms must be made. Mapping between terms can be exact (salinity = salinity), equivalent (salinity = S), related (salinity // conductivity), etc. and can be quite complex. If all data providers use the same dictionary, the need to map between terms is eliminated. Our premise is that dictionaries used by the various GCOOS regional data providers for their near real-time data streams are probably quite similar. If so, each provider could adopt a common dictionary and would only need to make a few changes to their own local data systems. A candidate common dictionary is the SEACOOS CDL V3 currently used by SEACOOS and by USF's COMPS.

Task 1a. We will acquire the current dictionaries used by the non-federal GCOOSRA data providers and assess and document the changes required to adopt the SEACOOS CDL V3. (Subramanian – March 2008)

Task 1b. Acquisition of a suitable dictionary for ecological/biological terms will be pursued. Potential sources of information include but are not limited to BODC, Global Change Master Directory (GCMC) and OBIS. (Kearns – March 2008).

Task 1c. If all GCOOS-RA data providers cannot agree upon a single common dictionary then a cross-walk activity (mapping terms between dictionaries) must take place. The Marine Metadata Interoperability (MMI) program has conducted cross-walk training workshops in the past and is planning on offering another workshop in 2008. This workshop could be held/co-hosted in the GCOOS region. Howard is on the MMI project and will coordinate. (June 2008).

Task 1d. Agree on approach for data dictionaries in the GCOOS-RA. (Kearns April 2008).

2) Adopt Data Access and Transport Method(s). This core element of the IOOS DMAC plan deals with how data and metadata are selected and moved from source to destination. Most of us are familiar with ftp in which the entire file associated with a given name is selected and transported. However, there is no provision with ftp for bringing over part of the named file or for knowing what is in the file before retrieving it. The IOOS DMAC Best Practices guide recommends OPeNDAP as a candidate transport protocol. An initial OPeNDAP call can be made to retrieve information about a file's contents (spatio-temporal coverage, parameters, etc.) and a secondary call can retrieve a selected data subset, rather than the whole file. The data arrive in an organized ready-to-use way. Most GCOOS data provider are offering their data via OPeNDAP. GCOOSRA formally endorses its use. Similarly, the Open Geospatial Consortium (OGC) has worked on developing standard ways to move data with geospatial content across the network using Web Services (i.e., XML-based exchanges). This content can be images or numerical values with sufficient geospatial metadata to enable precise overlay with other geographic data using Geographic Information System (GIS) software–either proprietary (ESRI) or open source. IOOS DMAC has approved the Web Services approach but the details remain to be established. This is an evolving topic with work going on in the grass-roots community (OOSTethys and OGC Ocean Interoperability Experiment (OIE)) and at NOAA/CSC's Data Transport Lab (DTL). LUMCON and COMPS participated with the DTL in prototype delivery systems involving GML/Web Feature Services.

Task 2a. Babin and Subramanian will summarize the outcomes of their collaboration with the NOAA/DTL and convey their recommendations to GCOOS-RA DMAC. (March 2008).

3) Metadata Standards. Standards, in this context, has two parts. The first is content standards. Content is the auxiliary information recorded about data. This information is used for sorting and selecting records. A content standard is a community agreement on what information needs to be recorded. The second part is metadata format standards. Expressing metadata content records in standard ways make it easy to aggregate records from multiple sources into community catalogs. The Federal Geospatial Data Committee (FGDC) has established a format standard that is easily ingested by many catalogs including NASA's Global Change Master Directory (GCMD). Federal funding usually comes with requirements that FGDC-compliant records be created and filed in an appropriate catalog. The GCOOSRA Data Portal will want a machine-readable catalog for several reasons–automated data discovery is one. We will ask our regional data providers to provide continually updated metadata records expressed in a standard way, most likely in a FGDC-compliant formats. GCOOS guidance for selecting a metadata content standard comes from the IOOS DMAC Metadata Standards Expert Team and from the Quality Assurance of Real-Time Ocean Data (QARTOD) workshops. Both groups have been attempting to define a content standard for real-time IOOS data. The US (through FGDC) and Canada are working on a new standard called the North American Profile which is tied to evolving international metadata standards. This work will have an impact on the community standards and deserves our attention. Julie Bosch is a member of the IOOS DMAC Metadata Expert Team, and the GCOOS DMAC Committee and we will rely on her expertise in this area.

Task 3a. Bosch will recommend a prototype content standard for use by the GCOOSRA data portal (March 2008).

4) Adopt data QA/QC and data handling methods. Although not formally part of the IOOS DMAC Plan, the GCOOSRA DMAC committee feels that QA/QC is a required element of our overall data handling strategy. The Quality-Assurance of Real-Time Ocean Data (QARTOD) group has been gathering community input for the past several years on QA/QC approaches to commonly-collected oceanographic data. NDBC has been working on QA/QC algorithms for real-time data. Responsibility for applying QA/QC to the GCOOSRA data streams will be shared between the data providers and the data portal. We will extract the QARTOD recommendations and transmit these to the GCOOSRA data providers and encourage them to incorporate these recommendations into their standard QA/QC processing by September 2008. The Local Data Nodes project requires that the previous year's worth of data be available online by the third year. We take steps to assure these delayed mode data are subjected to additional delayed mode QA/QC as appropriate. Final (delayed-mode) data will be transmitted to the appropriate national archives. We believe WHOI is working on a Sensor ML/QARTOD synthesis and we will monitor this program's activity.

Task 4a. Obtain QARTOD recommendations (Burnett June 2008).
Task 4b. Work with local data nodes to apply standard QA/QC. (Howard Sept 2008).

5) Establish/promote data discovery. Data discovery will be supported though the use of catalogs. GCOOS-RA will interact with several types of catalogs. First is the NOAA/CSC IOOS Regional Observation Registry. This is a "live" catalog of the non-federal observing assets of the 11 Regional Associations. Individual data providers publish a list of their assets in a publicly-available computer directory. These lists are harvested automatically for selection and display. It is important for these lists to be accurate at all times. GCOOS-RA DMAC will encourage the local data nodes to automate the production of these lists to keep them accurate. The second type are the national clearinghouse catalogs such as NASA's Global Change Master Directory (GCMD) and GeoSpatial One-Stop. GCOOS-RA will submit the region's metadata in FGDC-compliant formats to these clearinghouses through the Local Data Nodes project's activities. NCDDC's Metadata Enterprise Resource Management Aid (MERMAid) tool for creating, validating, managing and publishing FGDC compliant metadata will be of use in this effort. The third type is a machine-accessible catalog to support the Data Portal work. One approach for this catalog might be to build upon the IOOS Regional Registry Project work. Another possibility is the standards-based approach under consideration and development by OGC called OpenGIS Catalogue Services–ebRIM profile of CSW which establishes a framework for implementing catalog services.

Task 5a. Acquire and practice with MERMAid. Seek support in the local data nodes for automated production of metadata for IOOS Regional Registry catalog use. Examine ebRIM for suitability for Data Portal use. (Howard, June 2008).

II. Survey, assessment and entrainment of data providers, and

6) Entrain potential sources of Gulf Coastal Data. The 10 principal GCOOS-RA non-federal data nodes will provide their physical oceanographic data to the Data Portal as specified in the Data Portal and Local Data Nodes Project proposals. GCOOS DMAC wishes to add biophysical data and federal data to this list. To start making progress in this area we need an inventory of these additional assets. The federal agencies and the person assigned to survey the assets of the listed federal group are listed below. We are particularly interested in easily-entrained and machine searchable data sets.

Task 6a. Survey Federal data providers for ingestible data streams. Potential sources and responsible individuals are listed below (September 2008).

  • NOAA/NDBC – Bill Burnett
  • NOAA/NOS – Charlton Galvarino
  • USACE – Jennifer Colee
  • USGS – Charlton Galvarino
  • NOAA/NMFS – Julie Bosch
  • NWS – Charlton Galvarino
  • NPS – Ed Kearns
  • CDMO/NERRS – Matt Howard
  • PODAAC – Brenda Babin
  • RV data – Brenda Babin
  • NGI OPeNDAP – Julie Bosch
  • Shell – Robert Raye

7) Self-assessment of 10 funded data node partners. Before the Data Portal can be implemented we need a complete self-assessment of the GCOOS-RA observing system data streams. For example, to establish the Data Portal database records we need to know the full-spectrum of what is being measured. We need to understand data delivery systems of each data provider to design and establish interfaces to them. Much of this information is also needed to write the GCOOS-RA Data Management Plan (RA-DMP). In addition, we need to inventory our human resources and their skills sets so we know who to call or who to bring to bear on particular IT issues. Some of the required information has already been collected. For example, the IOOS Regional Observation Registry is a list of near real-time data streams and all nodes are participating in this program. The Local Data Nodes Project involves the IT staff at each of the local data nodes so we have established the contact list. We need to complete the assessment of IT skills and the system architectures at each node.

Task 7a. Gather the information needed to complete the self-assessment. (Bosch, Howard, Kearns – begin in January 2008).

III. Promoting communications and IT exchanges between data providers.

8) Establish connections to all relevant data management IT activities. There are a number of external groups working on IT issues related to IOOS DMAC. These groups may have activities, development efforts or results that are of interest or use to GCOOS-RA DMAC efforts. We will keep apprised of these national and grass-root activities by attending, participating and reporting of these activities. We will work to transfer technology between local data provider nodes by sending experts on site visits. We are calling these IT transfers by experts "Geek Squad Visits". Provision has been made in the Local Data Nodes contracts for representatives from each of 10 nodes to attend two IT workshops or conferences per year to keep involved and abreast of activities related to interoperability. We expect these individuals will relay what they have learned to the others data nodes and to GCOOS-RA DMAC. Some national and grass root activities known to us at the present time are:

IOOS FY2007 Grantees workshop
NOAA/CSC Community information repository (CIR)
NOAA/CSC Data Transport Lab (DTL)
OGC Ocean Interoperability Experiment OIE
OOSTETHYS
Marine Metadata Interoperability Program (MMI)
NOAA IOOS Data Integration Framework (DIF)

Task 8a. Report to GCOOS-RA and local data nodes of outcome of FY2007 Grantees workshop (Howard, February 2008).
Task 8b. Establish communications mechanism (forums, wiki, etc.) for GCOOS-RA (Howard February 2008).
Task 8c. Establish a DMAC Geek Squad (Kearns coordinator: start May/June 2008).

Other Priorities

  1. GCOOS DMAC will participate in the Data Portal scoping activities in early 2008
  2. GCOOS DMAC seeks closer connections with the Products and Services Committee to learn what products are needed so that required data connections can be made.
  3. GCOOS DMAC seeks closer communications with the Observations Committee to identify new observing systems, especially from the private sector.

Suggested Projects

  1. GCOOS IT "Geed Squad" (2 FTEs: 1 full-time regional tech support person, 1 full-time management but technical savvy person to serve all data provider partners.
  2. Seek NOAA/CSC assistance to install OOSTethys software
  3. Regional Ops Center: explore NDBC's role in QA/QC and 24/y operations.

Previous Action Plans