JOSS Data Management Plan for support of the
Lake-ICE/SNOWBAND Project















FINAL DRAFT









December 1997



















Prepared by:

University Corporation for Atmospheric Research (UCAR)

Joint Office for Science Support (JOSS)

P.O. Box 3000

Boulder, Colorado 80307 USA













TABLE OF CONTENTS

1.0 Introduction/Background

2.0 Lake-ICE/SNOWBAND Data Management Policy

2.1 Data processing/quality control

2.2 Data Availability

2.3 Community Access to Data

2.4 Distributed Archive

3.0 Data Management Functional Description and Strategy

3.1 JOSS CODIAC System

3.2 Investigator Requirements

3.2.1 Data Format Convention

3.2.2 Dataset Documentation

3.2.3 Dataset Submission Procedures

3.3 Data Collection during the Field Phase

3.3.1 On-line Catalog

3.4 Data Processing after the Field Season

3.5 Data Archival and Long-term Access

4.0 Lake-ICE/SNOWBAND Datasets

4.1 Operational Data Collection and processing

4.2 Research Data Collection



1.0 Introduction/Background

The Lake-Induced Convection Experiment (Lake-ICE) seeks to determine how the atmosphere is modified by heat and moisture input from the Great Lakes by focusing on three important scientific goals: 1) determining mechanisms which control the structure and evolution of mesoscale convective circulations (such as boundary layer rolls and shore-parallel bands) in boundary layers strongly heated from below, 2) determining interrelationships between these mesoscale circulations, fluxes throughout the depth of the boundary layer, and cloud and precipitation development, and 3) identifying the processes by which heat and moisture fluxes from the Great Lakes augment large-scale atmospheric circulations. The intensive field phase will occur from 1 December 1997 to 30 January 1998 with a base of operations in Ann Arbor, Michigan. Details on field operations and data collection are contained in the Lake-ICE Scientific Overview and Implementation Plan (October 1997).

The development and maintenance of a comprehensive and accurate data archive is a critical step in meeting the scientific goals of Lake-ICE. A distributed data archive will be established to contain the entire Lake-ICE dataset. This distributed data base will allow users access to the variety of measured and derived fields obtained during Lake-ICE. Oversight of the Lake-ICE data management task will come from the Lake-ICE Science Steering Committee and will be coordinated with investigators and other participating groups. This Lake-ICE Data Management Plan describes the guiding data management policies (section 2), the strategy and functional description of the data management system (section 3), and the implementation details of the data management system (section 4).

The Lake-ICE/SNOWBAND project data management activities include support from the UCAR Joint Office for Science Support (JOSS) located in Boulder, Colorado. The JOSS activities fall into three major areas: (1) development and implementation of a real-time data catalog to provide field support for the PIs, (2) collection of a comprehensive suite of operational datasets to support project data analyses, and (3) establishment of a final archive and provision of data distribution/support for the PIs as well as the larger community. General guidance is given to the JOSS by the Lake-ICE Science Steering Committee (see Section 2). The JOSS has primary responsibility for the collection, compilation and access to all supporting operational data (see Section 4.1.1) for Lake-ICE. The JOSS will quality control and reformat select operational data (soundings and surface data) prior to access by the community (see Section 3). The JOSS will be responsible for compiling some other research data that were collected, processed and quality controlled by the investigators (see Section 4.1.2), and generating combined datasets or providing access to separate datasets, as necessary (see Section 3.6). JOSS will house a portion of the distributed Lake-ICE Data Center Archive (see Section 3.7) and complement other groups that make up the archive.



2.0 Lake-ICE/SNOWBAND Data Management Policy

The following data protocols were discussed and agreed to by the Lake-ICE Science Steering Committee and are delineated in the Lake-ICE Scientific Overview and Implementation Plan (November 1997). These data protocols form the basis of the data management strategies discussed in subsequent sections of this document.



2.1 Data Processing/Quality Control

All data releases in the field will be considered preliminary data to be used for planning and operational purposes only. No further distribution of the data will be permitted without the consent of the Principal Investigator (PI). At the end of the field phase of Lake-ICE, no preliminary data will be archived unless agreed to by the PI. PIs will be responsible for the final processing and quality control of their own datasets.

All investigators participating in Lake-ICE must agree to promptly submit their processed, quality controlled data to the Lake-ICE Data Archive to facilitate intercomparison of results, quality control checks and inter-calibrations, and an integrated interpretation of the combined dataset.



2.2 Data Availability

All data shall be promptly provided to other Lake-ICE PIs upon request. Lake-ICE PIs are defined as those included in the Scientific Overview and Implementation Plan and/or those directly participating in the field experiment. Distribution can be done either directly by the PI or through the Lake-ICE Data Archive with the permission of the PI.



2.3 Community Access to Data

It is the intent of the Lake-ICE Science Steering Committee that all data will be considered public domain after the end of the field experiment (February 1998) and that any use of the data will include either acknowledgment or co-authorship of the PI who collected the data. General community access to the data will be available through the JOSS Data Management System (CODIAC) [see section 3.1].



2.4 Distributed Archive

JOSS, project PIs and facilities will all have responsibility for long term access to the data. This document describes the datasets for which JOSS will have responsibility. JOSS will establish links via CODIAC to other archive locations to make access to the different archive centers as efficient as possible.



3.0 Data Management Functional Description and Strategy

It is important that the Lake-ICE data management strategy be responsive to the needs of the investigators, assuring that data are accurate and disseminated in a timely fashion. It is also important that the investigators know what is expected of them in this process. A time line of critical dates in the sequence of Lake-ICE data management tasks are included in Fig. 1. After a description of the JOSS data management system (CODIAC), each step in the Lake-ICE data management process is discussed.



3.1 JOSS CODIAC System

The CODIAC data management system is central to JOSS data related services. CODIAC offers scientists access to research and operational data. It provides the means to identify data sets of interest, facilities to view data and associated metadata, and the ability to automatically obtain data via internet file transfer or magnetic media. The user may browse data to preview selected datasets prior to retrieval. Data displays include time series plots for surface parameters, skew-T/log-P diagrams for soundings, and GIF images for model analysis and satellite imagery. CODIAC users can directly retrieve data. They can download data via Internet directly to their workstation or personal computer or request delivery of data on magnetic media. Data may be selected by time or location and can be converted to one of several formats before delivery. CODIAC automatically includes associated documentation concerning the data itself, processing steps, and quality control procedures.

Users may access CODIAC using a World Wide Web (WWW) browser such as Mosaic, Netscape, or Lynx.

WWW address (JOSS Home Page): http://www.joss.ucar.edu/

For questions on CODIAC contact JOSS at (303)497-8987 or e-mail to: codiac@ncar.ucar.edu



3.2 Investigator Requirements

The first step in organizing the data management support is to understand what data are anticipated from the various components of the program. JOSS has surveyed the participants and documented these items from the individual PIs. Section 4 provides results from the questionnaire with information on both research and operational datasets collected for the Lake-ICE/SNOWBAND project. The Lake-ICE Science Steering Committee has agreed that tasks associated with Lake-ICE/SNOWBAND data aquisition (e.g. in-field record keeping, backing up field data, data documantation [for catalog purposes], provision of data to data processing locations, and processing of raw data into geophysical parameters) will be performed by the participating PIs. The PIs will be requested to document datasets in accordance with JOSS documentation guidelines so that the data (and associated metadata) can be included in the Lake-ICE/SNOWBAND on-line catalog and in CODIAC.



3.2.1 Data Format Convention

It is recognized that initial field datasets produced by investigators' instrumentation may be in a variety of formats and completeness (World Meteorological Organization (WMO) level I and IIA data). It is important that processed data end up in a common format whenever possible or practical, accessible by all Lake-ICE/SNOWBAND investigators and eventually the larger scientific community. Establishing a standard format is quite important in the Lake-ICE/SNOWBAND projects where it is important to compare different instruments in the same space/time environment. It is desirable to compare observations all documented in UTC time (YYMMDDHHmmss) (YY=year, MM=month, DD=Calendar day, HH=UTC hour, mm=minute and SS.ss= seconds and hundreths (as appropriate), rather than in diverse time references.

The following format standards are being proposed by the JOSS for use by Lake-ICE and SNOWBAND Principal Investigators. These standards would apply to all datasets. This format will maximize flexibility in the use of a variety of analysis and display tools (spreadsheets, graphics engines, etc.)

The Lake-ICE Science Steering Committee has yet to endorse particular format standards for data delivered to the Lake-ICE archive (WMO Level II. or higher). It will be necessary for several formats (NetCDF, HDF, ASCII, etc.) to be used and available due to established facility procedures and investigator requirements. The following format conventions are recommended whenever possible.

Lake-ICE/SNOWBAND Dataset Format Structure

HEADER

First row contains column name (date, time, temp, wnd_spd)

Second row contains column units (UTC, HH:MM , C , m/s)

DATA

YYMMDDHHmmss.ss

YY = Year MM = Month (01-12) DD = day (01-31)

HH = hour (00-23) mm = Minute (00-59) ss.ss = Seconds and hundreths (as applicable)

Columns may be tab, space or comma delimited

Date/time block, parameter1, parameter2, etc.

Date time start, data/time stop, parameter1, parameter2, etc.

JOSS will work with the investigators to implement these format standards for data submitted to the archive and specify data formats for data delivered by the archive. A critical initial understanding of the diversity of Lake-ICE/SNOWBAND formats will be possible as investigators complete the data management questionnaire. It is important to understand any format conventions prior to data collection. There may be certain situations where conversion to alternate formats must occur after the data are received at the archive location and prior to dissemination.



3.2.2 Dataset Documentation

The importance of providing complete and separate documentation ("read me" file) with every Lake-ICE/SNOWBAND dataset, regardless of format, cannot be over emphasized. It is critical for the long term viability of the comprehensive data base and the easiest way to explain to everyone who might use a dataset important details that might be forgotten in years to come. There are several important components to a complete documentation file that should accompany a given dataset. They include;

3.2.3 Dataset Submission Procedures

JOSS has encouraged and will continue to encourage all participating investigators to provide sample datasets at any time prior to receipt of full preliminary datasets. Receiving this information allows JOSS to continue customization of the data management system for Lake-ICE/SNOWBAND. We hope that all data can be provided in the formats discussed above but will appreciate having samples, even if different. Data can be provided by mail on disk or via ftp transfer. FTP transfers should be limited to less than 40 Mb. Please contact JOSS to make arrangements for larger transfers.

The mailing address is:
UCAR/JOSS
Attn: Jim Moore
P.O. Box 3000
Boulder, Colorado 80307
For Ftp transfers:
ftp ftp.eol.ucar.edu
User Name: anonymous
Password: your e-mail address
cd pub/incoming/lakeice
bin (switch to binary)


3.3 Lake-ICE/SNOWBAND Data Collection During the Field Phase

Data collection will begin on 20 November, 1997 and conclude approximately 25 January 1998. JOSS will have primary responsibility for coordinating receipt of research-quality operational datasets during this period. Other data and metadata for complete documentation of field season activities (i.e. status reports and mission summaries) will be collected from the PIs by JOSS as operations dictate. All this information along with selected research and operational datasets will be entered into the on-line catalog on a near real-time basis. Further details of the make-up of these datasets are available in sction 4.0



3.3.1 On-line Catalog

JOSS will develop and maintain an on-line data catalog that will be functional during the Lake-ICE/SNOWBAND field phase. The catalog will be implemented using a WWW interface and will be accessible by all participants during and after the field phase. Data collection information about both operational and research datasets (including metadata and overview documentation) will be entered on the system in near real-time beginning 27 November 1997. The catalog will permit data entry (data collection details, field summary notes, certain operational data etc.), data browsing (listings, plots) and limited catalog information distribution. Daily summaries will be prepared and contain information regarding operations (aircraft flight times, major instrument systems sampling times, etc.). These summaries will be entered into the on-line catalog either electronically or manually. It is important and desirable for the PIs to contribute graphics (e.g. plots in GIF or Postscript format) and/or data for retention on the catalog whenever possible. Updates of the status of data collection and instrumentation (on a daily basis or more often depending on the platforms) will be available. Input requirements for the on-line catalog used during the field season for status updating are discussed in Section 4.2.



3.4 Data Processing after the Field Season

It is important that all Lake-ICE PIs concentrate on post field season data processing activities to assure timely availability of datasets to all investigators. The PIs will have complete responsibility for the processing and delivery of their data to the various archive locations at the conclusion of the field phase. All operational data will be staged to CODIAC and freely accessible by the community as soon as possible after the field season.

The impact of timely receipt of the data on further steps in the data processing scheme is summarized with the time line in Fig. 1. The "preliminary" data will be in "native" resolution and format (though ASCII would be preferred), that is, in the format and resolution the PI produces in their initial data processing. It is hoped that most preliminary research and all operational datasets will become available within 9-10 months of the end of the field program. Between the field phase and the time the PI submits data to the archive, each PI will be individually responsible for the distribution and support of their datasets.



3.5 Data Archival and Long-term Access

The Lake-ICE operational data sets will be archived and distributed through CODIAC. This archive will contain all operational data that will eventually be accessible by the general community. JOSS has the responsibility for getting operational datasets into a long term archive. Again, the CODIAC system (section 3.1) will be used for access, browse, and distribution of the data. As directed by the Lake-ICE Science Team, research datasets will be available, on a restricted basis, as PIs provide processed data. As shown in Fig. 1, JOSS intends to have all data for which it has responsibility on the CODIAC system and accessible to all Lake-ICE PIs within 6 to 12 months of the completion of the field phase of the project.



4.0 Lake-ICE/SNOWBAND Datasets



4.1 Operational Data Collection and Processing

The following high resolution datasets will be collected during the field project (Table 1 lists data resolution and sources for these datasets). Some of this information will be made available in real time and all will be accessed via CODIAC after the end of the field season.

1. 5 minute ASOS and AWOS data from commissioned, non-commissioned and 20 minute AWOS data from Federal/non-Federal sites from the domain specified by the PIs. The estimated number of stations is 320 ASOS and 220 AWOS that would be included in the dataset. The area of coverage is from 37-55 degrees north and 74-100 degrees west. Data will only be taken during PI pre-selected IOP periods.

2. Canadian surface station available via the GTS in the region north and east of the Great Lakes. There are estimated to be 30-40 stations in this domain.

3. National 6 second vertical resolution upper air dataset. These data will be retrieved from NCDC after the fact. There are 70 stations taking twice per day observations. This results in a total of 9800 launches plus another 500-600 project research soundings.

4. A regional high resolution satellite sector that covers the Lake-ICE/SNOWBAND domain. Original digital satellite files (McIDAS AREA format) will be archived by SSEC during the project. These data will then be transferred to JOSS after completion of the project for archival. The domain is similar to that of the surface data.

5. National Profiler Network data will be retrieved for the region of interest through established links at NCDC. Data are available in 6-minute and hourly time resolution.

6. ACARS data will be collected by JOSS from the NOAA Forecast Systems Laboratory and available to the project for the analysis phase. It will include all aircraft that produce ACARS data at any altitude within the region of interest.

7. WSR-88D archive level II data will be made available through NCDC. JOSS is working with NCDC to set up a procedure for requesting a single copy of project related data from the PIs and then redistributing the tapes to all interested project investigators.

8. NIDS products from WSR-88D radars in the western great lakes region will be collected at JOSS and at the University of Illinois.



There are two major post processing efforts that have been requested for Lake-ICE. These efforts will result in composite datasets that reduce duplication of effort by the investigators during the analysis phase. The first composite involved the merging and quality control of surface station data from the US and Canada into a surface composite dataset. It will consist of approximately 1000 stations and include 5 min, 20 min and hourly data from ASOS, AWOS and Canadian stations. It will also include a number of stations currently being retrieved for the GCIP project. The dataset will cover all selected IOP periods encompassing the Lake-ICE field phase. Three separate datasets will result at 5, 20 and 60 minute time resolution. Documentation will be provided including station lists and quality control flag definitions.

The second composite will combine data from project sounding sites (9) and the national upper air network (about 70 stations in the contiguous U.S.) for the 70 day period encompassing the project. Data will be reformatted into the JOSS QCF ASCII format and automatic quality control checks will be run. The National upper data will require format conversion as well as the generation of wind information. Documentation will be provided including site information listing and a map.

All data collected by JOSS, the composites described above and the NIDS data collected by the University of Illinois, and the Satellite data collected by SSEC will be staged on CODIAC and made available via the JOSS data management system. Data will be accessible by date, station or location to meet PI requirements.





Table 1: High Resolution Operational Datasets collected by JOSS during the Lake-ICE/SNOWBAND Project
Dataset Data Resolution Project Data Source
Composite Rawinsonde Data - Entire USA plus field operations sites All soundings launched between 12/1/97 - 1/26/97

Vertical resolution - 6 sec or better

Lake-ICE

SNOWBAND

JOSS - ingest

NCAR/SSSF

PSU

Network Profiler Data Winds, Control, Surface, and Moments Data

Time: 6min and 1hr

Lake-ICE

SNOWBAND

NCDC
ACARS All Data between 12/1/97 - 1/26/97 Lake-ICE

SNOWBAND

JOSS - ingest
ASOS/AWOS Spatial: Wi,Mi,Il,In,Ia,Mn,

Mo,Oh,Pa,Ny

Time: 5 min/20 min

Lake-ICE

SNOWBAND

JOSS - ingest
NEXRAD level II Archive Radars: MKX,GRB,LOT,

GRR,MQT,APX, ILX, DVN, ARX, DTX, IWX, DMX,

MPX, DLH, CLE

Time: All data during IOP events

Lake-ICE

SNOWBAND

NCDC
NIDS products Products: Regional composites, lowest 4 elevations dz,ve for all radars mentioned above plus individual VADs, echo tops Lake-ICE

SNOWBAND

JOSS - ingest

U. of Illinois

GOES8 Regional VIS Spatial: 1 km

Time: 15 min

Lake-ICE

SNOWBAND

SSEC
GOES8 Regional IR Spatial: 4 km Lake-ICE

SNOWBAND

SSEC
GOES8 Regional WV Spatial: 8 km Lake-ICE

SNOWBAND

SSEC
Supplemental midwest data from GCIP project Variable

TBD based on PI needs

Lake-ICE

SNOWBAND

JOSS and others?
Canadian RAOB and Sfc data 6 sec RAOBS

Hourly - Sfc

Lake-ICE

SNOWBAND

JOSS




4.2 Research Data Collection

Table 2 describes research datasets that are being collected for the Lake-ICE/SNOWBAND project. This table was constructed from PI responses to a questionnaire circulated by JOSS along with other information regarding NCAR operations during the project.



Table 2: High Resolution Research Datasets being collected (PI/facility responsible) during the Lake-ICE/SNOWBAND Project
Instrument Parameters measured Field Format Archive Format Real-Time products PI Responsible
NCAR Electra in-situ sensor suite Hundreds, see RAF Electra Bulletin netCDF? NetCDF? Selected Time-Series RAF

George Young

ELDORA reflectivity,

velocity, ?

? ? ? RAF
UW King Air ? ? ? Selected Time-series? UWyo

UWyo

NCAR ISS Temperature,

dewpoint, wind profiles

CLASS sounding format, netCDF CLASS sounding format, netCDF skewT, time-height x-sections of wind, temperature ATD
NCAR CLASS systems in Canada Temperature,

dewpoint, wind profiles

CLASS sounding format CLASS sounding format skewT ATD
RADARSAT SAR Imager Radar backscatter from Lake Michigan ASCII ASCII SAR Images of Lake Michigan Pierre Mourad
TSI 3068Aerosol Electrometer Current binary ASCII --- Cindy Twohy?
CSU Continuous Flow Diffusion Chamber System Ice nucleating aerosols and total aerosol

(CN)

? ? --- Dave Rogers
PSU Cloud Observing System Reflectivity, cloud-base, temperature, dewpoint, wind profiles ? ? skewT

reflectivity profiles

Hans Verlinde
Pump water vapor sampling --- --- --- Elen Cutrim