Arm Archive User Interface

 

Overview

The ARM Archive contains more than ???2,200,000 user accessible data files formatted in more than ???2000 types of data streams. (The total data volume of the Archive is more than ???6,500,000 files and ???38 terabytes.) The user interface for the Archive is designed to facilitate the identification of specific ARM data files that should be retrieved for a data user's request without going through numerous, very long, lists of obscure filenames. The magnitude of the ARM data collection requires that data be stored in a Mass Storage System (MSS: a collection of computers and automated tape libraries containing ???1000's of tape cartridges). Because the data files are not 'on-line', the user interface processes 'directory' information from an on-line database to identify the availability of data files. A schematic of the Archive can be seen here. Secondary processing by the Archive computers copies requested data files from the MSS to an accessible FTP site. Users are notified by e-mail when all requested files are available at the FTP site. Accessibility to the data files is completed when the user has copied the files via FTP to their own system. Processing of requests greater than 25 GB (~10,000 files) is suspended until the Archive staff confirm the availability of online storage???.

The following sections provide additional information on:

  1. The computing capabilities needed to access the Archive and use ARM data
  2. A logical overview of the Archive User Interface
  3. User Interface Choices

 

Presumed Computing Capabilities By Archive Interface Users

Users of the ARM Archive Interface and retrieved data files are presumed to have the following computing capabilites:

 

Logical Flow of The User Interface

The logic of the user interface includes the following steps:

  1. Login to interface
    • This step enables the interface to track your request specifications and notify you when your files are retrieved.
    • Specify your username or email address, if you have previously registered
    • If you are a new user, register a username
      • We need to know an e-mail address for notification of successful file retrievals.
      • Name, address, and phone number which provides important information for contacting you and characterizing the ARM data user community.
  2. Review request status or specify new request
  3. Select Interface type
    • Data Browser Interface
      • Specify files to be requested with exact specifications for site, date range, instrument or measurement type, and facility.
    • Catalog Interface
      • Browse tables of data availability summarized by location, year, instrument type, etc. and select data in monthly increments.
    • Statistical Browser
      • Browse a series of drill-down statistical graphs for showcase datasets with the option to extract more statistical information or order ARM data files.
    • IOP Data Browser
      • Review Intensive Operational Period (IOP) data stored in an online, documented directory tree and download files individually or build collections of files as a TAR file.
  4. Select ARM Data
    • Enter query specifications in data browser interface
    • Select entries from the catalog interface
    • Download or "check" items in IOP data browser
  5. Review data selection results and submit retrieval request
    • Each interface displays and estimates the number of files and bytes contained in the request
  6. Specify additional requests or logoff the interface
    • This is the end of an interactive session with the user interface
    • Users are notified by e-mail when the requested files are accessible from online storage.
  7. A secondary computer program supervises the copying of the requested files from the Mass Storage System to the user accessible FTP storage.
    • Requests greater than 25 MB (sum of file sizes) or 10,000 files are suspended until the availability of FTP storage is confirmed by Archive staff.
    • The time required to complete the retrieval of files from the MSS depends on:
      • The number of files requested (e.g., >5000 files may require a few hours to complete)
      • The number of other requests pending in the retrieval 'queue'.
  8. Review data notifications
    • Description of data quality report system
    • Request for credit and publications
  9. Use FTP to download data files (follow link in notification message)
    • Connect to ftp.archive.arm.gov
    • Enter username: armguest
    • Enter email address as FTP password

 

User Interface Choices

The Archive provides five online user interfaces for the specification of files that need to be accessed by a data user. The user interfaces accomplish the same function - facilitate user access to the data files -, but support complementary solutions to finding the files that you want from the 5,000,000+ files stored in the Archive. Summary descriptions of the user interfaces are:

More information about these interfaces are provided in the sections below. Assistance with requests for data can also be submitted to the Archive User Services (email: armarchive@ornl.gov or phone 1-888-ARM-DATA or 1-865-241-4851).

 

Data Discovery

The identification of the requested data files is determined from a query to an online database representing the 'directory' of available files. Requested files are typically identified from queries related to site, time, instrument or measurement or data stream, and facility. Besides ordering files, users can view data quality information (such as Data Quality Report, Data Quality Color Calendar, Quick Looks) for the selected data streams and date ranges. The queries for user-defined selections of files are based on the following three logical pathways.

  1. ???Novice Interface (Show Figure)
    • Site:
      • Data must be selected from one geographic site per request
    • Date Range:
      • Starting and ending dates for the query must be specified
      • This is explicitly specified by the user by way of month, day, and year selections for each date criteria.
    • Search Path:
      • Instruments or Measurements
    • Instruments or Measurements Category:
      • One or more categories can be selected
    • Instruments or Measurements:
      • One or more Instruments or Measurements can be selected within the selected category
    • Facilities:
      • List of facilities are displayed based on the previous selection criteria (specific to site, date range, category, and instruments or facilities)
      • One or more facilities can be selected from all the available facilities
    • Files to Order:
      • A list of files is displayed based on the selected search criteria
  2. ???Datastream Interface (Show Figure)

    Datastream Interface is equivalent to the data streams options found in the previous Power User application.

    • Site:
      • Data must be selected from one geographic site per request
    • Date Range:
      • Starting and ending dates for the query must be specified
      • This is explicitly specified by the user by way of month, day, and year selections for each date criteria.
    • Data Streams:
      • List of available data streams based on the selected site and date range are displayed
    • Files to Order:
      • A list of files is displayed based on the selected search criteria
Additional information about these query options is provided in the table below. For a step-by-step tutorial on using the Data Browswer, click here.

 

Storage Location: Delivery Directories

Data files requested by Standing Orders begin accumulating in temporary storage very shortly (less than one day) after the order is placed. You will receive a notification (one e-mail message) at the end of each delivery period of each order. This notification will describe the delivery period, the number of files included in the delivery ("null" if no files are available), and a list of file names.

The notification will describe the location of the data files on ftp.archive.arm.gov. (This system is an FTP server that can be accessed in anonymous mode with username armguest and your e-mail as the password.) The notification will describe the data files as stored under a path of:

Storage Location: Holding Directories (For The Impacient)

The data files for the entire collection of standing orders are actually stored in central data structure on the FTP server. If you need access to data files that have not been delivered (e.g., part way through a week), you may access the temporary copy with the following steps:

  1. FTP to ftp.archive.arm.gov as username: anonymous, password: your e-mail address
  2. cd to the standing_orders directory
  3. cd to the sub-directory by data stream name (e.g., sgp1smosE13.a1)
  4. Transfer files (ftp get command) as needed
  5. NOTE: This area contains file names with version numbers and all versions of the data files. The delivery directories only contain the highest version of the data files and are named without version numbers.

 

???Novice Interface

Query Option Type of Logic User Efficiency User Actions Limitations
Instrument Indirect; Background filtering of the potential data stream list from user selected criteria for site, date range, instrument categories, instruments, facilities and highest data level High: when searching for data from specific instruments Low: when selecting data for a diversity of instruments Selects site, date range, instrument categories, instruments, and facilities Lengthy list of instrument names; Presumes knowledge of the instrument's measurement capabilities
Measurement Secondary, Indirect; Background filtering of the potential data stream list from user selected criteria for site, date range, measurement categories, measurements, facilities and highest data level High: when searching for many possible variations of a measurement type Low: when searching for diverse set of unrelated measurements Selects site, date range, measurement categories, measurements, and facilities Lengthy list of measurement names; Availability of measurements is confounded by site, date, facility, and data level criteria

???Datastream Interface

Query Option Type of Logic User Efficiency User Actions Limitations
Data Streams Indirect; Background filtering of the potential data stream list from user selected criteria for site, date range High: when searching for a few specific data stream types Low: when selecting a diversity of data stream types Selects site, date range, data stream names Presumes a working knowledge of ARM data stream name codes; Requires scrolling a VERY long list of data stream names
 

Catalog Interface

The catalog based user interface presents, in an interactive sequence of tables, a hierarchical summary of available data files (see Figure 1) organized in a way that will be useful to the inexperienced, as well as the expert Archive user. In addition to leading the user to specifying a subset of data, the intent of the catalog is also to display the availability of the data. The availability of data is irregular in time and space because of incremental changes in the installation and operation of the field sites (points of data generation). The content of the table's cell values indicates the quantity of available data (number of files) within the criteria represented by each cell. Criteria combinations for which data are available contain cell values greater than zero and are linked to the next subset levels. Combinations containing no data display zero and are not linked.

The navigation catalog metadata is combined with a "Data Cart" concept for collecting file sets of particular interest. At any level, the user may view the contents of the data cart, remove file sets from the data cart, or submit the list for retrieval from the Archive.

Description of The Interface

The ARM Archive catalog interface consists of two major components: 1) a catalog of available data files organized in a four level hierarchy, and 2) a data cart collection scheme that allows the user to store, edit and display a list of selected file sets. The interface programs display a sequence of linked HTML tables that allow the user to move through the various catalog levels, converging to desired sets of files. The hierarchy includes links to tables for increasingly narrow subsets of the data collection (see Figure 1). Selecting a value in each table leads to a table showing more detail in the next step. After the fourth step, data may be selected for addition to the Data Cart. The section below describes the user interface at each of these levels.

Instructions

For a step-by-step tutorial on using the Catalog Browser Interface, click here. The following steps include:

  1. Selecting The Site and Year

    Following a login screen, the top level of the interface presents the number of files available in the Archive grouped by site and year (Figure 2). The user selects a site and year by clicking on the corresponding number of files in the table, assuming the number is nonzero.

  2. Selecting The Instrument Category and Facility Type

    This selection takes the user to the second level, Figure 3, which displays all instrument categories and types of facilities from which ARM data were collected for the site and year chosen on the previous page. From this level an instrument category and facility type are chosen by clicking on the number of files in the appropriate cell of the table. Alternatively, the user may return to level 1 (to change the previous selection) by clicking on "Year" or "Site" at the top of the page.

  3. Selecting The Instrument and Data Level

    The third level (Figure 4) lists the number of available files by instrument code and data level, for the previously selected combination of site, year, instrument category and facility type. The data level reflects the amount of processing done on raw data. Instrument and data level codes are briefly described below the table. Again, options are available to return to levels 1 or 2 via links at the top of the page.

  4. Selecting The Facility and Month

    The final level (Figure 5) in the hierarchy of metadata attributes allows the user to select file sets by facility and month, or return to one of the previous three levels.

  5. Adding Files to The Data Cart

    After the selection of facility and month (by clicking on a nonzero number of files in the table), the user is then presented a summary of the selections (Figure 6) together with the number and total size of the data files. At this point the user may elect to add these files to the data cart, return to any of the previous interface levels to edit selections, or continue browsing. Adding the set of files to the data cart returns the user to the original catalog interface, with the selected data added to "Current Selections" (Figure 7) The user may then continue browsing and adding data selections to the Data Cart. Each time, the chosen datastream will be added under "Current Selections." To remove a data selection, highlight the selection and click "Remove Selected Streams."

  6. Ordering Selected Data Files

    When the user is satisfied with a collection of file sets, clicking "Proceed to Order" will bring up the selected data. The user may then select to "Select All" files, choose only certain files for ordering, or extract measurements (Figure 8). Clicking "Order Files" will submit the user's request for the selected files. An "Order Confirmation" will then be displayed (Figure 9).

Summary and Discussion

The catalog interface enables the ARM researcher to efficiently identify files of interest, determine the existence of data, and collect sets of data prior to submitting a retrieval request. Important aspects of the system described here include the assignment of descriptive instrument categories and the dynamic explanation of instrument codes. Collection of data sets is currently done at the facility/month level. The collection (data cart) may be listed and edited from any level.

 
 

Statistical Browser

 

Background and Description

The Statistical Browser (also referred to as "statistical views") currently consists of pre-computed products for nested time ranges (whole period of record, annual, seasonal, and monthly - as appropriate). For each time range and measurement, a variety of simple statistics are computed. Graphs of the statistical distribution of measurements (e.g., histograms) are also linked to the actual statistics displayed in the graphs. The graphs are available through a web-based interface. Users select a location and measurement and then drill down through times scales ranging from the full period of record to individual months. In addition to viewing graphs displayed by the user interface, users are able to extract the data behind the statistical graphs, obtain the measurements that were used in calculating the statistics, and order the ARM data files from which the measurements were obtained. This interface currently contains statistical views for showcase datasets.

Instructions

Steps include:

  1. Begin by selecting an ARM site for which to view available statistical plots and summaries. Current ARM sites available in the Statistical Browser are: SGP, NSA, TWP, and HFE.
  2. Next, select a dataset. Current showcase datasets available are:
    • ARM Surface Radiation Data (qcrad1long)
    • Climate Modeling Best Estimate Data (CMBE)
    • Long-Term Continuous Forcing Data from Variational Analysis (CONSTRVARANA)
  3. Select a facility from the list of those available.
  4. Select a measurement from the list of those available. This will display the available plot types for the selected measurement.
  5. Select a plot type from the list of those available. Plot types will vary based on dataset and measurement, ranging from daily to monthly to seasonal plots. Users can mouse-over the image in parentheses to see a description of each plot type. Clicking on the image will bring up a sample of that particular plot type.
  6. Select a date range by entering start month/year and end month/year, and click on "Get Plots" to view the plots.
  7. The plots will be displayed below the interface in thumbnail form. Users may click on any thumbnail image to view the detailed data plot and utilize additional features for accessing the data. These features are:
    • Get Statistics: Available in Text, Excel, and XML formats. After choosing a format, the statistics will be displayed.
    • Get Data for the Selected Range: Available in Text, Compressed Text(qz), Excel, and NetCDF formats. Once the format is chosen, a "Download Data" screen will appear while the request is being processed. This may take a few seconds. When the download is complete, follow the URL given to download the extracted measurement data.
    • Get ARM Data Files: Select "Add to Cart" or "List Files" to order the ARM Data files chosen by the user.
Users must login with their email address or Archive User ID to access these features. Some features are still currently under development. For a comprehensive list of known issues and future developments, click here.

 

IOP Data Browser

 

Background

IOP Intensive Operational Periods (IOPs) generate data that are "non-routine" because they originate from extra or guest data sources. The data may also be "non-routine" because the instruments are operated with temporary, experimental (non-production) protocols. All of these exceptions from normal operations causes significant "clutter" in the metadata and logic used in the query and catalog interfaces. Constraining the structure of the IOP data to follow the simple logic required to successfully manage the 5,000,000+ ARM data files, challenged the creativity of the ARM data managers and frustrated the IOP data generators (who are often guest collaborators with ARM and are not (or should not) fully indoctrinated with ARM-specific data management practices). The IOP Data Browser is also used for storage and access of reference data sets (e.g., geographic overlays of states, rivers, etc. for satellite images) and special data (e.g., preliminary versions of VAP output).

The IOP Data Browser was designed to provide the following features:

 

Description

The IOP Data Browser contains a documented, online directory tree of IOP data. The IOP data are organized in a hierarchy of year / site / IOP / insturment - PI subdirectories. Additional subdirectories may be used within an IOP. Each subdirectory has a "readme" file to guide the user through that level's information. Data from IOPs may be downloaded as individual files by clicking on each file link. If the user needs to download large portions of IOP data (multiple files or subdirectories), a "check box system" (described in the outline below) can be used to select files and directories to be built into a single TAR file for download. The creation of the TAR file occurs after the end of an IOP browsing session and the user is notified by email when the TAR file is ready to download.

The IOP Data Browser presents a 3 section display:

 

Access and Login to The IOP Data Browser

The IOP Data Browser can be accessed after a login to the Archive User Interface; or it can be accessed directly at http://iop.archive.arm.gov/arm-iop/. (The IOP Data Browser can also be accessed from links located throughout ARM IOP documentation; see web page located under http://www.arm.gov/campaigns). All attempts to access IOP Data Browser will request a web login requiring the entry of a username and password. The user should enter their Archive account name for BOTH the username and password. Although this login appears to be redundant, it enables the Archive record the user access of each file. The records of access are important for distributing notifications about future updates to IOP data and reporting statistics on the usage of IOP data.