Overview of ARM Archive User Interface
(2/5/98; revised 11/22/2004 Giri Palanisamy and Raymond A. McCord)
The ARM Archive contains more than 2,200,000 user accessible data files formatted in more than 2000 types of data streams. (The total data volume of the Archive is more than 6,500,000 files and 38 terabytes.) The user interface for the Archive is designed to facilitate the identification of specific ARM data files that should be retrieved for a data user's request without going through numerous, very long, lists of obscure filenames. The magnitude of the ARM data collection requires that data be stored in a Mass Storage System (MSS: a collection of computers and automated tape libraries containing 1000's of tape cartridges). Because the data files are not 'on-line', the user interface processes 'directory' information from an on-line database to identify the availability of data files. A schematic of the Archive can be seen at this link. Secondary processing by the Archive computers copies requested data files from the MSS to an accessible FTP site. Users are notified by e-mail when all requested files are available at the FTP site. Accessibility to the data files is completed when the user has copied the files via FTP to their own system. Processing of requests greater than 600 MB is suspended until the Archive staff confirm the availability of online storage.
The following sections provide addition information on:
- The computing capabilities needed to access the Archive and use ARM data
- A logical overview of the Archive User Interface
- User Interface choices
- Data Browser Interface
- Catalog Interface
- IOP Data Browser
Presumed computing capabilities by Archive interface users
Users of the ARM Archive interface and retrieved data files are presumed to have the following computing capabilities:
- A WWW browser (the interface is designed and tested for Netscape 4.0 or higher; other web browsers appear to be okay as well)
- required to view the user interface
- helpful for accessing ARM documentation (http://www.arm.gov/)
- An e-mail address (required for retrieval notifications)
- tools for Internet file transfer by way of FTP
- very large requests can be transmitted by tape (contact armarchive@ornl.gov for assistance)
- system acceptance of long filenames (ARM filenames range from ~20-64 characters)
- netCDF or HDF tools
- compilers [C or Fortran] for incorporating public domain subroutines into user written software or
- commercial applications for analyzing netCDF (e.g., IDL, MATLAB)
Logical Flow of the User Interface
The logic of the user interface includes the following steps:
- Login to interface
- This step enables the interface to track your request specifications and notify you when your files are retrieved.
- specify your username, if you have previously registered
- register a username, if you are a new user
- we need to know an e-mail address for notification of successful file retrievals.
- name, address, and phone number also provides important information for contacting you and characterizing the ARM data user community.
- Review request status or specify new request
- Select Interface type
- Data Browser Interface
- Specify files to be requested with exact specifications for site, date range, instrument or measurement type, and facility.
- Catalog Interface
- Browse tables of data availability summarized by location, year, instrument type, etc. and select data in monthly increments
- IOP Data Browser
- Review Intensive Operational Period (IOP) data stored in an online, documented directory tree and download files individually or build collections of files as a TAR file.
- Select ARM data
- enter query specifications in data browser interface
- select entries from the catalog interface
- download or "check" items in IOP data browser
- Review data selection results and submit retrieval request
- Each interface displays and estimate of the number of files and bytes contained in the request
- Review
- Specify additional requests or logoff the interface
- This is the end of an interactive session with the user interface
- Users are notified by e-mail when the requested files are accessible from online storage.
- A secondary computer program supervises the copying of the requested files from the Mass Storage System to the user accessible FTP storage.
- Requests greater than 600 MB (sum of file sizes) are suspended until the availability of FTP storage is confirmed by Archive staff.
- The time required to complete the retrieval of files from the MSS depends on:
- The number of files requested (e.g., 100 -1000 files may require a few hours to complete; 1000 - 6000 files may require a few days)
- The number of other requests pending in the retrieval 'queue'.
- Review data notifications
- description of data quality report system
- request for credit and publications
- Use FTP to download data files (follow link in notification message)
- connect to ftp.archive.arm.gov
- enter username: armguest
- enter email address as FTP password
User Interface choices
The Archive provides three online user interfaces for the specification of files that need to be accessed by a data user. The user interfaces accomplish the same function - facilitate user access to the data files -, but support complementary solutions to finding the files that you want from the 1,000,000+ files stored in the Archive. Summary descriptions of the user interfaces are:
- Data Browser Interface
- Identifies available data files from exact specifications of site, date range, instrument or measurement type, and facility, etc
- The Data Browser Interface provides an overview of ARM data quality. It displays daily quality color (green, yellow, red) for user specified subsets of sites, facilities, measurements and date ranges.
- The Data Browser can also provide detailed information about Data
Quality Reports and quick looks for user specified search criteria
- Catalog Interface
- Supports browsing of summary tables (by combinations of year, site, data source, etc.) about file availability and the specification of data requests in one month increments
- IOP Data Browser
- provides access to IOP data stored in an online, documented directory tree.
More information about these interfaces are provided in the sections below. Assistance with requests for data can also be submitted to the Archive User Services (email: armarchive@ornl.gov or phone 1-888-276-3282 or 1-865-241-4851).
Data Browser Interface
The identification of the requested data files is determined from a query to an online database representing the 'directory' of available files. Requested files are typically identified from queries related to site, time, instrument or measurement or data stream, and facility. Besides ordering files, users can view data quality information (such as Data Quality Report, Data Quality Color Calendar, Quick Looks) for the selected data streams and date ranges. The queries for user-defined selections of files are based on the following three logical pathways
1) Novice User Interface (Show Figure):
- Site:
- data must be selected from one geographic site per request
- Date range:
- starting and ending dates for the query must be specified
- This is explicitly specified by the user by way of month, day, and year selections for each date criteria.
- Search Path:
- Instruments or Measurements
- Instruments or Measurements Category:
- One or more categories can be selected
- Instruments or Measurements:
- one or more Instruments or Measurements can be selected within the selected category
- Facilities:
- List of facilities are displayed based on the previous selection criteria (specific to site, date range, category, and instruments or facilities)
- One or more facilities can be selected from all the available facilities
- Files to order:
- A list of files is displayed based on the selected search criteria
2) Power User Interface (Show Figure):
(power user interface is equivalent to the data streams options found in the previous Query Interface application)
- Site:
- data must be selected from one geographic site per request
- Date range:
- starting and ending dates for the query must be specified
- This is explicitly specified by the user by way of month, day, and year selections for each date criteria.
- Data Streams:
- List of available data streams based on the selected site and date range are displayed
- Files to order:
- A list of files is displayed based on the selected search criteria
3) Measurement Interface (Show Figure):
- Site:
- data must be selected from one geographic site per request
- Date range:
- starting and ending dates for the query must be specified
- This is explicitly specified by the user by way of month, day, and year selections for each date criteria.
- Recommended Measurements:
- One or more measurements can be selected from the List of recommended measurements
- Data Streams:
- One or more data streams can be selected from the list of available data streams (data level wild cards)
- Facilities:
- One or more facilities can be selected from all the available facilities
- Files to order:
- A list of files is displayed based on the selected search criteria
Additional information about these query options is provided in the table below.
| Query option | type of logic | user efficiency | User actions | limitations |
| Novice Interface: | ||||
| Instrument | indirect
background filtering of the potential data stream list from user selected criteria for site, date range, instrument categories, instruments, facilities and highest data level |
high: when searching for data
from specific instruments
low: when selecting data for a diversity of instruments |
selects site, date range, instrument categories, instruments, and facilities | lengthy list of instrument
names
presumes knowledge of the instrument's measurement capabilities |
| Measurement | secondary, indirect
background filtering of the potential data stream list from user selected criteria for site, date range, measurement categories, measurements, facilities and highest data level |
high: when searching for many
possible variations of a
measurement type
low: when searching for diverse set of unrelated measurements |
selects site, date range, measurement categories, measurements, and facilities | lengthy list of measurement
names
availability of measurements is confounded by site, date, facility, and data level criteria |
| Power Interface: | ||||
| Data streams | indirect
background filtering of the potential data stream list from user selected criteria for site, date range |
high: when search for a few
specific data stream types
low: when selecting a diversity of data stream types |
selects site, date range, data stream names | presumes a working
knowledge of ARM data
stream name codes
requires scrolling a VERY long list of data stream names |
| Measurement User Interface: | ||||
| Recommended Measurement |
indirect
background filtering of the potential data stream list from user selected criteria for site, date range, recommended measurements, data streams, and facilities. |
high: when searching for many
possible variations of a
measurement type
low: when searching for diverse set of unrelated measurements |
selects site, date range, recommended measurements, data streams, and facilities | This work is in progress and for demo
only The judgments are made by ARM experts. Common measurements with multiple sources are focused primarily. This work is in progress, and the list of recommendations incomplete |
Catalog Interface
The catalog based user interface presents, in an interactive sequence of tables, a hierarchical summary of available data files ( see Figure 1) organized in a way
that will be useful to the inexperienced, as well as the expert Archive user. In addition to leading the user to specifying a subset of data, the intent of the catalog
is also to display the availability of the data. The availability of data is irregular in time and space because of incremental changes in the installation and operation
of the field sites (points of data generation). The content of the table's cell values indicates the quantity of available data (number of files) within the criteria
represented by each cell. Criteria combinations for which data are available contain cell values greater than 0 and are linked to the next subset levels.
Combinations containing no data display '0' and are not linked.
The navigation catalog metadata is combined with a "shopping cart" concept for collecting file sets of particular interest. At any level, the user may view the
contents of the shopping cart, remove file sets from the shopping cart, or submit the list for retrieval from the Archive.
Description of the Interface
The ARM Archive catalog interface consists of two major components: 1) a catalog of available data files organized in a four level hierarchy, and 2) a shopping
cart collection scheme that allows the user to store, edit and display a list of selected file sets. The interface programs display a sequence of linked HTML tables
that allow the user to move through the various catalog levels, converging to desired sets of files. The hierarchy includes links to tables for increasingly narrow
subsets of the data collection (see Figure 1). An example sequence of linked interface tables is shown in Figure 2. Selecting a value in each table leads to a table
showing more detail in the next step. In the fourth step, a small subset of data may be selected for addition to the Shopping Cart. Each interface screen contains
information displaying the previous selection criteria and links to re-visit the earlier screens or the Shopping Cart (see Figure 3). This section describes the user
interface at each of these levels.
Following a login screen, the top level of the interface presents the number of files available in the Archive grouped by site and year (Figure 4). The user selects
a site and year by clicking on the corresponding number of files in the table, assuming the number is nonzero.Selecting the Site and Year
This selection takes the user to the second level, Figure 5, which displays all instrument categories and types of facilities from which ARM data were collected
for the site and year chosen on the previous page. From this level an instrument category and facility type are chosen by clicking on the number of files in the
appropriate cell of the table. Alternatively, the user may return to level 1 (to change the previous selection) by clicking on "Year" or "Site" at the top of the page.Selecting the Instrument Category and Facility Type
The third level (Figure 6) lists the number of available files by instrument code and data level, for the previously selected combination of site, year, instrument
category and facility type. The data level reflects the amount of processing done on raw data. Instrument and data level codes are briefly described below the
table. Again, options are available to return to levels 1 or 2 via links at the top of the page.Selecting the Instrument and Data Level
The final level (Figure 7) in the hierarchy of metadata attributes allows the user to select file sets by facility and month, or return to one of the previous three levels.Selecting the Facility and Month
If a facility and month are chosen (by clicking on a nonzero number of files in the table), the user is transferred to a screen (Figure 8) displaying a summary of the
attributes chosen from levels 1-4, together with the number and total size of the data files. At this point the user may elect to add these files to the shopping cart
or simply return to any of the previous interface levels. Adding the set of files to the shopping cart returns the user to level 4 (number of files by facility and month).Saving Files in the Shopping Cart
From any of the four interface levels an option (Figure 9) to "View Shopping Cart" is available. This option presents a table summarizing the codes for all of the
file sets in the shopping cart and the total number and size of the files. The shopping cart may be modified from this page by clicking "Remove" for a particuler
set of files. When the user is satisfied with a collection of file sets, clicking "Submit Request to Archive" will submit the request and exit the interface. Clicking
"Return" will return the user to the screen from which"View Shopping Cart" was chosen so that additional selections can be made or other portions of the
catalog can be viewed. The catalog interface enables the ARM researcher in efficiently to identify files of interest, determine the existence of data, and collect sets of data prior to
submitting a retrieval request. Important aspects of the system described here include the assignment of descriptive instrument categories and the dynamic
explanation of instrument codes. Collection of data sets is currently done at the facility/month level. The collection (shopping cart) may be listed and edited from
any level.
Viewing the Contents of the Shopping Cart
Summary and Discussion
IOP Data Browser
Background
IOP Intensive Operational Periods (IOPs) generate data that are "non-routine" because they originate from extra or guest data sources. The data may also be "non-routine" because the instruments are operated with temporary, experimental (non-production) protocols. All of these exceptions from normal operations causes significant "clutter" in the metadata and logic used in the query and catalog interfaces. Constraining the structure of the IOP data to follow the simple logic required to successfully manage the 1,000,000+ ARM data files, challenged the creativity of the ARM data managers and frustrated the IOP data generators (who are often guest collaborators with ARM and are not (or should not) fully indoctrinated with ARM-specific data management practices. The IOP Data Browser is also used for storage and access of reference data sets (e.g., geographic overlays of states, rivers, etc. for satellite images) and special data (e.g., preliminary versions of VAP output).
The IOP Data Browser was designed to provide the following features:
- It presented enough structure so that potential data users could follow an understandable path to identify and access the IOP data sets.
- It allowed for considerable flexibility in how the data were structured within an IOP.
- Minimal rules about names for sub-directories and files within each IOP
- Every subdirectory has a "readme" explaining its contents. The specifications for the readme are minimal, but links to more extensive web-based documentation are allowed.
- Minimal expectations for IOP data to follow similar naming or documentation
- It enables users to select and download a few individual files or a few individual sub-directories
- It enables the Archive to track "who accessed which data when" for reporting and update notification purposes.
Description
The IOP Data Browser contains a documented, online directory tree of IOP data. The IOP data are organized in a hierarchy of year / site / IOP / insturment - PI subdirectories. Additional subdirectories may be used within an IOP. Each subdirectory has a "readme" file to guide the user through that level's information. Data from IOPs may be downloaded as individual files by clicking on each file link. If the user needs to download large portions of IOP data (multiple files or subdirectories), a "check box system" (described in the outline below) can be used to select files and directories to be built into a single TAR file for download. The creation of the TAR file occurs after the end of an IOP browsing session and the user is notified by email when the TAR file is ready to download.
The IOP Data Browser presents a 3 section display:
- The top section displays the contents of the readme for the current subdirectory.
- This readme may link to additional information at other web sites generated or referenced by the IOP participants.
- The primary ARM documentation about IOPs is contain in a series of web pages located at: http://www.arm.gov/docs/iops.html
- The ARM documentation has a directory structure that is similar to the one used for the IOP data
- Other web sites may be visited without losing your place in the IOP data structure.
- The middle section shows a traditional browser-based directory and file list than can be used to navigate the data collection.
- The top of this section shows the current directory path for "where am I".
- The main portion of this section lists directories and files within the current directory.
- Users may click on directory links to navigate to lower levels.
- Users may click on file link to open or download individual files.
- For some formats (e.g., netCDF), other information about the data files maybe displayed.
- Very large data files (e.g., cloud radar, WSI, etc.) may be stored in the Mass Storage System of the Archive.
- The readme information for these files will include information on how to find these IOP data in the Archive.
- Each directory or file link displayed has a "check box" on the left side to select data to be added to a TAR file.
- Clicking the check box for a file will add the file to a TAR file.
- Clicking the check box for a directory will add the entire contents of the directory (including the contents of lower subdirectories and files) to the TAR file.
- After sub-trees of the directory have been "checked", lower level files and sub-directories maybe unchecked as needed to specify the exact collection of IOP data to be included in the TAR file
- The bottom section shows information and options about the TAR file being specified for downloading multiple files and directories
- Lists of included directories are displayed (and can be removed as needed)
- Lists of excluded directories are displayed (and can be removed as needed)
- Option for "zipping" the TAR file can be selected
- Control buttons for submitting the request for TAR construction are located in the section.
Access and login to the IOP Data Browser
The IOP Data Browser can be access after a login to the Archive User Interface; or it can be accessed directly at http://iop.archive.arm.gov/arm-iop/. (The IOP Data Browser can also be accessed from links located throughout ARM IOP documentation; see web page located under http://www.arm.gov/docs/iops.html). All attempts to access IOP Data Browser will request a web login requiring the entry of a username and password. The user should enter their Archive account name for BOTH the username and password. Although this login appears to be redundant, it enables the Archive record the user access of each file. The records of access are important for distributing notifications about future updates to IOP data and reporting statistics on the usage of IOP data.


