Automatically Downloading Data from the ARM Archive Using Mirror
Updated: 15 September 2000
WARNING: The
ftp.pl_wupatch patch (available at http://sunsite.org.uk/packages/mirror/ftp.pl_wupatch)
must be applied to your version of mirror in order to communicate with
the ARM Archive FTP site.
Data ordered from the ARM Archive, particularly Standing Order data, can be automatically downloaded from the ARM Archive FTP site using mirror, a Perl script available at http://sunsite.org.uk/packages/mirror/. irror is available for both Unix-like and Microsoft® WindowsTM platforms. Provided below are sample package files which can be modified slightly to meet the needs of ARM data users. Data users must either order data with one of the WWW-based interfaces at the ARM Archive or have established Standing Orders which will make requested data available on a weekly or monthly basis. The data are available on the ARM Archive FTP site only after e-mail notification has been provided back to the data user.
Mirror is typically used to keep FTP sites synchronized with each other. It operates by contacting a remote FTP site, downloading new or changed files (i.e., new files or files with changed dates or sizes), creating legitimate symbolic links, and deleting files no longer contained on the remote FTP site. This perfect synchronization is not usually desired for the present purpose so data users will need to specifically turn off the delete feature of mirror. Mirror typically operates recursively; therefore, it keeps entire directory trees synchronized.
Below is a minimal package file which will cause mirror to
download any files not already contained in or below the local
directory (local_dir) and any files which have sizes or
timestamps which differ from files with the same name contained in or
below the local directory (local_dir). Upon completion,
mirror sends a list of files transferred to the e-mail address provided
in mailt_to.
package=arm-data # This can be anything or left out altogether. comment=A mirror of my ARM data # This is the ARM Archive FTP site. DO NOT CHANGE THIS! site=ftp.archive.arm.gov # Providing one's real e-mail address as the password can be helpful # for debugging problems in communicating with the ARM Archive. # Change this to your real e-mail address. remote_password=forrest@esd.ornl.gov # Location from whence to pull files. # This should be /armguest/archive_user_name remote_dir=/armguest/forrest # Location in which to put the files on my machine. # Change to desired local directory name. local_dir=/home/forrest/arm-data # # If you are under Wind*ws then use a line like this instead: # local_dir=c:\tmp\mirror # # Don't delete anything locally. DO NOT CHANGE THIS! do_deletes=false # Notify me when mirror has done its thing (may only work on Un*x). # Change this to your real e-mail address. mail_to=forrest@esd.ornl.gov
Very often a data user will move datasets to other directories for
analysis. If mirror is run again after data have been moved on the
local system and while these data are still available on the ARM
Archive FTP site, mirror will download these files again. In order to
avoid this behavior, mirror could be run on a scheduled basis, possibly
by cron on a Unix-like system, with the max_days parameter
set to limit the downloads to files which have dates more recent than
max_days prior to the present. For instance, adding the
following line to the package file above will cause mirror to download
only those files which have timestamps within the past seven days.
# Download files added in the last week. max_days=7
This feature is particularly useful for data users which have
established Standing Orders. Such data users which have requested
weekly retrieval Standing Orders could set max_days to a
value of 7.
On Unix-like systems, cron can be used to automatically schedule the retrieval of data from the ARM Archive FTP site. The following line, when placed in a user's crontab file, will cause cron to execute mirror with the specified package file every day at 12:00 noon.
0 12 * * * /usr/local/bin/mirror $HOME/arm-data.package
Data users which have established weekly retrieval Standing Orders could add the following line in their crontab files so that mirror will be executed every Wednesday at 12:00 noon.
0 12 * * 3 /usr/local/bin/mirror $HOME/arm-data.package
Likewise, users which have estalished monthly retrieval Standing Orders could add the following line in their crontab files so that mirror will be executed the first Wednesday of every month at 12:00 noon.
0 12 1,2,3,4,5,6,7 * 3 /usr/local/bin/mirror $HOME/arm-data.package
For additional information about using mirror in conjunction with the ARM Archive, contact Forrest Hoffman (forrest@esd.ornl.gov).


