The long-term repository for LIGO frame data is the HPSS (High Performance Storage System) archive at CACR, the Caltech Center for Advanced Computing Research. (Technically, "HPSS" refers only to the software, not to the tape robot and associated hardware, but that distinction is commonly glossed over.) In the past, the only available method to access data stored in HPSS was to use ftp to retrieve the raw tar files. This was inefficient, since the full data stream had to be transferred over the network even if one really only wanted to look at one or a few channels.
I have developed a new server program (called "LARS", for "LIGO Archive Retrieval Server"), running on a machine at CACR, to retrieve data from HPSS, strip out the desired channel(s), and return the output to the user. At present there is one client program, called "getFrames", which runs on the user's computer and communicates with LARS through the Data Flow Manager (which runs in the background on the user's computer). getFrames and the Data Flow Manager (dfm) are distributed as part of the LIGOtools package called "dataflow". Other client programs may be developed in the future.
A user request consists of a dataset name, a time interval, and a list of channels. For example:
getFrames -d //ligo/raw/lho/e2 -t 658000000-658000100 -c "H2:LSC-AS*" -o
This request causes getFrames to start the Data Flow Manager as a background process, then send the request parameters to it to be forwarded to LARS. LARS retrieves the appropriate file(s) from HPSS onto scratch disk, then runs a program to strip out the data for all channels whose names begin with "H2:LSC-AS". When this is finished, the Data Flow Manager retrieves the output file (which contains 100 frames in this case) over the network and writes it onto disk on the user's computer. It then sends the output filename back to getFrames, which prints it out and exits.
getFrames is able to retrieve data from the various engineering runs, as well as "recent" data (approximately the last 24 hours) from Hanford and Livingston, using the same request syntax but a different dataset name. In the latter case, the Data Flow Manager directs the request to the appropriate LDAS manager rather than to LARS. getFrames can also retrieve trend data (from either LHO or LLO) from the HPSS archive. The "Universal Dataset Names" (UDNs) currently handled by the Data Flow Manager, and thus accessible using getFrames, are listed at http://www.ldas-sw.ligo.caltech.edu/ligotools/dataflow/UDN_List.html.
As other datasets are entered into the archive, the plan is to make them available via the same user request syntax, though the underlying server software may change. For instance, eventually the data in HPSS may be served by the LDAS diskCacheAPI rather than by LARS. This change will hopefully be transparent to the user.
It it important to note that LARS is currently not very efficient. For instance, it typically delivers raw data at about 1/3 to 1/2 of real-time speed, e.g. it may take up to an hour to retrieve 20 minutes of data, regardless of how many channels are actually retrieved; and there is typically a 5-10 minute latency to retrieve even a small amount of data. (Second-trend and minute-trend data can be retrieved at up to about 7 and 400 times real-time speed, respectively, but the latency is about the same.) This is due partly to the time required to get data out of the tape robot, and partly to the slowness of the program which strips out the desired channels (since the DAQ currently produces frame files without the table-of-contents structure, and we wrap them up in a tar archive). Improvements in the latter could probably boost the total throughput by a factor of 2 or 3, but the way we currently store the data in the tape robot (as full frame files) will remain a fundamental limitation. Thus, wholesale copying of very long stretches of data is discouraged. Eventually, we will need to store the data in a different way if there is a demand for faster access to long stretches of data for an arbitrary set of channels.
Some other usage notes:
Please report problems or suggestions to Peter Shawhan