|
ldasJob { -name {} -password {} -email {} } { userCmd -opt1 {} ... }
Which is in the format of a Tcl command, ldasJob, with two required arguments:
Acquires input data from the frame API, metadata API, disk files
and URL's and performs mathematical transformations upon it using
the datacond API. The results of these transforms can be output
to disk or to URL's, or sent to the metadata API for ingestion into
the database.
Data extracted from the frame API is concatenated as possible by
default. This default behaviour can be modified by using the
-concatenate option with an argument of 0.
Data products can be sent to other API's by using the
output() action of the
-algorithms option.
Data objects recieved by other API's can often be
inspected by setting the ::DEBUG level in the appropriate
API('s) to 219 (0xdb), the data will then be placed in the
job result area of the LDAS installation.
Calling convention (all on a single line):
ldasJob { -name "username" -password "********" -email "user@foobar.edu" } { conditionData -inputprotocol {Tcl list} -returnprotocol URL -outputformat data_format -datacondtarget {API name} -framequery {Tcl list} -responsefunction {Tcl list} -responsefiles {Tcl list} -tarball {http or ftp URL} -aliases {Tcl list} -algorithms {Tcl list} -setsingledc {Boolean} }
Option Descriptions:
-inputprotocol:
http ftp file port
NOTE: Embedded spaces in the argument to the -inputprotocol
option will cause the request to fail.
-returnprotocol:
http ftp gridftp mailto file port
Default Value: http://results.ilwd
Note that in most cases this option should NOT be explicitly
set by the user, since it can have unexpected interactions
with the output() actions in the datacond API, for example.
The argument to -returnprotocol resembles the usual
browser conventions for URL's, and is used to control the
naming and disposition of output data from a user request.
The http://, ftp://, gridftp:/,
and file:/ URI types are currently supported.
The specific format of the argument, and possibly the values
of other arguments will affect the way that the
-returnprotocol argument is applied by the LDAS system:
Will cause the system to return the location of output
data as an ftp or http URL (the system default, in the
absence of a user supplied -returnprotocol option, is
to use http).
Will cause the system to return the location of output
data as an ftp or http URL (the system default, in the
absence of a user supplied -returnprotocol option, is
to use http). And any output object which is not
in frame format will be named according to the
suggested pattern. When possible, indexes will be
appended to the filename to differentiate between multiple
output files, but sometimes this can not be managed and in
that case as many levels of .ba* files will be created as
required to avoid overwriting data.
When this form is used and dirname corresponds to an
existing directory in the anonymous ftp area of the LDAS
system, all output files will be copied into that directory.
In some cases a filename can be specified and will be used;
in this case .ba* files may be created to avoid overwriting
of data. This option of -returnprotocol is currently is not supported. The output
e.g. frame or xml, is placed in the job output directory.
The user has to retrieve the output via http or ftp to
to his/her local system.
When this form is used the LDAS system will attempt to copy
the output data via anonymous ftp to the remote site. This option of -returnprotocol is currently is not supported. The output
e.g. frame or xml, is placed in the job output directory.
The user has to retrieve the output via http or ftp to
to his/her local system.
This is a special protocol used by the dataStandAlone user
command for interaction with a GLOBUS/GRID system.
This returnprotocol is only interpreted by the datacond API
for use on output data formatted for ingestion by a stand alone
wrapper API.
The file will be written to the local gridftp home directory
or a subdirectory of it which is writable by the user which
LDAS runs as.
Several modified forms of the gridftp URL are available:
as a useful shorthand notation for producing a directory
hierarchy of dataStandAlone results which maps directly to
the job directories normally produced by LDAS, allowing
simple correlation of output data with other job resuts or
messages from the LDAS system.
Then the directory hierarchy will be created as required
and unique job-specific default filenames will be assigned
to the datacond API's output for the wrapper, i.e.:
NOTES:
-outputformat: frame ilwd LIGO_LW
The argument of the -outputformat option is the data type to
use in formatting the result of the user request.
-framequery: {R H {} 666666666-666666669 Adc(1,2,5-15)}
The -framequery syntax now supports a complex
associative format which allows the simultaneous retrieval
of unrelated data elements from multiple sources.
The five fields of the complex framequery are:
The framequery option supports the slicing syntax Adc(channel!start!range[type]!)
or Proc(channel!start!range[type]!):
Note that the extent of the x-axis of a channel is not related
to the GPS time of the frame. The first value on the x-axis is given by
the startX field of the channel, and the extent is
given by the number of samples in the channel times the dx field
of the channel. While startX may have any value, it is usually 0,
although there are cases where it could be negative, such as for a
2-sided power spectrum.
The sliced data will contain all samples whose
x-coordinate satisfy start <= x < start + range.
An error will be generated if the start of the slice is less than
startX, or if the slice extends past the end of the channel.
Examples of slicing:
Suppose we are accessing a channel from frame H-R-600000000.gwf:
Adc(H2:LSC_AS-Q!0.3!0.4TIME!)
The ! syntax determines a start-time and range within the
channel (since startX is zero for time-series data,
the start-time may be interpreted as an offset from the GPS start-time
of the framequery).
The syntax requires that the channel name be followed
by a !, followed immediately by a valid time offset into
the data array for the channel, followed by another !,
followed immediately by a range, followed by a final !.
This example will return a 0.4 second long slice of the channel data
from GPS time 600000000.300000000 through 600000000.700000000.
Suppose we have a Proc frame with a Proc structure containing
frequency spectrum data with Hertz as the x-axis units.
We can request a specific frequency band by specifying a
start-frequency and a frequency range:
Proc(0!1024!1024FREQ!)
This will return a Proc structure containing the data at all frequency
bins greater than or equal to 1024 Hz and less than 2048 Hz, that is,
all frequency data in the semi-open interval [0, 1024) Hz.
Any metadata associated
with the original object will be passed though unmodified.
NOTE on frequency slicing:
The framequery option support for downsampling syntax
Adc(channel!resample!q!) or Proc(channel!resample!q!):
Adc(H2:LSC_AS-Q!resample!8!)
In the case of resampling, the first field after the channel name contains the
literal string "resample", and the second field contains the downsampling
factor q. (see the
resampling algorithm
documentation.)
It is intended that resampling be applied after data has been time
sliced based on the time range part of the framequery option.
Note that the only supported resampling factors are 2, 4, 8,
and 16.
The -framequery option by-the-numbers:
The first element of the complex -framequery option is a
list of the type of the frames which should be retrieved.
The type refers to the frame attribute referenced by the second
field in the proposed frame spec. This filed defaults internall to
R, or "Raw".
The second element of the complex -framequery option is a
list of the interferomenters which produced the data that is wanted.
A separate output container will be created for each interferometer
specified.
The third element of the complex -framequery option is a Tcl
list of frame file names or URL's.
The fourth element of the -framequery option, times,
is a Tcl list of GPS timestamps and ranges. The times argument
must have the form a-b where a and
b are valid GPS times in whole numbers of seconds.
Due to historical precedent, this range should be interpreted as a
request for data including the second starting at a
until the end of the second starting at b,
thus the actual interval of time being requested is [a, b+1).
For example, -times 666666666-666666681 represents the equivalent of
specifying the names of 16 1-second frame files, and represents the 17
seconds of data from time 666666666 up to but not including 666666682.
Example:
-framequery { {} H {} {666666666-666666669:allow_gaps 666667660-666667665} full(0)}
Will, with the -allowgaps
option specified, ultimately result in a single frame of 1000
seconds duration.
Note that the times element of the -framequery
option returns the data spanning the range of time from the
top of the starting second to the bottom of the ending second.
The fifth element of the -framequery option supports a
shorthand notation for frame structure accessor methods exposed
to the Tcl layer. The most commonly used methods are the Adc()
and Proc() accessors, which retrieves Adc (time serialised data)
or Proc (time, freq, or time-freq domain) channel structures from frames.The argument to this accessor method can be an integer (referring to
the ith channel) OR the specific name of a channel:
-aliases: { <alias0> = <regex0>;
<alias1> = <regex1> ... }
This option allows aliases to be specified for variables
ingested by the dataconditioning API. The argument to -aliases
consists of a semicolon-delimited list whose elements are of the form
<alias> = <regex>, where <alias>
is an alphanumeric string beginning with an alphabetic character,
and <regex> is a regular expression (under Unix, type
'man 7 regex' for a description of regular expressions).
Each regular expression must match the name
of exactly one variable ingested by the dataconditioning API,
otherwise an error is reported and the job will not proceed.
Note that a regular expression is deemed to match a variable name if
it matches any substring of it. For example, FOO matches
FOO, 123-FOO and FOOBAR.
Aliases may be used in place of variable names in the body of the
-algorithms option. Before the commands
in the body of -algorithms are executed, each
occurrence of an alias is replaced by the unique variable name which
matched the regular expression on the right-hand side of the assignment.
Implicit aliases
In addition to user-defined aliases, the implicit aliases _ch0,
_ch1, ..., _chN are assigned to variables in the order
in which they are ingested by the dataconditioning API.
Making use of the implicit aliases requires knowlege
of the order of ingestion of the objects, which in most cases is
undefined and may change even on subsequent runs of identical jobs.
However, implicit aliases can be used in the
-algorithms option in the same way as
user-defined aliases. See the sample user commands
and the example below.
Example:
ADC channel data read from frame files is ingested into the
data conditioning API with a variable name based on the channel name
and its start-time, eg. data from channel H2:LSC-AS_Q
starting at GPS time 693960000s 0ns will be ingested with the
variable name "H2\:LSC-AS_Q::AdcData:693960000:0:Frame"
(note that embedded colons must be escaped with a backslash).
In the following user command, all occurrences of "gw" in
-algorithms will be replaced by
"H2\:LSC-AS_Q::AdcData:693960000:0:Frame". If data
from channel L1:LSC-AS_Q was also ingested,
an error would occur
because the regular expression on the right-hand side of "gw = LSC-AS_Q"
would match more than one variable name:
L1:LSC-AS_Q is the name of a frame channel.
L1:LSC-AS_Q::AdcData:693960000:0:Frame is the name
of a datacond variable.
-algorithms: { action syntax }
-responsefunction: { Full Path to File }
-tarball:
http or ftp URL to .tar.gz or .tar.bz2 file
This option is intended to be made use of in addition
to any and all former options, and does not obsolete or obviate the
requirement for any other option.
The -tarball option accepts a single argument of
a URL referencing a .tar.gz or .tar.bz2
file containing all of the files which are otherwise
referenced via http and ftp url's in the user command.
The location and directory hierarchy within the tarball should be
consistent with those of the http and ftp references.
For example, if there is an option of this form:
NOTE:
Internally (in the LDAS manager API), the use of the
-tarball option turns off retrieval of
ALL other http and ftp URL's, so you MUST
put all of the files otherwise referenced by URL's in the user
command into the tarball!
-datacondtarget: {API name}
The default API for datacond results.
When the underscore _ is used as the protocol argument in an
output()
action, the value of -datacondtarget will be used as the
protocol for the output.
In a conditionData command, results are normally written to disk
in plain ilwd format, or returned to the frame API for output as
frames. If output actions use the _ default protocol
specifier, then their output is written to disk in plain ilwd
format or XML, whichever is specified via the -outputformat
option, or as specified by the output() action format
specifier (see the documentation for the output() action).
NOTE:
-setsingledc: {Boolean}
Frame File Naming Convention
When the frame cache consists of a mixed collection of frames in
an inconsistent hierarchy of subdirectories the appropriate frame
file(s) for fulfilling a given request are determined by parsing
the frame file names according to an installation-specific naming
convention.
The LDAS system provides a default naming convention which is described
in detail in the document
Naming Convention for Frame Files Which Are to be Processed by LDAS
.
where
Examples:
The argument to the -inputprotocol option conforms
to the usual browser conventions for URI's for determining
the location of the results of the user request.
When the URI is of type http or ftp
the LDAS system will attempt to retrieve the data referred
to by the URI description, and will make a local copy of
it in the result directory assigned to the user command.
On completion or failure of the user command, the local copy
of the input data will be removed.
When the URI is of type port, the system will attempt
to read ilwd binary data from the referenced port.
When the URI is of type file, the data is locally
available, and will be read from the local file system if it
exists and is readable.
The possible formats of the argument are:
Optional
Specifying "file" will return the local path to the file
as it is seen by all LDAS API's.
Specifying ftp or http will return a URL
relative to the gateway machine of the local system (which
is the same machine to which user requests are made).
The names used for the output files will be determined
by the system and are generally descriptive of the
content of the files.
The "file" form is used to get the local name of the file
relative to the LDAS installation for use as input data
by subsequent jobs.
This form is not advised when the job is one that is expected
to produce many output files!
In some special cases the suggested filenamr and extension
may be ignored, as when a frame file is being produced, or
an exact ilwd representation of a frame file.
Note that the trailing slash is absolutely required in order
for the URL to be interpreted as a local directory by the
LDAS system.
The URL returned to the user will be of the type specified.
The user will need to run tests to make certain that this
form works with the specified site.
The LDAS writable subdirectory should be defined in the
LDASapi.rsc resource file as the
::GRID_FTP_WRITABLE_SUBDIRECTORY, which will be joined
to the grid home directory when calculating output filenames.
-returnprotocol gridftp
-returnprotocol gridftp://here/there/mydir/mysubdir/
LDAS-TEST1234567_wrapperdata.ilwd
Default: {ilwd binary}
The possible output formats which the system can produce include:
The -framequery option is REQUIRED for getFrameData,
getFrameElements, and concatFrameData user commands, but is OPTIONAL
for conditionData, dataStandAlone, and dataPipeline user commands.
The -framequery argument is a complex list of frame
API query atoms.
The query atoms consist of unique identifying strings (which are
not case sensitive) with indices or channel names grouped
in parentheses. The query atom strings consist of the unique
parts of the accessor function names from the frame API c++ code.
(See: frameAPI.so)
In the simplest case, a single frame repository containing
frames from more than one instrument can be queried to retrieve
a common time range from each specified interferometer:
This query will return the data for Adc 0 from both Hanford and
Livingston over the gps time period 600000000 to 600000007.
This query will return the data from both Hanford and Livingston
over the gps time range 600000000 to 600000007 for the H2 and L1
so called gravitational wave channels.
The "slicing" syntax allows a subset of an Adc or Proc data channel
to be obtained. Slicing is performed by appending two floating-point
numeric arguments, the start and range, to the channel
specification (the fifth field of the framequery),
delimited by !'s. The start is the absolute
starting position of the slice along the x-axis of the channel,
and the range is the extent of the slice.
Valid [type] specifications are TIME, FREQ, and
TIMEFREQ. When no type specifier is given, time is assumed,
and this may generate errors if frequency series data is found.
The units of the both numbers are taken from the unitX field
for the specified channel in the frame file when the data is operated
on by an another API.
This is usually seconds for time-series data and Hertz for
frequency-series data.
The ILWD object containing this data slice will have the correct
calculated start-time and times-span for this slice of data, as well as
the correct number of data points in the data array.
The directive Adc(CAL-CAV_GAIN!0!2048.0001FREQ!)
is interpreted by LDAS as
all frequencies >= 0 and < 2048.0001.
Note the strict inequality for the upper limit. The frequency
of a bin is determined by:
f_k = startX + k*df, k = 0, 1, ..., N-1
so you just need to make sure your request is consistent with this
scheme. The reason some users are adding 0.0001 is becuase they have
f-sequences that have a bin that they want at EXACTLY 2048 Hz (say),
so they need to specify an upper limit above this ie. the upper
limit must be strictly greater than 2048 and less than or equal to
2048 + df Hz. If adding 0.0001 works for you fine, but that's obviously
not the only choice.
This might seem a strange convention but the reason it's chosen is:
[ ... ) [ ... ) [ ... )
Suppose we are asking for gravity wave channel data over an
hour, but want to downsample the data by a factor of 8
before it is sent to the dataconditioning API:
As of the 0.2.0 release of LDAS, this has been imperfectly implemented,
and it is important that users wishing to make effective use of the
resampling syntax provide feedback to the developers via the problem
tracking system:
Another example of a frame type would be mT, or "Minute Trend".
separate output container will be created for each interferometer
specified.
This option is generally only used when a single specific input frame
file (or a single file from, say each of two interferometers) is
needed, and is provided primarily for the application of data
conditioning or pipelining of test data in the form of frame files.
This option will become the mecahnism for the reading of calibration
or other process data from frame files at a later time.
The syntax supports a gap filling flag:
666666666-666666681:allow_gaps which the LDAS system recognises
as an indication that it should allow missing frame files, and to account
for them in the data conditioning stage of the user command by filling
in missing data points as specified by the
fillgaps action in the
algorithms option.
Note that the -allowgaps option to the
dataPipeline user command overrides all the syntactical subtleties
implied here and will cause a single data segment spanning all
specified time ranges to be created.
Adc(12)
Adc(H2:LSC-AS_Q)
An argument of "full()" as an item in the
framequery option list will result in either a copy of the
frame if the return format is specified as "frame",
a full text dump of the frame if the return format is
specified as "ilwd", and a full XML dump if the
format is specified as LIGO_LW.
Provides assignment of aliases to variables ingested by the
dataconditioning API.
conditionData
-framequery { R H {} 693960000-693960001 Adc(H2:LSC-AS_Q) }
-aliases { gw = LSC-AS_Q; }
-algorithms {
gw2 = resample(gw, 1, 8);
output(gw2,, gw2.ilwd, gw2, Downsampled gw channel);
}
-datacondtarget datacond
Note that when using output() to create
frame formatted results, that the name
field should not contain the backslashes as used in the datacond
API variable names. Datacond API variable names are derived from
frame channel names, but they are not identical! Frame channel
names do not have backslashes protecting the colon after the
interferometer i.d.
Example:
Required
The algorithms option is constructed from a series of mathematical
"actions" which are defined within the data conditioning API.
These "actions" are entered in the form of a semi-colon delimited
list.
The "action" calls are used to develop complex algorithms by
"chaining" multiple actions which are evaluated from left to right.
The results of actions can be assigned to variables or "printed"
out using a special action: output(), provided for that
purpose.
Optional: Deprecated in favor of -responsefiles, q.v.
-responsefiles: { Full Path to File(s) }
See -responsefiles option below.
Optional: When an external ilwd responsefile(s) is/are required,
the full path to the file is specified by this option. The syntax
allows for some degree of flexibility in the disposition of the
files. See below.
This option is used to specify the location and disposition of
files containing data and/or coefficients which are not provided
by the data as received from the frame API or which cannot be
calculated from the frame derived data by the data-conditioning
API or extracted from the database.
The files referred to by this option can be injected into the data
stream for the job either within the data-conditioning API by the
use of the "push" option, or can be attached to the output of the
data-conditioning API for transmission to the metadata or wrapper
API's by use of the "pass" option. Use of the "push" option
requires that an additional argument in the form of an "alias" for
the data-conditioning API be provided.
Examples:
-responsefiles {
file:/MayMDC/al.ilwd,push,al
file:/MayMDC/bl.ilwd,push,bl
file:/MayMDC/am.ilwd,push,am
file:/MayMDC/bm.ilwd,push,bm
file:/MayMDC/ah.ilwd,push,ah
file:/MayMDC/bh.ilwd,push,bh
file:/MayMDC/resp.bin,pass
}
Here the "push" elements are ilwd data files which are used to
populate the values al, bl, am, bm, ah, and bh. The contents of
the files can then be referenced in the call chain algorithm by
referring to the appropriate variable.
The "pass" element is passed on to the api pointed to by the
-datacondtarget option and is not used in calculations performed by
the data conditioning API.
The purpose of this option is to limit the number of individual
Curl calls
made by the manager API to avoid making the manager very busy managing
hundreds (or thousands of remote file retrievals via http or ftp.
-responsefiles \
http://www.foo.org/bar/baz/data1.ilwd
http://www.foo.org/bar/baz/data2.ilwd
http://www.foo.org/bar/baz/data3.ilwd
http://www.foo.org/bar/baz/data4.ilwd
then the tarball option could be any of:
-tarball http://www.foo.org/tarball.tar.gz
-tarball http://www.foo.org/bar/tarball.tar.gz
-tarballhttp://www.foo.org/bar/baz/tarball.tar.gz
And as long as the internal structure of the tarball is such that
it would unpack and overwrite the files individually referenced
otherwise, the user command will succeed!
Default: datacond
A straightforward method of getting wrapper formatted ilwd
written to disk is to use the
dataStandAlone
user command.
If this flag is 0 (false), then each frame formatted output from the
data conditioning API will be returned as a seperate frame object.
If this flag is 1 (true), then frame formatted output from the
data conditioning API is combined into a single frame object.
In summary, the required format of frame-file names is
16 seconds of raw data from Hanford.
1 minute of second-trend data from Livingston.
16 seconds of Level 1 reduced data from Hanford.
Return to Top