|
ldasJob { -name {} -password {} -email {} } { userCmd -opt1 {} ... }
Which is in the format of a Tcl command, ldasJob, with two required arguments:
Calling convention (all on a single line):
ldasJob { -name "username" -password "********" -email "user@foobar.edu" } { dataStandAlone -inputprotocol {Tcl list} -returnprotocol URL -outputformat {Tcl list} -framequery {Tcl list} -aliases {Tcl list} -algorithms {Tcl list} -subject "freeform string" -responsefiles {Tcl list} -tarball {http or ftp URL} }
Option Descriptions:
-inputprotocol:
http ftp file port
NOTE: Embedded spaces in the argument to the -inputprotocol
option will cause the request to fail.
-outputformat: frame ilwd LIGO_LW
The argument of the -outputformat option is the data type to
use in formatting the result of the user request.
-framequery: {R H {} 666666666-666666669 Adc(1,2,5-15)}
The -framequery syntax now supports a complex
associative format which allows the simultaneous retrieval
of unrelated data elements from multiple sources.
The five fields of the complex framequery are:
The framequery option supports the slicing syntax Adc(channel!start!range[type]!)
or Proc(channel!start!range[type]!):
Note that the extent of the x-axis of a channel is not related
to the GPS time of the frame. The first value on the x-axis is given by
the startX field of the channel, and the extent is
given by the number of samples in the channel times the dx field
of the channel. While startX may have any value, it is usually 0,
although there are cases where it could be negative, such as for a
2-sided power spectrum.
The sliced data will contain all samples whose
x-coordinate satisfy start <= x < start + range.
An error will be generated if the start of the slice is less than
startX, or if the slice extends past the end of the channel.
Examples of slicing:
Suppose we are accessing a channel from frame H-R-600000000.gwf:
Adc(H2:LSC_AS-Q!0.3!0.4TIME!)
The ! syntax determines a start-time and range within the
channel (since startX is zero for time-series data,
the start-time may be interpreted as an offset from the GPS start-time
of the framequery).
The syntax requires that the channel name be followed
by a !, followed immediately by a valid time offset into
the data array for the channel, followed by another !,
followed immediately by a range, followed by a final !.
This example will return a 0.4 second long slice of the channel data
from GPS time 600000000.300000000 through 600000000.700000000.
Suppose we have a Proc frame with a Proc structure containing
frequency spectrum data with Hertz as the x-axis units.
We can request a specific frequency band by specifying a
start-frequency and a frequency range:
Proc(0!1024!1024FREQ!)
This will return a Proc structure containing the data at all frequency
bins greater than or equal to 1024 Hz and less than 2048 Hz, that is,
all frequency data in the semi-open interval [0, 1024) Hz.
Any metadata associated
with the original object will be passed though unmodified.
NOTE on frequency slicing:
The framequery option support for downsampling syntax
Adc(channel!resample!q!) or Proc(channel!resample!q!):
Adc(H2:LSC_AS-Q!resample!8!)
In the case of resampling, the first field after the channel name contains the
literal string "resample", and the second field contains the downsampling
factor q. (see the
resampling algorithm
documentation.)
It is intended that resampling be applied after data has been time
sliced based on the time range part of the framequery option.
Note that the only supported resampling factors are 2, 4, 8,
and 16.
The -framequery option by-the-numbers:
The first element of the complex -framequery option is a
list of the type of the frames which should be retrieved.
The type refers to the frame attribute referenced by the second
field in the proposed frame spec. This filed defaults internall to
R, or "Raw".
The second element of the complex -framequery option is a
list of the interferomenters which produced the data that is wanted.
A separate output container will be created for each interferometer
specified.
The third element of the complex -framequery option is a Tcl
list of frame file names or URL's.
The fourth element of the -framequery option, times,
is a Tcl list of GPS timestamps and ranges. The times argument
must have the form a-b where a and
b are valid GPS times in whole numbers of seconds.
Due to historical precedent, this range should be interpreted as a
request for data including the second starting at a
until the end of the second starting at b,
thus the actual interval of time being requested is [a, b+1).
For example, -times 666666666-666666681 represents the equivalent of
specifying the names of 16 1-second frame files, and represents the 17
seconds of data from time 666666666 up to but not including 666666682.
Example:
-framequery { {} H {} {666666666-666666669:allow_gaps 666667660-666667665} full(0)}
Will, with the -allowgaps
option specified, ultimately result in a single frame of 1000
seconds duration.
Note that the times element of the -framequery
option returns the data spanning the range of time from the
top of the starting second to the bottom of the ending second.
The fifth element of the -framequery option supports a
shorthand notation for frame structure accessor methods exposed
to the Tcl layer. The most commonly used methods are the Adc()
and Proc() accessors, which retrieves Adc (time serialised data)
or Proc (time, freq, or time-freq domain) channel structures from frames.The argument to this accessor method can be an integer (referring to
the ith channel) OR the specific name of a channel:
-returnprotocol:
http ftp gridftp mailto file port
Default Value: ${jobid}_wrapperdata.ilwd (local filename)
The argument to -returnprotocol resembles the usual
browser conventions for URL's, and is used to control the
naming and disposition of output data from a user request,
however the range of supported protocols is severlely limited
for the dataStandAlone user command.
This is a special protocol used by the dataStandAlone user
command for interaction with a GLOBUS/GRID system.
This returnprotocol is only interpreted by the datacond API
for use on output data formatted for ingestion by a stand alone
wrapper API.
The file will be written to the local gridftp home directory
or a subdirectory of it which is writable by the user which
LDAS runs as.
Several modified forms of the gridftp URL are available:
as a useful shorthand notation for producing a directory
hierarchy of dataStandAlone results which maps directly to
the job directories normally produced by LDAS, allowing
simple correlation of output data with other job resuts or
messages from the LDAS system.
Then the directory hierarchy will be created as required
and unique job-specific default filenames will be assigned
to the datacond API's output for the wrapper, i.e.:
NOTES:
-aliases: { <alias0> = <regex0>;
<alias1> = <regex1> ... }
This option allows aliases to be specified for variables
ingested by the dataconditioning API. The argument to -aliases
consists of a semicolon-delimited list whose elements are of the form
<alias> = <regex>, where <alias>
is an alphanumeric string beginning with an alphabetic character,
and <regex> is a regular expression (under Unix, type
'man 7 regex' for a description of regular expressions).
Each regular expression must match the name
of exactly one variable ingested by the dataconditioning API,
otherwise an error is reported and the job will not proceed.
Note that a regular expression is deemed to match a variable name if
it matches any substring of it. For example, FOO matches
FOO, 123-FOO and FOOBAR.
Aliases may be used in place of variable names in the body of the
-algorithms option. Before the commands
in the body of -algorithms are executed, each
occurrence of an alias is replaced by the unique variable name which
matched the regular expression on the right-hand side of the assignment.
Implicit aliases
In addition to user-defined aliases, the implicit aliases _ch0,
_ch1, ..., _chN are assigned to variables in the order
in which they are ingested by the dataconditioning API.
Making use of the implicit aliases requires knowlege
of the order of ingestion of the objects, which in most cases is
undefined and may change even on subsequent runs of identical jobs.
However, implicit aliases can be used in the
-algorithms option in the same way as
user-defined aliases. See the sample user commands
and the example below.
Example:
ADC channel data read from frame files is ingested into the
data conditioning API with a variable name based on the channel name
and its start-time, eg. data from channel H2:LSC-AS_Q
starting at GPS time 693960000s 0ns will be ingested with the
variable name "H2\:LSC-AS_Q::AdcData:693960000:0:Frame"
(note that embedded colons must be escaped with a backslash).
In the following user command, all occurrences of "gw" in
-algorithms will be replaced by
"H2\:LSC-AS_Q::AdcData:693960000:0:Frame". If data
from channel L1:LSC-AS_Q was also ingested,
an error would occur
because the regular expression on the right-hand side of "gw = LSC-AS_Q"
would match more than one variable name:
L1:LSC-AS_Q is the name of a frame channel.
L1:LSC-AS_Q::AdcData:693960000:0:Frame is the name
of a datacond variable.
-algorithms: { action syntax }
-subject: "freeform string"
The value of the -subject option is used as the subject field
in the email returned to the user after successful completion
of an LDAS job
The argument to the -inputprotocol option conforms
to the usual browser conventions for URI's for determining
the location of the results of the user request.
When the URI is of type http or ftp
the LDAS system will attempt to retrieve the data referred
to by the URI description, and will make a local copy of
it in the result directory assigned to the user command.
On completion or failure of the user command, the local copy
of the input data will be removed.
When the URI is of type port, the system will attempt
to read ilwd binary data from the referenced port.
When the URI is of type file, the data is locally
available, and will be read from the local file system if it
exists and is readable.
The possible formats of the argument are:
Default: {ilwd binary}
The possible output formats which the system can produce include:
The -framequery option is REQUIRED for getFrameData,
getFrameElements, and concatFrameData user commands, but is OPTIONAL
for conditionData, dataStandAlone, and dataPipeline user commands.
The -framequery argument is a complex list of frame
API query atoms.
The query atoms consist of unique identifying strings (which are
not case sensitive) with indices or channel names grouped
in parentheses. The query atom strings consist of the unique
parts of the accessor function names from the frame API c++ code.
(See: frameAPI.so)
In the simplest case, a single frame repository containing
frames from more than one instrument can be queried to retrieve
a common time range from each specified interferometer:
This query will return the data for Adc 0 from both Hanford and
Livingston over the gps time period 600000000 to 600000007.
This query will return the data from both Hanford and Livingston
over the gps time range 600000000 to 600000007 for the H2 and L1
so called gravitational wave channels.
The "slicing" syntax allows a subset of an Adc or Proc data channel
to be obtained. Slicing is performed by appending two floating-point
numeric arguments, the start and range, to the channel
specification (the fifth field of the framequery),
delimited by !'s. The start is the absolute
starting position of the slice along the x-axis of the channel,
and the range is the extent of the slice.
Valid [type] specifications are TIME, FREQ, and
TIMEFREQ. When no type specifier is given, time is assumed,
and this may generate errors if frequency series data is found.
The units of the both numbers are taken from the unitX field
for the specified channel in the frame file when the data is operated
on by an another API.
This is usually seconds for time-series data and Hertz for
frequency-series data.
The ILWD object containing this data slice will have the correct
calculated start-time and times-span for this slice of data, as well as
the correct number of data points in the data array.
The directive Adc(CAL-CAV_GAIN!0!2048.0001FREQ!)
is interpreted by LDAS as
all frequencies >= 0 and < 2048.0001.
Note the strict inequality for the upper limit. The frequency
of a bin is determined by:
f_k = startX + k*df, k = 0, 1, ..., N-1
so you just need to make sure your request is consistent with this
scheme. The reason some users are adding 0.0001 is becuase they have
f-sequences that have a bin that they want at EXACTLY 2048 Hz (say),
so they need to specify an upper limit above this ie. the upper
limit must be strictly greater than 2048 and less than or equal to
2048 + df Hz. If adding 0.0001 works for you fine, but that's obviously
not the only choice.
This might seem a strange convention but the reason it's chosen is:
[ ... ) [ ... ) [ ... )
Suppose we are asking for gravity wave channel data over an
hour, but want to downsample the data by a factor of 8
before it is sent to the dataconditioning API:
As of the 0.2.0 release of LDAS, this has been imperfectly implemented,
and it is important that users wishing to make effective use of the
resampling syntax provide feedback to the developers via the problem
tracking system:
Another example of a frame type would be mT, or "Minute Trend".
separate output container will be created for each interferometer
specified.
This option is generally only used when a single specific input frame
file (or a single file from, say each of two interferometers) is
needed, and is provided primarily for the application of data
conditioning or pipelining of test data in the form of frame files.
This option will become the mecahnism for the reading of calibration
or other process data from frame files at a later time.
The syntax supports a gap filling flag:
666666666-666666681:allow_gaps which the LDAS system recognises
as an indication that it should allow missing frame files, and to account
for them in the data conditioning stage of the user command by filling
in missing data points as specified by the
fillgaps action in the
algorithms option.
Note that the -allowgaps option to the
dataPipeline user command overrides all the syntactical subtleties
implied here and will cause a single data segment spanning all
specified time ranges to be created.
Adc(12)
Adc(H2:LSC-AS_Q)
An argument of "full()" as an item in the
framequery option list will result in either a copy of the
frame if the return format is specified as "frame",
a full text dump of the frame if the return format is
specified as "ilwd", and a full XML dump if the
format is specified as LIGO_LW.
Optional
Currently any attempt to use the -returnprotocol option
to cause data to be sent by LDAS via anonoymous ftp will be
ignored, and only a local directory and filename will be returned.
The same result will be seen for http type URL's.
For the time being the onus of determining the http URL which
can be used to retrieve the results of a dataStandAlone command
that is NOT interacting with gridftp is with the user.
Support for remote anonymous ftp for remote push, and the
return of a fully qualified http URL for remote pull will be
available in the next release of LDAS.
The LDAS writable subdirectory should be defined in the
LDASapi.rsc resource file as the
::GRID_FTP_WRITABLE_SUBDIRECTORY, which will be joined
to the grid home directory when calculating output filenames.
-returnprotocol gridftp
-returnprotocol gridftp://here/there/mydir/mysubdir/
LDAS-TEST1234567_wrapperdata.ilwd
Provides assignment of aliases to variables ingested by the
dataconditioning API.
conditionData
-framequery { R H {} 693960000-693960001 Adc(H2:LSC-AS_Q) }
-aliases { gw = LSC-AS_Q; }
-algorithms {
gw2 = resample(gw, 1, 8);
output(gw2,, gw2.ilwd, gw2, Downsampled gw channel);
}
-datacondtarget datacond
Note that when using output() to create
frame formatted results, that the name
field should not contain the backslashes as used in the datacond
API variable names. Datacond API variable names are derived from
frame channel names, but they are not identical! Frame channel
names do not have backslashes protecting the colon after the
interferometer i.d.
Example:
Required
The algorithms option is constructed from a series of mathematical
"actions" which are defined within the data conditioning API.
These "actions" are entered in the form of a semi-colon delimited
list.
The "action" calls are used to develop complex algorithms by
"chaining" multiple actions which are evaluated from left to right.
The results of actions can be assigned to variables or "printed"
out using a special action: output(), provided for that
purpose.
Optional: Defaults to
results when the
-returnprotocol spec is not gridftp, and
grid_output when the
-returnprotocol option is gridftp.
The user specified subject will be overridden and grid_output
will be used under all circumstances when gridftp is the
-returnprotocol to conform with the LDAS/GRID interface.
-tarball:
http or ftp URL to .tar.gz or .tar.bz2 file
This option is intended to be made use of in addition
to any and all former options, and does not obsolete or obviate the
requirement for any other option.
The -tarball option accepts a single argument of
a URL referencing a .tar.gz or .tar.bz2
file containing all of the files which are otherwise
referenced via http and ftp url's in the user command.
The location and directory hierarchy within the tarball should be
consistent with those of the http and ftp references.
For example, if there is an option of this form:
NOTE:
Internally (in the LDAS manager API), the use of the
-tarball option turns off retrieval of
ALL other http and ftp URL's, so you MUST
put all of the files otherwise referenced by URL's in the user
command into the tarball!
The result of this job is an ilwd file which can be used by a standalone
wrapper with the tfcluster dso. The ilwd file will normally be found in
the job directory, which can be determined by the returned email message.
The purpose of this option is to limit the number of individual
Curl calls
made by the manager API to avoid making the manager very busy managing
hundreds (or thousands of remote file retrievals via http or ftp.
-responsefiles \
http://www.foo.org/bar/baz/data1.ilwd
http://www.foo.org/bar/baz/data2.ilwd
http://www.foo.org/bar/baz/data3.ilwd
http://www.foo.org/bar/baz/data4.ilwd
then the tarball option could be any of:
-tarball http://www.foo.org/tarball.tar.gz
-tarball http://www.foo.org/bar/tarball.tar.gz
-tarballhttp://www.foo.org/bar/baz/tarball.tar.gz
And as long as the internal structure of the tarball is such that
it would unpack and overwrite the files individually referenced
otherwise, the user command will succeed!
Return to Top