|
ldasJob { -name {} -password {} -email {} } { userCmd -opt1 {} ... }
Which is in the format of a Tcl command, ldasJob, with two required arguments:
ldasJob {-name mlei -password md5protocol -email 131.215.115.126:52124} {dataPipeline ... }
Then the manager returns a string consisting of the word md5salt and the md5salt that the client should use when creating the hash:
md5salt [integer]
(** Right, it's not really a salt in the normal sense, it's a
session key. The name is an artefact of an earlier implementation.
Sorry for any confusion.)
The client then appends the salt value to the user's password and calculates an md5 hash of the combined password/salt string and returns this value to the manager like this:
md5digest af2058b1b115a2aec77e76e07b85d031
And the manager validates the request.
Acquires input data from the frame API, metadata API, disk files
and URL's and performs mathematical transformations upon it using
the datacond API. The primary output sink for the datacond API
is the wrapper API, where mpi processing occurs. Secondary datacond
API results can be output to disk or to URL's in LIGO lightweight
data format, ilwd format, or frame format, or can be sent to the
metadata API for ingestion into the database.
Results of mpi processing are sent to the eventmon API, where they
are formatted for ingestion into the database via the metadata API.
Data extracted from the frame API is concatenated as possible by
default. This default behaviour can be modified by using the
-concatenate option with an argument of 0.
Data products can be sent to other API's by using the
output() action of the
-algorithms option.
Data objects recieved by other API's can often be
inspected by setting the ::DEBUG level in the appropriate
API('s) to 219 (0xdb), the data will then be placed in the
job result area of the LDAS installation.
Calling convention (all on a single line):
ldasJob { -name "username" -password "********" -email "user@foobar.edu"} {dataPipeline -inputfile URL -inputprotocol {Tcl list} -returnprotocol URL -outputformat data_format -usertype string -outputdir output_directory -usejobdirs use per-job subdirectories -subject {freeform string} -ingestdata {Tcl list} -multidimdatatarget {API name} -mpiapi {HOST,PORT} -state {Tcl list} -multidimoutput {Tcl list} -responsefunction {Tcl list} -responsefiles {Tcl list} -tarball {http or ftp URL} -autoexpand boolean -aliases {Tcl list} -algorithms {Tcl list} -framequery {String} -datacondtarget {API name} -metadatatarget {API name} -dataapi {HOST,PORT} -dbquery {Tcl list} -dbntuple {Tcl list} -dbspectrum {Tcl list} -dbqualitychannel {Tcl list} -doloadbalance {Tcl list} -concatenate {Boolean} -allowgaps {Boolean} -np {Integer} -dynlib {Full path to .so} -realtimeratio {Float} -memoryusagelimit {Float} -filterparams {Freeform String} -usertag {Freeform String} -datadistributor {WRAPPER | SEARCHMASTER} -communicateoutput {ONCE | ALWAYS} -metadataapi {API name} -resultapi {HOST,PORT} -setsingledc {Boolean} -setsingleem {Boolean} -database {String} -ligolwformat data_format }
Options Description:
-returnprotocol:
http ftp gridftp mailto file port
Default Value: http://results.ilwd
Note that in most cases this option should NOT be explicitly
set by the user, since it can have unexpected interactions
with the output() actions in the datacond API, for example.
The argument to -returnprotocol resembles the usual
browser conventions for URL's, and is used to control the
naming and disposition of output data from a user request.
The http://, ftp://, gridftp:/,
and file:/ URI types are currently supported.
The specific format of the argument, and possibly the values
of other arguments will affect the way that the
-returnprotocol argument is applied by the LDAS system:
Will cause the system to return the location of output
data as an ftp or http URL (the system default, in the
absence of a user supplied -returnprotocol option, is
to use http).
Will cause the system to return the location of output
data as an ftp or http URL (the system default, in the
absence of a user supplied -returnprotocol option, is
to use http). And any output object which is not
in frame format will be named according to the
suggested pattern. When possible, indexes will be
appended to the filename to differentiate between multiple
output files, but sometimes this can not be managed and in
that case as many levels of .ba* files will be created as
required to avoid overwriting data.
When this form is used and dirname corresponds to an
existing directory in the anonymous ftp area of the LDAS
system, all output files will be copied into that directory.
In some cases a filename can be specified and will be used;
in this case .ba* files may be created to avoid overwriting
of data. This option of -returnprotocol is currently is not supported. The output
e.g. frame or xml, is placed in the job output directory.
The user has to retrieve the output via http or ftp to
to his/her local system.
When this form is used the LDAS system will attempt to copy
the output data via anonymous ftp to the remote site. This option of -returnprotocol is currently is not supported. The output
e.g. frame or xml, is placed in the job output directory.
The user has to retrieve the output via http or ftp to
to his/her local system.
This is a special protocol used by the dataStandAlone user
command for interaction with a GLOBUS/GRID system.
This returnprotocol is only interpreted by the datacond API
for use on output data formatted for ingestion by a stand alone
wrapper API.
The file will be written to the local gridftp home directory
or a subdirectory of it which is writable by the user which
LDAS runs as.
Several modified forms of the gridftp URL are available:
as a useful shorthand notation for producing a directory
hierarchy of dataStandAlone results which maps directly to
the job directories normally produced by LDAS, allowing
simple correlation of output data with other job resuts or
messages from the LDAS system.
Then the directory hierarchy will be created as required
and unique job-specific default filenames will be assigned
to the datacond API's output for the wrapper, i.e.:
NOTES:
-outputformat: frame ilwd LIGO_LW
The argument of the -outputformat option is the data type to
use in formatting the result of the user request.
-usertype: string
Allows the user to specify a type to be used in constructing
the output frame names. The specified type will be supplemented with
an underscore and integer value when more than one frame is output
by a single user command, consistent with the default naming
convention.
For example, specifying -usertype foobar will result in an
output type of foobar when only one frame file is written.
However, specifying -usertype foobar when more than one
frame is to be written will result in type specifications of the
form foobar_1, foobar_2 and so forth for each
succeeding frame file. The naming convention will be consistent
with the convention used when the option is not specified.
A request that would result in the overwriting of existing data will
raise an exception and no output will result.
Default: NONE, use default RDS output type.
Note: Specifying a unique type will help ensure that LDAS finds
the intended frames in a -framequery option of an LDAS job.
This option is very useful when creating merged RDS frames (i.e. frames
containing multiple IFO data). It will help LDAS find the correct
multi-IFO RDS frames and not other single-IFO RDS frames which may
otherwise be named similarly.
-outputdir: output_directory
This option applies when the user desires
result frames to be written to a special directory provided
for the purpose of collecting reduced data sets.
If the directory does not exist, the job fails due to
frameAPI reporting error about non-existent directory.
Default: job directory.
-usejobdirs: 0 or 1
When this flag is set to 1, there will be a new
subdirectory created under the outputdir for each new job
when the -outputdir option is also specified. When this flag
is set to '0', the files will be output directly into the
directory specified by -outputdir.
Default: 1
-tarball:
http or ftp URL to .tar.gz or .tar.bz2 file
This option is intended to be made use of in addition
to any and all former options, and does not obsolete or obviate the
requirement for any other option.
The -tarball option accepts a single argument of
a URL referencing a .tar.gz or .tar.bz2
file containing all of the files which are otherwise
referenced via http and ftp url's in the user command.
The location and directory hierarchy within the tarball should be
consistent with those of the http and ftp references.
For example, if there is an option of this form:
NOTE:
Internally (in the LDAS manager API), the use of the
-tarball option turns off retrieval of
ALL other http and ftp URL's, so you MUST
put all of the files otherwise referenced by URL's in the user
command into the tarball!
-np: { Integer }
-dynlib: { Filename }
Developers of dso's take note:
When pushing copies of development shared objects into the NFS
mounted directory hierarchy visible to the wrapper API, be sure
to name subsequent versions of the shared object with new, unique
names (for example, use a patchlevel number when naming the file),
otherwise there is a chance that a memory cached copy of an
older version of your dso will be used.
-mpiapi: {HOST,PORT}
-dataapi: {HOST,PORT}
-resultapi: {HOST,PORT}
-setsingledc: {Boolean}
-setsingleem: {Boolean}
-database: {String}
-ligolwformat: LIGO_LW | LIGO_LW base64
-filterparams: {Freeform cslist}
-usertag: { Freeform string }
-realtimeratio: { Float }
-memoryusagelimit: { Float }
-doloadbalance: { TRUE | FALSE }
-datadistributor: { WRAPPER, SEARCHMASTER }
This option is currently disabled.
-communicateoutput: { ONCE | ALWAYS }
-frametarget: datacond
-inputprotocol: { filenames and/or URLs }
-subject: {subject for the return e-mail}
-autoexpand: Boolean [01]
-aliases: { <alias0> = <regex0>;
<alias1> = <regex1> ... }
This option allows aliases to be specified for variables
ingested by the dataconditioning API. The argument to -aliases
consists of a semicolon-delimited list whose elements are of the form
<alias> = <regex>, where <alias>
is an alphanumeric string beginning with an alphabetic character,
and <regex> is a regular expression (under Unix, type
'man 7 regex' for a description of regular expressions).
Each regular expression must match the name
of exactly one variable ingested by the dataconditioning API,
otherwise an error is reported and the job will not proceed.
Note that a regular expression is deemed to match a variable name if
it matches any substring of it. For example, FOO matches
FOO, 123-FOO and FOOBAR.
Aliases may be used in place of variable names in the body of the
-algorithms option. Before the commands
in the body of -algorithms are executed, each
occurrence of an alias is replaced by the unique variable name which
matched the regular expression on the right-hand side of the assignment.
Implicit aliases
In addition to user-defined aliases, the implicit aliases _ch0,
_ch1, ..., _chN are assigned to variables in the order
in which they are ingested by the dataconditioning API.
Making use of the implicit aliases requires knowlege
of the order of ingestion of the objects, which in most cases is
undefined and may change even on subsequent runs of identical jobs.
However, implicit aliases can be used in the
-algorithms option in the same way as
user-defined aliases. See the sample user commands
and the example below.
Example:
ADC channel data read from frame files is ingested into the
data conditioning API with a variable name based on the channel name
and its start-time, eg. data from channel H2:LSC-AS_Q
starting at GPS time 693960000s 0ns will be ingested with the
variable name "H2\:LSC-AS_Q::AdcData:693960000:0:Frame"
(note that embedded colons must be escaped with a backslash).
In the following user command, all occurrences of "gw" in
-algorithms will be replaced by
"H2\:LSC-AS_Q::AdcData:693960000:0:Frame". If data
from channel L1:LSC-AS_Q was also ingested,
an error would occur
because the regular expression on the right-hand side of "gw = LSC-AS_Q"
would match more than one variable name:
L1:LSC-AS_Q is the name of a frame channel.
L1:LSC-AS_Q::AdcData:693960000:0:Frame is the name
of a datacond variable.
-algorithms: { action syntax }
-datacondtarget: {API name}
The default API for datacond results.
When the underscore _ is used as the protocol argument in an
output()
action, the value of -datacondtarget will be used as the
protocol for the output.
In a dataPipeline, results are normally sent to the wrapper
API for mpi processing. If output actions use the _
default protocol specifier, and the user specifies
-datacondtarget datacond, then the output will go to
disk. The format of the output will be plain however, not
wrapper format. To dump wrapper formatted output to disk via
a dataPipeline command requires that the debugging flag
::DEBUG_DUMP_OBJECTS be set using the cmonClient.
A much more straightforward method of getting wrapper formatted
ilwd written to disk is to use the
dataStandAlone
user command.
-metadatatarget: metadata
-dbquery: {{{sql} push alias}...}
Example of an arbitrary query
-dbntuple: {{{sql} {} {}} ...}
Each result ilwd will be sent to the target specified by
-metadatatarget option, e.g. datacondAPI.
The second and third elements of the individual -dbntuple arguments
are currently required, and should be represented as pairs of empty curly
braces: {}.
A compound -dbntuple argument should look like:
-dbspectrum: {{param=value param=value ...}...}
Example of a spectrum query: (with embedded blanks in fields)
To enter more than 1 query, enclose each one as a Tcl list,
e.g. {{spectrumquery1} {spectrumquery2} ...}
-dbqualitychannel: {{param=value param=value}...}
The sql command must provide the four columns
StartSec (GPS time in seconds of when the interferometer achieved lock),
StartNanoSec (GPS time in nanoseconds of when the interferometer achieved lock),
StopSec (GPS time in seconds of when the interferometer losted lock),
and StopNanoSec (GPS time in nanoseconds of when the interferometer lost lock).
A fifth column may be supplied to provide data about which interferometer is
to be associated with the start/stop time pair. If this column is provided, it
must me labeled IFO. If this column is supplied, the name of the
resulting object will be appended with the IFO.
These columns can (and most likely will be) aliases of the actual database
columns. Also, the names are case insensitive. StartSec, startSec, and startsec are all equivelent.
SQL Example (no IFO column):
SQL Example (with IFO column):
Code Examples:
To enter more than 1 query, enclose each one as a Tcl list,
e.g. {{qualitychannel1} {qualitychannel2} ...}
Each result ilwd will be sent to the target specified by -metadatatarget option
e.g. datacondAPI.
-concatenate: Boolean
-allowgaps: Boolean
For example:
The information representing the gaps is retained in a history record
named gap_info.
This may be useful information for developers of filter code.
The history record contains two pieces of information.
The first is the fill method that has been applied to the gaps.
This is an enumerated type as described in the
fillgap action.
Within the history record, it is named fill_method.
The second piece of information represents the range of the gap by
a GPS start time and GPS end time pair.
Each GPS time is represented by two unsigned integers.
The first number is the seconds component of the GPS time.
The second number is the nanoseconds component of the GPS time.
These pieces are named time_range.
There may be multiple instances of time_range, each representing
a single gap.
The gap range start with the GPS start time and goes till, but does not
include, the GPS end time.
This allow for loops:
An example of the history record in ilwd form is:
-framequery: {R H {} 666666666-666666669 Adc(1,2,5-15)}
The -framequery syntax now supports a complex
associative format which allows the simultaneous retrieval
of unrelated data elements from multiple sources.
The five fields of the complex framequery are:
The framequery option supports the slicing syntax Adc(channel!start!range[type]!)
or Proc(channel!start!range[type]!):
Note that the extent of the x-axis of a channel is not related
to the GPS time of the frame. The first value on the x-axis is given by
the startX field of the channel, and the extent is
given by the number of samples in the channel times the dx field
of the channel. While startX may have any value, it is usually 0,
although there are cases where it could be negative, such as for a
2-sided power spectrum.
The sliced data will contain all samples whose
x-coordinate satisfy start <= x < start + range.
An error will be generated if the start of the slice is less than
startX, or if the slice extends past the end of the channel.
Examples of slicing:
Suppose we are accessing a channel from frame H-R-600000000.gwf:
Adc(H2:LSC_AS-Q!0.3!0.4TIME!)
The ! syntax determines a start-time and range within the
channel (since startX is zero for time-series data,
the start-time may be interpreted as an offset from the GPS start-time
of the framequery).
The syntax requires that the channel name be followed
by a !, followed immediately by a valid time offset into
the data array for the channel, followed by another !,
followed immediately by a range, followed by a final !.
This example will return a 0.4 second long slice of the channel data
from GPS time 600000000.300000000 through 600000000.700000000.
Suppose we have a Proc frame with a Proc structure containing
frequency spectrum data with Hertz as the x-axis units.
We can request a specific frequency band by specifying a
start-frequency and a frequency range:
Proc(0!1024!1024FREQ!)
This will return a Proc structure containing the data at all frequency
bins greater than or equal to 1024 Hz and less than 2048 Hz, that is,
all frequency data in the semi-open interval [0, 1024) Hz.
Any metadata associated
with the original object will be passed though unmodified.
NOTE on frequency slicing:
The framequery option support for downsampling syntax
Adc(channel!resample!q!) or Proc(channel!resample!q!):
Adc(H2:LSC_AS-Q!resample!8!)
In the case of resampling, the first field after the channel name contains the
literal string "resample", and the second field contains the downsampling
factor q. (see the
resampling algorithm
documentation.)
It is intended that resampling be applied after data has been time
sliced based on the time range part of the framequery option.
Note that the only supported resampling factors are 2, 4, 8,
and 16.
The -framequery option by-the-numbers:
The first element of the complex -framequery option is a
list of the type of the frames which should be retrieved.
The type refers to the frame attribute referenced by the second
field in the proposed frame spec. This filed defaults internall to
R, or "Raw".
The second element of the complex -framequery option is a
list of the interferomenters which produced the data that is wanted.
A separate output container will be created for each interferometer
specified.
The third element of the complex -framequery option is a Tcl
list of frame file names or URL's.
The fourth element of the -framequery option, times,
is a Tcl list of GPS timestamps and ranges. The times argument
must have the form a-b where a and
b are valid GPS times in whole numbers of seconds.
Due to historical precedent, this range should be interpreted as a
request for data including the second starting at a
until the end of the second starting at b,
thus the actual interval of time being requested is [a, b+1).
For example, -times 666666666-666666681 represents the equivalent of
specifying the names of 16 1-second frame files, and represents the 17
seconds of data from time 666666666 up to but not including 666666682.
Example:
-framequery { {} H {} {666666666-666666669:allow_gaps 666667660-666667665} full(0)}
Will, with the -allowgaps
option specified, ultimately result in a single frame of 1000
seconds duration.
Note that the times element of the -framequery
option returns the data spanning the range of time from the
top of the starting second to the bottom of the ending second.
The fifth element of the -framequery option supports a
shorthand notation for frame structure accessor methods exposed
to the Tcl layer. The most commonly used methods are the Adc()
and Proc() accessors, which retrieves Adc (time serialised data)
or Proc (time, freq, or time-freq domain) channel structures from frames.The argument to this accessor method can be an integer (referring to
the ith channel) OR the specific name of a channel:
-metadataapi: { API name | tee }
Default: metadata
The metadata results will be inserted into the database by default.
If 'ligolw' is specified e.g. -metadataapi ligolw,
and the -returnprotocol specifies a url, the metadata results will be output in the form of xml
rather than being inserted into the database i.e.
running at a site that does not have a database.
If 'tee' option is specified, e.g. -metadataapi tee, and the -returnprotocol specifies a url
the metadata results will
be output both as an xml and also be inserted
into the database.
Examples of dataPipeline commands:
Simple but complete example which concatenates the gravitational wave
channel data from one hour of frames, sends the concatenated object
to the data-conditioning API, which prepares a table of statistics
of the time series data, and puts this data into the appropriate
database.
This example shows a typical argument list fora dataPipeline user
command:
One or more products are created by the dataPipeline command and appear
in the the user's job output directory. Diagnostic ilwds may be generated
if debugging level is set for some APIs, e.g. setting DEBUG level to 1
in the eventmonAPI cause the multi-dimension ilwd and metadata ilwd, if any,
to be written to the job output directory.
Table created by dc API in ilwd ascii format:
More examples of products:
Optional
Specifying "file" will return the local path to the file
as it is seen by all LDAS API's.
Specifying ftp or http will return a URL
relative to the gateway machine of the local system (which
is the same machine to which user requests are made).
The names used for the output files will be determined
by the system and are generally descriptive of the
content of the files.
The "file" form is used to get the local name of the file
relative to the LDAS installation for use as input data
by subsequent jobs.
This form is not advised when the job is one that is expected
to produce many output files!
In some special cases the suggested filenamr and extension
may be ignored, as when a frame file is being produced, or
an exact ilwd representation of a frame file.
Note that the trailing slash is absolutely required in order
for the URL to be interpreted as a local directory by the
LDAS system.
The URL returned to the user will be of the type specified.
The user will need to run tests to make certain that this
form works with the specified site.
The LDAS writable subdirectory should be defined in the
LDASapi.rsc resource file as the
::GRID_FTP_WRITABLE_SUBDIRECTORY, which will be joined
to the grid home directory when calculating output filenames.
-returnprotocol gridftp
-returnprotocol gridftp://here/there/mydir/mysubdir/
LDAS-TEST1234567_wrapperdata.ilwd
Default: {ilwd binary}
The possible output formats which the system can produce include:
-inputfile: URL
Optional: this option applies when the user desires to have the
wrapperAPI get its input from a file rather than via the
datacondAPI.
The inputfile should be a valid URL but currently full file path is
allowed, e.g.
-inputfile /ldas_outgoing/jobs/ldasmdc/mpi/test/06inspiral/input/c_1.40_1.40_11.00.ilwd
-ingestdata: { port:_null }
Optional: this option applies when there is metadata to be inserted
from the output of eventmonAPI processing.
The default, -ingestdata port:_null, should be used at all times
if included or leave out this option to use the default.
-multidimdatatarget: { API name }
Optional: this option applies when there is multi-dimension ilwd data
from eventmonAPI processing to be converted by the mddapi
to some end-user output, e.g. to LIGO_LW document by the ligolwAPI
or to frames by the frameAPI.
Valid API names are ligolw and frame.
-state: { jobid }
Optional: this option applies when there is state information
from the previous wrapperAPI job run to pass on to the next
wrapperAPI job run.
-multidimoutput: { format URL }
jobid identifies the previous MPI job that has the state information for the next run.
Currently the state is keep on file in the output directory for the job.
-responsefunction: { Full Path to File }
This option defines the format and URL of the multi-dimension data
e.g. LIGO_LW for XML output, frame output and the url for placing the output file.
Optional: Deprecated in favor of -responsefiles, q.v.
-responsefiles: { Full Path to File(s) }
See -responsefiles option below.
Optional: When an external ilwd responsefile(s) is/are required,
the full path to the file is specified by this option. The syntax
allows for some degree of flexibility in the disposition of the
files. See below.
This option is used to specify the location and disposition of
files containing data and/or coefficients which are not provided
by the data as received from the frame API or which cannot be
calculated from the frame derived data by the data-conditioning
API or extracted from the database.
The files referred to by this option can be injected into the data
stream for the job either within the data-conditioning API by the
use of the "push" option, or can be attached to the output of the
data-conditioning API for transmission to the metadata or wrapper
API's by use of the "pass" option. Use of the "push" option
requires that an additional argument in the form of an "alias" for
the data-conditioning API be provided.
Examples:
-responsefiles {
file:/MayMDC/al.ilwd,push,al
file:/MayMDC/bl.ilwd,push,bl
file:/MayMDC/am.ilwd,push,am
file:/MayMDC/bm.ilwd,push,bm
file:/MayMDC/ah.ilwd,push,ah
file:/MayMDC/bh.ilwd,push,bh
file:/MayMDC/resp.bin,pass
}
Here the "push" elements are ilwd data files which are used to
populate the values al, bl, am, bm, ah, and bh. The contents of
the files can then be referenced in the call chain algorithm by
referring to the appropriate variable.
The "pass" element is passed on to the api pointed to by the
-datacondtarget option and is not used in calculations performed by
the data conditioning API.
The purpose of this option is to limit the number of individual
Curl calls
made by the manager API to avoid making the manager very busy managing
hundreds (or thousands of remote file retrievals via http or ftp.
-responsefiles \
http://www.foo.org/bar/baz/data1.ilwd
http://www.foo.org/bar/baz/data2.ilwd
http://www.foo.org/bar/baz/data3.ilwd
http://www.foo.org/bar/baz/data4.ilwd
then the tarball option could be any of:
-tarball http://www.foo.org/tarball.tar.gz
-tarball http://www.foo.org/bar/tarball.tar.gz
-tarballhttp://www.foo.org/bar/baz/tarball.tar.gz
And as long as the internal structure of the tarball is such that
it would unpack and overwrite the files individually referenced
otherwise, the user command will succeed!
Optional: Number of nodes requested for MPI calculations,
defaults to 3, and installation specific limits may be enforced.
Fewer nodes or more nodes than the number requested may be assigned
according to system availability or local rules regarding the
allocation of nodes. By default, only one user may have job
processing occurring on a single node, this can be disabled by
setting the global mpi API variable ::MPI_MULTIPLE_NODES in the mpiAPI resource
file LDASmpi.rsc to "0".
Setting that value to zero is probably not what is wanted in most
cases.
The relative or absolute path to the dynamically loaded shared
object library containing the wrapperAPI search algorithms which
will be used to analyze the data.
When the path is relative (a bare filename IS a relative path),
the specified location is assumed to be relative to the directory
defined in the LDASmpi.rsc file as the variable
::DYNLIB_DIRECTORY.
A path beginning with a "/" will ALWAYS be taken as an
absolute path to the dynamic library and no attempt will be made
to match a file in the ::DYNLIB_DIRECTORY when a
leading "/" is provided.
The hostname and port number of the computer that the mpiAPI
runs on; the wrapperAPI will connect to the port on this host
to communicate with the mpiAPI.
Normally specified in the resource file for the mpi API as the
global variables ::MPIHOST and ::MPIPORT.
It may be specified in the argument list of the user command for
development or debugging purposes only.
The hostname and the port number of the platform on which the LDAS API providing
data to wrapperAPI runs on, e.g. data conditioning API host and port; the wrapperAPI
will connect to this port on this host to receive the data.
Normally specified in the resource file for the mpi API as the
global variables ::DATAHOST and ::DATAPORT.
It may be specified in the argument list of the user command for
development or debugging purposes only.
The hostname and port number of the platform on which the LDAS API
receiving data from the wrapperAPI resides on, e.g. eventmonAPI's host and port number;
the wrapper will connect to the port on this host to send its output objects.
Depending upon the -returnprotocol option,
which may be set to the name of a known API, this value
can be dynamically calculated by the mpi API. It may be
specified in the argument list of the user command for
development or debugging purposes only.
If this flag is 0 (false), then each frame formatted output from the
data conditioning API will be returned as a seperate frame object.
If this flag is 1 (true), then frame formatted output from the
data conditioning API is combined into a single frame object.
If this flag is 0 (false), then each frame formatted output from the
event monitor API will be returned as a seperate frame object.
If this flag is 1 (true), then frame formatted output from the
event monitor API is combined into a single frame object.
This option allows the metadata results produced to be inserted
into the database specified instead of the default database (the one
connected to when the metadataAPI starts up). For the list of valid
databases, please refer to the LDAS database link at your site.
e.g.
The databases for site ldas-dev.ligo.caltech.edu lists
the default database is cit_test. Using a -database cit_1 option in the
pipeline command will direct the insertion to the database cit_1
instead of cit_test.
Default: LIGO_LW
This option allows specification of base64 encoding for
Vector data in LIGO_LW documents, e.g. generated from frame ilwds. It only applies if
such data is present, otherwise base64 has no effect even if
it is specified.
List of options to pass to the shared object code specified
by the -dynlib option. The option list in this
case is not a Tcl list, but a cslist, or "comma
separated list" of values.
The syntax of the -filterparams argument is specific to the
search code being exercised. This example is from the original
ldasinspiral.so:
Argument Description type
numCoarseExch Number of coarse templates to exchange int
numPoints Number of data points in a segment int
numSegments Number of overlapping data segments int
numChisqBins Number of frequency bands for chisq veto int
deltaT Sampling interval float
ovrlap Overlap betweeen segments (# of points) int
invSpecTrunc Duration of inverse spectrum in time domain int
fLow Low frequency cut-off in inverse spectrum float
rhosqThreshold threshold for SNR float
chisqThreshold threshold for chisqr float
numTmplts number of templates int
(m1,m2;...) list of templates see below
The arguments are a comma delimited list that the wrapper parses and passes to
the shared object as strings. The first 12 arguments are simple floats or ints
that the shared object calls atof() or atoi() on.
The last argument is the list of templates. The wrapperAPI baseline
requirements state that an argment that is enclosed in paranthesis is passed
to the shared object as a single string. Thus the list of masses is passed as
a string to the shared object and parsed there.
Each template is a pair of masses m1,m2 and each template is separated by a
semicolon.
Thus a valid command line to wrapperAPI is (formatted with line-breaks for readability):
wrapperAPI
-mpiAPI="(marfik.ligo.caltech.edu,10000)"
-nodelist="(1-2)"
-dynlib="/home/duncan/lib/lalwrapper/libinspiral.so"
-dataAPI="(datahost,5678)"
-resultAPI"=(reshost, 9101)"
-filterparams="(3,1048576,1,8,0.00012207,0,0,5.0,100.0,2.1,69.0,6,(1.0,1.0;1.4,1.4;2.0,2.0;2.2,2.2;2.4,2.4;5.5,5.5))"
-nodeDutyCycle=2
-realTimeRatio=0.9
-doLoadBalance=FALSE
-dataDistributor=W
-jobID=8
-inputFile="event.ilwd"
This list of templates parsed by the shared object would be
tmplt number m1 m2
0 1.0 1.0
1 1.4 1.4
2 2.0 2.0
3 2.2 2.2
4 2.4 2.4
5 5.5 5.5
The so checks that the number of templates in the string equals
the numTmplts argument and aborts if it doesn't.
This option sets user defined tag used to group related jobs
when running database queries.
The desired ratio of the time required to process the data to the
time contained within the data segment, e.g. a value of 0.90 would
request 54 seconds be used to analyse 60 seconds worth of data.
Defaults to 1.0.
This option sets desired memory usage for each beowulf node
assigned to the job. The value is a ratio of maximum memory
amount used by the process to the total available physical memory.
Defaults to 1.0.
Determines whether the job can be dynamically load-balanced by
adding or subtracting nodes while it is running.
Defaults to FALSE
The method for distributing input data from master to slave:
Defaults to SEARCHMASTER
The method of specifying the output data object structure used by the filter algorithm:
Defaults to ALWAYS
Optional: defaults to datacond
The argument of the -frametarget option is the name of a known LDAS
API. The result output from the frame API will be registered
with the named API to be used as input data for the code with
the same job i.d. that the data is tagged with.
Optional
The -inputprotocol option is used to load data into the conditioning
API from files or URL's instead of, or in addition to data from the
frame API.
Default: "Data inserted ok."
String to be used as the subject for the e-mail returned by the
system on completion of the job.
Default: 0
When the -autoexpand flag is set, the -aliases and -algorithms
options will be used as patterns that will be expanded once for
each object recieved from the frame API.
The -aliases options used with -autoexpand will be applied in
combination with the -algorithms option to all ingested channel
data.
Example of -aliases and -algorithms options that may be auto-expanded:
-aliases sftN
-algorithms {output(sftN,,,sftN,sftN data)}
The string sftN will be replaced with the name of
each ingested frame channel, once for each channel ingested.
Implications and side-effects of auto-expansion have not been
explored in detail. There is a possibility that some combinations
of data and -aliases and -algorithms options will produce unexpected
results. Simple applications of auto-expansion will present less
opportunity for unexpected behaviour.
Provides assignment of aliases to variables ingested by the
dataconditioning API.
conditionData
-framequery { R H {} 693960000-693960001 Adc(H2:LSC-AS_Q) }
-aliases { gw = LSC-AS_Q; }
-algorithms {
gw2 = resample(gw, 1, 8);
output(gw2,, gw2.ilwd, gw2, Downsampled gw channel);
}
-datacondtarget datacond
Note that when using output() to create
frame formatted results, that the name
field should not contain the backslashes as used in the datacond
API variable names. Datacond API variable names are derived from
frame channel names, but they are not identical! Frame channel
names do not have backslashes protecting the colon after the
interferometer i.d.
Example:
Required
The algorithms option is constructed from a series of mathematical
"actions" which are defined within the data conditioning API.
These "actions" are entered in the form of a semi-colon delimited
list.
The "action" calls are used to develop complex algorithms by
"chaining" multiple actions which are evaluated from left to right.
The results of actions can be assigned to variables or "printed"
out using a special action: output(), provided for that
purpose.
Default: wrapper
Default: datacond
The API which the metadata API should send it's results to.
The default option is datacond but mpi should
be used if the results are to be processed by the wrapperAPI.
If this is set to metadata the output will be dumped
to file and the data pipeline will end here.
Default: {}
This option is used to
inject metadata into the data pipeline. The first element
in the -dbquery option list must be a valid SQL statement
which results in the return of exactly one database object.
The second element must be either push or pass.
When the second element is push, the element will be made available
in the call chain of the data conditioning API using the third element of
the -dbquery option, alias, as it's variable name.
When the second element is pass, the element will be passed through
the data conditioning API and will not be available for use in the call
chain.
To enter more than 1 query, enclose each one as a Tcl list,
e.g. {{{sql1} push1 alias1} {{sql2} push2 alias2} ...}
Each result ilwd will be sent to the target specified by -metadatatarget option
e.g. datacondAPI.
Example of 2 arbitrary queries
-dbquery {{select * from sngl_inspiral fetch first 2 rows only} pass dbquery}
-dbquery {{{select * from sngl_inspiral fetch first 2 rows only} pass dbquery_1}
{{select * from sngl_burst fetch first 2 rows only} pass dbquery_2}}
Default: {}
This option is used to inject metadata into the data pipeline
for the datacondAPI to pass to the wrapperAPI only. It is
identical to the -dbquery option except the data selected
should only be types valid for the wrapperAPI, e.g. currently
wrapperAPI does not support database blobs (binary large objects)
and clobs (character large objects) so these types would be invalid
for this option.
The first element
in the -dbquery option list must be a valid SQL statement
which results in the return of exactly one data object.
The second element must be pass or {}
for the element to be passed through
from the data conditioning API to the wrapperAPI.
To enter more than 1 query, enclose each one as a Tcl list,
e.g. {{{sql1} {} {}} {{sql2} {} {}} ...}
-dbntuple { {{sql} {} {}} {{sql} {} {}} }
For example.
Default: {}
This option is used to extract spectra data into the pipeline.
The option is provided in
the form of a Tcl list of these nine elements:
Example of a spectrum query: (without embedded blanks in fields)
-dbspectrum {channel={optimally filtered cross-correlation spectrum}
spectrum_type={IFO IFO differential mode cross correlation spectrum for stochastic background search}
start_time=681932000
start_time_ns=93750000
start_frequency=0
delta_frequency=32
spectrum_length=9
pushpass=push
alias=spectrum}
-dbspectrum {channel=H2:LSC-AS_Q
spectrum_type=Welch
start_time=680938500
start_time_ns=0
start_frequency=-512
delta_frequency=.0625
spectrum_length=16384
pushpass=push
alias=spectrum}
Each result ilwd will be sent to the target specified by -metadatatarget option
e.g. datacondAPI.
-dbspectrum { { channel={optimally filtered cross-correlation spectrum}
spectrum_type={IFO IFO differential mode cross correlation spectrum for stochastic background search}
start_time=681932000
start_time_ns=93750000
start_frequency=0
delta_frequency=32
spectrum_length=9
pushpass=push
alias=spectrum1}
{ channel=H2:LSC-AS_Q
spectrum_type=Welch
start_time=680938500
start_time_ns=0
start_frequency=-512
delta_frequency=.0625
spectrum_length=16384
pushpass=push
alias=spectrum2}}
Default: {}
This option is provided for the generation of a quality channel
based on data retrieved from the database.
Multiple quality channels may
be generated by listifying successive argument groups as
demonstrated in the example. The option is provided in
the form of a Tcl list of these 8 elements:
SELECT start_time as StartSec, start_time_ns as StartNanoSec, end_time as
StopSec, end_time_ns as StopNanoSec
FROM SEGMENT
WHERE ( ( Start_Time >= 680895416 and Start_Time <= 680896086 ) OR ( end_time <= 680896086 AND end_time >= 680895416 ) ) AND ( segment_group LIKE 'H2:%' )
SELECT start_time as StartSec, start_time_ns as StartNanoSec, end_time as
StopSec, end_time_ns as StopNanoSec, substr(segment_group, 1, 2) AS IFO
FROM SEGMENT
WHERE ( ( Start_Time >= 680895416 and Start_Time <= 680896086 ) OR ( end_time <= 680896086 AND end_time >= 680895416 ) ) AND ( segment_group LIKE 'H2:%' )
-dbqualitychannel {{ 1024 680896086 0 680896086 0
{
SELECT start_time as StartSec, start_time_ns as StartNanoSec, end_time as
StopSec, end_time_ns as StopNanoSec
FROM SEGMENT
WHERE ( ( Start_Time >= 680895416 and Start_Time <= 680896086 ) OR ( end_time <= 680896086 AND end_time >= 680895416 ) ) AND ( segment_group LIKE 'H2:%' ) }
push qc1}
{ 1024 680896086 0 680896086 0 {
SELECT start_time as StartSec, start_time_ns as StartNanoSec, end_time as
StopSec, end_time_ns as StopNanoSec, substr(segment_group, 1, 2) AS IFO
FROM SEGMENT
WHERE ( ( Start_Time >= 680895416 and Start_Time <= 680896086 ) OR ( end_time <= 680896086 AND end_time >= 680895416 ) ) AND ( segment_group LIKE 'H2:%' ) }
pass qc2}
}
Default: 1
Setting this option to '1', or not declaring it explicitly (using the
default value) causes data to be serialised across frame boundaries
so that, for example, a 160 second long run of data will be packed
into a single channel object if ten 16 second frames satisfy the time
range specified in the -framequery option.
Setting the -concatenate option to '0' will cause one output object
for each channel requested from each frame satisfying the time range
specified. This is not very useful when the data required is time
domain data, but could be very useful if frequency domain data is expected
to be returned based on a time-domain request.
The frame API will normally be required to concatenate time serialised
channel data for conditioning, consequently this option will not
normally need to be explicitly declared.
Default: 0
This option applies gap-filling logic GLOBALLY over all
-framequery times specified by the user. It causes all elements
of a time range specification to be combined into a single long time range
which includes gaps due to missing data files and gaps
intentionally introduced by manipulating the format of the 'times'
spec.
-framequery { {} H {} { 600000000-600000004 600000008 } Adc(0) }
Would cause an induced gap to occur between 600000004 and 600000008
for( GPSTime t = GPSStart; t < GPSEnd; t += delta_t )
{
// Do something interesting
}
<ilwd name='LDAS_History'>
<ilwd name='gap_info' size='5'>
<int_4u name='fill_method'>0</int_4u>
<int_4u dims='4' name='time_range'>688011799 999511718 688011801 0</int_4u>
<int_4u dims='4' name='time_range'>688011802 0 688011803 0</int_4u>
<int_4u dims='4' name='time_range'>688011804 0 688011805 0</int_4u>
<int_4u dims='4' name='time_range'>688011808 0 688011809 999999999</int_4u>
</ilwd>
</ilwd>
The -framequery option is REQUIRED for getFrameData,
getFrameElements, and concatFrameData user commands, but is OPTIONAL
for conditionData, dataStandAlone, and dataPipeline user commands.
The -framequery argument is a complex list of frame
API query atoms.
The query atoms consist of unique identifying strings (which are
not case sensitive) with indices or channel names grouped
in parentheses. The query atom strings consist of the unique
parts of the accessor function names from the frame API c++ code.
(See: frameAPI.so)
In the simplest case, a single frame repository containing
frames from more than one instrument can be queried to retrieve
a common time range from each specified interferometer:
This query will return the data for Adc 0 from both Hanford and
Livingston over the gps time period 600000000 to 600000007.
This query will return the data from both Hanford and Livingston
over the gps time range 600000000 to 600000007 for the H2 and L1
so called gravitational wave channels.
The "slicing" syntax allows a subset of an Adc or Proc data channel
to be obtained. Slicing is performed by appending two floating-point
numeric arguments, the start and range, to the channel
specification (the fifth field of the framequery),
delimited by !'s. The start is the absolute
starting position of the slice along the x-axis of the channel,
and the range is the extent of the slice.
Valid [type] specifications are TIME, FREQ, and
TIMEFREQ. When no type specifier is given, time is assumed,
and this may generate errors if frequency series data is found.
The units of the both numbers are taken from the unitX field
for the specified channel in the frame file when the data is operated
on by an another API.
This is usually seconds for time-series data and Hertz for
frequency-series data.
The ILWD object containing this data slice will have the correct
calculated start-time and times-span for this slice of data, as well as
the correct number of data points in the data array.
The directive Adc(CAL-CAV_GAIN!0!2048.0001FREQ!)
is interpreted by LDAS as
all frequencies >= 0 and < 2048.0001.
Note the strict inequality for the upper limit. The frequency
of a bin is determined by:
f_k = startX + k*df, k = 0, 1, ..., N-1
so you just need to make sure your request is consistent with this
scheme. The reason some users are adding 0.0001 is becuase they have
f-sequences that have a bin that they want at EXACTLY 2048 Hz (say),
so they need to specify an upper limit above this ie. the upper
limit must be strictly greater than 2048 and less than or equal to
2048 + df Hz. If adding 0.0001 works for you fine, but that's obviously
not the only choice.
This might seem a strange convention but the reason it's chosen is:
[ ... ) [ ... ) [ ... )
Suppose we are asking for gravity wave channel data over an
hour, but want to downsample the data by a factor of 8
before it is sent to the dataconditioning API:
As of the 0.2.0 release of LDAS, this has been imperfectly implemented,
and it is important that users wishing to make effective use of the
resampling syntax provide feedback to the developers via the problem
tracking system:
Another example of a frame type would be mT, or "Minute Trend".
separate output container will be created for each interferometer
specified.
This option is generally only used when a single specific input frame
file (or a single file from, say each of two interferometers) is
needed, and is provided primarily for the application of data
conditioning or pipelining of test data in the form of frame files.
This option will become the mecahnism for the reading of calibration
or other process data from frame files at a later time.
The syntax supports a gap filling flag:
666666666-666666681:allow_gaps which the LDAS system recognises
as an indication that it should allow missing frame files, and to account
for them in the data conditioning stage of the user command by filling
in missing data points as specified by the
fillgaps action in the
algorithms option.
Note that the -allowgaps option to the
dataPipeline user command overrides all the syntactical subtleties
implied here and will cause a single data segment spanning all
specified time ranges to be created.
Adc(12)
Adc(H2:LSC-AS_Q)
An argument of "full()" as an item in the
framequery option list will result in either a copy of the
frame if the return format is specified as "frame",
a full text dump of the frame if the return format is
specified as "ilwd", and a full XML dump if the
format is specified as LIGO_LW.
dataPipeline
-dynlib /ldcg/lib/lalwrapper/libldasinspiral.so
# the filterparams are specific coefficients relative
# to the specified shared object.
-filterparams (1048576,1,0,40.0,0,69.0,8,(64.0,0.0),(3.0,0.0),1,1,1,2,(1.400000,1.400000/1.800000,1.800000))
-aliases {gw=_ch0;darm=_ch1}
-algorithms {
rdarm = resample(darm, 1, 8);
ldarm = linfilt(bl, al, rdarm);
output(ldarm,_,mpi,darm,low bandpass);
rx = resample(gw, 1, 8);
output(rx,_,mpi,rx,resampled gw timeseries);
p = psd(rx,1048576);
output(p,_,_,psd,psd of resampled timeseries);
}
-responsefiles {
file:/MayMDC/al.ilwd,push,al
file:/MayMDC/bl.ilwd,push,bl
file:/MayMDC/am.ilwd,push,am
file:/MayMDC/bm.ilwd,push,bm
file:/MayMDC/ah.ilwd,push,ah
file:/MayMDC/bh.ilwd,push,bh
file:/MayMDC/resp.bin,pass
}
# 1024 second data segment
# the gravitation channel and an arm control channel
-framequery { {} H {} 658021000-658022023 Adc(H2:LSC-AS_Q,H2:LSC-DARM_CTRL) } }"
<ilwd size='3'>
<ilwd name='processgroup:process:table' size='10'>
<lstring name='processgroup:process:program' size='11'>datacondAPI</lstring>
<ilwd name='processgroup:process:process_id'>
<char dims='20'>process:process_id:0</char>
</ilwd>
<lstring name='processgroup:process:version' size='11'>ldas-0.0.15</lstring>
<lstring name='processgroup:process:cvs_repository' size='27'>ldas/api/datacondAPI/so/src</lstring>
<int_4s name='processgroup:process:cvs_entry_time'>0</int_4s>
<int_4s name='processgroup:process:start_time'>666000000</int_4s>
<int_4s name='processgroup:process:is_online'>0</int_4s>
<lstring name='processgroup:process:node' size='7'>datacon</lstring>
<lstring name='processgroup:process:username' size='4'>ldas</lstring>
<int_4s name='processgroup:process:unix_procid'>2378</int_4s>
</ilwd>
<ilwd name='process_paramsgroup:process_params:table' size='5'>
<lstring name='process_paramsgroup:process_params:program' size='11'>datacondAPI</lstring>
<ilwd name='process_paramsgroup:process_params:process_id'>
<char dims='20'>process:process_id:0</char>
</ilwd>
<lstring name='process_paramsgroup:process_params:param' size='6'>step-0</lstring>
<lstring name='process_paramsgroup:process_params:type' size='6'>action</lstring>
<lstring name='process_paramsgroup:process_params:value' size='8'>all(raw)</lstring>
</ilwd>
<ilwd name='summ_statisticsgroup:summ_statistics:table' size='15'>
<lstring name='summ_statisticsgroup:summ_statistics:program' size='11'>datacondAPI</lstring>
<ilwd name='processgroup:process:process_id'>
<char dims='20'>process:process_id:0</char>
</ilwd>
<int_4u name='summ_statisticsgroup:summ_statistics:start_time'>666000000</int_4u>
<int_4u name='summ_statisticsgroup:summ_statistics:start_time_ns'>0</int_4u>
<int_4u name='summ_statisticsgroup:summ_statistics:end_time'>666003600</int_4u>
<int_4u name='summ_statisticsgroup:summ_statistics:end_time_ns'>0</int_4u>
<int_4u name='summ_statisticsgroup:summ_statistics:samples'>58982400</int_4u>
<lstring name='summ_statisticsgroup:summ_statistics:channel' size='39'>H2\:LSC-AS_Q::AdcData:666000000:0:Frame</lstring>
<real_8 name='summ_statisticsgroup:summ_statistics:min_value'>-1.5082244873046875e+03</real_8>
<real_8 name='summ_statisticsgroup:summ_statistics:max_value'>5.8331585693359375e+02</real_8>
<real_8 name='summ_statisticsgroup:summ_statistics:mean'>1.1476757515331777e+00</real_8>
<real_8 name='summ_statisticsgroup:summ_statistics:rms'>1.7742118277333952e+02</real_8>
<real_8 name='summ_statisticsgroup:summ_statistics:variance'>3.1477599350458338e+04</real_8>
<real_8 name='summ_statisticsgroup:summ_statistics:skewness'>-1.9985680110792783e+00</real_8>
<real_8 name='summ_statisticsgroup:summ_statistics:kurtosis'>1.3436855850203969e+01</real_8>
</ilwd>
</ilwd>
Frame File Naming Convention
When the frame cache consists of a mixed collection of frames in
an inconsistent hierarchy of subdirectories the appropriate frame
file(s) for fulfilling a given request are determined by parsing
the frame file names according to an installation-specific naming
convention.
The LDAS system provides a default naming convention which is described
in detail in the document
Naming Convention for Frame Files Which Are to be Processed by LDAS
.
where
Examples:
Examples of the -metadataai option:
In summary, the required format of frame-file names is
16 seconds of raw data from Hanford.
1 minute of second-trend data from Livingston.
16 seconds of Level 1 reduced data from Hanford.
Subject: BOX-III98872 tfclu online running on H2:LSC-AS_Q Inserted
1 rows into ldas_tst database table process Inserted 14 rows into ldas_tst database
table process_params Inserted 1330 rows into ldas_tst database table sngl_burst
Inserted 1 rows into ldas_tst database table search_summary
Subject: BOX-III98856 tfclu online running on H2:LSC-AS_Q Your results: tfcluste--rs_result.xml
can be found at: http://131.215.114.23/ldas_outgoing/jobs/BOX-III_9/BOX-III98856
Subject: BOX-III98809 tfclu online running on H2:LSC-AS_Q Inserted 1 rows into ldas_tst database
table process Inserted 14 rows into ldas_tst database table process_params
Inserted 1330 rows into ldas_tst database table sngl_burst Inserted 1 rows into ldas_tst database
table search_summary Your results: tfclusters_result.xml can be found
at: http://131.215.114.23/ldas_outgoing/jobs/BOX-III_9/BOX-III98809
Return to Top