How to use the ldasjob package, version 2.6

Peter Shawhan
January 18, 2005

Contents

Introduction
Overview of the ldasJob Tcl library
Notes on writing scripts using the ldasjob library
Detailed information about LJrun
  Ways to specify the LDAS manager , username
  Other options: -log , -email , -nowait
Handling errors
Getting information about a job / Table of job info array elements
Retrieving output files (LJread , LJcopy , LJreaddir )
Providing an input file to an LDAS job
Executing an external program
Running a sequence of LDAS jobs
Running LDAS jobs asynchronously
Saving and restoring job information
Tcl language resources / on-line documentation for Tcl commands
Other useful packages
Under the hood: the LDASJobH "helper process"


Return to top

Introduction

The LIGO Data Analysis System (LDAS) is based on a client-server model, in which clients anywhere on the Internet send "user commands" over the network to the LDAS managerAPI, which serves as the single point of contact for all LDAS operations at a given site. Submitting an LDAS job is (for now, at least) simply a matter of opening a socket connection, sending the text of the "user command", and receiving a response with the job ID assigned to the job, after which the socket is closed. When the job finishes, an email message is sent to the user at the address he/she specifies in the user command. All of this can be accomplished with a reasonably short script in a high-level language such as Tcl or perl. For example:
#!/usr/bin/env tclsh

# Set the host and port of the LDAS manangerAPI
set hostName "ldas.ligo-wa.caltech.edu"
set port 10001

# Set the user command
set ldasCmd "ldasJob\
{-name myname -password blah -email me@ligo.caltech.edu}\
{getMetaData -returnprotocol http:out.xml -outputformat LIGO_LW\
-sqlquery {select tabname from syscat.tables} }"

#---- Code below here is the same for all jobs ----

# Open a socket to the LDAS managerAPI
if { [catch {socket $hostName $port} sockId] } {
    puts "Unable to connect to LDAS manager"
    exit
}

# Send the LDAS command
puts $sockId $ldasCmd
flush $sockId

# Get the response from the managerAPI and print it out
set jobInfo [read $sockId]
puts $jobInfo

# Close the socket connection and exit
close $sockId
The script above starts the LDAS job and then exits, without waiting for the LDAS job to finish. It is admirably straightforward, but there are some drawbacks: The first two drawbacks can be overcome rather easily (e.g. by reading the managerAPI address and/or LDAS username and password from a file, as has been done by a number of people in a few different ways). However, the last two are more fundamental.

To address these issues and more, the ldasjob LIGOtools package has been designed to provide a more robust, flexible, and user-friendly interface for running LDAS jobs from within scripts. Features include:


Return to top

Overview of the ldasjob Tcl library

The ldasjob package provides a library of high-level Tcl procedures to execute LDAS jobs, and a simple mechanism to retrieve information about jobs that have executed. Even if you have no prior experience with Tcl, the amount of Tcl you need to know to run LDAS jobs is rather small, so that the examples in this documentation should be enough to get you going. The user interface is designed to be as simple as possible, although there is some rather sophisticated stuff going on beneath the surface. The package provides certain advanced capabilities, such as automatic transmission of input files to LDAS and the ability to run LDAS jobs in parallel, with a minimum of user code.

Before describing the ldasjob library routines in detail, it is worth considering a few examples.

A simple example

Using the ldasjob library, the example script from the Introduction can be rewritten as follows:
#!/usr/bin/env tclshexe
package require ldasjob

LJrun job1 -manager lho {
    getMetaData -returnprotocol http:out.xml -outputformat LIGO_LW
    -sqlquery {select tabname from syscat.tables}
}

puts $job1(jobInfo)
The first line of the script, #!/usr/bin/env tclshexe, has the effect of starting up the LIGOtools version of the Tcl shell interpreter to execute the rest of the script. The second line of the script, package require ldasjob, loads library code to define several new Tcl procedures. One of these is LJrun, which takes arguments to indicate the LDAS user command to be executed and the LDAS manager to which it should be sent (lho in this case, which is acceptable shorthand for the LDAS system at LIGO Hanford Observatory), and associates the job with a "tag" (job1). Note that only the "core" LDAS user command is specified; the library reads the user's LDAS username and password from the file ~/.ldaspw, where they were previously stored using the ldaspw utility. There also is no email address in the script; the LJrun command simply "blocks" while the LDAS job runs, and returns when the LDAS job finishes, without any email being sent to the user. Finally, this example script executes puts $job1(jobInfo) to print out the message from LDAS containing information about the job which was submitted.

A more realistic example

The following script takes advantage of a few more of the features of the ldasjob library:
#!/usr/bin/env tclshexe
package require ldasjob

set table "syscat.tables"

LJrun job1 {
    getMetaData
    -returnprotocol http:out.xml
    -outputformat LIGO_LW      # This is "LIGO lightweight" format,
                               # which is based on XML
   # -sqlquery {select tabname,tabschema from $table}
    -sqlquery {select tabname from $table}
}
if $LJerror {
    puts "LDAS job failed!  Error message is:"
    puts $job1(error)
    exit 1
}

puts "LDAS job $job1(jobid) succeeded.  Reply message from LDAS:"
puts $job1(jobReply)
The most significant new feature in this example script is the test to see whether the LDAS job succeeded or failed, by checking the value of the LJerror variable after executing the job. If the job succeeded, then the "reply" message from LDAS (i.e. the message normally sent by email, stating that the job has finished and giving the location of the output, if any) is printed.

In this example, the LJrun command does not specify the LDAS manager to which the job should be sent; therefore, this will be determined from the environment variable LDASMANAGER, or by using the '-manager' command-line option when executing the script. This allows you to submit your job to one LDAS installation or another without modifying your script.

This example also shows that the LDAS user command, enclosed in curly braces, can contain extra spaces and newlines, as well as comments beginning with the "#" character. These comments are removed, and all newlines are converted to ordinary spaces, before the user command is actually sent to LDAS.

Finally, note that Tcl variable substitution (in this case, for '$table'), as well as command substitution (inside square brackets), is performed on the LDAS user command before it is submitted.


Return to top

Notes on writing scripts using the ldasjob library

A script which uses the ldasjob library should normally begin with the line:
#!/usr/bin/env tclshexe
This takes advantage of the standard unix env utility to start the tclshexe shell without having to specify its location, as long as it is located somewhere in the user's PATH. Thus, the script may be copied from one unix cluster to another and run without modification, even if tclshexe is installed in a different place. (Of course, this trick requires the env utility to be in /usr/bin, which is the case for all unix systems that I am familiar with.)

You should definitely use the tclshexe shell, which is part of LIGOtools, rather than the ordinary tclsh shell which might be installed on your computer. tclshexe is a fully functional version of the Tcl shell which has been specially compiled to automatically check for Tcl libraries (such as the ldasjob library) in $LIGOTOOLS/lib. In the future, it might also contain LIGO-specific extensions to the core Tcl language, for instance to implement encrypted communications for submitting LDAS jobs.

The ldasjob library may be loaded in either of two ways, differing only in capitalization:
package require ldasjob
or
package require LDASJob

When the ldasjob package is loaded, it does a few things with command-line arguments (if any) and environment variables for the convenience of the script author:


Return to top

Detailed information about LJrun

Syntax summary

LJrun <job_tag> [<options>] <LDAS user command>
Note that the job tag must be the first argument to LJrun, while the LDAS user command must be the last argument. The available options are:
  -manager <LDAS manager>
  -user <LDAS username>
  -log
  -log <command>
  -email <email address>
  -email
  -email <host>:<port>
  -nowait
If the LDAS job finishes successfully (or if LJrun was called with the -nowait option and the job was submitted successfully), then LJrun sets LJerror=0 and returns the LDAS job ID. If the LDAS job fails, then LJrun sets LJerror=1 and returns the error message. If a software error occurs, then LJrun sets LJerror=1, returns the error message, and generates a Tcl error condition which causes the user script to terminate (unless caught and handled).

The job tag

LJrun associates the LDAS job with a "tag" that you specify. A global variable of the same name is created, which you use to access information about the job, as described elsewhere in this documentation. (If you call LJrun from within a procedure, the job tag global variable is brought into scope in that procedure and all of its parent procedures.) The tag can be almost any string, as long as it is not the name of another global variable. Each job that you submit using LJrun must have a distinct tag, although it is possible to "delete" a job (using LJdelete <job_tag>) and then re-use its tag.

The LDAS user command

Usage information for LDAS user commands is available on the web by following a link on the LDAS home page. As mentioned earlier, only the "core" part of the user command should be passed to LJrun (not the initial "ldasJob" or the username/password/email information that appears in the LDAS documentation). The user command should be enclosed in curly braces. It may contain extra spaces and newlines as desired for readability. In addition, any text following a # character, up to the end of the line, is removed before the command is sent to LDAS; this may be used to annotate the user command or to comment out parts of it.

Tcl variable substitutions (indicated by dollar signs) and command substitutions (indicated by square brackets) are performed before the user command is sent to LDAS. These substitutions are performed in the scope of the routine which calls LJrun. For example, the following script retrieves 20 seconds of frame data for one Hanford channel:

#!/usr/bin/env tclshexe
package require ldasjob

set channel H2:LSC-AS_Q
set time1 693960025
set length 20

# Calculate the end time for the query.  Note that the LDAS convention
# is to INCLUDE a full second of data beginning at the end time
# specified by the user.  Thus, to get exactly $length seconds of
# data, we must add ($length-1) to the start time.
set time2 [expr $time1+($length-1)]

LJrun datajob -manager lho {
    getFrameData
    -returnprotocol http:out.gwf
    -outputformat frame
    -framequery { R [string index $channel 0] {} $time1-$time2 Adc($channel) }
        # "[string index $channel 0]" returns the first letter of the
        # channel name, i.e. the detector site code
}
if $LJerror {
    puts "LDAS job failed!  Error message is:"
    puts $datajob(error)
    exit 1
}

puts "LDAS job $datajob(jobid) succeeded.  Output files are:"
puts $datajob(outputs)
Note that backslashes appearing in the user command are not treated as special characters. For instance, if your user command contains the string H2\:LSC-AS_Q::AdcData:693960025:0:Frame, the backslash will be retained when the user command is sent to LDAS. (A corollary is that you cannot suppress variable substitution by preceeding the dollar sign with a backslash, nor can you suppress command substitution by preceeding the square bracket with a backslash. I think this is OK, since I don't know of any situation in which the user command sent to LDAS should contain a dollar sign or square bracket; let me know if you encounter such a situation.)

If you prefer, you may store the LDAS user command in a Tcl variable and then just pass this variable as an argument to LJrun. Variable and command substitutions are still performed (at the time that LJrun is called) in this case. For example:

...
set ldascmd {
    getFrameData
    -returnprotocol http:out.gwf
    -outputformat frame
    -framequery { R [string index $channel 0] {} $time1-$time2 Adc($channel) }
        # "[string index $channel 0]" returns the first letter of the
        # channel name, i.e. the detector site code
}
LJrun datajob -manager lho $ldascmd
...

Return to top

Ways to specify the LDAS manager

The string that you use to specify the LDAS manager takes one of three forms:

There are four mechanisms for communicating this string to LJrun. From highest to lowest precedence:

If none of these mechanisms is used, an error occurs. Note that if you call LJrun with the -manager flag, but the value after the flag is blank (e.g. an empty string), then the -manager flag will be ignored and one of the other mechanisms will be used instead.

Caveat: due to the fact that the library creates an independent "helper process" to take care of communicating with LDAS, the thing that matters is the value of the LDASMANAGER environment variable at the time your script makes its first call to LJrun. Changing LDASMANAGER after this point will have no effect on subsequent jobs. If your script needs to submit jobs to different LDAS systems, it should call LJrun with the -manager option to specify where each job should be sent.

Return to top

Ways to specify the LDAS username

Normally, LJrun submits the LDAS job using the default username listed in your ~/.ldaspw file. There are two ways to specify a different username to use: you can include -user <username> in the list of arguments to LJrun, or you can set the LDASUSER environment variable to the username you want to use. In any case, the username you specify must be listed in your ~/.ldaspw file, since that is where the ldasjob library reads your (encrypted) password from. To store one or more LDAS username/password pairs in your ~/.ldaspw file, use the ldaspw utility.

Return to top

The -log option

You may want to know the LDAS job ID for a job as soon as it is submitted, e.g. so that you can check on it in the LDAS log files. In version 1.0 of the ldasjob package, this was a little tricky to do, involving the use of the -nowait option to LJrun. This has been made simpler in version 2.0 and later: if you include -log in the list of arguments to LJrun, then a message with the job tag and LDAS job ID is printed to standard output as soon as (and only if) the job is successfully submitted to LDAS.

Alternatively, you can provide your own command after the -log which will be executed instead of printing the default log message. This command is executed in the scope of the routine which calls LJrun, and in principal can be anything, not just a logging command. As a convenience, the command can use "this" to refer to the job info array for the job just submitted, rather than having to use the actual job tag. For example:

  ... -log {puts "Job ID is $this(jobid)"} ...
prints the job ID of the LDAS job which has just been started, while
  ... -log "puts $fid \"Job ID is \$this(jobid)\"; flush $fid" ...
prints the same message to a file opened with file descriptor $fid. (The quoting in the latter example causes $fid to be substituted in the scope of the routine which calls LJrun, before the command is passed to LJrun.)

Return to top

Autonomous job execution: the -email option

Calling LJrun with the -email option causes LDAS to send an email message to the specified address when the job is finished, instead of notifying the ldasjob package. Consequently, LJrun returns as soon as the job has been submitted (since it will not know when the job finishes). Information related to job submission (e.g. the LDAS job ID under which the job is running) may be accessed via the job info array.

If you specify an email address after the -email flag, that address will be used. If you do not specify an email address after the -email flag, then the email address is taken from the LDASEMAIL environment variable; an error occurs if it is not set.

To support certain special applications, you may specify a server socket address (in the form <host>:<port>) instead of an ordinary email address. In this case, when the job is finished, LDAS will connect to this socket and transmit the message rather than sending it by email.

Asynchronous job execution: the -nowait option

Like the -email option, the -nowait option causes LJrun to return control to your script as soon as the LDAS job has been submitted, rather than waiting until it has finished. However, unlike the -email option, the ldasjob library continues to keep track of this job, and you can determine when the job has finished and get the job results (e.g. output files) within the script using the job info array. The use of this option will be described in more detail elsewhere in this documentation.

Return to top

A note about subroutines and variable scope

You can define and use Tcl procedures in your user script, if you wish. The LJerror variable and job info arrays are global variables which you can access from parent routines, as in the following example:
...
proc RunIt {timerange channel} {
    #-- Delete the job tag if it already exists (if not, LJdelete just returns)
    LJdelete datajob

    LJrun datajob -manager lho {
        getFrameData
        -returnprotocol http:out.gwf
        -outputformat frame
        -framequery { R [string index $channel 0] {} $timerange Adc($channel) }
        # "[string index $channel 0]" returns the first letter of the
        # channel name, i.e. the detector site code
    }
}

RunIt 693960000-693960064 H1:LSC-AS_Q
if $LJerror {
    puts "H1 job failed!  Error message is: $datajob(error)"
    exit 1
}
set h1file $datajob(outputs)

RunIt 693960000-693960064 H2:LSC-AS_Q
if $LJerror {
    puts "H2 job failed!  Error message is: $datajob(error)"
    exit 1
}
set h2file $datajob(outputs)
...
However, there is one subtlety: LJerror and job info arrays are automatically brought into scope in parent routines, but not in other subroutines. To access them in other subroutines, you must explicitly bring them into scope with the Tcl "global" command. (It is safe to do this even if they have already been brought into scope.) This is done in the "CheckIt" routine in the following example:
...
proc RunIt {timerange channel} {
    #-- Delete the job tag if it already exists (if not, LJdelete just returns)
    LJdelete datajob

    LJrun datajob -manager lho {
        getFrameData
        -returnprotocol http:out.gwf
        -outputformat frame
        -framequery { R [string index $channel 0] {} $timerange Adc($channel) }
        # "[string index $channel 0]" returns the first letter of the
        # channel name, i.e. the detector site code
    }
}

proc CheckIt {jobname} {
    global LJerror datajob   ;#-- Needed to bring these into scope
    if $LJerror {
        puts "$jobname job failed!  Error message is: $datajob(error)"
        exit 1
    }
}

RunIt 693960000-693960064 H1:LSC-AS_Q
CheckIt H1
set h1file $datajob(outputs)

RunIt 693960000-693960064 H2:LSC-AS_Q
CheckIt H2
set h2file $datajob(outputs)
...


Return to top

Handling errors

Even a flawlessly constructed script will have to deal with the possibility that the LDAS job(s) it submits will fail for some reason. The global variable LJerror is used to indicate LDAS job failures or other errors (e.g. network disruptions) which are not the fault of the software. LJerror is set after a call to LJrun, a read operation on a job info array (except a read of the jobtag element), and after the LJwait and LJfill commands. A value of 0 indicates that the job succeeded (or at least has not failed so far, in the case of a job submitted using the -email or -nowait option), while a value of 1 indicates that the job failed. An example near the beginning of this documentation shows how LJerror can be checked to trigger an error message or whatever.

Errors intrinsic to the software (e.g. syntax or logic errors in the user script, or internal software errors in the library code) will also cause LJerror to be set equal to 1, but more importantly, they will cause a Tcl error condition. Normally, this will cause the user script to terminate with an informative error message and stack trace. It is possible to use Tcl's catch command to ignore such an error or to handle it "gracefully", although it is generally preferable to write the code so as to avoid generating the error in the first place. For example, considering the following code from a user script:

...
LJrun job1 {
    getMetaData -returnprotocol http:out.xml -outputformat LIGO_LW
    -sqlquery {select tabname from syscat.tables}
}
puts "Output http directory is $job1(outputDir)"
...
If the LDAS job fails, then the outputDir element of the array will not be set, and a Tcl error will be generated when the script tries to access it. It would be better to modify the script to make sure that the job succeeded before trying to read that element of the array, e.g.:
...
LJrun job1 {
    getMetaData -returnprotocol http:out.xml -outputformat LIGO_LW
    -sqlquery {select tabname from syscat.tables}
}
if $LJerror {
    puts "Job failed!"
} else {
    puts "Output http directory is $job1(outputDir)"
}
...


Return to top

Getting information about a job

When you use the LJrun command to submit an LDAS job, a global array is created with the same name as the job tag you specify as the first argument to LJrun. You can get information about the job by reading elements of this "job info array". The examples in this documentation have demonstrated the basic syntax for reading an element of an array: if the array name is job1 and you want to read the jobid element, then your code should reference $job1(jobid).

The complete list of array elements is shown below. Element names are case-sensitive. Note that not all elements will be set if a job fails (or if the job was submitted with the -email option), and attempting to read an array element which has not been set will cause a Tcl error condition.

Array element Description
jobtag The user-assigned job tag associated with this job
cwd The current working directory at the time the job was submitted
unixHost The hostname of the computer which submitted the job
unixUser The unix username which submitted the job
startTime A date/time string indicating when the job was submitted
startTimeS The unix system clock value (seconds since 1970) when the job was submitted
command The LDAS user command sent to the manager (after comments have been removed and substitutions have been performed)
user The LDAS username used to submit the job to LDAS
email If LJrun was called with the -email option, then this contains the actual email address sent to LDAS. Otherwise it is not set.
manager The address of the LDAS managerAPI to which the job was sent, in the form "<host>:<port>", e.g. "ldas.ligo-wa.caltech.edu:10001"
managerIP Just the Internet address of the LDAS managerAPI to which the job was sent, e.g. "ldas.ligo-wa.caltech.edu"
managerPort The LDAS managerAPI port number to which the job was sent
ljproxy If an LDAS job proxy server was used as an intermediary when submitting the job, then this array element will contain the address of that proxy server in the form "<host>:<port>". If a proxy server was not used, then this array element will not be set.
inputs A list of input files transmitted to LDAS as part of job execution. If there were no inputs, then this will be an empty list.
jobInfo The full text of the message from LDAS stating that the job is running and giving the job ID
jobid The LDAS job ID, e.g. "NORMAL1234"
jobnum The numeric part of the job ID, e.g. "1234"
LDASVersion The version number of the LDAS software running on the LDAS system to which the job was submitted
status The status of the job, which can be "submitted", "running", "done", or "error".
done Equal to 1 if the job has finished (either successfully or with an error), 0 otherwise. However, if LJrun was called with the -email option, then this element will be set to 1 as soon as the job is submitted and LJrun returns.
error An error message if the job failed, or an empty string if the job succeeded
jobReply The full text of the message sent by LDAS when the job finished
outputs A list of URLs from which the outputs from the job can be retrieved. If the job produced no outputs, then this will be an empty list.
outputDir The http directory in which the outputs are located. If the job produced no outputs, then this array element will not be set.
jobTime The total execution time of the job in seconds, as reported by LDAS
endTime A date/time string indicating when the job finished
endTimeS The unix system clock value (seconds since 1970) when the job finished
metadataTime
etc.
The amount of time the job spent in the metadataAPI, in seconds. Similar array elements may include frameTime, ligolwTime, datacondTime, mpiTime. Times will be reported only for those APIs which were involved in the execution of the job.
errorAPI If the job ended with an error, this indicates which LDAS API flagged the error, e.g. "metadata" for the metadata API. If the job did not end with an error, then this array element will not be set.

Reading any info array element (except the jobtag element) causes the global LJerror variable to be set to 0 if the job succeeded (or at least has not failed so far), or to 1 if the job has failed.

An alternative way to check the status of a job is to use the LJstatus command, which takes a job tag as its argument. [LJstatus job1] is equivalent to $job1(status).

Technical note: in version 1.0 of the ldasjob package, array elements were copied from the helper process only when they were read by the client script, so that executing "array get ..." on a job info array would return only those elements which had already been read. This behavior was changed in version 2.0 of the ldasjob package; now, the job info array is filled as completely as possible after each call to LJrun, LJwait or LJfill. The LJfill command should not generally be needed now, although it could be useful in certain cases. (For example, passing a job tag to LJfill causes LJerror to be set to indicate whether the job succeeded or failed; this could be handy in a script which has to keep track of multiple jobs at the same time.)

LDAS always reports the location of job outputs as URLs with Internet IP addresses. If the client program actually connected to LDAS via a private network, then it may not be able to connect to the web server at the Internet IP address. In this situation, the ldasjob code modifies the URLs in the output and outputDir array elements, replacing the Internet IP address with the private-network address of the manager. This relies on the assumption that the web server is running on the same machine as the manager API.


Return to top

Retrieving output files

Output files from LDAS jobs are written onto a disk on the LDAS system which the user may access via the LDAS web server. The ldasjob package provides a few Tcl functions for convenient access to such files. (Actually, these functions may be used for any files which are available via http, not just the outputs from LDAS jobs.)

Return to top

LJread

This function retrieves the contents of a remote file into a Tcl variable, which allows your script to examine the output from a job that it has just run. It takes one argument, the URL of the remote file. The following script demonstrates its usage:
...
# This assumes that the job is known to have produced exactly one output file,
# so that the "outputs" list contains just one item
set url $job1(outputs)
set contents [LJread $url]
puts "Output file size is [string length $contents] bytes"
...

Return to top

LJcopy

This function copies the contents of a URL to a local disk file. It takes two arguments: LJcopy returns the name of the local file which was created. The following code demonstrates how it may be used to retrieve any number of output files from a job:
...
puts "Job produced [llength $job1(outputs)] output file(s)"

#-- Copy all of the output files to local disk
set destDir "/home/pshawhan/outputs"
foreach url $job1(outputs) {
    set locFile [LJcopy $url $destDir]
    puts "Created file $locFile"
}
...
The example above will terminate the script if an error occurs while retrieving a file. The code inside the foreach loop may be modified as follows to handle errors more gracefully:
...
puts "Job produced [llength $job1(outputs)] output files"

#-- Copy all of the output files to local disk
set destDir "/home/pshawhan/outputs"
foreach url $job1(outputs) {
    if [catch {LJcopy $url $destDir} locFile] {
        #-- An error occurred, so "locFile" now contains the error message
        set errorMessage $locFile
	puts "Error while copying $url: $errorMessage"
    } else {
        puts "Created file $locFile"
    }
}
...

Return to top

LJreaddir

The LJreaddir command reads the contents of an LDAS job's output directory (or of any other remote directory on an Apache web server such as LDAS uses). It returns a Tcl list of items in the directory. If the URL passed to it is not a directory (i.e. it is a regular file), then LJreaddir returns an empty list. The following code shows an example of how it may be used:
...
puts "Job's output directory is $job1(outputDir)"

foreach item [LJreaddir $job1(outputDir)] {
    puts "Found item $item"
    puts "Complete URL for item is $job1(outputDir)/$item"
}
Same thing, but with error handling:
...
puts "Job's output diretory is $job1(outputDir)"

if [catch {LJreaddir $job1(outputDir)} itemlist] {
    puts "Output directory $job1(outputDir) does not actually exist!"
    exit 1
}

foreach item $itemlist {
    puts "Found item $item"
    puts "Complete URL for item is $job1(outputDir)/$item"
}


Return to top

Providing an input file to an LDAS job

Perhaps the most "magical" feature of the ldasjob package is the ability to transparently use a file on your local disk as input to an LDAS job. At any point in an LDAS user command where LDAS will accept an input filename or URL, you may put "%FILE(<local_filename>)", and the ldasjob package takes care of the rest. For example, the following script (called "frame2ilwd") uses LDAS to extract one channel from a frame file on the user's local disk, and copies the output ilwd file back to local disk:
#!/usr/bin/env tclshexe
package require ldasjob

#-- Check whether user specified all needed command-line arguments
if { ${#argv} != 3 } {
    puts "Usage:    frame2ilwd   "
    puts "Example:  frame2ilwd H-657968401.F H0:PEM-LVEA_SEISX myout.ilwd"
    puts "Note: you must either set LDASMANAGER or use the '-manager' option"
    exit 1
}

#-- Run the LDAS job
LJrun job1 {
    concatFrameData
    -returnprotocol http://daq -outputformat {ilwd ascii}
    -framequery { {} {} %FILE($1) {} Adc($2) }
}
if $LJerror { puts "LDAS job error:\n$job1(error)"; exit 3 }

#-- Retrieve the output
set url [lindex $job1(outputs) 0]
set gotfile [LJcopy $url $3]
puts "Retrieved $gotfile"
This feature is provided by the "helper process" that is started when you call LJrun. It replaces the %FILE(...) in the user command with an obscure URL, then acts as a (highly restricted) web server to deliver the file to LDAS when LDAS requests that URL. You can use %FILE(...) as many times as you want in any given LDAS job.


Return to top

Executing an external program

The Tcl exec command allows your script to execute an external shell command or program, e.g. to operate on a file that you downloaded after running an LDAS job. exec returns the standard output from whatever it executes (unless an error occurs). For example, to count the number of lines in a file, you could do:
set info [exec wc $file]
scan $info %d%d%d lines words chars
puts "File $file contains $lines lines"

exec can execute a pipeline, so that another way to count the number of lines in a file would be:

set lines [exec wc $file | cut -c1-8 ]
puts "File $file contains $lines lines"

In general, it is good practice to use "catch" to handle any error which might occur while executing the external program(s). Here is a more careful version of the example just above:

if [catch {exec wc $file | cut -c1-8 } lines] {
    #-- If an error occurs, exec returns whatever was written to stderr
    set errmsg $lines
    puts "Error occurred while counting lines: $errmsg"
} else {
    puts "File $file contains $lines lines"
}


Return to top

Running a sequence of LDAS jobs

The ldasjob package allows you to run any number of LDAS jobs in a single script. For instance, you might run one job, then use some information about that job (e.g. the LDAS job ID) to run a second job, as in the following example:
#!/usr/bin/env tclshexe
package require ldasjob

#-- Run a gravitational-wave burst search
LJrun search {
    dataPipeline
    ...
}
if $LJerror {puts "LDAS error from dataPipeline job:\n$search(error)"; exit 3}
#-- Copy LDAS job ID into a scalar variable for convenience
set searchjob $search(jobid)

#-- Do a database query to retrieve all the event candidates from the
#-- search job to a local file.  I figured out what SQL query to use by
#-- building a query like this with guild, then basically cutting and pasting.
LJrun getmeta {
    getMetaData -returnprotocol http://out.xml -outputformat LIGO_LW
    -sqlquery {
        SELECT * FROM SNGL_BURST
        WHERE ((process_id,creator_db) in
               (select distinct process_id,creator_db from process
                where (jobid = $searchjob)))
        ORDER BY start_time, start_time_ns
    }
}
if $LJerror {puts "LDAS error from getMetaData job:\n$getmeta(error)"; exit 3}

#-- Retrieve the output from the getMetaData job to a local file
set file [LJcopy $getmeta(outputs) ${searchjob}_events.xml]

#-- Count the number of events in the file using the 'lwtscan' utility
set report [exec lwtscan $file]
#-- The number of rows appears at the end of the report from lwtscan
regexp {(\d+) rows$} $report match nrows

#-- Print out the number of events found
puts "Job $searchjob found $nrows event candidates"
Another type of script involves a loop, with basically the same job being executed each time through the loop, but with slightly different parameters. In this case it is probably best to "delete" the job (using the LJdelete function) at the end of the loop, after which the job tag can be re-used. (The alternative approach of constructing a distinct job tag each time through the loop would also work, but the script would gradually consume more and more memory.) This kind of script might have the following structure:
#!/usr/bin/env tclshexe
package require ldasjob

#-- Initialize parameters, etc.
set start 693960000
set length 60       ;#-- Length of time to be analyzed by a single job
file delete loop.end   ;#-- Delete this file if it exists

#-- Loop until a file called "loop.end" appears in the current directory
#-- (a kludgy but effective way for the user to cause a graceful exit)
while { ! [file exists loop.end] } {
    #-- Construct the time range for this loop iteration
    set trange "$start-[expr $start+($length-1)]"

    #-- Run the LDAS job to analyze this time range
    LJrun loopjob -log {
        ...
    }

    #-- Now do some post-analysis of the job (or whatever)
    ...

    #-- Clean up at the end of the loop
    LJdelete loopjob      ;#-- Forget about this job
    incr start $length    ;#-- Increment the time range
}
puts "Exited loop because a file called loop.end appeared"
You can safely use LJdelete to "delete" a job at any time, even if you submitted the job with the -email or -nowait option and it might still be running within LDAS. LJdelete does not instruct LDAS to cancel a job that is running or queued; it simply causes the ldasjob library software to forget about that job.


Return to top

Running LDAS jobs asynchronously

In some cases, you may want to be able to run more than one LDAS job at the same time. For instance, you may want to query the databases at both observatory sites and then compare the results of your queries. The default behavior of LJrun is to wait until the job finishes before returning, so your script would run the getMetaData jobs serially, which is not necessary in this case. The -nowait option changes this behavior, causing LJrun to return control to your script as soon as the job is submitted, while internally (in the helper process) awaiting notification by LDAS when the job finishes.

There are two ways to get the results (success/failure status, list of output files, etc.) from a job that was submitted using the -nowait option.

First, you can call LJwait <job_tag>, which returns when the job finishes either successfully or unsuccessfully. (If the job has already finished, LJwait returns immediately.) If the LDAS job finished successfully, then LJwait sets LJerror=0 and returns the LDAS job ID. If the LDAS job failed, then LJwait sets LJerror=1 and returns the error message. If a software error occurs, then LJwait sets LJerror=1, returns the error message, and generates a Tcl error condition which causes the user script to terminate (unless caught and handled). Here is part of a script which runs jobs simultaneously at LHO and LLO, then uses LJwait to wait for them to finish:

#!/usr/bin/env tclshexe
package require ldasjob

set query {
    select * from sngl_inspiral
    where end_time between 693960000 and 693965000
    order by end_time, end_time_ns
}

#-- Submit jobs to run simultaneously, using the "-nowait" option
LJrun lhojob -mananger lho -log -nowait {
    getMetaData -returnprotocol http://out.xml -outputformat LIGO_LW
    -sqlquery $query
}
LJrun llojob -mananger llo -log -nowait {
    getMetaData -returnprotocol http://out.xml -outputformat LIGO_LW
    -sqlquery $query
}

#-- Wait for both jobs to finish
LJwait lhojob
if $LJerror { puts "LHO job failed: $lhojob(error)"; exit 3 }
LJwait llojob
if $LJerror { puts "LLO job failed: $llojob(error)"; exit 3 }

#-- Now retrieve the output files from each job and compare them
...
Note that the -log option still operates normally (printing the default log message, in this case, as soon as the job is successfully submitted) when -nowait is used.

The second way to get the results from a job that was submitted using the -nowait option is rather magical: simply reference the desired element of the job info array, and the software will automatically wait until that element has been assigned a value. The calls to LJwait in the example above can be removed to take advantage of this feature:

...
#-- Submit jobs to run simultaneously, using the "-nowait" option
LJrun lhojob -mananger lho -log -nowait {
    getMetaData -returnprotocol http://out.xml -outputformat LIGO_LW
    -sqlquery $query
}
LJrun llojob -mananger llo -log -nowait {
    getMetaData -returnprotocol http://out.xml -outputformat LIGO_LW
    -sqlquery $query
}

#-- Now retrieve the output files from each job and compare them
#-- (These calls will automatically wait until the outputs are known)
set lhofile [LJcopy $lhojob(outputs) lho_out.xml]
set llofile [LJcopy $llojob(outputs) llo_out.xml]
...
Note that reading any job info array element (except the jobtag element) causes the global LJerror variable to be set to 0 or to 1, depending on whether that job succeeded or failed. It is also possible to encounter a Tcl error condition, if you attempt to read an array element that ends up never being set because the job fails. For this reason, it is probably best to not use the magical wait-until-filled feature, but instead to use LJwait to explicitly wait for jobs to finish, and then to check the value of LJerror before attempting to do anything with the outputs from the jobs.

A further note: at present, there is no way to wait for any one out of a set of jobs to finish. This feature could be added if there is a need for it.


Return to top

Saving and restoring job information

The complete information about a job can be saved in a file using one of the two forms of the LJsave command: From a file previously stored using LJsave, a job info array can be recreated (e.g. in a different script, or during a later invocation of the same script) using LJrestore <job_tag> or LJrestore <job_tag> <file> . As with LJsave, the filename is assumed to be <job_tag>.lji if not explicitly specified, and a default extension of .lji is added if a filename with no extension is specified.

Caveat: the LJsave and LJrestore commands have not really been tested.


Return to top

Tcl language resources

The main Tcl web site is located at www.tcl.tk . There are links to many resources, including on-line documentation for Tcl commands. There are also a number of books about Tcl; I have a well-worn copy of "Practical Programming in Tcl and Tk" by Brent Welch, which I think is excellent.

(By the way, ".tk" is the country extension for Tokelau, a small island nation in the Pacific. Check out the remarkably slick www.dot.tk web site.)


Return to top

Other useful packages

A user script may contain any number of "package require ..." statements.

The tconvert package allows you to convert (both ways) between GPS seconds and UTC (or local) date/time strings within a Tcl script. It is part of the dataflow LIGOtools package. For more information, see the FAQ entitled "How can I convert between GPS time and UTC or local time?".


Return to top

Under the hood: the LDASJobH "helper process"

Submitting a job to LDAS involves a handfull of sequential operations, as shown in the example in the Introduction of this documentation. However, to receive notification from LDAS when the job has finished, a client must "listen" on a port for LDAS to initiate a socket connection and send the message, which requires a break from procedural programming. In order to support multiple simultaneous jobs, and to serve as a restricted web server for the %FILE() mechanism, the ldasjob package must use the Tcl event loop and fully event-driven code. On the other hand, user scripts should be as straightforward and sequential as possible.

To satisfy these two distinct requirements, the ldasjob package is divided into two parts: the ldasjob.tcl library, which contains ordinary procedural code, and a separate event-driven program called LDASJobH which serves as a "helper process", taking care of the asynchronous communication with LDAS. An LDASJobH child process is automatically launched when LJrun is called for the first time in a user script, and this process handles the communication for all calls to LJrun and other ldasjob library commands, no matter how many jobs are submitted by the user script. When the user script terminates for any reason, its associated LDASJobH process terminates too. Any number of user scripts, each with its own LDASJobH process, can run simultaneously on a machine without interfering with each other.

LDASJobH has a special feature to handle the constraint imposed by LDAS on the minimum time between job submissions: when it detects this error, it automatically resubmits the job after an appropriate time interval has elapsed. This is completely transparent from the user's point of view.

Log files

To facilitate troubleshooting, each LDASJobH process writes a log file in the directory ~/.LDASJobH/. While the process is running, the file has a name of the form LDASJobH_<start_time>_<process_id>_running.log, where <start_time> is the value of the unix system clock at the time the process was started. When the LDASJobH process shuts down, the file is renamed to the form LDASJobH_<start_time>_<process_id>_done_<end_time>.log. Note that if an LDASJobH process ends abnormally for some reason (e.g. if you kill the user script with Ctrl-C), then the log file will not be renamed, i.e. its name will continue to suggest that the process is still running. However, log files of both types ("running" and "done") are automatically cleaned up by later invocations of LDASJobH, according to the following algorithm: A consequence of this algorithm is that if you execute many user scripts within a one-hour time interval, there can be a fairly large number of log files in your ~/.LDASJobH/ directory. However, the number of log files will not grow indefinitely, so you are advised to let LDASJobH take care of cleaning them up.

For completeness, the ldasjob library includes an LJend function which instructs the LDASJobH process to immediately shut down gracefully. However, normally this is not needed, since the LDASJobH process shuts down gracefully anyway when the user script exits.