History of Changes for LIGOtools Package: ldasjob
Version 1.0
First released version.
Version 2.0
(Peter Shawhan, May 24, 2002)
This was a major overhaul, with too many changes to list individually.
The basic operation is pretty much the same as in version 1.0. One of
the most significant changes is that the documentation has been completed! It
describes all the new features, though generally it does not indicate
what is new for version 2.0.
Major changes include:
- Fixed a number of bugs, including one which caused scripts to
hang (when the log file of the associated LDASJobH process was
deleted), one which sometimes left the LDASJobH process running after
the user script exited, and one which eventually caused Tcl to run out
of file descriptors (because files sent by %FILE(...) were never
closed).
- Made the usage of the LJerror variable more
consistent. A value of 1 means that the LDAS job failed, for whatever
reason. In general, only logic errors or intrinsic software errors
should cause Tcl error conditions now.
- Added several new elements to job info arrays.
- It is now recommended to use "package require ldasjob" instead of
"package require LDASJob", for
consistency with the LIGOtools package name, which is lowercase.
However, the old form will still work too.
- The behavior of the -nowait flag has changed.
Previously, it caused LJrun to return immediately, before the
job had even been submitted to LDAS. Now it waits until the job has
been submitted, and the LJerror flag reliably indicates
whether or not job submission was successful. But there may, in fact,
be fewer cases in which it is necessary to use the -nowait
flag, because...
- LJrun now has a new "-log" option which
causes a log message, including the LDAS job ID, to be printed as soon
as the job is submitted successfully. I know of at least two people
who were using the -nowait flag just to be able to print out
job information without having to wait for the job to finish; they can
now use the -log option instead. You can also replace the
default message with a message of your own (or with any Tcl command,
in fact).
- LJrun also has a new "-email" option which
causes LDAS to notify the user by email when the job is done,
instead of the notification coming back to the ldasjob
code and being accessible to the user script. In other words, this
makes LJrun behave a lot like the boilerplate job-submission
code in plain Tcl scripts, such as have been used for various MDCs.
This is relevant because...
- The protocol for clients to submit user commands to LDAS is changing
to encrypt the password for transmission over the network. When the
next version of LDAS is released, password encryption will probably
become mandatory for some or all of the LDAS systems. The
ldasjob package and other LIGOtools utilities (e.g.
guild) will handle this change transparently, but plain-Tcl
scripts will cease to work.
Version 2.1
(Peter Shawhan, November 22, 2002)
Most of the changes were incrementally deployed via web-patching
prior to the actual release of version 2.1.
- Several modifications to improve the robustness of communication
with LDAS, e.g. to automatically retry if a job is rejected due to a
transient error condition (such as the queue being full). Also fixed
a bug when using the LDAS job proxy.
- Bug fix in ldasjob.tcl: the argc variable is
now properly updated to exclude "-manager ", if that was
specified on the command line.
- Added job info array elements to indicate number of events
inserted into database, and the database(s) they were inserted into.
- In LDASJobH, added a check to make sure the file to
be sent to LDAS is not empty.
- A few other minor bug fixes.
Version 2.2
(Peter Shawhan, February 6, 2003)
Most of the changes were incrementally deployed via web-patching
prior to the actual release of version 2.2.
- Added code to LDASJobH to automatically use LDAS job proxy server
when LDASJobH is running on an LLO machine and connecting to a machine
not at LLO.
- Modified code which parses reply message from LDAS, to more robustly
pick out the job ID and determine whether the job succeeded or failed.
(LDAS message syntax is not very standardized.)
- Updated default list of web-patch hosts to include ldas-cit.
- Put in code to check the WEBPATCH_HOST environment variable.
This allows a user to specify a specific web-patch host to use instead
of the default list (i.e. because it is local), or to skip
web-patching altogether by setting it to "none".
- Increased EchoMyIP time from 5 sec per web-patch host to a
variable number, depending on how many hosts there are in the list.
- Increased the timeout interval for downloading web-patch file.
Version 2.3
(Peter Shawhan, February 6, 2003)
- Always skip web-patching if running on one of the
ldas-jobs computers.
- Conversion between shorthand manager name (e.g. "lho" or
"cit") is now hard-coded. Previously, it was derived from
the servers.txt file that was downloaded from the LDAS web
server.
Version 2.3.1
(Peter Shawhan, February 19, 2003)
- Add "gateway" to the end of the list of web-patch hosts for LDASJobH.
- Modify LDASJobH to NOT use a proxy server if manager is on a private
network, unless explicitly told to do so via the LJPROXY environment variable.
Also, modify the URLs of job outputs if necessary to be sure to access them via
the network (private vs. public) over which we connected to the manager.
Version 2.3.2
(Peter Shawhan, April 2, 2003)
- Add psu and psudev as acceptable shorthand forms for LDAS manager.
Version 2.4
(Peter Shawhan, June 28, 2004, though many of these changes were
available earlier due to web-patching)
- Philip Charlton pointed out that '<' and '>' symbols in the
job-reply message were causing text to be lost, due to code in
LDASJobH which tried to strip out html tags. So this code has now
been commented out.
- Fixed bug when parsing very long list of LDAS result files.
- Andri Gretarsson has an LDAS username which contains a period; modified
LDASJobH to handle this when reading from the .ldaspw file.
- Modified LDASJobH to avoid prepending a slash to gridftp URLs.
- In LDASJobH, generalized the code which checks whether it is
running on an ldas-jobs machine, by making the comparison
case-insensitive and ignoring any domain which might be included in
the output from [info hostname].
- Modified LDASJobH so that if it is unable to create a log file in
~/.LDASJobH/, it prints a warning message to stderr (if it
can) and then simply continues, discarding any log messages which are
generated.
- Modified LDASJobH to use mirfak as default ljproxy server,
instead of sheratan.
Version 2.5
(Peter Shawhan, March 30, 2005)
- Updated LDASJobH to use persistent socket connections, unless the LJPERSISTENT environment variable is set to 'never'. Also cleaned up a few internal things in the code.
Version 2.6
(Peter Shawhan, October 28, 2005)
- In LDASJobH, rewrote 'jobInfo' branch of SeqLDAS function to handle job
information in a uniform way for all connection methods (transient, proxied,
persistent). Also some MonMsg changes for debugging.
- Moved FrameCacheDump, FrameCacheQuery, and
FrameCacheRanges from the 'dataflow' package into this
package.
Version 3.0
(Peter Shawhan, June 9, 2009)
- Major update with code from Mary Lei to use Globus authentication.
- New utility getFrameData (the "prototype" version written by Peter in
November 2006).