Proxy Services for Client Programs in LIGOtools

Peter Shawhan
September 23, 2002
Slightly revised June 28, 2004 to reflect new ljproxy host

Executive Summary

Discussion

Communication Barriers

LIGOtools includes several "client" programs (guild, the ldasjob Tcl library, DTT, getMeta, getFrames, getFrameData, etc.) which rely on being able to communciate with LDAS and other data services. Various networking topologies and security measures, used by the LIGO Laboratory and various institutions and individuals, place barriers in the way of straightforward communication:

Dealing with a Firewall

The normal communication model for running an LDAS job is for the client to connect to LDAS, send the command, get back the LDAS job ID, then close the connection. When the job finishes, LDAS connects back to the client and sends a message with information about job output. However, the networking topologies listed above prevent LDAS from being able to connect directly back to the client. To take care of this situation, a special proxy server called ljproxy acts as an intermediary, handling the asynchronous communication with LDAS while keeping open the original communication channel established by the client. (The LARS data server, which handles certain getFrames data requests, uses a similar communication model and is also serviced by ljproxy.) The main copy of ljproxy runs on a workstation at Caltech and handles all clients which connect to the Internet through a NAT router or other firewall. To use this proxy server, set the environment variable LJPROXY to "default", i.e.:
  In csh/tcsh: "setenv LJPROXY default"
  In sh/bash/ksh: "LJPROXY=default; export LJPROXY"
You may wish to set this in your .login, .cshrc, or .profile file. Alternatively, within an ldasjob Tcl script, you can do "set env(LJPROXY) default" at the beginning of the script.

guild checks the LJPROXY environment variable, but you can also tell it to use the proxy server by going to the "Connect" menu and setting "Operate through firewall?" to "Yes".

Dealing with a Martian Network

Machines on martian networks cannot connect to the workstation running ljproxy at Caltech. Therefore, there are additional copies of ljproxy running on the gateway machines of the CDS martian networks at the observatories, and the 'ops' accounts have the LJPROXY environment variable set appropriately (to "portal:9802" at LHO, and to "gateway:9802" at LLO), so that users in the control rooms can run LDAS jobs anywhere.

Normally, the output from an LDAS job is retrieved from the LDAS web server using HTTP. In the case of a martian network, an HTTP proxy server must be set up on the gateway machine in order for computers on the inside to be able to access outside web servers. This is the case for the CDS networks at the LIGO observatories. LIGOtools software checks the HTTPPROXY variable, and if it is set, directs HTTP requests to that proxy server. For example, HTTPPROXY should be set to "gateway:80" on the CDS cluster at LLO. This is set in the .cshrc file of the 'ops' account, for instance.

Technical Details

If the LJPROXY environment variable contains the string "default", the software will use the main ljproxy server running on mirfak.ligo.caltech.edu at port 9802. If it has the form "<host>:<port>" (e.g. "mypc.caltech.edu:12345"), then the software will attempt to connect to a ljproxy process running at the specified host and port number.

There is an hourly cron job running on mirfak, so that if it gets rebooted, the ljproxy process should be restarted within an hour.

The HTTPPROXY environment variable is specific to LIGOtools. The code to check it is built into the tclshexe and wishexe shells (i.e. LIGOtools' customized versions of the standard tclsh and wish shells), rather than in individual LIGOtools utilities, so all LIGOtools programs which use the Tcl library for HTTP transfers can take advantage of proxy servers. NOTE, however, that this code is only in version 8.3.4b and later of the 'tclexe' package, whereas most people currently have version 8.3.4. (The HTTP proxy code is the only major difference between the two, so it did not seem worthwhile to make people upgrade to 8.3.4b.) I can provide the 8.3.4b version upon request.

The HTTPPROXY environment variable can have the form "<host>:<port>" or "<host>"; in the latter case, port 8080 is assumed.

The HTTPPROXYBYPASS environment variable may be used to specify a set of web servers which can be reached directly, expressed as a comma-separated list of items. Each item can have one of two forms:

If HTTPPROXYBYPASS is not set, then some default rules apply: the proxy server is bypassed if the host address is in numerical form and corresponds to a private network, or if the host address consists of a single word. (This is equivalent to a value of "10.*.*.*,192.168.*.*,169.254.*.*,*", except that the additional private-network addresses 172.16.*.* through 172.31.*.* are also included.) These default rules do not apply if HTTPPROXYBYPASS is set to anything.