Data Flow Manager Communication Protocol Peter Shawhan August 21, 2001 Overview -------- Communication with the data flow manager (dfm) is based on a simple, line-oriented syntax. For input, there are two kinds of lines: 1. One or more parameter/value pairs, separated by spaces (e.g. "param1 value1 param2 value2"). If a value contains embedded spaces, it should be enclosed in curly braces. (Any value can be enclosed in braces, which are not part of the value itself.) Parameter names are case-sensitive, and are lowercase by convention. Once a parameter is set, it remains set for the duration of the "session" (client connection) or until it is set to a new value. 2. A directive, appearing by itself on a line. A job request consists of several parameter/value pairs followed by the "go" directive. The parameters retain their values even after the "go", so another job can be submitted by modifying only those parameters which need to be modified, and then sending another "go". The meaning of output lines depends on the value of the "output" parameter specified by the user (see below). Parameters required for all job requests ---------------------------------------- udn -- The "Universal Dataset Name". This is the logical name of an information service from which some information is being requested. The list of UDNs is kept in a web file, http://www.ldas-sw.ligo.caltech.edu/ligotools/dataflow/dfm/servers.txt (also mirrored to other LDAS web servers), which is read when the dfm starts up. (In principle, there can be multiple servers for a given UDN; the current behavior of the dfm is to ask all matching servers to validate the request, and from among those which respond that the request is valid, it chooses the one listed earliest in the servers.txt file.) The dfm knows how to handle several server types, and each one expects a different group of parameters to be set to make a meaningful information request. These parameters are listed below. As an alternative to the UDN look-up mechanism, you can specify the server type and server address explicitly using the form "::" . At present, the dfm knows how to handle the (case-sensitive) server types LDASJob, LDASMeta, LDASFrame, and LARS. Examples: "LDASJob:myhost:10001", "LDASJob:myhost.caltech.edu:10001". output -- Output protocol. This determines how the results of the job are to be returned to the client. The permitted values vary depending on the UDN; the following are typical: file: -- Instructs the dfm to write the job output to the specified file. MUST be an absolute path, not a relative one (since the dfm does not know the client's current working directory). If is actually the name of a directory, the dfm chooses a filename (possibly, but not necessarily, a descriptive one) in that directory, making sure not to overwrite an existing file. In either case, when the job is finished, the dfm sends the filename back across the socket to the client. If a job produces more than one output file, the dfm sends a space-separated list (all on one line) back to the client. stdout -- Instructs the dfm to send the job output back over the socket itself, then close the socket. url -- Instructs the dfm to send a URL, giving the location of the job outpout, back over the socket connection. In this case, it is up to the client program to actually retrieve the data. Informational Parameters --------------------------- These are not necessarily required for a request to be processed, but they facilitate bookkeeping and debugging, so they should always be specified. hostname -- Node name of computer from which request is submitted. logname -- Unix username of requestor program -- Name of client program procid -- Unix process ID of client program Directives ------------------- go -- Start a job based on the parameters which have been set. Any further input will be buffered by the dfm, then handled when the first job finishes. Thus, you can send a series of job requests to the dfm all at once, and they will be executed sequentially. exit -- Causes the dfm to close the socket connection to the client (after all previous lines of input have been processed). clear -- Immediately clear input buffer. Unlike the other directives, which are buffered if necessarily and are processed sequentially, "clear" takes effect immediately. Error reporting --------------- If an error occurs, the dfm sends a message beginning with "Error:" back to the client over the socket connection. Query available UDNs -------------------- If the "udn" parameter contains any wildcard (asterisk, question mark, or square brackets), then it is interpreted as a globbing pattern Required Parameters udn -- The UDN pattern to be matched. The usual globbing rules apply: "*" matches zero or more characters, "?" matches exactly one character, and "[rqx]" matches any one of the characters in the brackets (r, q, and x in this case). ("*" or "?" can match a slash, unlike the rules for filename completion performed by the shell.) output -- Can be "stdout", "url", or "file:" Optional Parameter udntype -- Restricts the query to UDNs of one or more types. The type of each UDN is listed in the servers.txt file, and is not used by the dfm for any purpose other than querying available UDNs (at least for now). Currently, the following types are in use: "sqldb" (SQL database), "frame", "fmeta" (metadata about frame data, e.g. time intervals and channel lists). udntype can be a globbing pattern, e.g. "f*" matches UDNs of both the "frame" and "fmeta" types. Data returned The output is a plain-text list; each line consists of a UDN followed by its type, separated by a single space. Examples Input: udn //ligo/*/lho/e1 output stdout go Output: //ligo/min/lho/e1 frame //ligo/raw/lho/e1 frame //ligo/sec/lho/e1 frame Input: udn //ligo/m*/lho/e1* output stdout go Output: //ligo/min/lho/e1 frame //ligo/min/lho/e1/channels fmeta //ligo/min/lho/e1/times fmeta Query an SQL Database --------------------- Required Parameters udn -- Universal Dataset Name output -- Can be "stdout", "url", or "file:" logname -- Unix username of client process name -- LDAS username password -- LDAS password (either encrypted as in ~/.ldaspw, or unencrypted) query -- SQL query, enclosed in curly braces Optional Parameters database -- The database instance name, e.g. "llo_1" or "llo_test". If this is not specified, the default database instance for the particular site is used. Data returned The output from an LDAS database is a file in LIGO_LW format. Example Input: hostname sheratan logname pshawhan program /ligoapps/ligotools/dev/bin/dfmpipe procid 22077 name peter password blah udn //ligo/meta/dev output stdout query {select tabname from syscat.tables where definer='LDASDB'} go Output: SQL=select tabname from syscat.tables where definer='LDASDB' for read only result_table:table "PROCESS", "PROCESS_PARAMS", ... "DBMDCTEST"
Insert Data into an SQL Database -------------------------------- Required Parameters udn -- Universal Dataset Name output -- Required, but not really used, except that if it is "stdout", then the dfm closes the socket connection when the job finishes. logname -- Unix username of client process name -- LDAS username password -- LDAS password (either encrypted as in ~/.ldaspw, or unencrypted) infile -- Input data file (LIGO_LW format) to be inserted into the database. The filename MUST include the full absolute path. Optional Parameters database -- The database instance name, e.g. "llo_1" or "llo_test". If this is not specified, the default database instance for the particular site is used. Data returned If file ingestion succeeds, the dfm sends a blank line back to the client across the socket connection. If there is an error, then the dfm sends a message beginning with "Error:". Frame data request ------------------ Required Parameters udn -- Universal Dataset Name output -- Can be "stdout", "url", or "file:" hostname -- Name of computer on which client is running logname -- Unix username of client process name -- LDAS username password -- LDAS password (either encrypted as in ~/.ldaspw, or unencrypted) times -- Time interval (in GPS seconds). Normally this consists of two integers separated by a dash. As a special case, a single integer denotes an interval one second long beginning at that time. Optional Parameters channels -- A channel name, or list of channel names (separated by spaces and enclosed in curly braces) to be returned. If omitted, all channels will be returned. If the request will be handled by LARS, then you can use one or more asterisks as wildcards; all matching channels will be included. compression -- The compression code to be used by FrCopy to build the output file. Compression code 0 means on compression. (This parameter is used only if the request is handled by LARS.) Data returned If the "output" parameter is "url", and there is more than one output file, then they are listed on separate lines. Execute an Arbitrary LDAS User Command -------------------------------------- Required Parameters udn -- Universal Dataset Name output -- Required, but not really used, except that if it is "stdout", then the dfm closes the socket connection when the job finishes. logname -- Unix username of client process name -- LDAS username password -- LDAS password (either encrypted as in ~/.ldaspw, or unencrypted) command -- The core part of the LDAS user command Data returned The output message from the LDAS user command, or an error message, if an error occurs. Example Input: hostname sheratan logname pshawhan program /ligoapps/ligotools/dev/bin/dfmpipe procid 22077 name peter password blah udn //ligo/ldas/dev output stdout command "getMetaData -returnprotocol http://guildquery006 -returnformat LIGO_LW -sqlquery {SELECT * FROM PROCESS ORDER BY start_time FETCH FIRST 100 ROWS ONLY FOR READ ONLY}" go Output: Subject: NORMAL18513 xml created Your results: guildquery006.xml can be found at: http://131.215.115.248/ldas_outgoing/jobs/NORMAL18513