How to read database entries into Matlab

Peter Shawhan
Revised February 11, 2003

The LIGOtools dataflow package includes a Matlab function (implemented as a MEX-file) called "readMeta" which reads a LIGO_LW file containing a table of database entries and places the contents of each table column into a Matlab array. Assuming that you set up your environment in the usual way (by invoking eval `<dir>/use_ligotools` in your .cshrc or .login file) your MATLABPATH environment variable will include $LIGOTOOLS/matlab, so the readMeta function will be available for use without any extra effort on your part. The syntax is rather simple, as you can see by typing "help readMeta" within Matlab:

  readMeta --- Read LIGO metadata table into a MATLAB structure
  By Peter Shawhan (shawhan_p@ligo.caltech.edu), April 2001
  Revised by Peter Shawhan, Dec 2002, to automatically unpack spectra from BLOBs
  Revised by Peter Shawhan, Feb 2003, to be able to read a specific table
  Uses "metaio" parsing library by Philip Charlton
  
  This returns a MATLAB structure with fields which correspond to
  columns in the table.  Each numeric column is converted to a vector
  of double-precision real values, while each string or binary column
  is converted to a cell array containing individual char arrays.

  Usage:  a = readMeta( file [,table] [,row [,col]] )

   file  is a "LIGO lightweight" XML file containing a Table object.
   table (optional) specifies the name of the table to be read; this allows a
           specific table to be read from a file containing multiple tables.
           The table name is case-insensitive.
   row   is a row number (counting from 1), or a vector of row numbers
           (in ascending order), e.g. [2:5,10] gives rows 2 through 5 and
           row 10. If omitted, or equal to 0, then all rows are included.
   col   is a string containing one or more column names, separated by
           commas and/or spaces.  If omitted, all columns are included.

  Spectrum BLOBs (from the summ_spectrum table) are automatically unpacked into
  numeric arrays, assuming the mimetype is one of the standard LIGO ones.
  Note: to access the contents of a Matlab cell array, use curly braces around
  the index.  For example, "a.spectrum{1}" gives the first spectrum array.

  If readMeta(file) is called without assigning the output to a variable,
  then it simply prints the column names and types and the number of rows.

As noted in the documentation above, there is an optional argument which allows you to read a specific table out of a file containing multiple tables. (This feature was added in version 4.10 of the dataflow package, as was the code which unpacks BLOBs into numeric arrays if they have a known mimetype.)

A sample Matlab session

This example uses the file inspiral_events.xml, which was created using the following command (on the unix command line):
  getMeta -d lho -q "select search,end_time,end_time_ns,mass1,mass2,amplitude,snr,chisq,event_id from sngl_inspiral where end_time > `tconvert jan 3` order by end_time,end_time_ns fetch first 1000 rows only" -o

                              < M A T L A B >
                  Copyright 1984-2000 The MathWorks, Inc.
                        Version 6.0.0.88 Release 12
                                Sep 21 2000

 
  To get started, select "MATLAB Help" from the Help menu.

>> %-- First, just scan the file to see what it contains
>> readMeta('inspiral_events.xml')
  search               LSTRING
  end_time             INT_4S
  end_time_ns          INT_4S
  mass1                REAL_4
  mass2                REAL_4
  amplitude            REAL_4
  snr                  REAL_4
  chisq                REAL_4
  event_id             ILWD:CHAR_U
Read 1000 rows

>> %-- Now read in the contents of the file
>> ins = readMeta('inspiral_events.xml')
Read 1000 rows

ins = 

         search: {1000x1 cell}
       end_time: [1000x1 double]
    end_time_ns: [1000x1 double]
          mass1: [1000x1 double]
          mass2: [1000x1 double]
      amplitude: [1000x1 double]
            snr: [1000x1 double]
          chisq: [1000x1 double]
       event_id: {1000x1 cell}

>> %-- Display a histogram of the signal-to-noise ratio
>> hist(ins.snr)


>> %-- Make a scatter plot of the two masses for each event candidate
>> plot(ins.mass1,ins.mass2,'.')


>> %-- Display a histogram of snr only for events with both masses > 6
>> hist(ins.snr(find(ins.mass1>6 & ins.mass2>6)))