Loader Instruction Files
General Layout
An Instruction JSON file contains a series of instructions - declarations of what should go into a specific Intermediate File (see IFLs, the consumers of the files, for more details).
The instruction format comes with a list of Tables. Each Table is composed of columns, where a column contains an arbitrary set of data pulled from the input file or generated by rules, based on the COLUMN_TYPE.
Above is an example of the structure of an instruction file. Here the file specifies three tables, each referencing a part of the overall data structure. As you can see, each Table has a series of Columns, where the column objects specify how to interpret the input files to generate them.
File Header
source
Source is the source file (or folder) for the table.
destination
Destination is the output destination for the intermediate file.
delimiter
(Mostly used by the CSV loader) Delimiter used to separate entries in the file. Ex( | , :)
fileType
The input file type. Generic, HapMap, VCF are currently supported filetypes. Each filetype comes with different behind the scenes code to deal with that type, and supports a different subset of column types.
Table
Each 'table' contains several elements, as well as several columns specifying what columns of data to be pulled.
table
Name of the Intermediate File to be generated, also the name of the table into which the data will be loaded; Used to determine where data is stored.
VCF Options
Options for filtering the VCF before loading.
Dataset Id
The ID of the dataset if there is a dataset to be loaded in this request.
GOBii Crop Type
The crop that is being loaded to
Contact ID
ID of the contact requesting the data load
Contact Email
Email address to send the load status to
Columns
"columnType" :
Many types, for Generic we have ROW, COLUMN, CONSTANT or BOTH
COLUMN : reads horizontally in the input file
ROW reads vertically in the input file
CONSTANT will always be a constant value
BOTH every entry contains the rest of the row of input (Helpful for variant call part)
"rCoord" :
Row coordinate of starting point
"cCoord" :
Column coordinate of starting point
"name" :
Name of the column, ignored in subcolumns
"filterFrom" :
String to filter 'from' - Starts the string at the first occurrence of filter
"filterTo" :
String to filter 'to' - Ends the string at the first occurance of filter
"constantValue" :
A constant value - if this column is of type constant this is the value that appears
"subcolumn" :
True if this is a subcolumn
"subcolumnDelimiter" :
The character that combines the subcolumn to the column before it
"metaDataId" :
Currently not used
Lifecycle
An instruction file is created either by passing a POJO to the web services serialized, or through calls to the web services directly. As they are represented as objects in the GOBii-model project, they can be thought of as both .json files and as Java objects, however, the names and constraints remain the same.
The Digester looks for files in a specified directory inside the GOBii directory to do its work. The web-services have this directory encoded in their configurations. By default, it is $ROOT/crops/$crop/digest/instructions.
This folder is checked every several minutes by the cron process, which takes these jobs and changes them to an 'inprogress' folder before calling the Java process on them. This ensures each file is processed exactly once, and allows for some information on the status of the request to be gleaned.
As each file is named based on the combination of the user name and the timestamp, each file can be considered a unique identifier for the job.
This handy chart explains what's happening if you find the file at each location. (Note: /inprogress has several lines in this chart)
Location | Status | Next Step |
---|---|---|
/instructions | Waiting to be picked up | Automatic |
/inprogress | Processing | User will be informed when complete |
/inprogress | Complete | Can be removed (Careful that processing is complete) |
Not Found | Was not successfully created | Attempt process again/Seek support |