Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

General Layout

An Instruction JSON file contains a series of instructions - declarations of what should go into a specific Intermediate File (see IFLs, the consumers of the files, for more details).

The instruction format comes with a list of Tables. Each Table is composed of columns, where a column contains an arbitrary set of data pulled from the input file or generated by rules, based on the COLUMN_TYPE.

Above is an example of the structure of an instruction file. Here the file specifies three tables, each referencing a part of the overall data structure. As you can see, each Table has a series of Columns, where the column objects specify how to interpret the input files to generate them.

File Header

source

Source is the source file (or folder) for the table.

destination

Destination is the output destination for the intermediate file.

delimiter

(Mostly used by the CSV loader) Delimiter used to separate entries in the file. Ex( | , :)

fileType

The input file type. Generic, HapMap, VCF are currently supported filetypes. Each filetype comes with different behind the scenes code to deal with that type, and supports a different subset of column types.

 

Table

Each 'table' contains several elements, as well as several columns specifying what columns of data to be pulled.

table

Name of the Intermediate File to be generated, also the name of the table the data will be loaded into. Used to determine where data is stored.

VCF Options 

Options for filtering the VCF before loading.

Dataset Id

The ID of the dataset if there is a dataset to be loaded in this request.

Gobii Crop Type

The crop that is being loaded to

Contact ID

ID of the contact requesting the data load

Contact Email

Email address to send the load status to

Columns

"columnType" :

Many types, for Generic we have ROW, COLUMN, CONSTANT or BOTH

COLUMN : reads horizontally in the input file

ROW  reads vertically in the input file

CONSTANT will always be a constant value

BOTH every entry contains the rest of the row of input (Helpful for variant call part)

"rCoord" :

Row coordinate of starting point

"cCoord" :

Column coordinate of starting point

"name" :

Name of the column, ignored in subcolumns

"filterFrom" :

String to filter 'from'. Starts the string at the first occurrence of filter.

"filterTo" :

String to filter 'to'. Ends the string at the first occurance of filter.

"constantValue" :

A constant value - if this column is of type constant this is the value that appears

"subcolumn" :

True if this is a subcolumn

"subcolumnDelimiter" :

The character that combines the subcolumn to the column before it

"metaDataId" :

This is currently not used

 

Lifecycle

An instruction file is created either by passing a POJO to the web services serialized, or through calls to the web services directly. As they are represented as objects in the gobii-model project, they can be thought of both as .json files and as Java objects, however the names and constraints remain the same.

 

The Digester looks for files in a specified directory inside the gobii directory to do its work. The web-services have this directory encoded in their configurations. By default, it is $ROOT/crops/$crop/digest/instructions.

This folder is checked every several minutes by the cron process, which takes these jobs and changes them to an 'inprogress' folder before calling the Java process on them. This ensures each file is processed exactly once, and allows for some information on the status of the request to be gleaned.

 

As each file is named based on the combination of the user name and the timestamp, each file can be considered a unique identifier for the job.

 

This handy chart explains what's happening if you find the file at each location. (Note: /inprogress has several lines in this chart)

Location

Status

Next Step

/instructions

Waiting to be picked up

Automatic

/inprogress

Processing

User will be informed when complete

/inprogress

Complete

Can be removed (Careful that processing is complete)

Not Found

Was not successfully created

Attempt process again/Seek support


  • No labels