MDE User Guide

This walks you through the usage of the PostgreSQL MDE and possible input errors and how to avoid them.

 

Prerequisites

  • Dataset ID

    • The database ID of the database you want the get metadata from. Note that the MDE is meant to be used by the digesters which should have the dataset_id from the instruction file generated by the UI.

    • The dataset ID is an integer from a serial column. So trying to give a non-numeric value will prompt an error, otherwise a database check will be done.

  • Output Directories where the MDE will write ouput files to. The marker, sample, and project metadata files will be written to a directory provided to their corresponding flags (ie. -m, -s, -p). These directories should be writeable to the postgres user (or whichever system account the database is set to use)

  • Database Connection String which specifies the database server, port, user, and database name (RFC 3986 URI). It will be passed as a parameter to the MDE command call. 

Metadata Extraction Guide (Minimal)

 

  1. The main entry point for the MDE is gobii_mde.py (located in $MDE_ROOT/gobii_mde/gobii_mde.py). Run "gobii_mde -h" to display the usage options

    gobii_mde.py -c <connectionString> -m <markerOutputFile> -s <sampleOutputFile> -p <projectOutputFile> -d <dataset_id> -a -v -h = Usage help -c or --connectionString = Database connection string (RFC 3986 URI). Format: postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1&...] -m or --markerOutputFile = The marker metadata output file. This should be an absolute path. -s or --sampleOutputFile = The sample metadata output file. This should be an absolute path. -p or --projectOutputFile = The project metadata output file. This should be an absolute path. -d or --datasetId = The dataset ID of which marker metadata will be extracted from. This should be a valid integer ID. -a or --all = Get all metadata information available, regardless if they are relevant to HMP, Flapjack, etc. formats. -v or --verbose = Print the status of the MDE in more detail



  2. To extract the minimal metadata for marker, sample, and project, run: $> python gobii_mde.py -c <connectionString> -m <mpath> -s <spath> -p <ppath> -d <dataset_id>

For example:

python gobii_mde.py -c postgresql://appuser:appuser@cbsugobii06.tc.cornell.edu:5432/gobii_rice -m /tmp/d3_markermeta.txt -s /tmp/d3_samplemea.txt -p /tmp/d3_projectmeta.txt -d 3

If extraction is successful, you'll see the following output messages:

On the other hand, if there's an error, the error message will be printed to stderr.

 

Note that the minimal extraction is the default. Following the requirements, only the marker metadata is actually different between minimal and full extraction. Both the sample metadata and the project metadata are always extracted in full regardless of the -a or --all flag.

Metadata Extraction Guide (Full)

 

  1. The main entry point for the MDE is gobii_mde.py (located in $IFL_ROOT/gobii_mde/gobii_mde.py). Run "gobii_mde -h" to display the usage options

    gobii_mde.py -c <connectionString> -m <markerOutputFile> -s <sampleOutputFile> -p <projectOutputFile> -d <dataset_id> -a -v -h = Usage help -c or --connectionString = Database connection string (RFC 3986 URI). Format: postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1&...] -m or --markerOutputFile = The marker metadata output file. This should be an absolute path. -s or --sampleOutputFile = The sample metadata output file. This should be an absolute path. -p or --projectOutputFile = The project metadata output file. This should be an absolute path. -d or --datasetId = The dataset ID of which marker metadata will be extracted from. This should be a valid integer ID. -a or --all = Get all metadata information available, regardless if they are relevant to HMP, Flapjack, etc. formats. -v or --verbose = Print the status of the MDE in more detail



  2. To do full extraction, run: $> python gobii_mde.py -c <connectionString> -m <mpath> -s <spath> -p <ppath> -d <dataset_id> -a

For example:

Similar to minimal metadata extraction, you get a "File created successfully" per file loaded, or an error message "Failed to extract <name_of_file>. Error=<error_message>" printed to stderr if it wasn't successful. 

 

The marker metadata doubles its size when extracted in full as it will include all the available information for the markers in the given dataset.

Individual Metadata Extraction

The MDE can be used to extract each of the metadata file individually. You simply don't specify the flags for the others. For example, passing a -m and not a -s or -p will only extract marker metadata and not the saple and project metadata.

StdErr

All error messages and stack traces are printed to stderr. So you can do something like:

Debugging and Testing

The MDE is composed of three components, extract_marker_metadata, extract_sample_metadata, and extract_project_metadata python scripts. You can run them directly without using gobii_mde.py if ever you need to. They do exactly as what their names indicate. A sample call is printed if you don't call it correctly, for example:



Unlike the gobii_ifl scripts, it is unlikely that you'll need to call these scripts individually as you can already invoke them one at a time by the use of their corresponding flags in the gobii_mde.