Layered System Architecture

This architecture stack is for batch operations. Metadata size and genotype can easily get too large for conventional data loading to handle. The main differences of this stack from the "general" architecture are the data access layer and the business layer. The digester serve as the business layer. It will convert whatever input files (raw files like hmp, csv, etc + instruction files from the presentation layer) to a format that the data access layer will understand for loading (IFL). It is also responsible for giving the instructions on what to extract and pass them to the metadata extractor (MDE). The data access layer here is broken into two parts based on functionality. IFL is for batch loading data to the different data stores while MDE is for extracting data in batches and writing them to files. You can also think of IFLs and MDEs as including the functions provided to load and extract the genotype matrix from HDF5/MonetDB. The whole communication line of the digesters and the data access layer is facilitated by cron jobs (as indicated in the gear icons below).

Legend/Acronyms:

PG - PostgreSQL
IFL - Intermediate File Loader
MDE - Metadata Extractor