Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This meeting provided a ton of useful feedback on GOBii and the GDM platform. This document focuses on the suggestions for improvement provided at that meeting.

CIMMYT

Overall

  • Benefits

    • centralized, searchable institutional record genotype data

    • not dependent on external services

  • Challenges

    • Can’t complete genotyping analysis workflows smoothly

    • “run and maintain” mode at CIMMYT pending GOBii / EBS integration

Loader

  • Benefits

    • Flexible

    • Templates

  • Challenges

    • Loader validations stricter than db req’s

      • Can’t load 2 dnaruns for the same sample GSD-255 - Getting issue details... STATUS

    • Diagnosing errors - overreliance on help desk

    • Lacking useful transformations during loading

      • Update marker names that are based on sequence positions

    • Convoluted - requires substantial training

    • No tool for SNP recalling

      • (for KASP markers – but unclear where/how
        this could happen)

Extractor

  • Benefits

    • Fast

    • Email notifications

    • Can aggregate some data across datasets

  • Challenges:

    • Query system inflexible

      • Can’t select multiple dataset types for download

      • Can’t extract by multiple factors - e.g. intersect of markers and samples

    • File delivery system convoluted GSD-247 - Getting issue details... STATUS

    • “QC” stats from KDC not provided to users during download

CAST

  • Benefits

    • Consensus call functionality

    • Splitting on multiple criteria

    • Some combining of data across datasets

  • Challenges

    • Cannot combine data from same samples in different datasets into one row

    • Does not facilitate the selection of marker groups to use in analyses

Timescope

  • Benefits:

    • Provides important functionalities to enable the “safe” deleting of datasets, markers, and samples

  • Challenges:

    • Requires deployment of another tool instead of combining all CRUD functionalities in one tool

    • Separate authentication system, not linked to institutional authentication system

Data

  • Benefits

    • Large data storage for common data types

    • Flexible properties facilitate metadata storage

    • Marker groups allow the storage of different haplotypes associated with the same group of markers

  • Challenges

    • Some data dependencies have led to unplanned processes, e.g. sample linkage to a project and UUID implementation have caused CIMMYT to use one project for all datasets

    • Variants have not been implemented to facilitate analyses for the “same” marker used in different platforms, potentially with different names, over time

    • Marker groups are based on markers instead of variants and are not linked to traits, phenotypes, etc.

    • Current data structure seem to prevent the storage of data linked to each genotypic call or data point

    • VCF metadata is not preserved

    • QC values can’t be associated with data points

    • Allele frequency data cannot be stored or retrieved easily

    • Across ST and GOBii no clear model for how to store “consensus” calls or “reference” genotype or fingerprint constructed from different samples over time

    • In an integrated system, many fields of information may be duplicated and sometimes have different “IDs” e.g. ID for germplasm in CB and new ID for germplasm in GOBii

IRRI

Loader

  • Challenges

    • Manual mapping is tedious

      • different fields from different service providers

    • Errors with certain characters in sample files generated by B4R

    • Indels not supported unless encoded as +/-

    • Data often requires cleaning prior to upload

    • Requirement to associate data with PI

  • No labels