Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This meeting provided a ton of useful feedback on GOBii and the GDM platform. This document focuses on the suggestions for improvement provided at that meetingmeetings on 7/28/21 and 8/4/21.

...

Category

...

Feedback

...

Benefits

  • centralized, searchable institutional record genotype data

  • not dependent on external services

Reporter

Notes

GSD

General

Can’t complete genotyping analysis workflows smoothly

CIMMYT

“run and maintain” mode at CIMMYT pending GOBii / EBS integration

...

Loader

...

Benefits

  • Flexible

  • Templates

...

CIMMYT

Want automatic project creation from BMS → GOBii

ICRISAT

Direct upload of standard formats and automatic metadata harvesting

ICRISAT

intersection data file extractor (samples + markers)

ICRISAT

Loader

Loader validations stricter than db req’s

Can’t load 2 dnaruns for the same sample

...

CIMMYT

CIMMYT uses same experiment / project for all datasets

https://gobiiproject.atlassian.net/browse/GSD-255

Diagnosing errors - over-

...

reliance on help desk

CIMMYT

Lacking useful transformations during loading

Update marker names that are based on sequence positions

...

CIMMYT

Steep learning curve

CIMMYT

No tool for SNP recalling

(for KASP markers – but unclear where/how

...

this could happen)

...

Extractor

...

Benefits

  • Fast

  • Email notifications

  • Can aggregate some data across datasets

Challenges:

...

CIMMYT

Manual mapping is tedious

different fields from different service providers

IRRI

Alleviated with web-loader, templates

Errors with certain characters when importing sample files generated by B4R

IRRI

Indels not supported unless encoded as +/-

IRRI

https://gobiiproject.atlassian.net/browse/GSD-154

Data often requires cleaning prior to upload

IRRI

Requirement to associate data with PI

IRRI

Extractor

Inflexible query system

Can’t select multiple dataset types for download

Can’t extract by multiple factors - e.g. intersect of markers and samples

CIMMYT

File delivery system convoluted

...

...

QC” stats from KDC not provided to users during download

CIMMYT

CAST

...

Benefits

...

Challenges

...

Splitting on multiple criteria

...

Some combining of data across datasets

Cannot combine data from same samples in different datasets into one row

CIMMYT

Does not facilitate the selection of marker groups to use in analyses

CIMMYT

Timescope

...

  • Provides important functionalities to enable the “safe” deleting of datasets, markers, and samples

Challenges:

...

Benefits

  • Large data storage for common data types

  • Flexible properties facilitate metadata storage

  • Marker groups allow the storage of different haplotypes associated with the same group of markers

...

Requires deployment of another tool instead of combining all CRUD functionalities in one tool

CIMMYT

Separate authentication system, not linked to institutional authentication system

Data

CIMMYT

Data

Some data dependencies have led to unplanned processes

...

sample linkage to

...

project and UUID implementation have caused CIMMYT to use one project for all datasets

CIMMYT

Variants have not been implemented to facilitate analyses for the “same” marker used in different platforms, potentially with different names, over time

CIMMYT

Marker groups are based on markers instead of variants

...

not linked to traits, phenotypes, etc.

CIMMYT

Current data structure

...

seems to prevent the storage of data linked to each genotypic call or data point

...

VCF metadata is not preserved

...

QC values can’t be associated with data points

VCF data apart from GT isn’t preserved

CIMMYT

Allele frequency data cannot be stored or retrieved easily

CIMMYT

Across ST and GOBii no clear model for how to store “consensus” calls or “reference” genotype or fingerprint constructed from different samples over time

CIMMYT

In an integrated system, many fields of information may be duplicated and sometimes have different “IDs” e.g. ID for germplasm in CB and new ID for germplasm in GOBii

IRRI

Loader

Challenges

...

Manual mapping is tedious

  • different fields from different service providers

...

Errors with certain characters in sample files generated by B4R

...

Indels not supported unless encoded as +/-

...

Data often requires cleaning prior to upload

CIMMYT

View file
name20210728_BRiN_CIMMYT_genotypic_data_overview.pdf

...