This meeting provided a ton of useful feedback on GOBii and the GDM platform. This document focuses on the suggestions for improvement provided at that meetingmeetings on 7/28/21 and 8/4/21.
...
Category |
---|
...
Feedback |
---|
...
Benefits
centralized, searchable institutional record genotype data
not dependent on external services
Reporter | Notes | GSD | ||
---|---|---|---|---|
General | Can’t complete genotyping analysis workflows smoothly | CIMMYT | ||
“run and maintain” mode at CIMMYT pending GOBii / EBS integration |
...
Loader
...
Benefits
Flexible
Templates
...
CIMMYT | |||
Want automatic project creation from BMS → GOBii | ICRISAT | ||
Direct upload of standard formats and automatic metadata harvesting | ICRISAT | ||
intersection data file extractor (samples + markers) | ICRISAT | ||
Loader | Loader validations stricter than db req’s Can’t load 2 dnaruns for the same sample |
...
CIMMYT | CIMMYT uses same experiment / project for all datasets | |
Diagnosing errors - over- |
...
reliance on help desk | CIMMYT | ||
Lacking useful transformations during loading Update marker names that are based on sequence positions |
...
CIMMYT | |||
Steep learning curve | CIMMYT | ||
No tool for SNP recalling (for KASP markers – but unclear where/how |
...
this could happen) |
...
Extractor
...
Benefits
Fast
Email notifications
Can aggregate some data across datasets
Challenges:
...
CIMMYT | ||||
Manual mapping is tedious different fields from different service providers | IRRI | Alleviated with web-loader, templates | ||
Errors with certain characters when importing sample files generated by B4R | IRRI | |||
Indels not supported unless encoded as +/- | IRRI | |||
Data often requires cleaning prior to upload | IRRI | |||
Requirement to associate data with PI | IRRI | |||
Extractor | Inflexible query system Can’t select multiple dataset types for download Can’t extract by multiple factors - e.g. intersect of markers and samples | CIMMYT | ||
File delivery system convoluted |
...
...
QC” stats from KDC not provided to users during download | CIMMYT | ||
CAST |
...
Benefits
...
Challenges
...
Splitting on multiple criteria
...
Some combining of data across datasets
Cannot combine data from same samples in different datasets into one row | CIMMYT | ||
Does not facilitate the selection of marker groups to use in analyses | CIMMYT | ||
Timescope |
...
Provides important functionalities to enable the “safe” deleting of datasets, markers, and samples
Challenges:
...
Benefits
Large data storage for common data types
Flexible properties facilitate metadata storage
Marker groups allow the storage of different haplotypes associated with the same group of markers
...
Requires deployment of another tool instead of combining all CRUD functionalities in one tool | CIMMYT | ||
Separate authentication system, not linked to institutional authentication system |
Data
CIMMYT | ||
Data | Some data dependencies have led to unplanned processes |
...
sample linkage to |
...
project and UUID implementation have caused CIMMYT to use one project for all datasets | CIMMYT | ||
Variants have not been implemented to facilitate analyses for the “same” marker used in different platforms, potentially with different names, over time | CIMMYT | ||
Marker groups are based on markers instead of variants |
...
not linked to traits, phenotypes, etc. | CIMMYT | ||
Current data structure |
...
seems to prevent the storage of data linked to each genotypic call or data point |
...
VCF metadata is not preserved
...
QC values can’t be associated with data points VCF data apart from GT isn’t preserved | CIMMYT | ||
Allele frequency data cannot be stored or retrieved easily | CIMMYT | ||
Across ST and GOBii no clear model for how to store “consensus” calls or “reference” genotype or fingerprint constructed from different samples over time | CIMMYT | ||
In an integrated system, many fields of information may be duplicated and sometimes have different “IDs” e.g. ID for germplasm in CB and new ID for germplasm in GOBii |
IRRI
Loader
Challenges
...
Manual mapping is tedious
different fields from different service providers
...
Errors with certain characters in sample files generated by B4R
...
Indels not supported unless encoded as +/-
...
Data often requires cleaning prior to upload
CIMMYT |
View file | ||
---|---|---|
|
...