Contents
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Genotyping data management systems
System | Group |
---|
Contact | VM Hostname | phase | |
---|---|---|---|
JHI |
MontyDB
Cornell
| ||||||||
Cornell |
Joel
Breedbase
BTI
Ask Tetima
Gigwa
CIRAD
Guilhem
PHG
Cornell
Ask Ed
BCF
Broad Institute
GDR-BIMS (https://github.com/laceysanderson )
Un. of Washington
Dori?
Patrick
GOBii |
| |||||||||
CIRAD |
| |||||||||
BTI |
| |||||||||
MontyDB | Cornell McCouch Lab |
| ||||||||
Broad Institute |
| |||||||||
Cornell Buckler Lab | Ask Ed |
| ||||||||
Breeding Insight |
| |||||||||
University of Washington | Dori? |
| ||||||||
Patrick |
|
VM allocations
VM Hostname | Status | Server Pool | Assignment | username | |||||
---|---|---|---|---|---|---|---|---|---|
| cbsugobii09 | Breedbase |
breedbase | ||||||||||
| cbsugobii09 | GDM | gadm | |||||||
| cbsugobii10 | Gigwa | gigwa | |||||||
| cbsugobii10 | PHG | phg | |||||||
| cbsugobii11 | Germinate |
jhi | |||||||||
| cbsugobii11 | MontyDB |
montydb |
Each VM has the following resources:
8 CPUs
64 GB RAM
2 TB SSD
/storage mounted volumnvolume
/shared_data mounted volumn
Users
...
Username
...
User
...
gadm
...
system
...
yaw
...
dave
...
francisco
volume
Datasets
Dataset | Format | Location |
---|---|---|
Maize NAM | CSV | /shared_data/test_data/NAM_HM32/csv |
Simulated datasets | ||
polyploid data in VCF | Moira share a dataset - invite to next meeting | |
indel data | ||
rice high density array | vcf | The Rice High Density Array is : 700K SNPs x ~1500 samples SNPs only vcf too Francisco loaded to Gigwa (own instance) already no problem http://rs-bt-mccouch4.biotech.cornell.edu/staged_data/CSHL_EVA_Release_HDRA.tar Hapmap: cbsugobiizvm19:/shared_data/test_data/genomics-systems-comparison/rice/Dataset.hmp.txt |
African rice | https://gigwa.ird.fr/gigwa/?module=AfricanRice available as vcf metadata availability? | |
3,000 rice genomes | too large? 29M SNPs | |
lettuce Wageningen Public dataset | vcf | 12M markers x 500 accessions 3 vcfs - one SNPs, one indels, one structural variants 40 GBs |
/pub/CNSA/data2/CNP0000335/Other/variation |
Actions:
- presentations on polyploid data
- user accounts for participants
- identify benchmarking criteria
Benchmarking suggestions
Start with a SNP dataset - vcf ? - check with Sebastian and Breedbase (Titima)
Gigwa - 10s Ms markers x 1000s samples
Loading times?
Extract times - increasing marker and sample numbers?
Start with overview of features so we can understand better benchmarking
Action items April 29th
All - check can access site and load database - Gigwa still to be loaded to VM. Guilhem can access site but needs a user name
Yaw - update confluence so all participants can edit
Yaw - set up slack channel
Yaw - Have user accounts been set up? Set up and distribute. Need user names for people setting up databases.
Yaw - request VMs are not open for security
Dave to set up training with Liz to learn how to use GDM
Yaw make sure to invite Moira to next meeting to discuss polyploid data
Invite Breedbase to next meeting
Liz put together a table overview of features that we can all align against
Schedule a demo of each system features for a future meeting
Features of Gigwa
Basic filtering functionalities
By chromosome / sequence
By position
By variant type
Advanced filtering functionalities
By functional annotations
By genotype patterns
Using multiple groups of samples
By metadata
Visualization
Allows consulting genotypes
Graphical representation
File formats
Multiple import formats
Multiple export formats
Data peculiarities
Support for INDELs
Support for polyploïds
Support for phasing information
Support metadata
Interoperability
Standard API support (as a data-provider)
Standard API support (as a data-consumer)
Supports feeding genotypes through API
Link to external software (e.g. Jbrowse)
Software availability
Distributable
Open-source
Available as Docker container
Embeddable
...
Lettuce | hapmap flapjack |
| |||||
potato (polyploid) | VCF |
|