Genotyping data management systems
System | Group/Institution | Contact |
---|---|---|
Germinate | JHI | Sebastian |
MontyDB | Cornell | Francisco |
GDM | Cornell | Joel |
Breedbase | BTI | Ask Titima |
Gigwa | CIRAD | Guilhem |
PHG | Cornell | Ask Ed |
BCF | Broad Institute | |
GDR-BIMS (https://github.com/laceysanderson ) | Un. of Washington | Dori? |
Patrick |
VM allocations
VM Hostname | Status | Server Pool | Assignment | username |
---|---|---|---|---|
OFF | cbsugobii09 | Breedbase | breedbase | |
ON | cbsugobii09 | GDM | gadm | |
ON | cbsugobii10 | Gigwa | gigwa | |
OFF | cbsugobii10 | PHG | phg | |
ON | cbsugobii11 | Germinate | jhi | |
OFF | cbsugobii11 | MontyDB | montydb |
Each VM has the following resources:
8 CPUs
64 GB RAM
2 TB SSD
/storage mounted volumn
/shared_data mounted volumn
Users
Username | User |
---|---|
gadm | system |
yaw | |
dave | |
francisco |
Datasets
Dataset | Format | Location |
---|---|---|
Maize NAM | CSV | /shared_data/test_data/NAM_HM32/csv |
Simulated datasets | ||
polyploid data in VCF | Moira share a dataset - invite to next meeting | |
indel data | ||
rice high density array | vcf | The Rice High Density Array is : 700K SNPs x ~1500 samples SNPs only vcf too Francisco loaded to Gigwa (own instance) already no problem http://rs-bt-mccouch4.biotech.cornell.edu/staged_data/CSHL_EVA_Release_HDRA.tar |
African rice | https://gigwa.ird.fr/gigwa/?module=AfricanRice available as vcf metadata availability? | |
3,000 rice genomes | too large? 29M SNPs | |
lettuce Wageningen Public dataset | vcf | 12M markers x 500 accessions 3 vcfs - one SNPs, one indels, one structural variants 40 GBs https://www.nature.com/articles/s41588-021-00831-0/pub/CNSA/data2/CNP0000335/Other/variation |
Lettuce | hapmap flapjack | cbsugobiizvm19: /shared_data/test_data/genomics-systems-comparison/lettuce/ chr1/ Lactuca__project1__2021-06-24__1152198variants__FLAPJACK.fjzip Lactuca__project1__2021-06-24__1152198variants__HAPMAP.zip markerlists.zip full/ Lactuca__project1__2021-06-28__12983735variants__FLAPJACK.fjzip Lactuca__project1__2021-06-28__12983735variants__HAPMAP.zip |
Actions:
- presentations on polyploid data
- user accounts for participants
- identify benchmarking criteria
Benchmarking suggestions
Start with a SNP dataset - vcf ? - check with Sebastian and Breedbase (Titima)
Gigwa - 10s Ms markers x 1000s samples
Loading times?
Extract times - increasing marker and sample numbers?
Start with overview of features so we can understand better benchmarking
Action items April 29th
All - check can access site and load database - Gigwa still to be loaded to VM. Guilhem can access site but needs a user name
Yaw - update confluence so all participants can edit
Yaw - set up slack channel
Yaw - Have user accounts been set up? Set up and distribute. Need user names for people setting up databases.
Yaw - request VMs are not open for security
Dave to set up training with Liz to learn how to use GDM
Yaw make sure to invite Moira to next meeting to discuss polyploid data
Invite Breedbase to next meeting
Liz put together a table overview of features that we can all align against
Schedule a demo of each system features for a future meeting