Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

VM Hostname

Status

Server Pool

Assignment

cbsugobiizvm20.biohpc.cornell.edu

Status
colourRed
titleoff

cbsugobii09

Germinate

cbsugobiizvm23.biohpc.cornell.edu

Status
colourGreen
titleon

cbsugobii09

GDM

cbsugobiizvm19.biohpc.cornell.edu

Status
colourGreen
titleon

cbsugobii10

Gigwa

cbsugobiizvm22.biohpc.cornell.edu

Status
colourRed
titleoff

cbsugobii10

MontyDBPHG

cbsugobiizvm21.biohpc.cornell.edu

Status
colourGreen
titleon

cbsugobii11

BreedbaseMontyDB

Status
colourRed
titleoff

cbsugobii11

PHGBreedbase

Each VM has the following resources:

...

Dataset

Format

Location

Maize NAM

CSV

/shared_data/test_data/NAM_HM32/csv

Simulated datasets

polyploid data in VCF

Moira share a dataset - invite to next meeting

indel data

rice high density array

vcf

The Rice High Density Array is : 700K SNPs x ~1500 samples

SNPs only

vcf too

Francisco loaded to Gigwa (own instance) already no problem

http://rs-bt-mccouch4.biotech.cornell.edu/staged_data/CSHL_EVA_Release_HDRA.tar

African rice

https://gigwa.ird.fr/gigwa/?module=AfricanRice

available as vcf

metadata availability?

3,000 rice genomes

too large? 29M SNPs

lettuce Wageningen

Public dataset

vcf

12M markers x 500 accessions

3 vcfs - one SNPs, one indels, one structural variants

40 GBs

https://www.nature.com/articles/s41588-021-00831-0

/pub/CNSA/data2/CNP0000335/Other/variation


ftp.cngb.org/pub/CNSA/data2/CNP0000335/Other/variation

Actions:

  •  presentations on polyploid data
  •  user accounts for participants
  •  identify benchmarking criteria

Benchmarking suggestions

Start with a SNP dataset - vcf ? - check with Sebastian and Breedbase (Titima)

Gigwa - 10s Ms markers x 1000s samples

Loading times?

Extract times - increasing marker and sample numbers?

Start with overview of features so we can understand better benchmarking

Action items April 21st

All - check can access site and load database - Gigwa still to be loaded to VM. Guilhem can access site but needs a user nameAdd team to Atlassian site

Yaw - update confluence so all participants can edit

Yaw - set up slack channel

Yaw - Have user accounts been set up? Set up and distribute. Need user names for people setting up databases.

Yaw - request VMs are not open for security

Dave to set up training with Liz to learn how to use GDM

Make Yaw make sure to invite Moira to next meeting to discuss polyploid data

Invite Breedbase to next meeting

Liz put together a table overview of features that we can all align against

Schedule a demo of each system features for a future meeting

Features of Gigwa

Basic filtering functionalities

  • By chromosome / sequence

  • By position

  • By variant type

Advanced filtering functionalities

  • By functional annotations

  • By genotype patterns

  • Using multiple groups of samples

  • By metadata

Visualization

  • Allows consulting genotypes

  • Graphical representation

File formats

  • Multiple import formats

  • Multiple export formats

Data peculiarities

  • Support for INDELs

  • Support for polyploïds

  • Support for phasing information

  • Support metadata

Interoperability

  • Standard API support (as a data-provider)

  • Standard API support (as a data-consumer)

  • Supports feeding genotypes through API

  • Link to external software (e.g. Jbrowse)

Software availability

  • Distributable

  • Open-source

  • Available as Docker container

  • Embeddable

Data compactness (percentage of disk space occupied by the data once in the system, compared to that occupied by the original VFC file)