2021-09-16 Meeting notes

Date

Sep 16, 2021

Participants

  • @Evan Rees

  • @Yaw Nti-Addae

  • @Dave Matthews

  • @guilhem.sempere

  • @Elizabeth Jones

  • @Moira Sheehan

  • @Francisco Agosto

  • @Pierre Larmande

Goals

  •  

Discussion topics

Item

Presenter

Notes

Item

Presenter

Notes

 

@Evan Rees

  • Contiguous slices are incorrect - @guilhem.sempere will provide contiguous slices



@Elizabeth Jones

  • Is it worth comparing vcftools?

  • bcftools offers compression / indexing - is this comparable to loading?

 

@Yaw Nti-Addae

  • Want to include MontyDB and Germinate

  • Only have 3 VMs provisioned -

    • Once GOBii and Gigwa are done, will open to MontyDB and Germinate

  • Should we include PLINK?

    • Contingent on inclusion of bcftools

 

@Elizabeth Jones

  • Can we setup Germinate for JHI?

  • Need instructions from @Sebastian Raubach

 

@guilhem.sempere

  • Larger tetraploid and indel datasets?

  • Right now potato dataset represents both and is quite small

  • Not possible to export ENTIRE potato dataset to Hapmap as marker types are mixed

    • hapmap only supports biallelic markers Sorry I said that in the first place but it’s not true (at least for Gigwa’s implementation, as we found no official specs for hapmap unfortunately). I seem to remember that the problem lies with the fact that no allele delimiter is specified so in the end it’s feasible to export multiallelic data as long as all markers are SNPs. So in the end I placed a hapmap file here with only SNPs: /shared_data/test_data/genomics-systems-comparison/potato/potato__142479variants__38individuals.hapmap (but I think we should stick to VCF for that dataset)

 

@Dave Matthews

  • Disk space should reflect HDF5 file size

Action items

@Evan Rees create README for shared data folder - cleanup table & stats
@Evan Rees Run bcftools + plink benchmarks for slicing (coordinate w/ Germinate / @Sebastian Raubach) - wait til benchmarks for other systems are finalized
@Dave Matthews rerun lettuce slicing benchmarks w/ new markerlists
@Evan Rees setup meeting with @Sebastian Raubach
@Evan Rees confirm with each platform what import / export formats are supported
@Sebastian Raubach has Hapmap been implemented?
@Sebastian Raubach polyploid import?
@guilhem.sempere export potato dataset to hapmap → /shared_data/test_data/genomics-systems-comparison/potato/potato__142479variants__38individuals.hapmap (SNPs only)
EVERYONE: define which criteria we want to take into account in order to see how efficiently each system supports polyploïd datasets

Decisions

  1. Flat files should be stored in shared_data folder