2021-08-10 Meeting notes

Date

Aug 10, 2021

Participants

  • @Evan Rees

  • @Dave Matthews

  • @Former user (Deleted)

  • @Pierre Larmande

  • @Sebastian Raubach

Goals

  • Progress updates

  • Blockers

Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

 

Access to CUVPN

@Sebastian Raubach

  • Unable to access CUVPN due to Cisco AnyConnect configuration



polypoloid dataset

@Evan Rees

@Yaw Nti-Addae

Awaiting decision on polyploid dataset

 

lettuce full dataset

@Former user (Deleted)

Is lettuce full genome too large to work with?

~200M markers / ~6B datapoints (markers * accessions).

Full dataset is 10’s of GB, probably not feasible to manipulate

Loading hapmap format will impact performance - genotype data are highly condensed

Difference between hapmap and VCF is large - not a fair comparison

Try ‘plain’ VCF with only GT fields

Need to account for conversion time from VCF to hapmap

Gigwa had limitation for number of markers to query?

 

data formats

@Former user (Deleted)

timings bcftools for slicing / processing / conversion

 

data export

@Pierre Larmande

need to specify when exporting VCF if annotations are present - will impact performance

 

parity / equivalence

@Pierre Larmande

when comparing datasets, need to account for annotations / info content in VCF

 

lettuce scaling

@Pierre Larmande

see if import times are linear across chromosomes

 

Measuring resource usage

@Pierre Larmande

Do we want to compare memory footprint?

Compare efficiency across platforms

CPU usage / multithreading

 

measuring time

@Evan Rees

Are we timing things consistently? Measuring resource usage

Action items

@Dave Matthews Export rice data to hapmap and flapjack
@Evan Rees schedule time with Dave to troubleshoot GDM
@Evan Rees rename markers in /full/Lactuca__project1__2021-06-28__12983735variants__HAPMAP.zip
@Evan Rees reformat table on benchmarking comparisons page
@Former user (Deleted) will add data to shared
Reps from each platform - figure out how to measure and record RAM usage, CPU usage, timings
@Former user (Deleted) look into tools for measuring resource usage by PID (procpath?)

Decisions

  1. Record timings for import from VCF, plain VCF, flapjack, and hapmap as each platform is able