2021-08-10 Meeting notes

Date

Aug 10, 2021

Time	Item	Presenter	Notes

Item	Presenter	Notes
Access to CUVPN	@Sebastian Raubach	Unable to access CUVPN due to Cisco AnyConnect configuration
polypoloid dataset	@Evan Rees @Yaw Nti-Addae	Awaiting decision on polyploid dataset
lettuce full dataset	@Former user (Deleted)	Is lettuce full genome too large to work with? ~200M markers / ~6B datapoints (markers * accessions). Full dataset is 10’s of GB, probably not feasible to manipulate Loading hapmap format will impact performance - genotype data are highly condensed Difference between hapmap and VCF is large - not a fair comparison Try ‘plain’ VCF with only GT fields Need to account for conversion time from VCF to hapmap Gigwa had limitation for number of markers to query?
data formats	@Former user (Deleted)	timings bcftools for slicing / processing / conversion
data export	@Pierre Larmande	need to specify when exporting VCF if annotations are present - will impact performance
parity / equivalence	@Pierre Larmande	when comparing datasets, need to account for annotations / info content in VCF
lettuce scaling	@Pierre Larmande	see if import times are linear across chromosomes
Measuring resource usage	@Pierre Larmande	Do we want to compare memory footprint? Compare efficiency across platforms CPU usage / multithreading
measuring time	@Evan Rees	Are we timing things consistently? Measuring resource usage

@Dave Matthews Export rice data to hapmap and flapjack

@Evan Rees schedule time with Dave to troubleshoot GDM

@Evan Rees rename markers in /full/Lactuca__project1__2021-06-28__12983735variants__HAPMAP.zip

@Evan Rees reformat table on benchmarking comparisons page

@Former user (Deleted) will add data to shared

Reps from each platform - figure out how to measure and record RAM usage, CPU usage, timings

@Former user (Deleted) look into tools for measuring resource usage by PID (procpath?)

Record timings for import from VCF, plain VCF, flapjack, and hapmap as each platform is able