Benchmarking comparisons

Contents

Logistics

Source files

All files are stored in a shared folder mounted on each VM at HOST:/shared_data/test_data/genomics-systems-comparison.

Files referenced in Table 1 and throughout this document are relative to the shared folder.

Output files

Each platform has a subdirectory within the outputs folder, which can be organized as seen fit. At least one output file for every benchmark should be retained and stored in this folder. It is not necessary to store replicates for the same benchmark.

Output files should be renamed to match the template {platform}-{name}-{format}.{extension}.

field

description

examples

field

description

examples

{platform}

name of platform / system

gobii, germinate, gigwa, breedbase

{name}

name of benchmark taken from 1st column of corresponding table

lettuce_chr1 , markers.lettuce_chr1_1M, samples.maize_5pct

{format}

long name of export format

flapjack, vcf, hapmap

{extension}

file extension (can be anything)

genotype, hmp.txt, vcf

Examples

gigwa-markers.lettuce_chr1_50pct-flapjack.genotype

platform format __|__ __|___ gigwa-markers.lettuce_chr1_50pct-flapjack.genotype ‾‾‾‾‾‾‾‾‾‾‾|‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ‾‾‾|‾‾‾‾ name extension

gobii-maize_chr1-hapmap.hmp.txt

platform format __|__ __|___ gobii-maize_chr1-hapmap.hmp.txt ‾‾‾‾|‾‾‾‾‾ ‾‾‾|‾‾‾ name extension

Table 1: Datasets

Each maize chromosome file should be loaded as a separate dataset. These will be used to benchmark querying samples from multiple datasets as in Table 4b. Only maize_chr1 needs to be benchmarked in Table 3.

name

polyploid

indels

format

file_size (MB)

num_markers

num_samples

num_missing

missingness

type

file

name

polyploid

indels

format

file_size (MB)

num_markers

num_samples

num_missing

missingness

type

file

lettuce_chr1

NO

NO

hapmap

1,582

1,152,198

440

20,220,080

0.04

2letter

lettuce/chr1/Lactuca__1152198variants__440individuals.renamed.hapmap

lettuce_full

NO

NO

hapmap

17,828

12,983,735

440

231,704,035

0.04

2letter

lettuce/full/Lactuca__project1__2021-06-28__12983735variants__HAPMAP.renamed/Lactuca__12983735variants__440individuals.renamed.hapmap

rice

NO

NO

hapmap

3,312

700,000

1,568

85,835,610

0.08

2letter

rice/Dataset.hmp.txt

potato

yes

yes

hapmap

37

142,479

38

0

0

4letter

potato/potato__142479variants__38individuals.hapmap

maize_chr1

NO

NO

hapmap

1,449

148,752

4,845

359,630,401

0.50

gbs

maize/All_SeeD_2.7_chr1_no_filter.unimputed.hmp.txt

maize_chr2

NO

NO

hapmap

1,122

115,173

4,845

280,942,230

0.50

gbs

maize/All_SeeD_2.7_chr2_no_filter.unimputed.hmp.txt

maize_chr3

NO

NO

hapmap

1,055

108,224

4,845

262,919,216

0.50

gbs

maize/All_SeeD_2.7_chr3_no_filter.unimputed.hmp.txt

maize_chr4

NO

NO

hapmap

923

94,726

4,845

235,124,013

0.51

gbs

maize/All_SeeD_2.7_chr4_no_filter.unimputed.hmp.txt

maize_chr5

NO

NO

hapmap

1075

110,328

4,845

265,417,151

0.50

gbs

maize/All_SeeD_2.7_chr5_no_filter.unimputed.hmp.txt

maize_chr6

NO

NO

hapmap

745

76,475

4,845

187,639,259

0.51

gbs

maize/All_SeeD_2.7_chr6_no_filter.unimputed.hmp.txt

maize_chr7

NO

NO

hapmap

785

80,517

4,845

195,381,118

0.50

gbs

maize/All_SeeD_2.7_chr7_no_filter.unimputed.hmp.txt

maize_chr8

NO

NO

hapmap

794

81,431

4,845

198,298,688

0.50

gbs

maize/All_SeeD_2.7_chr8_no_filter.unimputed.hmp.txt

maize_chr9

NO

NO

hapmap

705

72,368

4,845

176,850,405

0.50

gbs

maize/All_SeeD_2.7_chr9_no_filter.unimputed.hmp.txt

maize_chr10

NO

NO

hapmap

654

67,126

4,845

166,253,346

0.51

gbs

maize/All_SeeD_2.7_chr10_no_filter.unimputed.hmp.txt

Table 2a: Marker lists

Lists of markers used in the analysis. Contiguous vs non-contiguous indicates whether markers are consecutive along the genome. The data file includes IDs for all markers in the given range.

name

contiguous

count

range_start

range_end

file

name

contiguous

count

range_start

range_end

file

markers.lettuce_chr1_10pct

yes

114,877

0

16,000,000

lettuce/chr1/markerlists/markers.lettuce_chr1_10pct.txt

markers.lettuce_chr1_20pct

yes

228,493

0

32,000,000

lettuce/chr1/markerlists/markers.lettuce_chr1_20pct.txt

markers.lettuce_chr1_30pct

yes

346,711

0

47,000,000

lettuce/chr1/markerlists/markers.lettuce_chr1_30pct.txt

markers.lettuce_chr1_40pct

yes

458,369

0

63,000,000

lettuce/chr1/markerlists/markers.lettuce_chr1_40pct.txt

markers.lettuce_chr1_50pct

yes

578,777

0

79,000,000

lettuce/chr1/markerlists/markers.lettuce_chr1_50pct.txt

markers.lettuce_chr1_1K

no

1,000

NA

NA

lettuce/chr1/markerlists/markers.lettuce_chr1_1K.txt

markers.lettuce_chr1_10K

no

10,000

NA

NA

lettuce/chr1/markerlists/markers.lettuce_chr1_10K.txt

markers.lettuce_chr1_100K

no

100,000

NA

NA

lettuce/chr1/markerlists/markers.lettuce_chr1_100K.txt

markers.lettuce_chr1_1M

no

1,000,000

NA

NA

lettuce/chr1/markerlists/markers.lettuce_chr1_1M.txt

Table 2b: Sample lists

List of sample used for extraction benchmarks. All samples are in adjacent columns in the data input file. Each sample file contains the names of the first count samples in the data file.

name

count

file

name

count

file

samples.lettuce_chr1_5pct

22

lettuce/chr1/samplelists/samples.lettuce_chr1_5pct.txt

samples.lettuce_chr1_10pct

44

lettuce/chr1/samplelists/samples.lettuce_chr1_10pct.txt

samples.lettuce_chr1_25pct

110

lettuce/chr1/samplelists/samples.lettuce_chr1_25pct.txt

samples.lettuce_chr1_50pct

220

lettuce/chr1/samplelists/samples.lettuce_chr1_50pct.txt

samples.maize_5pct

242

maize/samplelists/samples.maize_5pct.txt

samples.maize_10pct

484

maize/samplelists/samples.maize_10pct.txt

samples.maize_25pct

1211

maize/samplelists/samples.maize_25pct.txt

samples.maize_50pct

2422

maize/samplelists/samples.maize_50pct.txt

Table 3: Dataset import / export

Time (in seconds) spent importing and exporting data to/from Flapjack, Hapmap, and VCF format, and disk space (in megabytes) occupied after upload.

See comments for discussion on VCF format: vcf.gz

  • Replicate each benchmark 3 times

    • Enter all 3 timings delimited by a semicolon

  • Enter U (unsupported) if your platform does not support a given benchmark

  • Enter F (failed) if a job fails due to system constraints

  • Report space_occupied in SI (1 MB = 1,000,000 Bytes)

 

dataset

platform

import_time (s)

space_occupied (MB)

export_time (s)

marker-oriented

sample-oriented

marker-oriented

sample-oriented

hapmap

vcf

flapjack

plink

vcf

hapmap

flapjack

lettuce_chr1

gigwa

326; 323; 327

294; 294; 294

387; 367; 361

374; 374; 390

438

335; 332; 334

268; 268; 267

487; 498; 491

gobii

320; 250; 273

84,770; 85,962; 85,041

1,793; 1,984; 1,503

U

2,028

U

935; 985; 979

896; 933; 926

germinate

118; 120; 118

U

160; 160; 159

 

1,020

U

106; 104; 107

201; 202; 204

montydb

 

 

 

 

 

 

 

 

breedbase

 

 

 

 

 

 

 

 

lettuce_full

gigwa

3,892; 3,841; 3,888

3,401; 3,379; 3,391

4,787; 4,903; 4,889

6,431; 7,552; 6,237

4,750

3,783; 3,726; 3,712

3,085; 3,052; 2,928

5710; 5721; 5719

gobii

5,819; 6,173; 6,482

 

U

U

22,851

U

11,442; 11,322; 11,228

10,811; 10,715; 10,839

germinate

1,716; 1,696; 1,732

U

4,214; 4,146; 4,185

 

17,233

U

1,022; 1,033; 1,078

2,366; 2,352; 2,344

montydb

 

 

 

 

 

 

 

 

breedbase

 

 

 

 

 

 

 

 

rice

gigwa

663; 669; 677

624; 627; 616

728; 722; 741

760; 749; 762

1,827

755; 778; 765

532; 550; 551

889; 890; 913

gobii

400; 423; 425

32,473; 32,496; 32,377

1576;

U

4,390

U

706; 721; 712

593; 599; 595

germinate

226; 222; 223

U

337; 336; 339

 

2,202

U

192; 189; 190

512; 515; 514

montydb

 

 

 

 

 

 

 

 

breedbase

 

 

 

 

 

 

 

 

potato

gigwa

6; 6; 7

8; 6; 6

13; 13; 13

U (ploidy)

12

9; 9; 6

5; 5; 4

10; 10; 10

gobii

 

 

 

 

 

U

 

U

germinate

9; 9; 9

 

 

 

12

U

5; 5; 5

3; 3; 3

montydb

 

 

 

 

 

 

 

U

breedbase

 

 

 

 

 

 

 

U

maize_chr1

gigwa

219; 221; 222

274; 282; 270

301; 288; 295

327; 321; 334 (!truncated since multi-allelic)

870.8

442; 439; 438

241; 238; 236

568; 574; 569

gobii

250; 454; 341; 301

 

1035; 912; 895

 

2,883

U

159; 156; 157

148; 144; 144

germinate

130; 130; 129

 

107; 106; 110

 

722

U

168; 175; 170

205; 216; 208

montydb

 

 

 

 

 

 

 

 

breedbase

 

 

 

 

 

 

 

 

Table 4a: Marker extraction

Time in seconds spent extracting contiguous and non-contiguous subsets of markers. Gigwa supports subsetting by genomic interval rather than an uploaded list of marker names, so timings are from the equivalent interval.

markers

gigwa

gobii

germinate

montydb

breedbase

hapmap

flapjack

hapmap

flapjack

hapmap

flapjack

hapmap

flapjack

hapmap

flapjack

markers.lettuce_chr1_10pct

28; 27; 27

68; 58; 57

102; 102; 101

97; 97; 96

12; 12; 12

23; 22; 23

 

 

 

 

markers.lettuce_chr1_20pct

54; 52; 52

110; 111; 112

205; 202; 203

192; 193; 191

22; 22; 21

43; 43; 43

 

 

 

 

markers.lettuce_chr1_30pct

80; 78; 80

165; 165; 167

298; 313; 309

291; 291; 293

30; 31; 31

63; 63; 64

 

 

 

 

markers.lettuce_chr1_40pct

107; 103; 103

227; 220; 217

411; 406; 407

385; 383; 385

39; 39; 39

81; 83; 82

 

 

 

 

markers.lettuce_chr1_50pct

132; 140; 128

270; 271; 273

511; 514; 521

486; 487; 484

48; 47; 48

97; 96; 95

 

 

 

 

markers.lettuce_chr1_1K

1; 1; 1

1; 1; 1

2; 2; 2

5; 2; 2

3; 3; 3

2; 2; 2

 

 

 

 

markers.lettuce_chr1_10K

5; 5; 5

7; 7; 7

11; 11; 10

10; 10; 10

4; 4; 4

3; 3; 4

 

 

 

 

markers.lettuce_chr1_100K

48; 48; 46

73; 71; 74

93; 93; 95

88; 87; 87

11; 11; 11

19; 19; 19

 

 

 

 

markers.lettuce_chr1_1M

469; 478; 482

718; 717; 728

891; 893; 900

839; 840; 843

81; 83; 81

165; 163; 165

 

 

 

 

Table 4b. Sample Extraction

Maize sample lists should be extracted across all 10 chromosome datasets.

samples

gigwa

gobii

germinate

montydb

breedbase

hapmap

flapjack

hapmap

flapjack

hapmap

flapjack

hapmap

flapjack

hapmap

flapjack

samples.lettuce_chr1_5pct

42; 43; 41

68; 68; 72

945; 955; 943

921; 923; 924

30; 31; 29

12; 12; 12

 

 

 

 

samples.lettuce_chr1_10pct

71; 72; 70

112; 109; 112

959; 967; 969

922; 917; 926

33; 33; 33

21; 22; 21

 

 

 

 

samples.lettuce_chr1_25pct

172; 172; 173

240; 238; 244

962; 978; 955

932; 934; 932

42; 41; 41

48; 49, 49

 

 

 

 

samples.lettuce_chr1_50pct

297; 300; 304

455; 451; 454

991; 994; 989

934; 928; 934

58; 58; 58

94; 93; 94

 

 

 

 

samples.maize_5pct

76; 76; 79

113; 113; 113

 

 

32; 34; 33

50; 49; 51

 

 

 

 

samples.maize_10pct

126; 125; 126

192; 191; 190

 

 

54; 55; 55

74; 74; 73

 

 

 

 

samples.maize_25pct

270; 268; 259

434; 407; 407

 

 

110; 111; 113

162; 160; 159

 

 

 

 

samples.maize_50pct

482; 490; 488

782; 785; 817

 

 

201; 199; 202

298; 298; 297