TODO
TODO
|
Follow the file path in the QC email notification to access output files as below:
This file contains genotype data extracted from GDM during QC process |
Dataset Summary
|
F1 Pedigree TestWhere germplasm_type for samples genotyped have been identified as F1, and germplasm_par1 and germplasm_par2 fields have been identified as germplasm_names that match samples in the same dataset, F1 allele match to the identified parents can be calculated. To calculate F1 match, an expected F1 is first derived which is then compared to the F1 progeny. The expected F1 can only be derived if there is no missing or heterozygous data in either parent. In the case below, only 7 values can be derived for the expected F1.
The ‘par_1 contained’ calculation looks at how many alleles from Parent1 are contained in the SampledF1 i.e. how many of marker alleles in the SampledF1 can be explained by the Parent1 contribution. In this case 9/10 of the SampledF1 marker alleles could have been derived from Parent1, so the result is 90% P1_contained. For the ‘par_2 contained’ calculation, 7/9 of the SampledF1 marker alleles could have been derived from Parent2, so 78% P2_contained. The calculation of Percent_F1_match is based on the number of marker genotype calls that exactly match between the SampledF1 and the derived F1, as a percent of the total number of markers that are non-missing or non-heterozygous in both parents. An exact match has to be both alleles matching and so AA and AA are a 100% match, but AA and AT are a zero match
|
ReproducibilityPair-wise comparison between all samples with exact matches (case sensitive) for the metadata field names. For example, samples A,B, and C having the same germplasm_external_code=10001 will have 3 (AB, AC, BC) reproducibility comparisons.
Reproducibility calculations depend on the Data QC#dataset_type. For all dataset_types, if there is any missing data (NN or N) in either sample the marker will be ignored in the calculation.
|
Similarity MatrixPair-wise calculation of genotypic similarity among all samples, with sample metadata provided above and left of the matrix. The calculation is displayed as an symmetric matrix (diagonals are a comparison of the same sample and should always = 1) with column names identical to row names. For example [Table 1]:
In the above table, samples 1 and 2 are very similar, whereas sample 3 is less so; the genetic similarity between samples 1 and 3 is 0.4. Genetic similarity ranges from 0 (no similarity) to 1 (identical) and is calculated as the average of the comparison scores across all markers using the following scoring methodology for markers with valid allele calls:
|
Similarity Matrix Column-wiseAlternative representation of the Similarity Matrix. Each pair-wise comparison result is outputted in its own row with metadata of compared samples written with the following structure: < Sample 1 meta fields > , Similarity score , < Sample 2 meta fields >
|
Summary by Markers
|
Summary by Samples
|
Summary Samples AveragesProvides averaged statistics of samples by following sample metadata fields
|
Summary Samples Chisq A chisq test for samples identified as having the germplasm_type listed below. Deviations from the expected allele ratios below are calculated. If dnasample_group or dnasample_group_cycle[3] fields are provided, the chisq tests are carried out by these fields. H0 (Null Hypothesis): Samples across marker support expected segregation ratio of specified germplasm population. H1 (Alternative Hypothesis): Reject H0 Calculation: Zygotes of samples grouped by meta field criteria, with missing values excluded, are counted as nz.The total number of zygotes are ntotal. The following formulation is performed on all zygotes to calculate Chisq using lookup tables in the Germplasm Population Distributions section below:
Germplasm Population Distributions The following table is used for Chisq and TwoLetter Nucleotide and SSR. For SSR, the most frequent Allele pair is labelled as Homozygote 1, the second most frequent Allele pair is labelled as Homozygote 2 and the Heterozygote between the two Homozygotes is labelled Heterozygote.
Dominant Germplasm Type Distributions Dominant Nucleotide only. Two sets of statistics are calculated: Major-Pairing and Minor-Pairing.
|