Consensus calling update

Target release

2.2.4

Epic

 

Document status

inprogress

Document owner

@Yaw Nti-Addae

Designer

@Elizabeth Jones

Tech lead

@Vishnu Govindaraj

Technical writers

 

QA

@Elizabeth Jones

Current statement:

The term ‘favoring homozygotes' means that if there are equal frequencies of homozygous and heterozygous genotype calls, then the homozygous genotype is called. For example: if there are replicate samples AA AA AT and AT, then the consensus call will be AA. However, if there are equal frequencies of homozygous calls, eg replicate samples AA AA TT and TT, then the consensus call will be NN or missing, as there is an equal tie between two different homozygous calls. Equal frequencies of homozygous calls will take precedence, so that if replicate samples are AA, AT and TT, the consensus call will be NN.

Issues:

NN is being treated as a call, and so equal frequencies of NN and GT is returning NN, when the call should be a GT

Frequencies are being assessed at the homozygous versus het state, rather than the genotypic state, so when there are equal frequencies of 2 different homozygotes and a heterozygote, the first homozygotes seen is being called eg CC, CT and TT is returning CC, when this should be an NN

When there is a one sample replicate, the same call is not always being returned eg GG is being called as NN – this is a true bug

 Rules

Frequencies should be based on non-missing values ie if there is one CC and 5 NNs, the consensus call is CC (100% of non-missing calls are CC)

Consensus calls should return the highest frequency genotype (where a genotype is an exact match at both alleles (allele order is not considered); A/T is the same genotype as A/T or T/A; A/T is not a match to A/A; A/T is not a match to A/C). So TT AT AT TT AC AC TT should return a consensus call of TT (there are more hets in this case, but the hets are different genotypes)

Where there are equal frequencies of two different homozygotes and the frequency is greater than or equal to the frequency of any heterozygote (ie there is a hom-alt hom tie, or a hom-alt hom-het tie) NN is returned eg AA AT TT is NN; AA AT AC TT is NN; AA AA AT TT TT is NN

When there are equal frequencies of a genotype, and one of the genotypes is a homozygous and the other is a het, and the frequency is greater than the alternate homozygous call (ie a hom-het tie), the homozygous call is returned eg AA AA AT AT TT is  AA; AA AA AT AC TT is  AA