Nucleotide Four Letter Cleaning Transformation

The tetraploid data transformation relies on a standard interpretation of how tetraploid data shall be entered. The data entered is assumed to be in one of three normal forms:

XXXX
XYXYXYXYX
Z

Where X is a valid allele data element {ACGT+-}

Y is a valid separator element { / | , }

and Z is a ‘missing’ indicator.

In the first two cases, the output is AAAA, while the last case, a set of four unknowns (“NNNN”) will be supplied by the transformation.

Note: table reflects live data in Bitbucket in NucleotideSeparatorSplitter.

Allele	Conversion

Allele	Conversion
A	A
T	T
C	C
G	G
N	N
+	+
-	-

Separator

Separator
\|
/
,

Note: table reflects live data in Bitbucket in missingIndicators.txt

Missing Entity - from missingIndicators.txt

Missing Entity - from missingIndicators.txt
?
Uncallable
NTC
Unknown
Unreadable

GDM Documentation

Nucleotide Four Letter Cleaning Transformation

Analytics