Nucleotide Four Letter Cleaning Transformation

The tetraploid data transformation relies on a standard interpretation of how tetraploid data shall be entered. The data entered is assumed to be in one of three normal forms:

  • XXXX

  • XYXYXYXYX

  • Z

Where X is a valid allele data element {ACGT+-}

Y is a valid separator element { / | , }

and Z is a ‘missing’ indicator.

In the first two cases, the output is AAAA, while the last case, a set of four unknowns (“NNNN”) will be supplied by the transformation.

 

Note: table reflects live data in Bitbucket in NucleotideSeparatorSplitter.

Allele

Conversion

Allele

Conversion

A

A

T

T

C

C

G

G

N

N

+

+

-

-

 

Separator

Separator

|

/

,

 

Note: table reflects live data in Bitbucket in missingIndicators.txt

Missing Entity - from missingIndicators.txt

Missing Entity - from missingIndicators.txt

?

Uncallable

NTC

Unknown

Unreadable