Nucleotide Four Letter Cleaning Transformation
The tetraploid data transformation relies on a standard interpretation of how tetraploid data shall be entered. The data entered is assumed to be in one of three normal forms:
XXXX
XYXYXYXYX
Z
Where X is a valid allele data element {ACGT+-}
Y is a valid separator element { / | , }
and Z is a ‘missing’ indicator.
In the first two cases, the output is AAAA, while the last case, a set of four unknowns (“NNNN”) will be supplied by the transformation.
Note: table reflects live data in Bitbucket in NucleotideSeparatorSplitter.
Allele | Conversion |
---|---|
A | A |
T | T |
C | C |
G | G |
N | N |
+ | + |
- | - |
Separator |
---|
| |
/ |
, |
Note: table reflects live data in Bitbucket in missingIndicators.txt
Missing Entity - from missingIndicators.txt |
---|
? |
Uncallable |
NTC |
Unknown |
Unreadable |