IUPAC to Biallelic Transposition
IUPAC input data runs through a transformation upon being placed into the HDF5 backend. A higher-level overview can be seen at Supported Dataset Types.
These conversions do their best to implement the IUPAC standard(1) and the TASSEL implementation(2), with the assumption that tri and quad base in IUPAC are mapped to an unknown (NN) bi-allelic state and ‘gaps’ (e.g. '.') are deletions.
Note: table reflects live data in Bitbucket in IUPACmatrixToBI.
IUPAC entity | Conversion | Source Material |
---|---|---|
A | AA | 1 |
T | TT | 1 |
C | CC | 1 |
G | GG | 1 |
W | AT | 1 |
R | AG | 1 |
M | AC | 1 |
K | TG | 1 |
Y | TC | 1 |
S | GC | 1 |
B | NN | 1 (multi-base to unknown) |
D | NN | 1 (multi-base to unknown) |
H | NN | 1 (multi-base to unknown) |
V | NN | 1 (multi-base to unknown) |
N | NN | 1 (multi-base to unknown) |
0 | +- | 2 |
+ | ++ | 2 |
- | -- | 2 |
. | -- | 1 (gap to deletion) |
Sources:
1 - http://www.bioinformatics.org/sms/iupac.html
2 - https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Appendix/NucleotideCodes