IUPAC to Biallelic Transposition

IUPAC input data runs through a transformation upon being placed into the HDF5 backend. A higher-level overview can be seen at Supported Dataset Types.

These conversions do their best to implement the IUPAC standard(1) and the TASSEL implementation(2), with the assumption that tri and quad base in IUPAC are mapped to an unknown (NN) bi-allelic state and ‘gaps’ (e.g. '.') are deletions.

Note: table reflects live data in Bitbucket in IUPACmatrixToBI.

IUPAC entity	Conversion	Source Material

IUPAC entity	Conversion	Source Material
A	AA	1
T	TT	1
C	CC	1
G	GG	1
W	AT	1
R	AG	1
M	AC	1
K	TG	1
Y	TC	1
S	GC	1
B	NN	1 (multi-base to unknown)
D	NN	1 (multi-base to unknown)
H	NN	1 (multi-base to unknown)
V	NN	1 (multi-base to unknown)
N	NN	1 (multi-base to unknown)
0	+-	2
+	++	2
-	--	2
.	--	1 (gap to deletion)

Sources:
1 - http://www.bioinformatics.org/sms/iupac.html

2 - https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Appendix/NucleotideCodes

GDM Documentation

IUPAC to Biallelic Transposition

Analytics