IUPAC to Biallelic Transposition

IUPAC input data runs through a transformation upon being placed into the HDF5 backend. A higher-level overview can be seen at https://gobiiproject.atlassian.net/wiki/spaces/GD/pages/32702625.

 

These conversions do their best to implement the IUPAC standard(1) and the TASSEL implementation(2), with the assumption that tri and quad base in IUPAC are mapped to an unknown (NN) bi-allelic state and ‘gaps’ (e.g. '.') are deletions.

 

Note: table reflects live data in Bitbucket in IUPACmatrixToBI.

IUPAC entity

Conversion

Source Material

IUPAC entity

Conversion

Source Material

A

AA

1

T

TT

1

C

CC

1

G

GG

1

W

AT

1

R

AG

1

M

AC

1

K

TG

1

Y

TC

1

S

GC

1

B

NN

1 (multi-base to unknown)

D

NN

1 (multi-base to unknown)

H

NN

1 (multi-base to unknown)

V

NN

1 (multi-base to unknown)

N

NN

1 (multi-base to unknown)

0

+-

2

+

++

2

-

--

2

.

--

1 (gap to deletion)

 

Sources:
1 -  http://www.bioinformatics.org/sms/iupac.html

2 - https://bitbucket.org/tasseladmin/tassel-5-source/wiki/UserManual/Appendix/NucleotideCodes