Tools for polyploids


PurposeTool FunctionsInput filesOutput filesPublication/ ContactNotes for use

Genotype calling

UNEAK  


FASTQ or BAMHapMap

Lu et al. 2013

Pipeline is designed for diploidized polyploids only

Genotype calling

HaploTag  


Intermediate files from UNEAKCustom text file

Tinker et al. 2016

Is specialized for self-fertilizing polyploids

Genotype calling

FreeBayes 


BAMVCF

McKenna et al. 2010 

Can output polyploid genotypes but require a reference genome

Genotype calling

GATK


FASTQVCF

Garrison and Marth 2012

Can output polyploid genotypes  but require a reference genome

Genotype calling

 EBG 


depth matrixtab-delimited text

Blischak et al. 2018

Imports read depth from other pipelines to estimate auto- or allopolyploid genotypes but requires allele frequency estimations from the parent species for allopolyploids. 

Genotype calling

Updog


depth matrixcustom R object

Gerard et al. 2018

The R package updog estimates polyploid genotypes from read depth, modeling preferential pairing and accounting for multiple technical issues that can arise with sequencing data, and can output posterior mean genotypes reflecting genotype uncertainty  but requires excessive amounts of computational time to run.

Genotype calling

SuperMASSA


Delimited text or VCFCustom text or VCF

Serang et al. 2012

SuperMASSA and fitPoly (Voorrips et al. 2011) were originally designed for calling polyploid genotypes from fluorescence-based SNP assays and have been adapted for sequencing data, but fail to call genotypes when low read depth results in high variance of read depth ratios. 

Can be used when ploidy is unknown (developed for sugarcane) - useful for yam?

Genotype calling

polyRAD


VCF, UNEAK, Stacks, TASSEL-GBSGAPIT, rrBLUP, GWASpoly, polymapR, MAPpoly, custom matrix

 Lindsay V. Clark,  Alexander E. Lipka and Erik J. Sacks; 

The software polyRAD is designed on the principle originally proposed by Li (2011) that it is not necessary to call genotypes with complete certainty in order to make useful inferences from sequencing data. Initially, SNP discovery is performed by other software such as TASSEL (Glaubitz et al. 2014) or Stacks (Catchen et al. 2013), with or without a reference genome, then allelic read depth is imported into polyRAD from those pipelines or the read counting software TagDigger (Clark and Sacks 2016). In polyRAD, one or several ploidies can be specified, including any level of auto- and/or allopolyploidy, allowing inheritance modes to vary across the genome. Genotype probabilities are estimated by polyRAD under a Bayesian framework, where priors are based on mapping population design, Hardy-Weinberg equilibrium (HWE), or population structure, with or without linkage disequilibrium (LD) and/or self-fertilization.

Genotype callingfitPoly
depth matrixcustom R objectVoorips et al. 2011Genotyping assays for bi-allelic markers (e.g. SNPs) produce signal intensities for the two alleles. 'fitPoly' assigns genotypes (allele dosages) to a collection of polyploid samples based on these signal intensities.

Mapping

MAPpoly




Mollinari and Garcia 2018

MAPpoly is an R package to construct genetic maps in autopolyploid bi-parental populations with even ploidy levels. In its current version, it can handle ploidy levels up to 8 when using hidden Markov models (HMM), and up to 12 when using the two-point simplification. Also, for all individuals in the F1 offspring, it computes the probability distribution of multiallelic genotypes in the whole genome given the estimated genetic map. This information can be easily used to perform QTL analysis using the software QTLpoly.

Mapping

TetraploidSNPMap




Hackett et al. 2016

TetraploidSNPMap, makes full use of the dosage data, and has new facilities for displaying the clustering of single nucleotide polymorphisms, rapid ordering of large numbers of single nucleotide polymorphisms using a multidimensional scaling analysis, and phase calling. It also has new routines for quantitative trait locus mapping based on a hidden Markov model, which use the dosage data to model the effects of alleles from both parents simultaneously. A Windows-based interface facilitates data entry and exploration.

Mapping

polymapR




Bourke et al. 2018

polymapR is an R package for genetic linkage analysis and integrated genetic map construction from bi-parental populations of outcrossing autopolyploids. It can currently analyse triploid, tetraploid and hexaploid marker datasets. Currently, the map construction is based on pairwise (or two-point) marker analysis.

MappingPolyGembler


Zhou et al., unpublishedproposes a novel approach to the creation of linkage maps in outcrossing polyploids, and is also suitable for diploid mapping. Interestingly, it combines a haplotyping algorithm[derivedfromthepolyHapalgorithm(Suetal.,2008)] to first generate phased multi-marker scaolds or haplotypes. These are then used to calculate recombination frequencies by counting recombination events both within and between these scaolds, leading to an extremely simple estimate of r which hasnocorrespondingLODscore.Scaoldsareclusteredusinga graphpartitioningalgorithm,andthereafter,thecomputationally ecient CONCORDE traveling-salesman solver is employed to ordermarkers[asisdoneforexampleinTSPmap(Monroeetal., 2017)].
MappingMDSMap


Preedy and Hackett, 2016a novel approach for determining a map order using multi-dimensional scaling. Certain combinations of markers provide very unambiguous information about co-inheritance, whereas others do not. Therefore, weights are required to prevent imprecise combinations from exerting a large influence on the map order. Join map can also do this but is very slow for higher numbers of marker and is therefore of limited use with current highdensity marker datasets. The MDSMap approach can achieve similar results in a fraction of the time,and takes as its  input the same information as JoinMap does, the pairwise recombination frequencyestimatesandlogarithmofodds(LOD)scores,making thistoolsuitableforlinkagemapconstructionatanyploidylevel, providedpairwiselinkageanalysiscanbeperformed. 
MappingLPmerge package in R 


Endelman and Plomion, 2014LPmerge uses linear programming to remove the minimum number of constraints in marker order in order to create a conflict-free consensus map.It was originally developed to create integrated genetic maps from multiple (diploid) populations.That aid,polyploids contain multiple copies of each chromosome and therefore also present a similar challenge if we consider each homolog map as originating from a different population, with non-simplex markers as bridging markers (mapped in more than one population).Homolog-specificmaps are still regularly generated in polyploid mapping studies[e.g.,in potato (Bourkeetal.,2015,2016),rose(Vukosavljevetal.,2016) or sweet potato (Shirasawa et al., 2017)]
HaplotypingpolyHap


 Su et al., 2008


HaplotypingSATlotyper


Neigenfind et al., 2008
HaplotypingHapCompass


Aguiar and Istrail, 2013HapCompass performed best at higher ploidies (6⇥and higher) (Motazedi et al., 2017).
HaplotypingHapTree


Berger et al., 2014produced more accurate haplotypes for triploid and tetraploid data
HaplotypingSDhaP


Das and Vikalo, 2015
HaplotypingSHEsisplus


Shen et al., 2016
HaplotypingTriPoly


Motazedi et al., unpublished







Genetic studies - QTL mapping

QTLPOLY




Pereira et al., 2019

Multiple QTL mapping in autopolyploids: a random-effect model approach with application in a hexaploid sweetpotato full-sib population

Genetic studies - QTL MappingTSNPM


Hackett et al., 2017TetraploidSNPMap (TSNPM) uses SNP dosage data to either construct a linkage map (as already described) or perform QTL interval mapping. In contrast to its predecessor, TSNPM can analyze all marker segregation types, and allows the user to explore dierent QTL models at detected peaks. At its core is an algorithm to determine identity-by-descent (IBD) probabilities for the ospring of the population, which are then usedinaweightedregressionperformedacrossthegenome.

Genetic studies - GWAS

GWASpoly




Rosyara UR, De Jong WS, Douches DS, Endelman JB. Plant Genome. 2016 Jul;9(2). doi: 10.3835/plantgenome2015.08.0073.

Software for Genome-Wide Association Studies in Autopolyploids and Its Application to Potato.

Handles the kinship matrix K well. Three different forms of K were tested in the development of the package, with the canonical relationship matrix (VanRaden, 2008) [termed the realized relationship matrix by the authors (Rosyara et al., 2016)] found to best control against inflation of significance values. This is also the default K provided in the GWASpoly package.

Genetic studies - GWASSHEsisPlus


 (Shen et al., 2016).Does not look critically at the kinship matrix