Tools for polyploids
Purpose | Tool | Functions | Input files | Output files | Publication/ Contact | Notes for use |
---|---|---|---|---|---|---|
Genotype calling | UNEAK | FASTQ or BAM | HapMap | Pipeline is designed for diploidized polyploids only | ||
Genotype calling | HaploTag | Intermediate files from UNEAK | Custom text file | Is specialized for self-fertilizing polyploids | ||
Genotype calling | FreeBayes | BAM | VCF | Can output polyploid genotypes but require a reference genome | ||
Genotype calling | GATK | FASTQ | VCF | Can output polyploid genotypes but require a reference genome | ||
Genotype calling | EBG | depth matrix | tab-delimited text | Imports read depth from other pipelines to estimate auto- or allopolyploid genotypes but requires allele frequency estimations from the parent species for allopolyploids. | ||
Genotype calling | Updog | depth matrix | custom R object | The R package updog estimates polyploid genotypes from read depth, modeling preferential pairing and accounting for multiple technical issues that can arise with sequencing data, and can output posterior mean genotypes reflecting genotype uncertainty but requires excessive amounts of computational time to run. | ||
Genotype calling | SuperMASSA | Delimited text or VCF | Custom text or VCF | SuperMASSA and fitPoly (Voorrips et al. 2011) were originally designed for calling polyploid genotypes from fluorescence-based SNP assays and have been adapted for sequencing data, but fail to call genotypes when low read depth results in high variance of read depth ratios. Can be used when ploidy is unknown (developed for sugarcane) - useful for yam? | ||
Genotype calling | polyRAD | VCF, UNEAK, Stacks, TASSEL-GBS | GAPIT, rrBLUP, GWASpoly, polymapR, MAPpoly, custom matrix | , and | The software polyRAD is designed on the principle originally proposed by Li (2011) that it is not necessary to call genotypes with complete certainty in order to make useful inferences from sequencing data. Initially, SNP discovery is performed by other software such as TASSEL (Glaubitz et al. 2014) or Stacks (Catchen et al. 2013), with or without a reference genome, then allelic read depth is imported into polyRAD from those pipelines or the read counting software TagDigger (Clark and Sacks 2016). In polyRAD, one or several ploidies can be specified, including any level of auto- and/or allopolyploidy, allowing inheritance modes to vary across the genome. Genotype probabilities are estimated by polyRAD under a Bayesian framework, where priors are based on mapping population design, Hardy-Weinberg equilibrium (HWE), or population structure, with or without linkage disequilibrium (LD) and/or self-fertilization. | |
Genotype calling | fitPoly | depth matrix | custom R object | Voorips et al. 2011 | Genotyping assays for bi-allelic markers (e.g. SNPs) produce signal intensities for the two alleles. 'fitPoly' assigns genotypes (allele dosages) to a collection of polyploid samples based on these signal intensities. | |
Mapping | MAPpoly | MAPpoly is an R package to construct genetic maps in autopolyploid bi-parental populations with even ploidy levels. In its current version, it can handle ploidy levels up to 8 when using hidden Markov models (HMM), and up to 12 when using the two-point simplification. Also, for all individuals in the F1 offspring, it computes the probability distribution of multiallelic genotypes in the whole genome given the estimated genetic map. This information can be easily used to perform QTL analysis using the software QTLpoly. | ||||
Mapping | TetraploidSNPMap | TetraploidSNPMap, makes full use of the dosage data, and has new facilities for displaying the clustering of single nucleotide polymorphisms, rapid ordering of large numbers of single nucleotide polymorphisms using a multidimensional scaling analysis, and phase calling. It also has new routines for quantitative trait locus mapping based on a hidden Markov model, which use the dosage data to model the effects of alleles from both parents simultaneously. A Windows-based interface facilitates data entry and exploration. | ||||
Mapping | polymapR | polymapR is an R package for genetic linkage analysis and integrated genetic map construction from bi-parental populations of outcrossing autopolyploids. It can currently analyse triploid, tetraploid and hexaploid marker datasets. Currently, the map construction is based on pairwise (or two-point) marker analysis. | ||||
Mapping | PolyGembler | Zhou et al., unpublished | proposes a novel approach to the creation of linkage maps in outcrossing polyploids, and is also suitable for diploid mapping. Interestingly, it combines a haplotyping algorithm[derivedfromthepolyHapalgorithm(Suetal.,2008)] to first generate phased multi-marker scaolds or haplotypes. These are then used to calculate recombination frequencies by counting recombination events both within and between these scaolds, leading to an extremely simple estimate of r which hasnocorrespondingLODscore.Scaoldsareclusteredusinga graphpartitioningalgorithm,andthereafter,thecomputationally ecient CONCORDE traveling-salesman solver is employed to ordermarkers[asisdoneforexampleinTSPmap(Monroeetal., 2017)]. | |||
Mapping | MDSMap | Preedy and Hackett, 2016 | a novel approach for determining a map order using multi-dimensional scaling. Certain combinations of markers provide very unambiguous information about co-inheritance, whereas others do not. Therefore, weights are required to prevent imprecise combinations from exerting a large influence on the map order. Join map can also do this but is very slow for higher numbers of marker and is therefore of limited use with current highdensity marker datasets. The MDSMap approach can achieve similar results in a fraction of the time,and takes as its input the same information as JoinMap does, the pairwise recombination frequencyestimatesandlogarithmofodds(LOD)scores,making thistoolsuitableforlinkagemapconstructionatanyploidylevel, providedpairwiselinkageanalysiscanbeperformed. | |||
Mapping | LPmerge package in R | Endelman and Plomion, 2014 | LPmerge uses linear programming to remove the minimum number of constraints in marker order in order to create a conflict-free consensus map.It was originally developed to create integrated genetic maps from multiple (diploid) populations.That aid,polyploids contain multiple copies of each chromosome and therefore also present a similar challenge if we consider each homolog map as originating from a different population, with non-simplex markers as bridging markers (mapped in more than one population).Homolog-specificmaps are still regularly generated in polyploid mapping studies[e.g.,in potato (Bourkeetal.,2015,2016),rose(Vukosavljevetal.,2016) or sweet potato (Shirasawa et al., 2017)] | |||
Haplotyping | polyHap | Su et al., 2008 | ||||
Haplotyping | SATlotyper | Neigenfind et al., 2008 | ||||
Haplotyping | HapCompass | Aguiar and Istrail, 2013 | HapCompass performed best at higher ploidies (6⇥and higher) (Motazedi et al., 2017). | |||
Haplotyping | HapTree | Berger et al., 2014 | produced more accurate haplotypes for triploid and tetraploid data | |||
Haplotyping | SDhaP | Das and Vikalo, 2015 | ||||
Haplotyping | SHEsisplus | Shen et al., 2016 | ||||
Haplotyping | TriPoly | Motazedi et al., unpublished | ||||
Genetic studies - QTL mapping | QTLPOLY | Multiple QTL mapping in autopolyploids: a random-effect model approach with application in a hexaploid sweetpotato full-sib population | ||||
Genetic studies - QTL Mapping | TSNPM | Hackett et al., 2017 | TetraploidSNPMap (TSNPM) uses SNP dosage data to either construct a linkage map (as already described) or perform QTL interval mapping. In contrast to its predecessor, TSNPM can analyze all marker segregation types, and allows the user to explore dierent QTL models at detected peaks. At its core is an algorithm to determine identity-by-descent (IBD) probabilities for the ospring of the population, which are then usedinaweightedregressionperformedacrossthegenome. | |||
Genetic studies - GWAS | GWASpoly | Rosyara UR, De Jong WS, Douches DS, Endelman JB. Plant Genome. 2016 Jul;9(2). doi: 10.3835/plantgenome2015.08.0073. | Software for Genome-Wide Association Studies in Autopolyploids and Its Application to Potato. Handles the kinship matrix K well. Three different forms of K were tested in the development of the package, with the canonical relationship matrix (VanRaden, 2008) [termed the realized relationship matrix by the authors (Rosyara et al., 2016)] found to best control against inflation of significance values. This is also the default K provided in the GWASpoly package. | |||
Genetic studies - GWAS | SHEsisPlus | (Shen et al., 2016). | Does not look critically at the kinship matrix | |||