Imputation

Created by Star Yanxin Gao, last modified by Umesh Rosyara on Feb 27, 2019

Imputation is process of replacing missing data with substituted value. Imputation is required for many of computations ( that requires matrix computations) that does allow user to have missing values. 

Naive Imputation 

This tool simply fill the missing value for mean or mode of each SNP value. This method do not use information of linkage / linkage disequilibrium among markers and simply fills the missing value with mean or mode for each markers.

Naive imputation using population mean or mode (Galaxy Version 0.1.0)

 Genotype data, Encoded to numeric form

(required) the input file must be tab-delimited. This tool expects that the allele data have been encoded/transformed to numeric. See encode tool for details

 

Select imputation method to use

Select imputation method to use from drop down menu

 

Column starts

Specify the column from where the genotypic data start (excluding metadata)

 

Additional Notes

 

What it does

This tool does simple imputation on missing genotype data using mean or mode. The idea is just fill out values with means or mode quickly, so that matrix is free from missing values. For sophisticated imputation can use other imputation tools (Beagle, NPUTE, Impute2).   

 

Input File:

This tool expects that the genotype data file has been encoded or transformed from allele to numeric. See Encode tool.

Encoded Genotype data need to be GOBii hapmap format. The input format is tabular data output from encode tool. 

Select imputation method to use:

Two imputation methods are available mean or mode. Mean will calculate mean ( after removing missing values) of the encoded numeric values for each markers and replace the missing values with this value. The mean value may not be integer. Whereas the mode replace these values with mode ( after removing missing values) and replaced value will be integer ( whole number with decimals).  

Lines of metadata (starts with #)

Output File:

This tool outputs a hapmap file.

When using mean method, the missing values/alleles in the hapmap are imputed with the mean or mode of the observed values for each marker.

 

Backend Source Code

Custom R script

 

Contributors:

The methodology and R script contributor is Umesh Rosyara (CIMMYT) and the Galaxy integration contributor is Venice Margarette Juanillas (IRRI).

 

References:

Schmitt P, Mandel J, Guedj M (2015) A Comparison of Six Methods for Missing Data Imputation. J Biom Biostat 6:224. doi: 10.4172/2155-6180.1000224

 

Corresponding contract:

 For any questions about this tool, please send an e-mail to u.rosyara@cgiar.org