Cross Validation

Created by Star Yanxin Gao, last modified on Aug 31, 2018

Cross-validation is a statistical method used to estimate the skill of how accurately a predictive model will perform in practice. Is a resampling procedure used to evaluate predictive models on a limited data sample. It is commonly used in genomic selection (GS) to compare and select a model for a given predictive modeling problem, that is, to use a limited sample in order to estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model. This technique is easy to understand, easy to implement, and results in accuracy estimates generally have a lower bias than other methods.

Variations on Cross-Validation

There are a number of variations on the k-fold cross validation procedure. Within cross validation or stratified cross validation is a procedure that follows splitting of data into folds may be governed by criteria such as ensuring that each fold has the same proportion of observations with a given categorical value, such as the class outcome value. GOBii’s GS Galaxy pipeline currently support a regular k-fold cross-validation, and a stratified or Within or Cross Group Cross-Validation analysis.