46-12 Empirical Evaluation for Models, Cross-Validation Schemes, and Population Structure in Genomic Selection.
See more from this Division: C01 Crop Breeding & Genetics
See more from this Session: Crop Breeding and Genetics: I
Monday, November 16, 2015: 11:05 AM
Minneapolis Convention Center, 101 FG
Abstract:
Recent advances in next generation sequencing technologies and statistical algorithms make genomic selection feasible for plant breeders. When applying genomic selection to plant breeding programs, there are several critical questions that need to be addressed: 1) Which prediction model should be used? 2) How to perform cross-validation to assess the predictability of the model? 3) Will population structure affect the prediction? Here, we investigated these questions with two crop populations: a set of 277 diverse maize inbred lines, and 1,000 globally collected biomass sorghum accessions. We first compared five common genomic selection models (rrBLUP, exponential kernel, Gaussian kernel, LASSO, and BayesCπ) for prediction accuracy. Second, we evaluated three cross-validation schemes: k-fold, repeated random sub-sampling, and leave-one-out. Third, we assessed the impact of population structure on genomic selection. Our analysis indicates: 1) Prediction accuracies are similar among different models; 2) k-fold cross-validation provides “stable” estimates of model predictability; 3) Repeated random sub-sampling cross-validation can better assess the sample size effect; 4) Leave-one-out cross-validation can be used to identify “outlier” observations in the training population. With a good understanding of these questions, we can further focus on optimization of training set, validation set, and testing site to tackle questions such as prediction in the context of broad germplasm and genotype by environment interaction.
See more from this Division: C01 Crop Breeding & Genetics
See more from this Session: Crop Breeding and Genetics: I