L2-Boosting algorithm applied to high-dimensional problems in genomic selection.

Abstract : The L(2)-Boosting algorithm is one of the most promising machine-learning techniques that has appeared in recent decades. It may be applied to high-dimensional problems such as whole-genome studies, and it is relatively simple from a computational point of view. In this study, we used this algorithm in a genomic selection context to make predictions of yet to be observed outcomes. Two data sets were used: (1) productive lifetime predicted transmitting abilities from 4702 Holstein sires genotyped for 32 611 single nucleotide polymorphisms (SNPs) derived from the Illumina BovineSNP50 BeadChip, and (2) progeny averages of food conversion rate, pre-corrected by environmental and mate effects, in 394 broilers genotyped for 3481 SNPs. Each of these data sets was split into training and testing sets, the latter comprising dairy or broiler sires whose ancestors were in the training set. Two weak learners, ordinary least squares (OLS) and non-parametric (NP) regression were used for the L2-Boosting algorithm, to provide a stringent evaluation of the procedure. This algorithm was compared with BL [Bayesian LASSO (least absolute shrinkage and selection operator)] and BayesA regression. Learning tasks were carried out in the training set, whereas validation of the models was performed in the testing set. Pearson correlations between predicted and observed responses in the dairy cattle (broiler) data set were 0.65 (0.33), 0.53 (0.37), 0.66 (0.26) and 0.63 (0.27) for OLS-Boosting, NP-Boosting, BL and BayesA, respectively. The smallest bias and mean-squared errors (MSEs) were obtained with OLS-Boosting in both the dairy cattle (0.08 and 1.08, respectively) and broiler (-0.011 and 0.006) data sets, respectively. In the dairy cattle data set, the BL was more accurate (bias=0.10 and MSE=1.10) than BayesA (bias=1.26 and MSE=2.81), whereas no differences between these two methods were found in the broiler data set. L2-Boosting with a suitable learner was found to be a competitive alternative for genomic selection applications, providing high accuracy and low bias in genomic-assisted evaluations with a relatively short computational time.
Complete list of metadatas

Cited literature [28 references]  Display  Hide  Download

https://hal-riip.archives-ouvertes.fr/pasteur-00606554
Contributor : Mariella Botta <>
Submitted on : Wednesday, July 6, 2011 - 8:36:50 PM
Last modification on : Thursday, April 4, 2019 - 11:36:01 AM
Long-term archiving on : Sunday, December 4, 2016 - 6:10:14 AM

File

GENRES92_2010.pdf
Publisher files allowed on an open archive

Identifiers

Collections

Citation

Oscar González-Recio, Kent A Weigel, Daniel Gianola, Hugo Naya, Guilherme J M Rosa. L2-Boosting algorithm applied to high-dimensional problems in genomic selection.. Genetical Research, 2010, 92 (3), pp.227-37. ⟨10.1017/S0016672310000261⟩. ⟨pasteur-00606554⟩

Share

Metrics

Record views

362

Files downloads

312