A surrogate model based on feature selection techniques and regression learners to improve soybean yield prediction in southern France

•Grain yield of soybean cultivars in southern France was poorly represented by STICS.•A surrogate model was built based on feature selection methods and regression learners.•This model mostly involves STICS-simulated variables related to plant physiology.•The surrogate model strongly improved grain...

Full description

Saved in:

Bibliographic Details
Published in	Computers and electronics in agriculture Vol. 192; p. 106578
Main Authors	Corrales, David Camilo, Schoving, Céline, Raynal, Hélène, Debaeke, Philippe, Journet, Etienne-Pascal, Constantin, Julie
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.01.2022 Elsevier BV Elsevier
Subjects	Agricultural sciences Agronomy Artificial neural networks Back propagation networks Computer Science Crop yield Decision trees Embedded Filter Grain Life Sciences Modeling and Simulation Neural networks Regression learners Regression models Simulation Soybeans STICS Support vector machines Wrapper France STICS Wrapper Embedded Regression learners Filter
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Grain yield of soybean cultivars in southern France was poorly represented by STICS.•A surrogate model was built based on feature selection methods and regression learners.•This model mostly involves STICS-simulated variables related to plant physiology.•The surrogate model strongly improved grain yield predictions from STICS simulations. Empirical and process-based models are currently used to predict crop yield at field and regional levels. A mechanistic model named STICS (Multidisciplinary Simulator for Standard Crops) has been used to simulate soybean grain yield in several environments, including southern France. STICS simulates at a daily step the effects of climate, soil and management practices on plant growth, development and production. In spite of good performances to predict total aboveground biomass, poor results were obtained for final grain yield. In order to improve yield prediction, a surrogate model was developed from STICS dynamic simulations, feature selection techniques and regression learners. STICS was used to simulate functional variables at given growth stages and over selected phenological phases. The most representative variables were selected through feature selection techniques (filter, wrapper and embedded), and a subset of variables were used to train the regression learners Linear regression (LR), Support vector regression (SVR), Back propagation neural network (BPNN), Random forest (RF), Least Absolute Shrinkage and Selection Operator (LASSO) and M5 decision tree. The subset of variables selected by wrapper method combined with regression models SVR (R2 = 0. 7102; subset of variables = 6) and LR (R2 = 0. 6912; subset of variables = 14) provided the best results. SVR and LR models improved significantly the soybean yield predictions in southern France in comparison to STICS simulations (R2 = 0.040).
ISSN:	0168-1699 1872-7107
DOI:	10.1016/j.compag.2021.106578