Developing machine learning models for wheat yield prediction using ground-based data, satellite-based actual evapotranspiration and vegetation indices
Timely and accurate crop yield estimation is important for adjusting agronomic management and enseuring agricultural sustainability. Machine learning (ML) algorithms provide new opportunities to integrate agronomic information with ground-based and satellite data and develop flexible yield predictiv...
Saved in:
Published in | European journal of agronomy Vol. 146; p. 126820 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.05.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Timely and accurate crop yield estimation is important for adjusting agronomic management and enseuring agricultural sustainability. Machine learning (ML) algorithms provide new opportunities to integrate agronomic information with ground-based and satellite data and develop flexible yield predictive models. In particular, satellite-based vegetation indices and evapotranspiration provide robust proxies for crop yield estimations in the absence of measurements; nevertheless, most prior model development efforts have focused on using only vegetation indices due to the simplicity of the process. Additionally, the contribution of input categories (i.e., field, meteorological, and satellite data) and the use of appropriate proxies, aligned with the crop growth stages, in developing yield predictive models have not been adequately investigated. To address these challenges, we employed two ML techniques, Random Forest (RF) and extreme gradient boosting algorithm (XGB), to estimate wheat yield using meteorological variables, satellite-driven actual evapotranspiration (ETa), and vegetation indices (VIs). ETa was separately computed using the surface energy balance concept and the METRIC model. The models were first trained and tested in the study area using three input combinations: i) meteorological variables, ii) satellite data, and iii) an ensemble of meteorological and satellite data. Then, the best-performing model was further evaluated using two independent datasets. We found ETa to be particularly important in improving the accuracy of the model predictions. Among the vegetation indices, EVI, EVI2, and NDVI during May, and among the meteorological data, growing degree days during the grain filling stage plus minimum temperature in the stem elongation stage had the highest contributions to yield predictions. Both ML algorithms generated relatively accurate results, where XGB was marginally more accurate than RF, considering an average mean absolute error of 0.39 t ha-1 for XGB and 0.50 t ha-1 for RF. Normalized root-mean-square errors of the ensemble, satellite-derived and meteorological-derived models in XGB were 0.05, 0.07, and 0.10, respectively. Nevertheless, both algorithms’ performances deteriorated in predicting the yield values beyond the range of the training set, though XGB could handle the extrapolation process more efficiently than RF.
•Two ML models were developed for wheat yield prediction using meteorological variables, satellite-driven ETa, and VIs.•The contribution of input meteorological, and satellite data and the importance of each variable were evaluated.•The extreme gradient boosting algorithm provided slightly more accurate yield predictions than the Random Forest algorithm.•Training ML algorithms using ensemble data improves their performance.•Satellite-based ETa (calculated using the METRIC algorithm) provided critical information for accurate yield prediction. |
---|---|
ISSN: | 1161-0301 1873-7331 |
DOI: | 10.1016/j.eja.2023.126820 |