Improving pedotransfer functions for predicting soil mineral associated organic carbon by ensemble machine learning

•We evaluated the potential of pedotransfer function in MAOC prediction.•Forward recursive feature selection performed well in model parsimony and performance.•Cubist had better model performance than Random Forest and Gradient Boosted Machine.•Model ensemble improved model performance and robustnes...

Full description

Saved in:
Bibliographic Details
Published inGeoderma Vol. 428; p. 116208
Main Authors Xiao, Yi, Xue, Jie, Zhang, Xianglin, Wang, Nan, Hong, Yongsheng, Jiang, Yefeng, Zhou, Yin, Teng, Hongfen, Hu, Bifeng, Lugato, Emanuele, Richer-de-Forges, Anne C., Arrouays, Dominique, Shi, Zhou, Chen, Songchao
Format Journal Article
LanguageEnglish
Published Elsevier B.V 15.12.2022
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•We evaluated the potential of pedotransfer function in MAOC prediction.•Forward recursive feature selection performed well in model parsimony and performance.•Cubist had better model performance than Random Forest and Gradient Boosted Machine.•Model ensemble improved model performance and robustness. Soil organic carbon (SOC) sequestration is a promising natural climate solution for capturing atmospheric CO2, and it provides crucial co-benefits in improving soil functions and services at the same time. Given that SOC is not a single and uniform entity, a deep understanding of SOC response to environmental changes requires additional information on SOC fractions with distinct characteristics such as particulate organic carbon (POC) and mineral associated organic carbon (MAOC). Despite their great importance, POC and MAOC information is still scarce in the soil databases, particularly on a broad scale. Pedotransfer function (PTF) is a good strategy to estimate missing soil properties, while its application in SOC fractions has been poorly explored. Based on 352 representative mineral topsoil samples (0–20 cm) across Europe, we evaluated the potential of MAOC prediction using machine learning based PTF (random forest (RF), Cubist, and gradient boosted machine (GBM)) together with predictor selection methods (recursive feature elimination (RFE) and forward recursive feature selection (FRFS)). The repeated validation (100 times) showed that MAOC could be well predicted by machine learning based PTFs (R2 of 0.877–0.9, RMSE of 2.994–3.269 g kg−1). RFE can effectively reduce the number of predictors from 21 to 12 with comparable performance to the models using all predictors. The proposed FRFS algorithm had the best model parsimony with only 6 predictors (SOC, silt + clay, nitrogen, nitrogen deposition, soil erosion and sand) and performed similar to or even better than RFE. In combination with FRFS, Cubist performed best among the three machine learning models (R2 of 0.9, RMSE of 2.994 g kg−1). Our results also showed that five model ensemble methods had similar model performance and can improve model accuracy and robustness compared to a single machine learning model. This study provides a valuable reference for coupling PTF and legacy soil databases to increase the spatial coverage and the performance of machine learning based SOC fraction predictions.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0016-7061
1872-6259
DOI:10.1016/j.geoderma.2022.116208