Digital mapping of soil carbon fractions with machine learning

Our understanding of the spatial distribution of soil carbon (C) pools across diverse land uses, soils, and climatic gradients at regional scale is still limited. Research in digital soil mapping and modeling that investigates the interplay between (i) soil C pools and environmental factors (“determ...

Full description

Saved in:
Bibliographic Details
Published inGeoderma Vol. 339; pp. 40 - 58
Main Authors Keskin, Hamza, Grunwald, Sabine, Harris, Willie G.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.04.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Our understanding of the spatial distribution of soil carbon (C) pools across diverse land uses, soils, and climatic gradients at regional scale is still limited. Research in digital soil mapping and modeling that investigates the interplay between (i) soil C pools and environmental factors (“deterministic trend model”) and (ii) stochastic, spatially dependent variations in soil C fractions (“stochastic model”) is just emerging. This evoked our motivation to investigate soil C pools in the State of Florida covering about 150,000 km2. Our specific objectives were to (i) compare different soil C pool models that quantify stochastic and/or deterministic components, (ii) assess the prediction performance of soil C models, and (iii) identify environmental factors that impart most control on labile and recalcitrant pools and soil total C (TC). We used soil data (0–20 cm) from a research collected at 1014 georeferenced sites including measured bulk density, recalcitrant carbon (RC), labile (hot-water extractable) carbon (HC) and TC. A comprehensive set of 327 geospatial soil-environmental variables was acquired. The Boruta method was employed to identify “all-relevant” soil-environmental predictors. We employed eight methods - Classification and Regression Tree (CaRT), Bagged Regression Tree (BaRT), Boosted Regression Tree (BoRT), Random Forest (RF), Support Vector Machine (SVM), Partial Least Square Regression (PLSR), Regression Kriging (RK), and Ordinary Kriging (OK) – to predict soil C fractions and TC. Overall, 36, 20 and 25 predictors stood out as “all-relevant” to estimate TC, RC and HC, respectively. We predicted a mean of 5.29 ± 3.58 kg TC m−2 in the top 20 cm with the best model. The prediction performance assessed by the Ratio of Prediction Error to Inter-quartile Range for TC stocks was as follows: RF > SVM > BoRT > BaRT > PLSR > RK > CART > OK. The best models explained 71.6%, 71.7% and 30.5% of the total variation for TC, RC and HC, respectively. Biotic and hydro-pedological factors explained most of the variation in soil C pools and TC; lithologic and climatic factors showed some relationships to soil C pools and TC, whereas topographic factors faded from soil C models. •Biotic and hydro-pedological factors mainly control soil C fractions in Florida.•Inclusion of the all-relevant variables guaranteed the capturing of all variation.•Best methods identified 71.6% of the overall variation in recalcitrant carbon.•The Random Forest model yielded the most satisfactory prediction accuracy.•No residual spatial autocorrelation left among evaluated machine learning models.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0016-7061
1872-6259
DOI:10.1016/j.geoderma.2018.12.037