Tuning support vector machines regression models improves prediction accuracy of soil properties in MIR spectroscopy
•Multivariate regression modeling is a prerequisite for mid-DRIFTS predictions.•Linear and non-linear models were compared along with non-linear parameterization.•Non-linear models outperformed linear models in predicting soil properties.•Tuning hyperparameters in non-linear models is a soil propert...
Saved in:
Published in | Geoderma Vol. 365; p. 114227 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
15.04.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •Multivariate regression modeling is a prerequisite for mid-DRIFTS predictions.•Linear and non-linear models were compared along with non-linear parameterization.•Non-linear models outperformed linear models in predicting soil properties.•Tuning hyperparameters in non-linear models is a soil property- and dataset-specific process.•Tuned models often improve prediction accuracy in mid-DRIFTS of soils.
Estimating soil properties in diffuse reflectance infrared Fourier transform spectroscopy in the mid-infrared region (mid-DRIFTS) uses statistical modeling (chemometrics) to predict soil properties from spectra. Modeling approaches can have major impacts on prediction accuracy. However, the impact of selecting best parameters for an algorithm (tuning), to optimize non-linear models for predicting soil properties, is relatively unexplored in the domain of soil sciences. This study aimed to evaluate the predictive performance of linear (partial least squares, PLS) and non-linear (support vector machines, SVM) multivariate regression models in estimating soil physical, chemical, and biological properties with mid-DRIFTS. We evaluated the impact of optimizing two hyperparameters (epsilon and cost) based on the noise tolerance in the ε-insensitive loss function of SVM models using two contrasting and diverse sets of soils, one from northern Tanzania (n = 533) and another one from USA Midwest (n = 400). Regression models were trained on calibration sets (75%) and tested on independent validation sets (25%) separately for each dataset. Support vector machines outperformed PLS models for all tested soil properties (clay, sand, pH, total organic carbon, and permanganate oxidizable carbon) in both datasets. Tuning hyperparameters epsilon and cost maintained or improved prediction accuracy of SVM models based on root mean squared errors of independent validation sets. Support vector machines tuned hyperparameters differed among soil properties and also for the same soil property in distinct datasets, suggesting the need for parameterizing non-linear models for specific soil properties and datasets. Optimizing SVM regression models in mid-DRIFTS improves prediction accuracy of soil properties and therefore will likely enable obtaining more robust predictive outcomes even in datasets with diverse land uses, parent materials, and/or soil orders. We recommend that tuning should be included as a routine step when using SVM for estimating soil properties. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0016-7061 1872-6259 |
DOI: | 10.1016/j.geoderma.2020.114227 |