Convolutional neural network for simultaneous prediction of several soil properties using visible/near-infrared, mid-infrared, and their combined spectra

No single instrument can characterize all soil properties because soil is a complex material. With the advancement of technology, laboratories have become equipped with various spectrometers. By fusing output from different spectrometers, better prediction outcomes are expected than using any single...

Full description

Saved in:
Bibliographic Details
Published inGeoderma Vol. 352; pp. 251 - 267
Main Authors Ng, Wartini, Minasny, Budiman, Montazerolghaem, Maryam, Padarian, Jose, Ferguson, Richard, Bailey, Scarlett, McBratney, Alex B.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 15.10.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:No single instrument can characterize all soil properties because soil is a complex material. With the advancement of technology, laboratories have become equipped with various spectrometers. By fusing output from different spectrometers, better prediction outcomes are expected than using any single spectrometer alone. In this study, model performance from a single spectrometer (visible-near-infrared spectroscopy, vis-NIR or mid-infrared spectroscopy, MIR) was compared to the combined spectrometers (vis-NIR and MIR). We selected a total of 14,594 samples from the Kellogg Soil Survey Laboratory (KSSL) database that had both vis-NIR and MIR spectra along with measurements of sand, clay, total C (TC) content, organic C (OC) content, cation exchange capacity (CEC), and pH. The dataset was randomly split into 75% training (n = 10,946) and the remaining (n = 3,648) as a test set. Prediction models were constructed with partial least squares regression (PLSR) and Cubist tree model. Additionally, we explored the use of a deep learning model, the convolutional neural network (CNN). We investigated various ways to feed spectral data to the CNN, either as one-dimensional (1D) data (as a spectrum) or as two-dimensional (2D) data (as a spectrogram). Compared to the PLSR model, we found that the CNN model provides an average improvement prediction of 33–42% using vis-NIR and 30–43% using MIR spectral data input. The relative accuracy improvement of CNN, when compared to the Cubist regression tree model, ranged between 22 and 36% with vis-NIR and 16–27% with MIR spectral data input. Various methods to fuse the vis-NIR and MIR spectral data were explored. We compared the performance of spectral concatenation (for PLSR and Cubist model), two-channel input method, and outer product analysis (OPA) method (for CNN model). We found that the performance of two-channel 1D CNN model was the best (R2 = 0.95–0.98) followed closely by the OPA with CNN (R2 = 0.93–0.98), Cubist model with spectral concatenation (R2 = 0.91–0.97), two-channel 2D CNN model (R2 = 0.90–0.95) and PLSR with spectral concatenation (R2 = 0.87–0.95). Chemometric analysis of spectroscopy data relied on spectral pre-processing methods: such as spectral trimming, baseline correction, smoothing, and normalization before being fed into the model. CNN achieved higher performance than the PLSR and Cubist model without utilizing the pre-processed spectral data. We also found that the predictions using the CNN model retained similar correlations to the actual values in comparison to other models. By doing sensitivity analysis, we identified the important spectral wavelengths variables used by the CNN model to predict various soil properties. CNN is an effective model for modelling soil properties from a large spectral library. •CNN can predict soil properties from vis-NIR and MIR spectra with high accuracy.•The use of mid-infrared spectra and the CNN model yield highly accurate prediction with R2 > 0.95.•CNN can take multiple input spectra efficiently and does not require spectral pre-processing.•Unraveling CNN model enables the assessment of important wavelengths to predict soil properties.
ISSN:0016-7061
1872-6259
DOI:10.1016/j.geoderma.2019.06.016