Remote estimates of suspended particulate matter in global lakes using machine learning models

Suspended particulate matter (SPM) in lakes exerts strong impact on light propagation, aquatic ecosystem productivity, which co-varies with nutrients, heavy metal and micro-pollutant in waters. In lakes, SPM exerts strong absorption and backscattering, ultimately affects water leaving signals that c...

Full description

Saved in:
Bibliographic Details
Published inInternational Soil and Water Conservation Research Vol. 12; no. 1; pp. 200 - 216
Main Authors Wen, Zhidan, Wang, Qiang, Ma, Yue, Jacinthe, Pierre Andre, Liu, Ge, Li, Sijia, Shang, Yingxin, Tao, Hui, Fang, Chong, Lyu, Lili, Zhang, Baohua, Song, Kaishan
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.03.2024
University of Chinese Academy of Sciences,Beijing 100100,China%College of Geography and Urban Planning Liaocheng University,Liaocheng,China
Northeast Institute of Geography and Agroecology,CAS,Changchun,Jilin,130102,China%Department of Geology,Indiana University~Purdue University,Indianapolis,IN,44602,USA%Northeast Institute of Geography and Agroecology,CAS,Changchun,Jilin,130102,China
KeAi Communications Co., Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Suspended particulate matter (SPM) in lakes exerts strong impact on light propagation, aquatic ecosystem productivity, which co-varies with nutrients, heavy metal and micro-pollutant in waters. In lakes, SPM exerts strong absorption and backscattering, ultimately affects water leaving signals that can be detected by satellite sensors. Simple regression models based on specific band or hand ratios have been widely used for SPM estimate in the past with moderate accuracy. There are still rooms for model accuracy improvements, and machine learning models may solve the non-linear relationships between spectral variable and SPM in waters. We assembled more than 16,400 in situ measured SPM in lakes from six continents (excluding the Antarctica continent), of which 9640 samples were matched with Landsat overpasses within ±7 days. Seven machine learning algorithms and two simple regression methods (linear and partial least squares models) were used to estimate SPM in lakes and the performance were compared. To overcome the problem of imbalance datasets in regression, a Synthetic Minority Over-Sampling technique for regression with Gaussian Noise (SMOGN) was adopted in this study. Through comparison, we found that gradient boosting decision tree (GBDT), random forest (RF), and extreme gradient boosting (XGBoost) models demonstrated good spatiotemporal transferability with SMOGN processed dataset, and has potential to map SPM at different year with good quality of Landsat land surface reflectance images. In all the tested modeling approaches, the GBDT model has accurate calibration (n = 6428, R2 = 0.95, MAPE = 29.8%) from SPM collected in 2235 lakes across the world, and the validation (n = 3214, R2 = 0.84, MAPE = 38.8%) also exhibited stable performance. Further, the good performances were also exhibited by RF model with calibration (R2 = 0.93) and validation (R2 = 0.86, MAPE = 24.2%) datasets. We applied GBDT and RF models to map SPM of typical lakes, and satisfactory result was obtained. In addition, the GBDT model was evaluated by historical SPM measurements coincident with different Landsat sensors (L5-TM, L7-ETM+, and L8-OLI), thus the model has the potential to map SPM of lakes for monitoring temporal variations, and tracks lake water SPM dynamics in approximately the past four decades (1984–2021) since Landsat-5/TM was launched in 1984.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2095-6339
DOI:10.1016/j.iswcr.2023.07.002