Spatiotemporal Seamless Estimation of Global Surface Soil Moisture Using Triple Collocation, Machine Learning, and Data Assimilation

Accurate and spatiotemporal seamless soil moisture (SM) products are important for hydrological drought monitoring and agricultural water management. Currently, physically-based process models with data assimilation (DA) are widely used for global seamless SM generation, such as SM Active Passive Le...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on geoscience and remote sensing Vol. 63; pp. 1 - 16
Main Authors Xu, Lei, Ye, Zhenni, Dai, Jin, Li, Qi, Hong, Youting, Tao, Yun, Yu, Hongchu, Zhang, Chong, Chen, Zeqiang, Chen, Nengcheng
Format Journal Article
LanguageEnglish
Published New York IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Accurate and spatiotemporal seamless soil moisture (SM) products are important for hydrological drought monitoring and agricultural water management. Currently, physically-based process models with data assimilation (DA) are widely used for global seamless SM generation, such as SM Active Passive Level 4 (SMAP L4), the land component of the fifth generation of European Reanalysis (ERA5-land), and Global Land DA System Noah (GLDAS-Noah). These datasets are usually produced using high-performance computation platforms and may be subject to potential uncertainties from model structure and parameters, limiting their practical application capacity in a flexible way in local or global areas. Here, we proposed a data-driven artificial intelligence (AI)-based method to generate spatiotemporal seamless daily SM data using triple collocation (TC), machine learning (ML), and DA. Specifically, the TC correlation coefficients (TC-Rs) method is employed to combine different SM datasets in order to obtain high-accuracy label data for model training first. A light gradient boosting machine (LightGBM) ML model is constructed to simulate global daily SM at 0.25° in an autoregressive way, using ERA5 meteorological forcings and MSWEP precipitation data as inputs. In addition, the satellite-based SM SMAP Level 3 (SMAP L3) is assimilated into the developed ML model using the simple Newtonian nudging technique to update the SM simulation states. The incorporation of DA into ML mimics the idea of physical models and brings much room for adaptable SM simulations. The developed data-driven model is examined over global land areas from March 31, 2015 to May 31, 2023 with a ten-fold cross-validation scheme, evaluated using 1094 in situ SM stations from the International Soil Moisture Network (ISMN). The results indicate that the ML-based assimilated soil moisture dataset (ML-DA) demonstrates a median correlation (R) of 0.741 and an unbiased root mean square error (ubRMSE) of 0.0437 m3/m3, better than SMAP L4 (R = 0.717 and ubRMSE = 0.0452 m3/m3), ERA5-land (R = 0.706 and ubRMSE = 0.0452 m3/m3), and GLDAS (R = 0.633 and ubRMSE = 0.0501 m3/m3). Compared to the three model-based SM products, the ML-DA dataset exhibits superior performance in time and space, and also in dry-wet zones. Therefore, the developed ML-DA framework offers significant potential for accurate, spatiotemporal SM simulations globally.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2025.3568034