A Hybrid Regression–Kriging–Machine Learning Framework for Imputing Missing TROPOMI NO2 Data over Taiwan

This study presents a novel application of a hybrid regression–kriging (RK) and machine learning (ML) framework to impute missing tropospheric NO2 data from the TROPOMI satellite over Taiwan during the winter months of January, February, and December 2022. The proposed approach combines geostatistic...

Full description

Saved in:
Bibliographic Details
Published inRemote sensing (Basel, Switzerland) Vol. 17; no. 12; p. 2084
Main Authors Valerio, Alyssa, Chen, Yi-Chun, Liu, Chian-Yi, Chen, Yi-Ying, Lin, Chuan-Yao
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 17.06.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This study presents a novel application of a hybrid regression–kriging (RK) and machine learning (ML) framework to impute missing tropospheric NO2 data from the TROPOMI satellite over Taiwan during the winter months of January, February, and December 2022. The proposed approach combines geostatistical interpolation with nonlinear modeling by integrating RK with ML models—specifically comparing gradient boosting regression (GBR), random forest (RF), and K-nearest neighbors (KNN)—to determine the most suitable auxiliary predictor. This structure enables the framework to capture both spatial autocorrelation and complex relationships between NO2 concentrations and environmental drivers. Model performance was evaluated using the coefficient of determination (r2), computed against observed TROPOMI NO2 column values filtered by quality assurance criteria. GBR achieved the highest validation r2 values of 0.83 for January and February, while RF yielded 0.82 and 0.79 in January and December, respectively. These results demonstrate the model’s robustness in capturing intra-seasonal patterns and nonlinear trends in NO2 distribution. In contrast, models using only static land cover inputs performed poorly (r2 < 0.58), emphasizing the limited predictive capacity of such variables in isolation. Interpretability analysis using the SHapley Additive exPlanations (SHAP) method revealed temperature as the most influential meteorological driver of NO2 variation, particularly during winter, while forest cover consistently emerged as a key land-use factor mitigating NO2 levels through dry deposition. By integrating dynamic meteorological variables and static land cover features, the hybrid RK–ML framework enhances the spatial and temporal completeness of satellite-derived air quality datasets. As the first RK–ML application for TROPOMI data in Taiwan, this study establishes a regional benchmark and offers a transferable methodology for satellite data imputation. Future research should explore ensemble-based RK variants, incorporate real-time auxiliary data, and assess transferability across diverse geographic and climatological contexts.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2072-4292
2072-4292
DOI:10.3390/rs17122084