A Hybrid Regression–Kriging–Machine Learning Framework for Imputing Missing TROPOMI NO2 Data over Taiwan
This study presents a novel application of a hybrid regression–kriging (RK) and machine learning (ML) framework to impute missing tropospheric NO2 data from the TROPOMI satellite over Taiwan during the winter months of January, February, and December 2022. The proposed approach combines geostatistic...
Saved in:
Published in | Remote sensing (Basel, Switzerland) Vol. 17; no. 12; p. 2084 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
17.06.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This study presents a novel application of a hybrid regression–kriging (RK) and machine learning (ML) framework to impute missing tropospheric NO2 data from the TROPOMI satellite over Taiwan during the winter months of January, February, and December 2022. The proposed approach combines geostatistical interpolation with nonlinear modeling by integrating RK with ML models—specifically comparing gradient boosting regression (GBR), random forest (RF), and K-nearest neighbors (KNN)—to determine the most suitable auxiliary predictor. This structure enables the framework to capture both spatial autocorrelation and complex relationships between NO2 concentrations and environmental drivers. Model performance was evaluated using the coefficient of determination (r2), computed against observed TROPOMI NO2 column values filtered by quality assurance criteria. GBR achieved the highest validation r2 values of 0.83 for January and February, while RF yielded 0.82 and 0.79 in January and December, respectively. These results demonstrate the model’s robustness in capturing intra-seasonal patterns and nonlinear trends in NO2 distribution. In contrast, models using only static land cover inputs performed poorly (r2 < 0.58), emphasizing the limited predictive capacity of such variables in isolation. Interpretability analysis using the SHapley Additive exPlanations (SHAP) method revealed temperature as the most influential meteorological driver of NO2 variation, particularly during winter, while forest cover consistently emerged as a key land-use factor mitigating NO2 levels through dry deposition. By integrating dynamic meteorological variables and static land cover features, the hybrid RK–ML framework enhances the spatial and temporal completeness of satellite-derived air quality datasets. As the first RK–ML application for TROPOMI data in Taiwan, this study establishes a regional benchmark and offers a transferable methodology for satellite data imputation. Future research should explore ensemble-based RK variants, incorporate real-time auxiliary data, and assess transferability across diverse geographic and climatological contexts. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2072-4292 2072-4292 |
DOI: | 10.3390/rs17122084 |