A two-stage algorithm to estimate ground-level PM2.5 concentration levels in Madrid (Spain) from AOD satellite data and surface proxies

Poor air quality in urban areas is an important health risk; therefore, reducing population exposure to pollutants such as PM2.5 is a major concern. Health assessments regarding this pollutant have typically relied on the measurements from urban networks of Air Quality Monitoring Stations (AQMS) to...

Full description

Saved in:
Bibliographic Details
Published inAtmospheric pollution research Vol. 16; no. 12; p. 102678
Main Author Cordero, J.M.
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.12.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Poor air quality in urban areas is an important health risk; therefore, reducing population exposure to pollutants such as PM2.5 is a major concern. Health assessments regarding this pollutant have typically relied on the measurements from urban networks of Air Quality Monitoring Stations (AQMS) to assess population exposure. The methods used for the spatial interpolation of observation often lacks a solid physical basis. Mesoscale air quality models provide high spatiotemporally resolved ground-level concentrations based on urban features, including the distribution of pollution sources; however, they are subject to significant uncertainty. In this work, a novel methodology to produce 1 km2 resolution maps of ground-level PM2.5 concentration for the Municipality of Madrid during 2015 is presented. Toward this end, different data sets including: meteorology, satellite observations of atmospheric optical depth (AOD) from MAIAC, emission data, population, land use, and vegetation land cover have been used. Subsequently, we applied extreme gradient boosting (XGBoost) machine learning algorithms in two steps to first fill gaps in the AOD field and then, estimate ground-level PM2.5 concentration. The predictions of the so-called 2_step_XGBoost algorithm were compared with observations from the all available ground-level PM2.5 concentration observations from the AQMS in Madrid obtaining a determination coefficient (r2) of 0.96, a RMSE of 1.5 μg/m3, and negligible bias. Additionally, we used a 10-fold cross validation to confirm the robustness of the algorithm and the independency of the dataset used for training (r2 of 0.94 ± 0.01, RMSE of 0.40 ± 0.04 and MAE of 0.22 ± 0.02. These results highlight the reliability of this approach for future urban health analysis. In addition, we performed a Feature Importance (FI) analysis that revealed that 2_step_XGBoost identified the planetary boundary layer height (PBLH) as the most influential variable while AOD was found to have relatively low explanatory power, a result that may be contrasted in other case studies. [Display omitted] •New two step xgboost algorithm to estimate ambient PM2.5 concentrations.•Atmospheric Optical Depth from MAIAC was studied as feature.•Annual average 1 km2 resolution PM2.5 obtained for the City of Madrid.•The model achieved high accuracy with cross-validated R2 of 0.96 and RMSE of 1.5•AOD was found to be strongly dependent on the PBL Height.
ISSN:1309-1042
1309-1042
DOI:10.1016/j.apr.2025.102678