A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO 2 concentration in France 2005-2022
Understanding and managing the health effects of Nitrogen Dioxide (NO ) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO concentration across continental France from 2005 to 2022. Innovations of this work i...
Saved in:
Published in | Environmental research p. 119241 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Netherlands
27.05.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Understanding and managing the health effects of Nitrogen Dioxide (NO
) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO
concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO
total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO
concentrations at a 1km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R
for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO
concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO
concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO
health effects in epidemiological studies. |
---|---|
ISSN: | 1096-0953 |