A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO 2 concentration in France 2005-2022

Understanding and managing the health effects of Nitrogen Dioxide (NO ) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO concentration across continental France from 2005 to 2022. Innovations of this work i...

Full description

Saved in:
Bibliographic Details
Published inEnvironmental research p. 119241
Main Authors Barbalat, Guillaume, Hough, Ian, Dorman, Michael, Lepeule, Johanna, Kloog, Itai
Format Journal Article
LanguageEnglish
Published Netherlands 27.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Understanding and managing the health effects of Nitrogen Dioxide (NO ) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO concentrations at a 1km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO health effects in epidemiological studies.
ISSN:1096-0953