The importance of data splitting in combined NO x concentration modelling

The polluted air breathed every day by those living in large conurbations poses a significant risk to their health. Through effective modelling (prediction) of concentrations of pollutants and identification of the factors influencing them, it should be possible to obtain advance information on dang...

Full description

Saved in:
Bibliographic Details
Published inThe Science of the total environment Vol. 868; p. 161744
Main Authors Kamińska, Joanna A, Kajewska-Szkudlarek, Joanna
Format Journal Article
LanguageEnglish
Published Netherlands 10.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The polluted air breathed every day by those living in large conurbations poses a significant risk to their health. Through effective modelling (prediction) of concentrations of pollutants and identification of the factors influencing them, it should be possible to obtain advance information on dangers and to plan and implement measures to reduce them. This work describes two different modelling approaches: based on the NO concentration of the previous hour (C&RT models); and based on meteorological factors, traffic flow, and past (up to two previous hours) NO and NO concentrations (CA models). For each approach, three alternative machine learning methods were applied: artificial neutral network (ANN), random forest (RF), and support vector regression (SVR). The best fits were obtained for the models using ANN and RF (MAPE values in the range 18.3-18.5 %). Poorer fits were found for the SVR models (MAPE equal to 23.4 % for the C&RT approach and 29.3 % for CA). No significant preferences were identified between the C&RT and CA approaches (based on various goodness-of-fit measures). The choice should be determined by the purposes for which the forecast is to be used.
ISSN:1879-1026