Identifying pollution sources and predicting urban air quality using ensemble learning methods

In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA...

Full description

Saved in:
Bibliographic Details
Published inAtmospheric environment (1994) Vol. 80; pp. 426 - 437
Main Authors Singh, Kunwar P., Gupta, Shikha, Rai, Premanjali
Format Journal Article
LanguageEnglish
Published Kidlington Elsevier Ltd 01.12.2013
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management. [Display omitted] •Developed tree ensemble models for seasonal discrimination and air quality prediction.•PCA used to identify air pollution sources; air quality indices used for health risk.•Bagging and boosting algorithms enhanced predictive ability of ensemble models.•Ensemble classification and regression models performed better than SVMs.•Proposed models can be used as tools for air quality prediction and management.
ISSN:1352-2310
1873-2844
DOI:10.1016/j.atmosenv.2013.08.023