Multi-source Machine Learning for AQI Estimation

In many countries worldwide, effectively estimating AQI values and levels is essential for better monitoring the air pollution around the living area. This problem has become one of the interesting research subjects for many years, and there are many applications developed for personal usages. In th...

Full description

Saved in:

Bibliographic Details
Published in	2020 IEEE International Conference on Big Data (Big Data) pp. 4567 - 4576
Main Authors	Duong, Dat Q., Le, Quang M., Nguyen-Tai, Tan-Loc, Bo, Dong, Nguyen, Dat, Dao, Minh-Son, Nguyen, Binh T.
Format	Conference Proceeding
Language	English
Published	IEEE 10.12.2020
Subjects	AQI Estimation Big Data CatBoost Extreme Gradient Boosting Feature extraction Humidity LightGBM Random Forest Random forests Temperature distribution Temperature sensors Urban areas
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In many countries worldwide, effectively estimating AQI values and levels is essential for better monitoring the air pollution around the living area. This problem has become one of the interesting research subjects for many years, and there are many applications developed for personal usages. In this work, we aim to investigate a multi-source machine learning approach to approximate the local AQI scores at users' location in a big city. We conduct different experiments on three primary data sets: "SEPHLA-MediaEval 2019", "MNR-Air-HCM," and "MNR-HCM," collected in Ho Chi Minh City (Vietnam) and Fukuoka city (Japan). From the data sets provided, we extract different types of useful attributes for the problem: the timestamp information, the geographical data, sensor data (humidity and temperature), users' emotion tags (such as greenness, calmness, etc.), the semantic features from images captured by users as well as the public weather data (including temperature, dew point, humidity, wind speed, and pressure) of the related cities. After that, we compare five distinct machine learning models for estimating the local AQI score and level, including Support Vector Machine [1], Random Forest [2], Extreme Gradient Boosting [3], LightGBM [4] and CatBoost [5]. We use RMSE, MAE, and R 2 for measuring the performance of these approaches. The experimental results show that using random forest with sensor data, combined with public weather data, the results in AQI values regression and AQI ranks prediction can be the highest in many cases.
DOI:	10.1109/BigData50022.2020.9378322