Analysis of outlier detection rules based on the ASHRAE global thermal comfort database

ASHRAE Global Thermal Comfort Database has been extensively used for analyzing specific thermal comfort parameters or models, evaluating subjective metrics, and integrating with machine learning algorithms. Outlier detection is regarded as an essential step in data preprocessing, but current publica...

Full description

Saved in:
Bibliographic Details
Published inBuilding and environment Vol. 234; p. 110155
Main Authors Zhang, Shaoxing, Yao, Runming, Du, Chenqiu, Essah, Emmanuel, Li, Baizhan
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 15.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:ASHRAE Global Thermal Comfort Database has been extensively used for analyzing specific thermal comfort parameters or models, evaluating subjective metrics, and integrating with machine learning algorithms. Outlier detection is regarded as an essential step in data preprocessing, but current publications related to this database paid less attention to the influence of outliers in raw datasets. This study aims to investigate the filter performance of different outlier detection methods. Three stochastic-based approaches have been performed and analyzed based on the example of predicting thermal preference using the Support Vector Machine (SVM) algorithm as a case study to compare the predictions before and after outlier removal. Results show that all three rules can filter some obvious outliers, and the Boxplot rule produces the most moderate filer results, whereas the 3-Sigma rule sometimes fails to detect outliers and the Hampel rule may provide an aggressive solution that causes a false alarm. It has also been discovered that a small reduction in establishing machine learning models can result in less complicated and smoother decision boundaries, which has the potential to provide more energy-efficient and conflict-free solutions. [Display omitted] •Three stochastic-based outlier removal rules were applied to the ASHRAE database.•The filtering performance was validated using simulation and ASHRAE database data.•The data distribution influences the selection of outlier removal rules.•SVM was chosen to demonstrate the effects of outlier removal predictive outcomes.•This work selects the optimal outlier removal rule for data distribution.
ISSN:0360-1323
1873-684X
DOI:10.1016/j.buildenv.2023.110155