The Improvement of Stress Level Detection in Twitter: Imbalance Classification Using SMOTE

This study developed a model to improve stress level detection using Synthetic Minority Oversampling Technique (SMOTE) imbalanced data classification. SMOTE is a method to address imbalanced datasets to oversample the minority class. The data collected from Twitter may seem vague mainly due to the m...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE International Conference on Computing (ICOCO) pp. 294 - 298
Main Authors Danuri, Mohd Shahrul Nizam Mohd, Rahman, Rohizah Abd, Mohamed, Ibrahim, Amin, Azzan
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.11.2022
Subjects
Online AccessGet full text
DOI10.1109/ICOCO56118.2022.10031684

Cover

Loading…
More Information
Summary:This study developed a model to improve stress level detection using Synthetic Minority Oversampling Technique (SMOTE) imbalanced data classification. SMOTE is a method to address imbalanced datasets to oversample the minority class. The data collected from Twitter may seem vague mainly due to the massive amount of data. This research used the framework model of Data, Experts Data Annotation, Text Pre-processing, and Text Representation and Classification. The Bag of Word (BoW), Term Frequency-Inverse Document Frequency (TFIDF), and Lemma were used for the text representation. The data were collected only from Twitter under certain circumstances. The Subject Matter Experts (SMEs) on mental health problems have annotated the text from the tweets based on four levels: Normal, Mild, Moderate, and Severe. The data group for the Normal stress level was relatively large compared to the other groups. Due to the imbalanced data group, the SMOTE technique was used for data argumentation. The result showed that the model classification using Support Vector Machine with SMOTE increased by improving the cardinality of the minority class label through the significant Macro Avg Recall and Macro Avg F1-Score analysis results compared to the baseline.
DOI:10.1109/ICOCO56118.2022.10031684