Cyberbullying Detection on Instagram with Optimal Online Feature Selection

Cyberbullying has emerged as a large-scale societal problem that demands accurate methods for its detection in an effort to mitigate its detrimental consequences. While automated, data-driven techniques for analyzing and detecting cyberbullying incidents have been developed, the scalability of exist...

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) pp. 401 - 408
Main Authors	Mengfan Yao, Chelmis, Charalampos, Zois, Daphney-Stavroula
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2018
Subjects	classification cyberharassment Feature extraction Media online social media optimization algorithm Random variables selection process Testing Twitter Videos
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Cyberbullying has emerged as a large-scale societal problem that demands accurate methods for its detection in an effort to mitigate its detrimental consequences. While automated, data-driven techniques for analyzing and detecting cyberbullying incidents have been developed, the scalability of existing approaches has largely been ignored. At the same time, the complexities underlying cyberbullying behavior (e.g., social context and changing language) make the automatic identification of "the best subset of features" to use challenging. We address this gap by formulating cyberbullying detection as a sequential hypothesis testing problem. Based on this formulation, we propose a novel algorithm to drastically reduce the number of features used in classification. We demonstrate the utility, scalability and responsiveness of our approach using a real-world dataset from Instagram, the online social media platform with the highest percentage of users reporting experiencing cyberbullying. Our approach improves recall by a staggering 700%, while at the same time reducing the average number of features by up to 99.82% compared to state-of-the-art supervised cyberbullying detection methods, learning approaches that require weak supervision, and traditional offline feature selection and dimensionality reduction techniques.
ISSN:	2473-991X
DOI:	10.1109/ASONAM.2018.8508329