Multilingual SMS Spam Detection using BERT and LSTM
With the increased use of digital communication, the battle against spam has gotten more fierce. It highlights how important spam identification is to systems like social media moderation, email filtering, and comment spam avoidance. Machine learning algorithms must always be enhanced in order to st...
Saved in:
Published in | 2024 International Conference on Innovations and Challenges in Emerging Technologies (ICICET) pp. 1 - 6 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
07.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | With the increased use of digital communication, the battle against spam has gotten more fierce. It highlights how important spam identification is to systems like social media moderation, email filtering, and comment spam avoidance. Machine learning algorithms must always be enhanced in order to stay ahead of newly developed spamming techniques and provide a safe online environment. This study uses a Kaggle dataset that was originally meant for spam detection. To conduct multilingual spam detection in French, German, and English, the data required some transformations and transitions. Thorough preparation, such as stop-word removal, tokenization, and category classification according to language, improves the dataset's flexibility for investigating intricate spam patterns in multilingual settings. To achieve the desired outcomes, a variety of machine learning algorithms like Multinomial NB, XGBoost, LSTM and BERT were appropriately applied. Among the models tested, Multinomial Naive Bayes exhibited superior performance with a remarkable combined accuracy of 98.1%, positioning it as a reliable choice for spam detection. With rigorous data cleaning, exploration, and model evaluation as a foundation, the work offers useful insights for spam detection on a variety of language datasets. |
---|---|
DOI: | 10.1109/ICICET59348.2024.10616322 |