Feature Extraction Methods and Classification for Malware Incident News

Studies related to data mining are one of the topics that have received much interest recently, including for the form of unstructured data. One that is commonly discussed is the automatic classification process using machine learning methods. A large amount of data is the main obstacle in the manua...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE International Conference on Cryptography, Informatics, and Cybersecurity (ICoCICs) pp. 115 - 120
Main Authors Gumilar, Gugum, Budiarto, Eka, Galinium, Maulahikmah, Lim, Charles
Format Conference Proceeding
LanguageEnglish
Published IEEE 22.08.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Studies related to data mining are one of the topics that have received much interest recently, including for the form of unstructured data. One that is commonly discussed is the automatic classification process using machine learning methods. A large amount of data is the main obstacle in the manual classification process. However, there are still many people who have difficulty determining the right combination between feature extraction and classification methods, so with this, we provide suggestions for using a variety of ways that can produce better accuracy in text classification. This research compares several feature extraction methods, including Bag-of-Word (BoW), Term Frequency - Inverse Document Frequency (TF-IDF), and Word2Vec with focusing on the Skip-gram model. On the other hand, this research also uses several classification methods, which include Support Vector Machine (SVM), Decision Tree, Logistic Regression, Gaussian Naive Bayes, K-Nearest Neighbor, Neural Network, Random Forest, and Doc2Vec. This research used two hundred crawled articles from several web blogs that have been labeled manually and have been split into two classes, malware incident news, and non-malware incident news class. The dataset quality was also measured using an open-source Python library known as "Cleanlab".
DOI:10.1109/ICoCICs58778.2023.10276685