Domain-adaptive entity recognition: unveiling the potential of CSER in cybersecurity and beyond

In the dynamic fields of cybersecurity, precise recognition and identification of cybersecurity-related entities in textual data have become crucial. Existing studies on Named Entity Recognition (NER) in the cybersecurity domain often overlook challenges posed by data sparsity and the substantial pr...

Full description

Saved in:

Bibliographic Details
Published in	International journal of machine learning and cybernetics Vol. 16; no. 5; pp. 2849 - 2867
Main Authors	Marjan, Md. Abu, Amagasa, Toshiyuki
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.06.2025 Springer Nature B.V
Subjects	Artificial Intelligence Automation Complex Systems Computational Intelligence Conditional random fields Control Cybercrime Cybersecurity Datasets Deep learning Effectiveness Engineering Machine learning Malware Mechatronics Methods Morphology Neural networks Original Article Pattern Recognition Recognition Robotics Semantics Software Systems Biology Threats Deep learning Cybersecurity Feature fusion Name entity recognition
Online Access	Get full text
ISSN	1868-8071 1868-808X
DOI	10.1007/s13042-024-02424-9

Cover

More Information
Summary:	In the dynamic fields of cybersecurity, precise recognition and identification of cybersecurity-related entities in textual data have become crucial. Existing studies on Named Entity Recognition (NER) in the cybersecurity domain often overlook challenges posed by data sparsity and the substantial presence of Out-of-Vocabulary (OOV) tokens in Cyber Treat Intelligence (CTI) reports. To tackle these challenges, we introduce the Cybersecurity Entity Recognition (CSER) model—a comprehensive approach crafted to handle CTI data complexities and similar intricacies across other domains. The CSER model integrates output from contextual, semantic, and morphological encoders to form a robust feature vector, capturing nuanced patterns, buzzwords, and structural attributes specific to cybersecurity entities. In particular, we employ various deep-learning approaches to capture morphological and contextual features, while pre-trained embeddings are utilized to capture semantic features. Additionally, Conditional Random Field (CRF) is employed as a sequential decoder, enhancing the effectiveness of cybersecurity entity identification. Extensive experiments on genuine cybersecurity datasets reveal that the proposed CSER model surpasses contemporary state-of-the-art methods, demonstrating superior predictive performance. To validate the effectiveness of this model, experiments are extended to datasets from biomedical and material science domains, providing comprehensive insights into the model’s adaptability across diverse domains. Our research demonstrates that the CSER model excels in domains with frequent OOV tokens, particularly cybersecurity, addressing data sparsity effectively. Its capability to manage a substantial volume of OOV tokens enhances performance where traditional models struggle.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1868-8071 1868-808X
DOI:	10.1007/s13042-024-02424-9