Automatic labeling of the elements of a vulnerability report CVE with NLP

Common Vulnerabilities and Exposures (CVE) databases contain information about vulnerabilities of software products and source code. If individual elements of CVE descriptions can be extracted and structured, then the data can be used to search and analyze CVE descriptions. Herein we propose a metho...

Full description

Saved in:

Bibliographic Details
Published in	2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI) pp. 164 - 165
Main Authors	Sumoto, Kensuke, Kanakogi, Kenta, Washizaki, Hironori, Tsuda, Naohiko, Yoshioka, Nobukazu, Fukazawa, Yoshiaki, Kanuka, Hideyuki
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2022
Subjects	BERT CVE Data science Distortion Machine learning named entity recognition Natural language processing security knowledge repository Software Technological Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Common Vulnerabilities and Exposures (CVE) databases contain information about vulnerabilities of software products and source code. If individual elements of CVE descriptions can be extracted and structured, then the data can be used to search and analyze CVE descriptions. Herein we propose a method to label each element in CVE descriptions by applying Named Entity Recognition (NER). For NER, we used BERT, a transformer-based natural language processing model. Using NER with machine learning can label information from CVE descriptions even if there are some distortions in the data. An experiment involving manually prepared label information for 1000 CVE descriptions shows that the labeling accuracy of the proposed method is about 0.81 for precision and about 0.89 for recall. In addition, we devise a way to train the data by dividing it into labels. Our proposed method can be used to label each element automatically from CVE descriptions.
DOI:	10.1109/IRI54793.2022.00045