Prediction of Software Security Vulnerabilities from Source Code Using Machine Learning Methods

One of the most significant problems in software engineering is the presence of security vulnerabilities in software. Attackers can exploit these vulnerabilities to gain unauthorized access to systems, leak information, corrupt data, and cause service interruptions. Therefore, in addition to develop...

Full description

Saved in:

Bibliographic Details
Published in	2023 Innovations in Intelligent Systems and Applications Conference (ASYU) pp. 1 - 6
Main Authors	Mandal, Dilek, KOsesoy, Irfan
Format	Conference Proceeding
Language	English
Published	IEEE 11.10.2023
Subjects	AST Tree Doc2Vec Feature extraction Logistic regression Machine Learning Algorithms Prediction algorithms Software Software algorithms Software Vulnerability Source coding Support vector machines TF-IDF
Online Access	Get full text

Cover

Loading…

Abstract	One of the most significant problems in software engineering is the presence of security vulnerabilities in software. Attackers can exploit these vulnerabilities to gain unauthorized access to systems, leak information, corrupt data, and cause service interruptions. Therefore, in addition to developing secure software, the detection of existing security vulnerabilities in software is also considered as an important research topic. In this study, security vulnerabilities in the source code of software were predicted using machine learning methods. The OWASP Benchmark Test pocket was used as the dataset. This dataset consisted of Java codes and was utilized for training machine learning models Logistic Regression, Decision Tree, Support Vector Machines, K-Nearest Neighbors, and Random Forest. TF-IDF and Doc2Vec methods were employed to extract feature vectors from the source code. In the conducted experimental study, the highest prediction accuracy (0.97) was achieved using the TF-IDF feature extraction method and the Decision Tree, SVM and Logistic Regression algorithms.
AbstractList	One of the most significant problems in software engineering is the presence of security vulnerabilities in software. Attackers can exploit these vulnerabilities to gain unauthorized access to systems, leak information, corrupt data, and cause service interruptions. Therefore, in addition to developing secure software, the detection of existing security vulnerabilities in software is also considered as an important research topic. In this study, security vulnerabilities in the source code of software were predicted using machine learning methods. The OWASP Benchmark Test pocket was used as the dataset. This dataset consisted of Java codes and was utilized for training machine learning models Logistic Regression, Decision Tree, Support Vector Machines, K-Nearest Neighbors, and Random Forest. TF-IDF and Doc2Vec methods were employed to extract feature vectors from the source code. In the conducted experimental study, the highest prediction accuracy (0.97) was achieved using the TF-IDF feature extraction method and the Decision Tree, SVM and Logistic Regression algorithms.
Author	Mandal, Dilek KOsesoy, Irfan
Author_xml	– sequence: 1 givenname: Dilek surname: Mandal fullname: Mandal, Dilek email: mandaldilek@gmail.com organization: Kocaeli University,Computer Engineering,Kocaeli,Turkey – sequence: 2 givenname: Irfan surname: KOsesoy fullname: KOsesoy, Irfan email: irfan.kosesoy@kocaeli.edu.tr organization: Kocaeli University,Software Engineering,Kocaeli,Turkey
BookMark	eNo1kNtKAzEURaMoWGv_QDA_MPXkMrk8luINKgpjBZ9KJjljI20imSnSv7d4edqw2Sw265ycpJyQkCsGU8bAXs-at2VttDBTDlxMGXCrtNRHZGK1NaIGAaq2cExGXGuotJXqjEz6_gMABAfJmBqR1XPBEP0Qc6K5o03uhi9XkDbodyUOe_q62yQsro2bOETsaVfy9jDbFY90ngPSZR_TO310fh0T0gW6kn4KHNY59BfktHObHid_OSbL25uX-X21eLp7mM8WVWTMDpWHVjJljDJdG5R1dWuFUDW6Nhyuaim5ssiwa7mVwLwyNnQHBzXjGKTxIMbk8pcbEXH1WeLWlf3q34n4BuuqWHM
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ASYU58738.2023.10296747
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798350306590
EISSN	2770-7946
EndPage	6
ExternalDocumentID	10296747
Genre	orig-research
GroupedDBID	6IE 6IF 6IL 6IN ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL
ID	FETCH-LOGICAL-i119t-c0b4168868fbd69a5b93365eabd320744269e1efb29401c689df109512ed48c03
IEDL.DBID	RIE
IngestDate	Wed Jun 26 19:24:41 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i119t-c0b4168868fbd69a5b93365eabd320744269e1efb29401c689df109512ed48c03
PageCount	6
ParticipantIDs	ieee_primary_10296747
PublicationCentury	2000
PublicationDate	2023-Oct.-11
PublicationDateYYYYMMDD	2023-10-11
PublicationDate_xml	– month: 10 year: 2023 text: 2023-Oct.-11 day: 11
PublicationDecade	2020
PublicationTitle	2023 Innovations in Intelligent Systems and Applications Conference (ASYU)
PublicationTitleAbbrev	ASYU
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003204116
Score	1.9016271
Snippet	One of the most significant problems in software engineering is the presence of security vulnerabilities in software. Attackers can exploit these...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	AST Tree Doc2Vec Feature extraction Logistic regression Machine Learning Algorithms Prediction algorithms Software Software algorithms Software Vulnerability Source coding Support vector machines TF-IDF
Title	Prediction of Software Security Vulnerabilities from Source Code Using Machine Learning Methods
URI	https://ieeexplore.ieee.org/document/10296747
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fS8MwEA66J59UnPibPPja2vxomjzKcAxhYzAn82k0yRVE2WS2CP71XrJuoiD4UkpoIFxo7r7L990Rcl0YnZXOlolXUCBAMZBoDlVS8VxYURjHsiBOHo7UYCrvZ_msFatHLQwARPIZpOE13uX7pWtCqgz_cG4Uxr-7ZFdnfC3W2iZUBM8kY6rlcLHM3NxOnqa5LkRgcHGRbmb_6KMS3Uh_n4w2C1izR17Sprap-_xVm_HfKzwg3W_FHh1vfdEh2YHFEZmPV-EaJpieLis6wRP3o1wBnbQ96-hj8xqKTkd-LCJmGrQm-FlI59Pe0gONhAI6jIRLoG0tVhyIbaffu2Tav3voDZK2oULyzJipE5dZjL-0VrqyXpkyt0YIlUNpPVqvkEHWCgwqyw3CLqe08RULMRgHL7XLxDHpLJYLOCEUUZQsSim8l4gxELQ5br3GJ5dKGaFOSTdYZ_62rpkx3xjm7I_xc7IXNil4BcYuSKdeNXCJ7r62V3GbvwAJIalz
link.rule.ids	310,311,786,790,795,796,802,27947,55098
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Da2vxomhxlOKZuY7BN5qk0zSuIsspsEfzrTbJuoiB4KSU0EF5o3vtevu89hK4SJaMs11lgBCQWoCgIJIUiKGjMNEtUTiInTh4MRW_K72fxrBGrey0MAHjyGYTu1d_lmzKvXarM_uFUCRv_bqItG1gruZRrrVMqjEacENGwuEikrm_GT9NYJsxxuCgLV_N_dFLxjqS7i4arJSz5Iy9hXekw__xVnfHfa9xD7W_NHh6tvdE-2oD5AUpHC3cR44yPywKP7Zn7kS0Aj5uudfixfnVlpz1D1mJm7NQm9jOX0Med0gD2lAI88JRLwE01VjvgG0-_t9G0ezvp9IKmpULwTIiqgjzSNgKTUshCG6GyWCvGRAyZNtZ6CXfCViBQaKos8MqFVKYgLgqjYLjMI3aIWvNyDkcIWxzFk4wzY7hFGRa25VQbaZ-UC6GYOEZtZ530bVk1I10Z5uSP8Uu03ZsM-mn_bvhwinbchjkfQcgZalWLGs6t86_0hd_yLzmyrNA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+Innovations+in+Intelligent+Systems+and+Applications+Conference+%28ASYU%29&rft.atitle=Prediction+of+Software+Security+Vulnerabilities+from+Source+Code+Using+Machine+Learning+Methods&rft.au=Mandal%2C+Dilek&rft.au=KOsesoy%2C+Irfan&rft.date=2023-10-11&rft.pub=IEEE&rft.eissn=2770-7946&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FASYU58738.2023.10296747&rft.externalDocID=10296747