Prediction of Software Security Vulnerabilities from Source Code Using Machine Learning Methods

One of the most significant problems in software engineering is the presence of security vulnerabilities in software. Attackers can exploit these vulnerabilities to gain unauthorized access to systems, leak information, corrupt data, and cause service interruptions. Therefore, in addition to develop...

Full description

Saved in:
Bibliographic Details
Published in2023 Innovations in Intelligent Systems and Applications Conference (ASYU) pp. 1 - 6
Main Authors Mandal, Dilek, KOsesoy, Irfan
Format Conference Proceeding
LanguageEnglish
Published IEEE 11.10.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract One of the most significant problems in software engineering is the presence of security vulnerabilities in software. Attackers can exploit these vulnerabilities to gain unauthorized access to systems, leak information, corrupt data, and cause service interruptions. Therefore, in addition to developing secure software, the detection of existing security vulnerabilities in software is also considered as an important research topic. In this study, security vulnerabilities in the source code of software were predicted using machine learning methods. The OWASP Benchmark Test pocket was used as the dataset. This dataset consisted of Java codes and was utilized for training machine learning models Logistic Regression, Decision Tree, Support Vector Machines, K-Nearest Neighbors, and Random Forest. TF-IDF and Doc2Vec methods were employed to extract feature vectors from the source code. In the conducted experimental study, the highest prediction accuracy (0.97) was achieved using the TF-IDF feature extraction method and the Decision Tree, SVM and Logistic Regression algorithms.
AbstractList One of the most significant problems in software engineering is the presence of security vulnerabilities in software. Attackers can exploit these vulnerabilities to gain unauthorized access to systems, leak information, corrupt data, and cause service interruptions. Therefore, in addition to developing secure software, the detection of existing security vulnerabilities in software is also considered as an important research topic. In this study, security vulnerabilities in the source code of software were predicted using machine learning methods. The OWASP Benchmark Test pocket was used as the dataset. This dataset consisted of Java codes and was utilized for training machine learning models Logistic Regression, Decision Tree, Support Vector Machines, K-Nearest Neighbors, and Random Forest. TF-IDF and Doc2Vec methods were employed to extract feature vectors from the source code. In the conducted experimental study, the highest prediction accuracy (0.97) was achieved using the TF-IDF feature extraction method and the Decision Tree, SVM and Logistic Regression algorithms.
Author Mandal, Dilek
KOsesoy, Irfan
Author_xml – sequence: 1
  givenname: Dilek
  surname: Mandal
  fullname: Mandal, Dilek
  email: mandaldilek@gmail.com
  organization: Kocaeli University,Computer Engineering,Kocaeli,Turkey
– sequence: 2
  givenname: Irfan
  surname: KOsesoy
  fullname: KOsesoy, Irfan
  email: irfan.kosesoy@kocaeli.edu.tr
  organization: Kocaeli University,Software Engineering,Kocaeli,Turkey
BookMark eNo1kNtKAzEURaMoWGv_QDA_MPXkMrk8luINKgpjBZ9KJjljI20imSnSv7d4edqw2Sw265ycpJyQkCsGU8bAXs-at2VttDBTDlxMGXCrtNRHZGK1NaIGAaq2cExGXGuotJXqjEz6_gMABAfJmBqR1XPBEP0Qc6K5o03uhi9XkDbodyUOe_q62yQsro2bOETsaVfy9jDbFY90ngPSZR_TO310fh0T0gW6kn4KHNY59BfktHObHid_OSbL25uX-X21eLp7mM8WVWTMDpWHVjJljDJdG5R1dWuFUDW6Nhyuaim5ssiwa7mVwLwyNnQHBzXjGKTxIMbk8pcbEXH1WeLWlf3q34n4BuuqWHM
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ASYU58738.2023.10296747
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350306590
EISSN 2770-7946
EndPage 6
ExternalDocumentID 10296747
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i119t-c0b4168868fbd69a5b93365eabd320744269e1efb29401c689df109512ed48c03
IEDL.DBID RIE
IngestDate Wed Jun 26 19:24:41 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-c0b4168868fbd69a5b93365eabd320744269e1efb29401c689df109512ed48c03
PageCount 6
ParticipantIDs ieee_primary_10296747
PublicationCentury 2000
PublicationDate 2023-Oct.-11
PublicationDateYYYYMMDD 2023-10-11
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-Oct.-11
  day: 11
PublicationDecade 2020
PublicationTitle 2023 Innovations in Intelligent Systems and Applications Conference (ASYU)
PublicationTitleAbbrev ASYU
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003204116
Score 1.9016271
Snippet One of the most significant problems in software engineering is the presence of security vulnerabilities in software. Attackers can exploit these...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms AST Tree
Doc2Vec
Feature extraction
Logistic regression
Machine Learning Algorithms
Prediction algorithms
Software
Software algorithms
Software Vulnerability
Source coding
Support vector machines
TF-IDF
Title Prediction of Software Security Vulnerabilities from Source Code Using Machine Learning Methods
URI https://ieeexplore.ieee.org/document/10296747
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fS8MwEA66J59UnPibPPja2vxomjzKcAxhYzAn82k0yRVE2WS2CP71XrJuoiD4UkpoIFxo7r7L990Rcl0YnZXOlolXUCBAMZBoDlVS8VxYURjHsiBOHo7UYCrvZ_msFatHLQwARPIZpOE13uX7pWtCqgz_cG4Uxr-7ZFdnfC3W2iZUBM8kY6rlcLHM3NxOnqa5LkRgcHGRbmb_6KMS3Uh_n4w2C1izR17Sprap-_xVm_HfKzwg3W_FHh1vfdEh2YHFEZmPV-EaJpieLis6wRP3o1wBnbQ96-hj8xqKTkd-LCJmGrQm-FlI59Pe0gONhAI6jIRLoG0tVhyIbaffu2Tav3voDZK2oULyzJipE5dZjL-0VrqyXpkyt0YIlUNpPVqvkEHWCgwqyw3CLqe08RULMRgHL7XLxDHpLJYLOCEUUZQsSim8l4gxELQ5br3GJ5dKGaFOSTdYZ_62rpkx3xjm7I_xc7IXNil4BcYuSKdeNXCJ7r62V3GbvwAJIalz
link.rule.ids 310,311,786,790,795,796,802,27947,55098
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA46D3pSceJvc_Da2vxomhxlOKZuY7BN5qk0zSuIsspsEfzrTbJuoiB4KSU0EF5o3vtevu89hK4SJaMs11lgBCQWoCgIJIUiKGjMNEtUTiInTh4MRW_K72fxrBGrey0MAHjyGYTu1d_lmzKvXarM_uFUCRv_bqItG1gruZRrrVMqjEacENGwuEikrm_GT9NYJsxxuCgLV_N_dFLxjqS7i4arJSz5Iy9hXekw__xVnfHfa9xD7W_NHh6tvdE-2oD5AUpHC3cR44yPywKP7Zn7kS0Aj5uudfixfnVlpz1D1mJm7NQm9jOX0Med0gD2lAI88JRLwE01VjvgG0-_t9G0ezvp9IKmpULwTIiqgjzSNgKTUshCG6GyWCvGRAyZNtZ6CXfCViBQaKos8MqFVKYgLgqjYLjMI3aIWvNyDkcIWxzFk4wzY7hFGRa25VQbaZ-UC6GYOEZtZ530bVk1I10Z5uSP8Uu03ZsM-mn_bvhwinbchjkfQcgZalWLGs6t86_0hd_yLzmyrNA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+Innovations+in+Intelligent+Systems+and+Applications+Conference+%28ASYU%29&rft.atitle=Prediction+of+Software+Security+Vulnerabilities+from+Source+Code+Using+Machine+Learning+Methods&rft.au=Mandal%2C+Dilek&rft.au=KOsesoy%2C+Irfan&rft.date=2023-10-11&rft.pub=IEEE&rft.eissn=2770-7946&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FASYU58738.2023.10296747&rft.externalDocID=10296747