Enhanced Accuracy and Prediction of Novel Random Forest Algorithm Compared over K-Nearest Neighbor in Software Bug Prediction System

The purpose of this research is to compare and contrast the effectiveness of the traditional K-Nearest Neighbor (KNN) method with a new approach based on random forests (RF) for detecting software bugs (SB). The main goal is to evaluate how well the proposed technique finds and predicts software pro...

Full description

Saved in:
Bibliographic Details
Published in2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS) pp. 1 - 6
Main Authors Bharath, KS, Nagalakshmi, T. J., Meenakshisundaram, N.
Format Conference Proceeding
LanguageEnglish
Published IEEE 18.04.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The purpose of this research is to compare and contrast the effectiveness of the traditional K-Nearest Neighbor (KNN) method with a new approach based on random forests (RF) for detecting software bugs (SB). The main goal is to evaluate how well the proposed technique finds and predicts software problems while keeping security risks to a minimum. The KNN technique and a new RF based methodology are compared in this research to see which one is better at predicting SB. In order to gather information on software projects and the bug reports that are associated with them, we make use of the Eclipse Bug Data dataset. This dataset is accessible to the general public and contains information about the number of lines of code, the complexity of the code, and the number of developers that participated in the project. In order to evaluate the effectiveness of different approaches, the dataset is randomly split into two groups, every consisting of 10 samples. We train and test the KNN technique with one set of data, and then we train and test the innovative RF - based approach with the other set of data. Both approaches are executed using Python and the scikit-learn package for machine learning (ML). Statistical power (G-power) = 0.85, alpha = 0.5, beta = 0.2, and confidence interval = 95% were used to perform the research and guarantee the findings' validity and reliability. Results from the experiments showed that compared to the KNN method's 75.59% accuracy, the proposed RF - based technique reached an impressive 78.59%. According to these results, the innovative strategy reduced security risks better than the KNN technique for SB prediction. The values obtained were 0.000 (p<0.05), indicating statistical significance. Research shows that when dealing with missing data and high-dimensional data both of which are common in software issue prediction the RF based approach performs better.
DOI:10.1109/ADICS58448.2024.10533477