A comparative analysis between two techniques for the prediction of software defects: fuzzy and statistical linear regression

Software engineers should estimate the necessary resources (time, people, software tools among others) to satisfy software project requirements; this activity is carried out in the planning phase. The estimated time for developing software projects is a necessary element to establish the cost of sof...

Full description

Saved in:

Bibliographic Details
Published in	Innovations in systems and software engineering Vol. 11; no. 4; pp. 277 - 287
Main Author	Valles-Barajas, Fernando
Format	Journal Article
Language	English
Published	London Springer London 01.12.2015
Subjects	Artificial Intelligence Computer Applications Computer Science Original Paper Software Engineering Software defect prediction Fuzzy linear regression Statistical linear regression
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Software engineers should estimate the necessary resources (time, people, software tools among others) to satisfy software project requirements; this activity is carried out in the planning phase. The estimated time for developing software projects is a necessary element to establish the cost of software projects and to assign human resources to every phase of software projects. Most companies fail to finish software projects on time because of a poor estimation technique or the lack of the same. The estimated time must consider the time spent eliminating software defects injected during each of the software phases. A comparative analysis between two techniques (fuzzy linear regression and statistical linear regression) to perform software defect estimation is presented. These two techniques model uncertainty in a different way; statistical linear regression models uncertainty as randomness, whereas fuzzy linear regression models uncertainty as fuzziness. The main objective of this paper was to establish the kind of uncertainty associated with software defect prediction and to contrast these two prediction techniques. The KC1 NASA data set was used to do this analysis. Only six of the metrics included in KC1 data set and lines of code metric were used in this comparative analysis. Descriptive statistics was first used to have an overview of the main characteristics of the data set used in this research. Linearity property between predictor variables and the variable of interest number of defects was checked using scatter plots and Pearson’s correlation coefficient. Then the problem of multicollinearity was verified using inter-correlations among metrics and the variance inflation factor. Best subset regression was applied to detect the most influencing subset of predictor variables; this subset was later used to build fuzzy and statistical regression models. Linearity property between metrics and number of defects was confirmed. The problem of multicollinearity was not detected in the predictor variables. Best subset regression found that the subset composed of 5 variables was the most influencing subset. The analysis showed that the statistical regression model in general outperformed the fuzzy regression model. Techniques for making software defect prediction should be carefully employed in order to have quality plans. Software engineers should consider and understand a set of prediction techniques and know their weaknesses and strengths. At least, in the KC1 data set, the uncertainty in the software defect prediction model is due to randomness so it is reasonable to use statistical linear regression instead of fuzzy linear regression to build a prediction model.
ISSN:	1614-5046 1614-5054
DOI:	10.1007/s11334-015-0256-4