Improving software fault-prediction for imbalanced data

Fault-proneness has been studied extensively as a quality factor. The prediction of fault-proneness of software modules can help software engineers to plan evolutions of the system. This plan can be compromised in case prediction models are biased or do not have high prediction performance. One majo...

Full description

Saved in:

Bibliographic Details
Published in	2012 International Conference on Innovations in Information Technology pp. 54 - 59
Main Author	Shatnawi, R.
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2012
Subjects	CK metrics data mining Data models fault-proneness imbalanced data Measurement Object oriented modeling Predictive models ROC curve Software engineering Software quality
Online Access	Get full text
ISBN	9781467311007 1467311006
DOI	10.1109/INNOVATIONS.2012.6207774

Cover

Loading…

More Information
Summary:	Fault-proneness has been studied extensively as a quality factor. The prediction of fault-proneness of software modules can help software engineers to plan evolutions of the system. This plan can be compromised in case prediction models are biased or do not have high prediction performance. One major issue that can impact the prediction performance is the fault distributions such as the data imbalance, i.e., the majority of modules are faultless whereas the minority of modules is only faulty. In this paper, we propose to use the fault content (i.e., the number of faults in a module) to oversample the minority. We applied this technique on a large object-oriented system - Eclipse. The proposed oversampling is tested on three classifiers. The results have shown a better prediction performance than other traditional oversampling techniques. The oversampling technique is more convenient than other sampling techniques because it's guided by information provided from the software history.
ISBN:	9781467311007 1467311006
DOI:	10.1109/INNOVATIONS.2012.6207774