基于集成混合采样的软件缺陷预测研究
对软件缺陷预测的不平衡问题进行了研究,提出了一种处理不平衡数据的采样方法,用来解决分类器因为样本集中的样本类别不平衡而造成分类器性能下降的问题。为了避免随机采样的盲目性,利用启发性的混合采样方法来平衡数据,针对少数类采用SMOTE过采样,对多数类采用K-Means聚类降采样,然后综合利用多个单分类器来进行投票集成预测分类。实验结果表明,混合采样与集成学习相结合的软件缺陷预测方法具有较好的分类效果,在获得较高的查全率的同时还能显著降低误报率。...
Saved in:
Published in | 计算机工程与科学 Vol. 37; no. 5; pp. 930 - 936 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
南京大学计算机软件新技术国家重点实验室,江苏南京210093
2015
南京航空航天大学计算机科学与技术学院,江苏南京,210016%南京航空航天大学计算机科学与技术学院,江苏南京210016 |
Subjects | |
Online Access | Get full text |
ISSN | 1007-130X |
DOI | 10.3969/j.issn.1007-130X.2015.05.012 |
Cover
Loading…
Summary: | 对软件缺陷预测的不平衡问题进行了研究,提出了一种处理不平衡数据的采样方法,用来解决分类器因为样本集中的样本类别不平衡而造成分类器性能下降的问题。为了避免随机采样的盲目性,利用启发性的混合采样方法来平衡数据,针对少数类采用SMOTE过采样,对多数类采用K-Means聚类降采样,然后综合利用多个单分类器来进行投票集成预测分类。实验结果表明,混合采样与集成学习相结合的软件缺陷预测方法具有较好的分类效果,在获得较高的查全率的同时还能显著降低误报率。 |
---|---|
Bibliography: | We study the class-imbalanced problem of software defect prediction and propose an inte- grated sampling method for class-imbalanced data classification so as to enhance the classification ability. In order to avoid the blindness of random sampling, we utilize the integrated sampling method to balance datasets..using SMOTE for over-sampling minority class and K-Means clustering for down-sampling ma- jority class. After obtaining a balanced dataset,we utilize multiple single classifiers to ensemble learning. Experimental results show that the software defect prediction algorithm, which combines integrated sam- pling and ensemble learning, has better classification performance, obtaining a higher true positive rate while significantly reducing the false alarm rate. unbalanced dataset ; SMOTE ; K-Means ; vote ; ensemble learning 43-1258/TP DAI Xiang , MAO Yu-guang ( 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016; 2. State Key Laboratory for Novel So |
ISSN: | 1007-130X |
DOI: | 10.3969/j.issn.1007-130X.2015.05.012 |