基于PU学习的软件故障检测研究
针对软件故障数据中正例样本相对较少且大量样本标注困难的现实场景,已知未标注样本中包含用于建立故障检测模型的大量有用信息,提出仅用正例和未标注数据构建分类模型对软件开发过程中的故障进行检测的半监督学习方法。首先采用合成少数类过采样SMOTE算法对数据集中的正例样本进行过采样,平衡数据集中的类分布。在此基础上合理构建正例集合和未标注集合,采用POSC 4.5和Bagging算法构建软件故障决策树集成分类器。通过对NASA MDP数据库中的12个数据集进行对比实验,结果表明,仅用正例和未标注数据建模可以得到与有监督学习方法相近的软件故障检测率,且集成分类器方法比单分类器方法具有更高的检测率,未标注样...
Saved in:
Published in | 计算机应用研究 Vol. 32; no. 11; pp. 3324 - 3327 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
西北农林科技大学 信息工程学院,陕西 杨凌,712100%西北农林科技大学 机电学院,陕西 杨凌,712100
2015
|
Subjects | |
Online Access | Get full text |
ISSN | 1001-3695 |
DOI | 10.3969/j.issn.1001-3695.2015.11.028 |
Cover
Summary: | 针对软件故障数据中正例样本相对较少且大量样本标注困难的现实场景,已知未标注样本中包含用于建立故障检测模型的大量有用信息,提出仅用正例和未标注数据构建分类模型对软件开发过程中的故障进行检测的半监督学习方法。首先采用合成少数类过采样SMOTE算法对数据集中的正例样本进行过采样,平衡数据集中的类分布。在此基础上合理构建正例集合和未标注集合,采用POSC 4.5和Bagging算法构建软件故障决策树集成分类器。通过对NASA MDP数据库中的12个数据集进行对比实验,结果表明,仅用正例和未标注数据建模可以得到与有监督学习方法相近的软件故障检测率,且集成分类器方法比单分类器方法具有更高的检测率,未标注样本集大小的软件故障检测率同样有影响。 |
---|---|
Bibliography: | software fault prediction; PU learning; unbalanced data; decision tree; ensemble classifier 51-1196/TP Zhang He, Li Mei, Zhang Yang, Cai Xiaoyan( a. College of Information & Engineering, b. College of Mechanical & Electronic Engineering, Northwest A & F University, Yangling Shaanxi 712100, China) The software fault datasets were highly possible that there were only a small set of labeled positive data and most of the data was hard to be labeled, which contained a great deal of useful information for building a prediction model for software fault detection. This paper proposed a semi-supervised classification model to predict the faults only using the positive and unlabeled data during the software development process, The proposed method firstly used the SMOTE ( synthetic minority oversampling technique) method to balance the class distribution by oversampling on the rare positive dataset. Then partitioned the improved dataset into positive subset and unlabeled subset properly. Third used the POSC 4.5 algorithm |
ISSN: | 1001-3695 |
DOI: | 10.3969/j.issn.1001-3695.2015.11.028 |