基于集成混合采样的软件缺陷预测研究

对软件缺陷预测的不平衡问题进行了研究,提出了一种处理不平衡数据的采样方法,用来解决分类器因为样本集中的样本类别不平衡而造成分类器性能下降的问题。为了避免随机采样的盲目性,利用启发性的混合采样方法来平衡数据,针对少数类采用SMOTE过采样,对多数类采用K-Means聚类降采样,然后综合利用多个单分类器来进行投票集成预测分类。实验结果表明,混合采样与集成学习相结合的软件缺陷预测方法具有较好的分类效果,在获得较高的查全率的同时还能显著降低误报率。...

Full description

Saved in:
Bibliographic Details
Published in计算机工程与科学 Vol. 37; no. 5; pp. 930 - 936
Main Author 戴翔 毛宇光
Format Journal Article
LanguageChinese
Published 南京大学计算机软件新技术国家重点实验室,江苏南京210093 2015
南京航空航天大学计算机科学与技术学院,江苏南京,210016%南京航空航天大学计算机科学与技术学院,江苏南京210016
Subjects
Online AccessGet full text
ISSN1007-130X
DOI10.3969/j.issn.1007-130X.2015.05.012

Cover

Abstract 对软件缺陷预测的不平衡问题进行了研究,提出了一种处理不平衡数据的采样方法,用来解决分类器因为样本集中的样本类别不平衡而造成分类器性能下降的问题。为了避免随机采样的盲目性,利用启发性的混合采样方法来平衡数据,针对少数类采用SMOTE过采样,对多数类采用K-Means聚类降采样,然后综合利用多个单分类器来进行投票集成预测分类。实验结果表明,混合采样与集成学习相结合的软件缺陷预测方法具有较好的分类效果,在获得较高的查全率的同时还能显著降低误报率。
AbstractList TP306; 对软件缺陷预测的不平衡问题进行了研究,提出了一种处理不平衡数据的采样方法,用来解决分类器因为样本集中的样本类别不平衡而造成分类器性能下降的问题.为了避免随机采样的盲目性,利用启发性的混合采样方法来平衡数据,针对少数类采用SMOTE过采样,对多数类采用K-Means聚类降采样,然后综合利用多个单分类器来进行投票集成预测分类.实验结果表明,混合采样与集成学习相结合的软件缺陷预测方法具有较好的分类效果,在获得较高的查全率的同时还能显著降低误报率.
对软件缺陷预测的不平衡问题进行了研究,提出了一种处理不平衡数据的采样方法,用来解决分类器因为样本集中的样本类别不平衡而造成分类器性能下降的问题。为了避免随机采样的盲目性,利用启发性的混合采样方法来平衡数据,针对少数类采用SMOTE过采样,对多数类采用K-Means聚类降采样,然后综合利用多个单分类器来进行投票集成预测分类。实验结果表明,混合采样与集成学习相结合的软件缺陷预测方法具有较好的分类效果,在获得较高的查全率的同时还能显著降低误报率。
Author 戴翔 毛宇光
AuthorAffiliation 南京航空航天大学计算机科学与技术学院,江苏南京210016 南京大学计算机软件新技术国家重点实验室,江苏南京210093
AuthorAffiliation_xml – name: 南京航空航天大学计算机科学与技术学院,江苏南京,210016%南京航空航天大学计算机科学与技术学院,江苏南京210016;南京大学计算机软件新技术国家重点实验室,江苏南京210093
Author_FL DAI Xiang
MAO Yu-guang
Author_FL_xml – sequence: 1
  fullname: DAI Xiang
– sequence: 2
  fullname: MAO Yu-guang
Author_xml – sequence: 1
  fullname: 戴翔 毛宇光
BookMark eNo9j71Lw0AAxW-oYK39J8TFIfE-cne5UYpfUHDp4BYulw8T9aoNot2LLRSkg4s6WNycnGwg6n_TJua_MFIRHjx4_HiPtwZquqt9ADYRNIlgYjs2oyTRJoKQG4jAYxNDRE1YCeEaqP_nq6CZJJELIaPMphzVAVk8Z_Psrny6zUeTPE0Xk1E5HObTtHgcfH-9zT9mxWdWPqTlyyB_HxfT--J1tg5WAnmW-M0_b4DO3m6ndWC0j_YPWzttQzGEDQ9JjDHxAkVcrlxqC0WFRTwmqW1L6VpCBQJ60Pc5U4Hlc-FyW3BBkG3hwBWkAbaWtddSB1KHTty96ulq0ImTOFT905vfl5BWHyt2Y8mqk64OL6OKvuhF57LXdxizOMWCUPIDslNp2w
ClassificationCodes TP306
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2RA
92L
CQIGP
W92
~WA
2B.
4A8
92I
93N
PSX
TCJ
DOI 10.3969/j.issn.1007-130X.2015.05.012
DatabaseName 维普期刊资源整合服务平台
中文科技期刊数据库-CALIS站点
中文科技期刊数据库-7.0平台
中文科技期刊数据库-工程技术
中文科技期刊数据库- 镜像站点
Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
DocumentTitleAlternate Research on software defect prediction based on integrated sampling and ensemble learning
DocumentTitle_FL Research on software defect prediction based on integrated sampling and ensemble learning
EndPage 936
ExternalDocumentID jsjgcykx201505012
664752935
GrantInformation_xml – fundername: 国家自然科学基金资助项目
  funderid: (41301407)
GroupedDBID 2RA
92L
ALMA_UNASSIGNED_HOLDINGS
CDYEO
CQIGP
W92
~WA
2B.
4A8
92I
93N
PSX
TCJ
ID FETCH-LOGICAL-c612-d1a2223dfc3b7cb589c5943d6a588aab49cf90d0ee76cf4e79b7897931842fb93
ISSN 1007-130X
IngestDate Thu May 29 04:04:00 EDT 2025
Wed Feb 14 10:30:44 EST 2024
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords ensemble learning
集成学习
SMOTE
不平衡数据
K-Means
unbalanced dataset
vote
投票
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c612-d1a2223dfc3b7cb589c5943d6a588aab49cf90d0ee76cf4e79b7897931842fb93
Notes We study the class-imbalanced problem of software defect prediction and propose an inte- grated sampling method for class-imbalanced data classification so as to enhance the classification ability. In order to avoid the blindness of random sampling, we utilize the integrated sampling method to balance datasets..using SMOTE for over-sampling minority class and K-Means clustering for down-sampling ma- jority class. After obtaining a balanced dataset,we utilize multiple single classifiers to ensemble learning. Experimental results show that the software defect prediction algorithm, which combines integrated sam- pling and ensemble learning, has better classification performance, obtaining a higher true positive rate while significantly reducing the false alarm rate.
unbalanced dataset ; SMOTE ; K-Means ; vote ; ensemble learning
43-1258/TP
DAI Xiang , MAO Yu-guang ( 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016; 2. State Key Laboratory for Novel So
PageCount 7
ParticipantIDs wanfang_journals_jsjgcykx201505012
chongqing_primary_664752935
PublicationCentury 2000
PublicationDate 2015
PublicationDateYYYYMMDD 2015-01-01
PublicationDate_xml – year: 2015
  text: 2015
PublicationDecade 2010
PublicationTitle 计算机工程与科学
PublicationTitleAlternate Computer Engineering & Science
PublicationTitle_FL Computer Engineering and Science
PublicationYear 2015
Publisher 南京大学计算机软件新技术国家重点实验室,江苏南京210093
南京航空航天大学计算机科学与技术学院,江苏南京,210016%南京航空航天大学计算机科学与技术学院,江苏南京210016
Publisher_xml – name: 南京航空航天大学计算机科学与技术学院,江苏南京,210016%南京航空航天大学计算机科学与技术学院,江苏南京210016
– name: 南京大学计算机软件新技术国家重点实验室,江苏南京210093
SSID ssib006568571
ssib017479296
ssib001050383
ssib015938883
ssib001102936
ssib051375740
ssib023646326
ssib036438059
ssib000459496
Score 1.980427
Snippet ...
TP306;...
SourceID wanfang
chongqing
SourceType Aggregation Database
Publisher
StartPage 930
SubjectTerms K-Means
SMOTE
不平衡数据
投票
集成学习
Title 基于集成混合采样的软件缺陷预测研究
URI http://lib.cqvip.com/qk/94293X/201505/664752935.html
https://d.wanfangdata.com.cn/periodical/jsjgcykx201505012
Volume 37
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1Na9RAdKgtiBdRVKxVKdI5puZjPo_JbpYi1osV9rYkk01LhW21LWjPxRYU6cGLerB48-TJLqz6b9pd91_43iTdDbSUKoQwmffmfcxL9r03Ox-EzBkWGGkEcwwE9w66ACfVnDtZKt0c_KNhLi5wXnwiFp6xR03enLj0rjJraWsznTfbZ64r-R-rQh3YFVfJ_oNlR0ShAspgX7iDheF-IRvTmFPdoFFIY4Z3FdNYUx1RJWgsqFJUu1iIJF6I7GIl4CiJF4BC14Ik1dCc0VjRqE7DhiUY0UggKKpZFkBZW2RNQ98iA2VOVYQ4QEczWwAcUQ15kWYY09Cz0Jhqy1cXNDkSDLkFKUsK-CqrCNSAVB7ihCDSaASx1Cyy3KIG1awKAdFBf2wTWw1BPrh0dXCjWNhpX0QLr1uRbAeGNduUIesTvufKf7q7AATdYrtdhTiPpEBGHI6iAX4hHSIXhqgjZeXTSJcgbe0YQvNaWQMi-TXb4R5aPLZ9pRqn5Yfc2i0OgyxdDQ4SQwTRrPqiYgOc8pvjFceiy3-v2uWTOMv9BVpo6_6QwfyIAU5g5MX-tP7Y7Y8mY65urC6b189fIZbLXTyye8qX0uOTZCqsLz5-Wg3_Natsz-jZzYSq67JdiB7HcMgVFB-nGxA5B0qN8SEZlhCdj_DxLANRSSfgMVCV8J97geSyWON8ottlMlcq_vA8tXHnlJW1zvILiAft8rxOnnSWK5Hk0jVytUwBZ8Pie75OJrZXbpDg-EvvqPd--PlNf2-_3-0e7-8Nd3f7B93Bp50_v78f_Twc_OoNP3aHX3f6P94ODj4Mvh3eJEuNeKm24JQHmjgGEgkn8xKMxrPcBKk0KVfaQH8GmUi4UkmSMm1y7WZuuy2FyVlb6lQqDQ7UU8zPUx3cIpOdtU77NplNeGoSSIyF66UsgyxC-6kxUmdGstwzcprMjPRtrRf71rSEYJKDgfg0eVD2QKv8NdtonXoL7lwEaYZcwXIxJnmXTG6-3Grfgyh9M71fvjx_AbDfqjc
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%9F%BA%E4%BA%8E%E9%9B%86%E6%88%90%E6%B7%B7%E5%90%88%E9%87%87%E6%A0%B7%E7%9A%84%E8%BD%AF%E4%BB%B6%E7%BC%BA%E9%99%B7%E9%A2%84%E6%B5%8B%E7%A0%94%E7%A9%B6&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%B7%A5%E7%A8%8B%E4%B8%8E%E7%A7%91%E5%AD%A6&rft.au=%E6%88%B4%E7%BF%94&rft.au=%E6%AF%9B%E5%AE%87%E5%85%89&rft.date=2015&rft.pub=%E5%8D%97%E4%BA%AC%E5%A4%A7%E5%AD%A6%E8%AE%A1%E7%AE%97%E6%9C%BA%E8%BD%AF%E4%BB%B6%E6%96%B0%E6%8A%80%E6%9C%AF%E5%9B%BD%E5%AE%B6%E9%87%8D%E7%82%B9%E5%AE%9E%E9%AA%8C%E5%AE%A4%2C%E6%B1%9F%E8%8B%8F%E5%8D%97%E4%BA%AC210093&rft.issn=1007-130X&rft.volume=37&rft.issue=5&rft.spage=930&rft.epage=936&rft_id=info:doi/10.3969%2Fj.issn.1007-130X.2015.05.012&rft.externalDocID=jsjgcykx201505012
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F94293X%2F94293X.jpg
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjgcykx%2Fjsjgcykx.jpg