SPAW-SMOTE: Space Partitioning Adaptive Weighted Synthetic Minority Oversampling Technique For Imbalanced Data Set Learning

The problem of data imbalance is common in reality, which greatly affects the performance of classifiers. Most of the solutions are to balance the data set by generating new minority class samples, which are faced with the problems of selecting the appropriate area for generating samples, fuzzy clas...

Full description

Saved in:
Bibliographic Details
Published inComputer journal Vol. 67; no. 5; pp. 1747 - 1762
Main Authors Zhang, Qiang, He, Junjiang, Li, Tao, Lan, Xiaolong, Fang, Wenbo, Li, Yihong
Format Journal Article
LanguageEnglish
Published Oxford University Press 22.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The problem of data imbalance is common in reality, which greatly affects the performance of classifiers. Most of the solutions are to balance the data set by generating new minority class samples, which are faced with the problems of selecting the appropriate area for generating samples, fuzzy classification boundary and uneven distribution of samples. To solve these problems, we propose a novel oversampling algorithm named space partitioning adaptive weighted synthetic minority oversampling technique (SPAW-SMOTE). We first divide the data space into boundary space and non-boundary space based on spatial partitioning techniques. The number of samples to be generated is assigned to different spaces by the designed adaptive weighting algorithm, which is used to solve the problems of uneven distribution of samples and easy to blur the classification boundary. Finally, we also endeavor to develop a new generation algorithm to reduce the probability of overlapping samples generated when synthesizing new samples and to ensure the diversity of new samples. Experimental results on 18 real-world data sets show that the average performance (G-mean, F1-measure and Area Under Curve) of SPAW-SMOTE is significantly better than other existing oversampling techniques.
AbstractList The problem of data imbalance is common in reality, which greatly affects the performance of classifiers. Most of the solutions are to balance the data set by generating new minority class samples, which are faced with the problems of selecting the appropriate area for generating samples, fuzzy classification boundary and uneven distribution of samples. To solve these problems, we propose a novel oversampling algorithm named space partitioning adaptive weighted synthetic minority oversampling technique (SPAW-SMOTE). We first divide the data space into boundary space and non-boundary space based on spatial partitioning techniques. The number of samples to be generated is assigned to different spaces by the designed adaptive weighting algorithm, which is used to solve the problems of uneven distribution of samples and easy to blur the classification boundary. Finally, we also endeavor to develop a new generation algorithm to reduce the probability of overlapping samples generated when synthesizing new samples and to ensure the diversity of new samples. Experimental results on 18 real-world data sets show that the average performance (G-mean, F1-measure and Area Under Curve) of SPAW-SMOTE is significantly better than other existing oversampling techniques.
Author Fang, Wenbo
Lan, Xiaolong
He, Junjiang
Li, Tao
Li, Yihong
Zhang, Qiang
Author_xml – sequence: 1
  givenname: Qiang
  surname: Zhang
  fullname: Zhang, Qiang
– sequence: 2
  givenname: Junjiang
  surname: He
  fullname: He, Junjiang
  email: hejunjiang@scu.edu.cn
– sequence: 3
  givenname: Tao
  surname: Li
  fullname: Li, Tao
– sequence: 4
  givenname: Xiaolong
  surname: Lan
  fullname: Lan, Xiaolong
– sequence: 5
  givenname: Wenbo
  surname: Fang
  fullname: Fang, Wenbo
– sequence: 6
  givenname: Yihong
  surname: Li
  fullname: Li, Yihong
BookMark eNqFkD1vwjAURa2qlQq0a2evHQLPBkzSDVFokUAghYox8scDjBIndQwq6p8vCPZOd7nnDKdJ7l3pkJAXBm0GSbejy2Lv8o76kQaS-I40WE9AxEEM7kkDgEHUExweSbOu9wDAIREN8psuh-sonS9W4zeaVlIjXUofbLCls25Lh0ZWwR6RrtFudwENTU8u7DBYTefWld6GE10c0deyqPILsUK9c_b7gHRSejotlMyl02fwXQZJUwx0htJf5E_kYSPzGp9v2yJfk_Fq9BnNFh_T0XAWac7jECkmEiEGyvTjzYCZLuspFKavmDLIBeqYYR9QA-fcJBoM17FBJZK4p1WXC-i2SPvq1b6sa4-brPK2kP6UMcgu6bJruuyW7gy8XoHyUP33_QORCnY0
Cites_doi 10.1002/9781118548387
ContentType Journal Article
Copyright The British Computer Society 2023. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2023
Copyright_xml – notice: The British Computer Society 2023. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2023
DBID AAYXX
CITATION
DOI 10.1093/comjnl/bxad098
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1460-2067
EndPage 1762
ExternalDocumentID 10_1093_comjnl_bxad098
10.1093/comjnl/bxad098
GroupedDBID -E4
-~X
.2P
.DC
.I3
0R~
123
18M
1OL
1TH
29F
3R3
4.4
41~
48X
5VS
5WA
6J9
6TJ
70D
85S
9M8
AAIJN
AAJKP
AAJQQ
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AAUAY
AAUQX
AAVAP
AAYOK
ABAZT
ABDFA
ABDTM
ABEFU
ABEJV
ABEUO
ABGNP
ABIXL
ABNKS
ABPTD
ABQLI
ABSMQ
ABVGC
ABVLG
ABXVV
ABZBJ
ACBEA
ACFRR
ACGFS
ACGOD
ACIWK
ACNCT
ACUFI
ACUTJ
ACUXJ
ACVCV
ACYTK
ADEYI
ADEZT
ADGZP
ADHKW
ADHZD
ADIPN
ADMLS
ADOCK
ADQBN
ADRDM
ADRTK
ADVEK
ADYJX
ADYVW
ADZXQ
AECKG
AEGPL
AEGXH
AEJOX
AEKKA
AEKSI
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFIYH
AFOFC
AGINJ
AGKEF
AGMDO
AGORE
AGSYK
AHGBF
AHXPO
AI.
AIDUJ
AIJHB
AJBYB
AJEEA
AJEUX
AJNCP
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
ALXQX
ANAKG
APIBT
APJGH
APWMN
ASAOO
ATDFG
ATGXG
AXUDD
AZVOD
BAYMD
BCRHZ
BEFXN
BEYMZ
BFFAM
BGNUA
BHONS
BKEBE
BPEOZ
BQUQU
BTQHN
CAG
CDBKE
COF
CS3
CXTWN
CZ4
DAKXR
DFGAJ
DILTD
DU5
D~K
EBS
EE~
EJD
F9B
FA8
FLIZI
FLUFQ
FOEOM
GAUVT
GJXCC
H13
H5~
HAR
HW0
HZ~
H~9
IOX
J21
JAVBF
JXSIZ
KBUDW
KOP
KSI
KSN
M-Z
MBTAY
ML0
MVM
N9A
NGC
NMDNZ
NOMLY
NU-
O0~
O9-
OCL
ODMLO
OJQWA
OJZSN
OWPYF
O~Y
P2P
PAFKI
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RNI
ROL
ROX
ROZ
RUSNO
RW1
RXO
RZO
SC5
TAE
TJP
TN5
VH1
VOH
WH7
WHG
X7H
XJT
XOL
XSW
YAYTL
YKOAZ
YXANX
ZKX
ZY4
~91
AAYXX
CITATION
ID FETCH-LOGICAL-c228t-b169667bd58f71d314be6d5b1bde26ec81e50ec0222d9c0d2c8deb6984cb32603
ISSN 0010-4620
IngestDate Tue Jul 01 02:55:11 EDT 2025
Mon Jun 30 08:34:52 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords imbalance data
adaptive spatial weight
classification
oversampling
Language English
License This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)
https://academic.oup.com/pages/standard-publication-reuse-rights
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c228t-b169667bd58f71d314be6d5b1bde26ec81e50ec0222d9c0d2c8deb6984cb32603
PageCount 16
ParticipantIDs crossref_primary_10_1093_comjnl_bxad098
oup_primary_10_1093_comjnl_bxad098
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-06-22
PublicationDateYYYYMMDD 2024-06-22
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-06-22
  day: 22
PublicationDecade 2020
PublicationTitle Computer journal
PublicationYear 2024
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References He (2024062312365471300_ref12) 2008
Batista (2024062312365471300_ref21) 2004; 6
Radwan (2024062312365471300_ref26) 2017
Han (2024062312365471300_ref11) 2005
Tao (2024062312365471300_ref10) 2021; 234
Thabtah (2024062312365471300_ref7) 2020; 513
Ma (2024062312365471300_ref18) 2017; 18
Liu (2024062312365471300_ref5) 2021; 106
Sáez (2024062312365471300_ref25) 2015; 291
Koziarski (2024062312365471300_ref27) 2017; 27
Barua (2024062312365471300_ref15) 2012; 26
Kaur (2024062312365471300_ref1) 2019
Ijaz (2024062312365471300_ref19) 2018; 8
Kovács (2024062312365471300_ref30) 2019; 83
Fernández (2024062312365471300_ref8) 2018; 61
Bispo (2024062312365471300_ref24) 2018
Haixiang (2024062312365471300_ref6) 2017; 73
Pedregosa (2024062312365471300_ref33) 2011; 12
Vasighizaker (2024062312365471300_ref3) 2018; 76
Guan (2024062312365471300_ref22) 2021; 51
Li (2024062312365471300_ref28) 2021; 228
Chawla (2024062312365471300_ref9) 2002; 16
Tang (2024062312365471300_ref14) 2015
Douzas (2024062312365471300_ref20) 2018; 465
Jurgovsky (2024062312365471300_ref4) 2018; 100
Pruengkarn (2024062312365471300_ref16) 2017
Alcalá-Fdez (2024062312365471300_ref34) 2011; 17
Ramentol (2024062312365471300_ref23) 2012; 33
Hosmer (2024062312365471300_ref32) 2013
Lin (2024062312365471300_ref2) 2020
Cortes (2024062312365471300_ref31) 1995; 20
Tao (2024062312365471300_ref17) 2020; 519
Barua (2024062312365471300_ref13) 2013
Gazzah (2024062312365471300_ref29) 2008
References_xml – start-page: 67
  year: 2017
  ident: 2024062312365471300_ref16
  article-title: Multiclass imbalanced classification using fuzzy c-mean and smote with fuzzy support vector machine
– volume: 83
  year: 2019
  ident: 2024062312365471300_ref30
  article-title: An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets
– volume: 20
  start-page: 273
  year: 1995
  ident: 2024062312365471300_ref31
  article-title: Support-vector networks
– volume: 234
  start-page: 107588
  year: 2021
  ident: 2024062312365471300_ref10
  article-title: Svdd boundary and dpc clustering technique-based oversampling approach for handling imbalanced and overlapped data
– volume: 26
  start-page: 405
  year: 2012
  ident: 2024062312365471300_ref15
  article-title: Mwmote–majority weighted minority oversampling technique for imbalanced data set learning
– start-page: 399
  year: 2017
  ident: 2024062312365471300_ref26
  article-title: Enhancing prediction on imbalance data by thresholding technique with noise filtering
– start-page: 878
  year: 2005
  ident: 2024062312365471300_ref11
  article-title: Borderline-smote: a new over-sampling method in imbalanced data sets learning
– year: 2013
  ident: 2024062312365471300_ref32
  article-title: Applied Logistic Regression, 3
  doi: 10.1002/9781118548387
– volume: 106
  year: 2021
  ident: 2024062312365471300_ref5
  article-title: A fast network intrusion detection system using adaptive synthetic oversampling and lightgbm
– volume: 8
  start-page: 1325
  year: 2018
  ident: 2024062312365471300_ref19
  article-title: Hybrid prediction model for type 2 diabetes and hypertension using dbscan-based outlier detection, synthetic minority over sampling technique (smote), and random forest
– volume: 73
  start-page: 220
  year: 2017
  ident: 2024062312365471300_ref6
  article-title: Learning from class-imbalanced data: review of methods and applications
– volume: 291
  start-page: 184
  year: 2015
  ident: 2024062312365471300_ref25
  article-title: Smote–ipf: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering
– volume: 76
  start-page: 23
  year: 2018
  ident: 2024062312365471300_ref3
  article-title: C-pugp: a cluster-based positive unlabeled learning method for disease gene prediction and prioritization
– start-page: 552
  year: 2018
  ident: 2024062312365471300_ref24
  article-title: Instance selection and class balancing techniques for cross project defect prediction
– volume: 51
  start-page: 1394
  year: 2021
  ident: 2024062312365471300_ref22
  article-title: Smote-wenn: solving class imbalance and small sample problems by oversampling and distance scaling
– volume: 12
  start-page: 2825
  year: 2011
  ident: 2024062312365471300_ref33
  article-title: Scikit-learn: machine learning in python
– volume: 513
  start-page: 429
  year: 2020
  ident: 2024062312365471300_ref7
  article-title: Data imbalance in classification: experimental evaluation
– volume: 18
  start-page: 1
  year: 2017
  ident: 2024062312365471300_ref18
  article-title: Cure-smote algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests
– volume: 33
  start-page: 245
  year: 2012
  ident: 2024062312365471300_ref23
  article-title: Smote-rsb${^\ast }$: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory
– start-page: 664
  year: 2015
  ident: 2024062312365471300_ref14
  article-title: Kerneladasyn: Kernel based adaptive synthetic data generation for imbalanced learning
– volume: 228
  start-page: 107269
  year: 2021
  ident: 2024062312365471300_ref28
  article-title: Sp-smote: a novel space partitioning based synthetic minority oversampling technique
– volume: 6
  start-page: 20
  year: 2004
  ident: 2024062312365471300_ref21
  article-title: A study of the behavior of several methods for balancing machine learning training data
– volume: 17
  year: 2011
  ident: 2024062312365471300_ref34
  article-title: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework
– volume: 61
  start-page: 863
  year: 2018
  ident: 2024062312365471300_ref8
  article-title: Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary
– volume: 27
  start-page: 727
  year: 2017
  ident: 2024062312365471300_ref27
  article-title: Ccr: a combined cleaning and resampling algorithm for imbalanced data classification
– year: 2019
  ident: 2024062312365471300_ref1
  article-title: A systematic review on imbalanced data challenges in machine learning: Applications and solutions
– start-page: 320
  year: 2020
  ident: 2024062312365471300_ref2
  article-title: Text classification feature extraction method based on deep learning for unbalanced data sets
– volume: 16
  start-page: 321
  year: 2002
  ident: 2024062312365471300_ref9
  article-title: Smote: synthetic minority over-sampling technique
– start-page: 317
  year: 2013
  ident: 2024062312365471300_ref13
  article-title: Prowsyn: Proximity weighted synthetic oversampling technique for imbalanced data set learning
– volume: 465
  start-page: 1
  year: 2018
  ident: 2024062312365471300_ref20
  article-title: Improving imbalanced learning through a heuristic oversampling method based on k-means and smote
– volume: 519
  start-page: 43
  year: 2020
  ident: 2024062312365471300_ref17
  article-title: Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering
– volume: 100
  start-page: 234
  year: 2018
  ident: 2024062312365471300_ref4
  article-title: Sequence classification for credit-card fraud detection
– start-page: 1322
  year: 2008
  ident: 2024062312365471300_ref12
  article-title: Adasyn: Adaptive synthetic sampling approach for imbalanced learning
– start-page: 677
  year: 2008
  ident: 2024062312365471300_ref29
  article-title: New oversampling approaches based on polynomial fitting for imbalanced data sets
SSID ssj0002096
Score 2.3704062
Snippet The problem of data imbalance is common in reality, which greatly affects the performance of classifiers. Most of the solutions are to balance the data set by...
SourceID crossref
oup
SourceType Index Database
Publisher
StartPage 1747
Title SPAW-SMOTE: Space Partitioning Adaptive Weighted Synthetic Minority Oversampling Technique For Imbalanced Data Set Learning
Volume 67
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLbK9sILd8QYIAsh8VCFJU7ipLxVbNNAKwM10_pW-ZapE6RTlUkD_gm_lnNix_VgEoOXKLWcozTn0_G5fD4m5JWuDY81ZxGvjY4yAQGrZLyOFB9hVQl5r1jRnXzkB8fZh1k-Gwx-Bqyli1a-Ud-v3VfyP1qFMdAr7pL9B816oTAA96BfuIKG4XojHU8_jU-i6eSo2sPAfgrhrwGXcGX7D3X5Di3OO27QSZcBRefyWwMuH3ZpnSyaJZ5cNzxCZoZAZjk8UfmervvL1fD9V4nUR-QI7IpWgGVp-46sp6Fb258NMQxfO0xHfwYUnq7zrnY_SHMWjh52vIJKLP2Azc3OFgIMtJvn8hMsQx4VC1OWWG7POLPVF2PNbMbjCBvHh3bY_VqEle7OqELQVAQLdOLs9x_G3zbGAnWeNV_gRl4KHdsjrq_22f5t_fOsRFuPT-dWwtw9f4tsMghBwIZujncnh1O_zrO4O_3N_z_fEjTdsRJ2nIQrLg9uoww8mOoeueNCDzq2OLpPBqZ5QO72qqPOyj8kP9aweks7UNEQVLQHFe1BRT2oaA8qGoKKelBRABVdg4oiqCiAivagekSO9_eqdweRO6MjUoyVbSQTDgFzIXVe1kWi0ySThutcJlIbxo0qE5PHRmFaQY9UrJkqtZF8VGZKQuQQp4_JRrNszBNCdVEocFBryY3JVF6IUkuQLGKYzkWab5HX_Wecn9tWLPPrVbZFXsJX_sukpzcWt01ur4H9jGy0qwvzHFzRVr5wmPgFW8yQ-A
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SPAW-SMOTE%3A+Space+Partitioning+Adaptive+Weighted+Synthetic+Minority+Oversampling+Technique+For+Imbalanced+Data+Set+Learning&rft.jtitle=Computer+journal&rft.au=Zhang%2C+Qiang&rft.au=He%2C+Junjiang&rft.au=Li%2C+Tao&rft.au=Lan%2C+Xiaolong&rft.date=2024-06-22&rft.issn=0010-4620&rft.eissn=1460-2067&rft.volume=67&rft.issue=5&rft.spage=1747&rft.epage=1762&rft_id=info:doi/10.1093%2Fcomjnl%2Fbxad098&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_comjnl_bxad098
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0010-4620&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0010-4620&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0010-4620&client=summon