Clustering the imbalanced datasets using modified Kohonen self-organizing map (KSOM)

The distribution of data plays an important role in determining the successfulness of learning process in machine learning. Data sets with imbalanced distribution may lead to biased results, especially in clustering. If the data is insufficient, the clustering will not be able to cluster and this wi...

Full description

Saved in:
Bibliographic Details
Published in2017 Computing Conference : 18-20 July 2017 pp. 751 - 755
Main Authors Ahmad, Azlin, Ismail, Mohd Najib, Yusoff, Rubiyah, Rosli, Nenny Ruthfalydia
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2017
Subjects
Online AccessGet full text
DOI10.1109/SAI.2017.8252180

Cover

Abstract The distribution of data plays an important role in determining the successfulness of learning process in machine learning. Data sets with imbalanced distribution may lead to biased results, especially in clustering. If the data is insufficient, the clustering will not be able to cluster and this will add randomness to the grouping. Therefore, the KSOM algorithm is modified to improve the clustering process. This modification is done based on the exploration and exploitation procedures in Ant Clustering Algorithm (ACA). To investigate the effectiveness of the modified algorithm, three imbalanced data sets are chosen; glass, Wisconsin diagnostic breast cancer and tropical wood data set. From the result, the modified KSOM has able to produce accurate number of clusters, reduce the number of overlapped cluster and slightly improve the percentage of accuracy.
AbstractList The distribution of data plays an important role in determining the successfulness of learning process in machine learning. Data sets with imbalanced distribution may lead to biased results, especially in clustering. If the data is insufficient, the clustering will not be able to cluster and this will add randomness to the grouping. Therefore, the KSOM algorithm is modified to improve the clustering process. This modification is done based on the exploration and exploitation procedures in Ant Clustering Algorithm (ACA). To investigate the effectiveness of the modified algorithm, three imbalanced data sets are chosen; glass, Wisconsin diagnostic breast cancer and tropical wood data set. From the result, the modified KSOM has able to produce accurate number of clusters, reduce the number of overlapped cluster and slightly improve the percentage of accuracy.
Author Rosli, Nenny Ruthfalydia
Yusoff, Rubiyah
Ahmad, Azlin
Ismail, Mohd Najib
Author_xml – sequence: 1
  givenname: Azlin
  surname: Ahmad
  fullname: Ahmad, Azlin
  email: azlin@tmsk.uitm.edu.my
  organization: Fac. of Comput. & Math. Sci., Univ. Teknol. MARA, Shah Alam, Malaysia
– sequence: 2
  givenname: Mohd Najib
  surname: Ismail
  fullname: Ismail, Mohd Najib
  email: najib.ismail@apu.edu.my
  organization: Asia Pacific Univ. of Technol. & Innovation, Kuala Lumpur, Malaysia
– sequence: 3
  givenname: Rubiyah
  surname: Yusoff
  fullname: Yusoff, Rubiyah
  email: rubiyah.kl@utm.my
  organization: Malaysia-Japan Int. Inst. of Technol., Univ. Teknol. Malaysia, Kuala Lumpur, Malaysia
– sequence: 4
  givenname: Nenny Ruthfalydia
  surname: Rosli
  fullname: Rosli, Nenny Ruthfalydia
  email: nenny.kl@utm.my
  organization: Malaysia-Japan Int. Inst. of Technol., Univ. Teknol. Malaysia, Kuala Lumpur, Malaysia
BookMark eNotTz1PwzAUNBJIQMmOxOIRhgQ_O46dsYooVC3q0DJXTv3cGiVOFacD_HoCdDrp7nQft-QydAEJuQeWAbDyeT2dZ5yByjSXHDS7IEmpNEhWMpnnQl6TJMZPxhiUhRYF3JBN1ZzigL0PezockPq2No0JO7TUmsFEHCI9xV-17ax3fuQX3WFsDTRi49Ku35vgv_8M5kgfF-vV-9MduXKmiZiccUI-Zi-b6i1drl7n1XSZelBySEFbs5MowXLGteQOcNzMAdAoBAfagBKQCxDCWl6WXBUur0HYHAuojRMT8vCf6xFxe-x9a_qv7fm7-AHCLE_y
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SAI.2017.8252180
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781509054435
150905443X
EndPage 755
ExternalDocumentID 8252180
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i175t-18dac5e51d202852f1e180211ea7e1f18a173143133dd299276f4b13d4e61baf3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:51:19 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-18dac5e51d202852f1e180211ea7e1f18a173143133dd299276f4b13d4e61baf3
PageCount 5
ParticipantIDs ieee_primary_8252180
PublicationCentury 2000
PublicationDate 2017-July
PublicationDateYYYYMMDD 2017-07-01
PublicationDate_xml – month: 07
  year: 2017
  text: 2017-July
PublicationDecade 2010
PublicationTitle 2017 Computing Conference : 18-20 July 2017
PublicationTitleAbbrev SAI
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001968361
Score 1.7176205
Snippet The distribution of data plays an important role in determining the successfulness of learning process in machine learning. Data sets with imbalanced...
SourceID ieee
SourceType Publisher
StartPage 751
SubjectTerms Cancer
clustering
Clustering algorithms
Data mining
Data visualization
Feature extraction
Glass
imbalanced data set
Kohonen Self Organizing map (KSOM)
Neural Network
Self-organizing feature maps
Title Clustering the imbalanced datasets using modified Kohonen self-organizing map (KSOM)
URI https://ieeexplore.ieee.org/document/8252180
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6QkydUMP5ODx40cWMd3bodDZGoBDUBEm6kW1-VCBuR7cJf7-s2IBoP3pamTbt2y_e91--9R8i1EuBxaWoAOsqzOMSBJYXDLSWZy8APpSjyzA5e_Mcxf554kxq528bCAEAhPgPbPBZ3-SqNc-Mqa6M1g4iEBvoefmZlrNbOnxL6Qcdnm5tIJ2wP75-MdEvY1bAf9VMK-Og1yGAzcaka-bTzLLLj9a-cjP9d2QFp7QL16NsWgg5JDZIj0thUaqDVj9sko-48NxkRsBNFxkdni8hIGmNQ1EhEV5CtqFHAv9NFqmYaaSntpx9pAgldwVxbZe2nddFBLulNf_g6uG2Rce9h1H20qnoK1gxJQmaxQMnYA48pF1mF52oGJv8bYyAFMM0CyUQH-ROarUohTLnC1zxiHcXBZ5HUnWNST3DqE0KDWDHu6sD45Xio48iNlUZgw_dFDiicU9I0mzRdlikzptX-nP3dfE72zUGVKtgLUs--crhErM-iq-KQvwHnx6kQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4IHvSECsbf9uBBEzfo1q3jaIgE5IcmQMKNdOurEmEQGRf-el-3gdF48LYsXbq1Wb7vvX7ve4TcKgEel6YHYE15FocosKSocUtJ5jDw61KkPrO9vt8a8eexNy6Qh10tDACk4jOwzWV6lq8W0dqkyqoYzSAiYYC-h7jPvaxa6zujUvcD12fbs8havTp4bBvxlrDzB390UEkBpFkive3UmW7kw14noR1tfrky_vfdDknlu1SPvu5A6IgUID4mpW2vBpr_umUybMzWxhMBB1HkfHQ6D42oMQJFjUh0BcmKGg38G50v1FQjMaWdxfsihpiuYKatrPvTJh0gl_SuM3jp3VfIqPk0bLSsvKOCNUWakFgsUDLywGPKQV7hOZqBcYBjDKQAplkgmXCRQWHgqhQClSN8zUPmKg4-C6V2T0gxxqlPCQ0ixbijA5OZ43UdhU6kNEIbfi-yQFE7I2WzSJNlZpoxydfn_O_bN2S_Nex1J912v3NBDsymZZrYS1JMPtdwhcifhNfphn8B8bSsXQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+Computing+Conference+%3A+18-20+July+2017&rft.atitle=Clustering+the+imbalanced+datasets+using+modified+Kohonen+self-organizing+map+%28KSOM%29&rft.au=Ahmad%2C+Azlin&rft.au=Ismail%2C+Mohd+Najib&rft.au=Yusoff%2C+Rubiyah&rft.au=Rosli%2C+Nenny+Ruthfalydia&rft.date=2017-07-01&rft.pub=IEEE&rft.spage=751&rft.epage=755&rft_id=info:doi/10.1109%2FSAI.2017.8252180&rft.externalDocID=8252180