Clustering the imbalanced datasets using modified Kohonen self-organizing map (KSOM)
The distribution of data plays an important role in determining the successfulness of learning process in machine learning. Data sets with imbalanced distribution may lead to biased results, especially in clustering. If the data is insufficient, the clustering will not be able to cluster and this wi...
Saved in:
Published in | 2017 Computing Conference : 18-20 July 2017 pp. 751 - 755 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.07.2017
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/SAI.2017.8252180 |
Cover
Abstract | The distribution of data plays an important role in determining the successfulness of learning process in machine learning. Data sets with imbalanced distribution may lead to biased results, especially in clustering. If the data is insufficient, the clustering will not be able to cluster and this will add randomness to the grouping. Therefore, the KSOM algorithm is modified to improve the clustering process. This modification is done based on the exploration and exploitation procedures in Ant Clustering Algorithm (ACA). To investigate the effectiveness of the modified algorithm, three imbalanced data sets are chosen; glass, Wisconsin diagnostic breast cancer and tropical wood data set. From the result, the modified KSOM has able to produce accurate number of clusters, reduce the number of overlapped cluster and slightly improve the percentage of accuracy. |
---|---|
AbstractList | The distribution of data plays an important role in determining the successfulness of learning process in machine learning. Data sets with imbalanced distribution may lead to biased results, especially in clustering. If the data is insufficient, the clustering will not be able to cluster and this will add randomness to the grouping. Therefore, the KSOM algorithm is modified to improve the clustering process. This modification is done based on the exploration and exploitation procedures in Ant Clustering Algorithm (ACA). To investigate the effectiveness of the modified algorithm, three imbalanced data sets are chosen; glass, Wisconsin diagnostic breast cancer and tropical wood data set. From the result, the modified KSOM has able to produce accurate number of clusters, reduce the number of overlapped cluster and slightly improve the percentage of accuracy. |
Author | Rosli, Nenny Ruthfalydia Yusoff, Rubiyah Ahmad, Azlin Ismail, Mohd Najib |
Author_xml | – sequence: 1 givenname: Azlin surname: Ahmad fullname: Ahmad, Azlin email: azlin@tmsk.uitm.edu.my organization: Fac. of Comput. & Math. Sci., Univ. Teknol. MARA, Shah Alam, Malaysia – sequence: 2 givenname: Mohd Najib surname: Ismail fullname: Ismail, Mohd Najib email: najib.ismail@apu.edu.my organization: Asia Pacific Univ. of Technol. & Innovation, Kuala Lumpur, Malaysia – sequence: 3 givenname: Rubiyah surname: Yusoff fullname: Yusoff, Rubiyah email: rubiyah.kl@utm.my organization: Malaysia-Japan Int. Inst. of Technol., Univ. Teknol. Malaysia, Kuala Lumpur, Malaysia – sequence: 4 givenname: Nenny Ruthfalydia surname: Rosli fullname: Rosli, Nenny Ruthfalydia email: nenny.kl@utm.my organization: Malaysia-Japan Int. Inst. of Technol., Univ. Teknol. Malaysia, Kuala Lumpur, Malaysia |
BookMark | eNotTz1PwzAUNBJIQMmOxOIRhgQ_O46dsYooVC3q0DJXTv3cGiVOFacD_HoCdDrp7nQft-QydAEJuQeWAbDyeT2dZ5yByjSXHDS7IEmpNEhWMpnnQl6TJMZPxhiUhRYF3JBN1ZzigL0PezockPq2No0JO7TUmsFEHCI9xV-17ax3fuQX3WFsDTRi49Ku35vgv_8M5kgfF-vV-9MduXKmiZiccUI-Zi-b6i1drl7n1XSZelBySEFbs5MowXLGteQOcNzMAdAoBAfagBKQCxDCWl6WXBUur0HYHAuojRMT8vCf6xFxe-x9a_qv7fm7-AHCLE_y |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/SAI.2017.8252180 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9781509054435 150905443X |
EndPage | 755 |
ExternalDocumentID | 8252180 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ABLEC ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK OCL RIE RIL |
ID | FETCH-LOGICAL-i175t-18dac5e51d202852f1e180211ea7e1f18a173143133dd299276f4b13d4e61baf3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:51:19 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i175t-18dac5e51d202852f1e180211ea7e1f18a173143133dd299276f4b13d4e61baf3 |
PageCount | 5 |
ParticipantIDs | ieee_primary_8252180 |
PublicationCentury | 2000 |
PublicationDate | 2017-July |
PublicationDateYYYYMMDD | 2017-07-01 |
PublicationDate_xml | – month: 07 year: 2017 text: 2017-July |
PublicationDecade | 2010 |
PublicationTitle | 2017 Computing Conference : 18-20 July 2017 |
PublicationTitleAbbrev | SAI |
PublicationYear | 2017 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0001968361 |
Score | 1.7176205 |
Snippet | The distribution of data plays an important role in determining the successfulness of learning process in machine learning. Data sets with imbalanced... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 751 |
SubjectTerms | Cancer clustering Clustering algorithms Data mining Data visualization Feature extraction Glass imbalanced data set Kohonen Self Organizing map (KSOM) Neural Network Self-organizing feature maps |
Title | Clustering the imbalanced datasets using modified Kohonen self-organizing map (KSOM) |
URI | https://ieeexplore.ieee.org/document/8252180 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6QkydUMP5ODx40cWMd3bodDZGoBDUBEm6kW1-VCBuR7cJf7-s2IBoP3pamTbt2y_e91--9R8i1EuBxaWoAOsqzOMSBJYXDLSWZy8APpSjyzA5e_Mcxf554kxq528bCAEAhPgPbPBZ3-SqNc-Mqa6M1g4iEBvoefmZlrNbOnxL6Qcdnm5tIJ2wP75-MdEvY1bAf9VMK-Og1yGAzcaka-bTzLLLj9a-cjP9d2QFp7QL16NsWgg5JDZIj0thUaqDVj9sko-48NxkRsBNFxkdni8hIGmNQ1EhEV5CtqFHAv9NFqmYaaSntpx9pAgldwVxbZe2nddFBLulNf_g6uG2Rce9h1H20qnoK1gxJQmaxQMnYA48pF1mF52oGJv8bYyAFMM0CyUQH-ROarUohTLnC1zxiHcXBZ5HUnWNST3DqE0KDWDHu6sD45Xio48iNlUZgw_dFDiicU9I0mzRdlikzptX-nP3dfE72zUGVKtgLUs--crhErM-iq-KQvwHnx6kQ |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4IHvSECsbf9uBBEzfo1q3jaIgE5IcmQMKNdOurEmEQGRf-el-3gdF48LYsXbq1Wb7vvX7ve4TcKgEel6YHYE15FocosKSocUtJ5jDw61KkPrO9vt8a8eexNy6Qh10tDACk4jOwzWV6lq8W0dqkyqoYzSAiYYC-h7jPvaxa6zujUvcD12fbs8havTp4bBvxlrDzB390UEkBpFkive3UmW7kw14noR1tfrky_vfdDknlu1SPvu5A6IgUID4mpW2vBpr_umUybMzWxhMBB1HkfHQ6D42oMQJFjUh0BcmKGg38G50v1FQjMaWdxfsihpiuYKatrPvTJh0gl_SuM3jp3VfIqPk0bLSsvKOCNUWakFgsUDLywGPKQV7hOZqBcYBjDKQAplkgmXCRQWHgqhQClSN8zUPmKg4-C6V2T0gxxqlPCQ0ixbijA5OZ43UdhU6kNEIbfi-yQFE7I2WzSJNlZpoxydfn_O_bN2S_Nex1J912v3NBDsymZZrYS1JMPtdwhcifhNfphn8B8bSsXQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+Computing+Conference+%3A+18-20+July+2017&rft.atitle=Clustering+the+imbalanced+datasets+using+modified+Kohonen+self-organizing+map+%28KSOM%29&rft.au=Ahmad%2C+Azlin&rft.au=Ismail%2C+Mohd+Najib&rft.au=Yusoff%2C+Rubiyah&rft.au=Rosli%2C+Nenny+Ruthfalydia&rft.date=2017-07-01&rft.pub=IEEE&rft.spage=751&rft.epage=755&rft_id=info:doi/10.1109%2FSAI.2017.8252180&rft.externalDocID=8252180 |