Cluster validity analysis using subsampling

Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this problem is estimating true number of clusters in a given dataset. There are a number of methods described in the literature to ha...

Full description

Saved in:
Bibliographic Details
Published inSMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483) Vol. 2; pp. 1435 - 1440 vol.2
Main Authors Abul, O., Lo, A., Alhajj, R., Polat, F., Barker, K.
Format Conference Proceeding
LanguageEnglish
Published IEEE 2003
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this problem is estimating true number of clusters in a given dataset. There are a number of methods described in the literature to handle both purposes. In this paper, we propose three methods for estimating confidence in the validity of clustering result. The first method validates clustering result by employing supervised classifiers. The dataset is divided into training and test sets and the accuracy of the classifier is evaluated on the test set. This method computes confidence in the generalization capability of clustering. The second method is based on the fact that if a clustering is valid then each of its subsets should be valid as well. The third method is similar to second method; it takes the dual approach, i.e., each cluster is expected to be stable and compact. Confidence is estimated by repeating the process a number of times on subsamples. Experimental results illustrate effectiveness of the proposed methods.
AbstractList Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this problem is estimating true number of clusters in a given dataset. There are a number of methods described in the literature to handle both purposes. In this paper, we propose three methods for estimating confidence in the validity of clustering result. The first method validates clustering result by employing supervised classifiers. The dataset is divided into training and test sets and the accuracy of the classifier is evaluated on the test set. This method computes confidence in the generalization capability of clustering. The second method is based on the fact that if a clustering is valid then each of its subsets should be valid as well. The third method is similar to second method; it takes the dual approach, i.e., each cluster is expected to be stable and compact. Confidence is estimated by repeating the process a number of times on subsamples. Experimental results illustrate effectiveness of the proposed methods.
Author Alhajj, R.
Lo, A.
Barker, K.
Abul, O.
Polat, F.
Author_xml – sequence: 1
  givenname: O.
  surname: Abul
  fullname: Abul, O.
  organization: Dept. Comput. Sci., Calgary Univ., Alta., Canada
– sequence: 2
  givenname: A.
  surname: Lo
  fullname: Lo, A.
  organization: Dept. Comput. Sci., Calgary Univ., Alta., Canada
– sequence: 3
  givenname: R.
  surname: Alhajj
  fullname: Alhajj, R.
  organization: Dept. Comput. Sci., Calgary Univ., Alta., Canada
– sequence: 4
  givenname: F.
  surname: Polat
  fullname: Polat, F.
– sequence: 5
  givenname: K.
  surname: Barker
  fullname: Barker, K.
BookMark eNotj01Lw0AURQetYFr7B3STvSS-N29mMrOU4Eeh4kIFd2WSvJGRNJZOKvTfG7Bw4Z7V5dy5mA0_AwtxjVAigrtb1W8vdSkBqESplEF1JjKpq6pAo_W5WLrKwhSqnJbVTGQIRhZOys9LMU_pG0CCQpuJ27o_pJH3-a_vYxfHY-4H3x9TTPkhxeErT4cm-e2un_hKXATfJ16eeiE-Hh_e6-di_fq0qu_XRUTQY6EpNG0Aj4o6JMud0cGaxhhm6wFZIspANhhF7KyDAC3xZNtSQ04FpIW4-d-NzLzZ7ePW74-b00_6A33URhY
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICSMC.2003.1244614
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Sciences (General)
Computer Science
EISSN 2577-1655
EndPage 1440 vol.2
ExternalDocumentID 1244614
GroupedDBID 29F
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
JC5
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i105t-53fbcf0a143d138ed65f86b66ee8a01e2112f38f643e9890f0c3e807c3b394f13
IEDL.DBID RIE
ISBN 9780780379527
0780379527
ISSN 1062-922X
IngestDate Wed Jun 26 19:20:43 EDT 2024
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i105t-53fbcf0a143d138ed65f86b66ee8a01e2112f38f643e9890f0c3e807c3b394f13
ParticipantIDs ieee_primary_1244614
PublicationCentury 2000
PublicationDate 20030000
PublicationDateYYYYMMDD 2003-01-01
PublicationDate_xml – year: 2003
  text: 20030000
PublicationDecade 2000
PublicationTitle SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483)
PublicationTitleAbbrev ICSMC
PublicationYear 2003
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020418
ssj0000454527
Score 1.5776298
Snippet Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis....
SourceID ieee
SourceType Publisher
StartPage 1435
SubjectTerms Clustering algorithms
Computer science
Humans
Organizing
Pattern analysis
Sampling methods
Stability analysis
Testing
Visualization
Title Cluster validity analysis using subsampling
URI https://ieeexplore.ieee.org/document/1244614
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT4NAEJ3UnvSi_TB-h4MHjS6lLCy7Z2JTTWpMtElvDbvsNo2GGgsXf727y5aq8eANNiTAMDBvhjdvAC6pzGOR4wzF2h2QjngUZUxxpGNPkqhISGG5OZNHMp5GD7N41oLbphdGSmnJZ9I3m_Zffr4SlSmVDUwsImZq9U7CWN2r1dRTjJRcHG6TrSAa1m1wJEQsDGc2ZacBTpg-yinvNPubbpqADe7T50lqdUJ9d7ofc1ds2Bntw2RzwTXb5NWvSu6Lz19ajv-9owPobxv8vKcmdHWgJYsu7G8mPHjuhe_C3je5wi503Prau3Jq1dc9uEnfKiO24GmXXeYa03uZ0znxDKd-4a31pykzvPVi0Yfp6O4lHSM3gQEtNe4qUYwVFyrINKjKh1g_VxIrSjghUtIsGEqdPYYKU6VhjWSUBSoQWGq7Cswxi9QQH0K7WBXyCLyMJxqrScoTrnM4njOSKCKZEqa5NqL4GHrGPPP3WmRj7ixz8vfyKexaVp2thZxBu_yo5LlGByW_sG7xBdjCsLE
link.rule.ids 310,311,783,787,792,793,799,4059,4060,27939,55088
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT4NAEJ00elAvtR_Gbzl40CgUWFh2z41Nq21jYpv01rDLbtNoqLFw8de7u1CqxoM32JAAw8C8Gd68AbgmIgl5gmI7VO5gq4hH7JhKZqvYE0Uy4IIbbs5ojPvT4HEWzmpwX_XCCCEM-Uw4etP8y09WPNelso6ORVhPrd4NNa4ourWqiooWkwv9bbrlBl7RCId9m_r-zCTtxEURVUeV2jvV_qafxqWdQfdl1DVKoU55wh-TV0zg6dVhtLnkgm_y6uQZc_jnLzXH_97TIbS3LX7WcxW8GlATaRPqmxkPVvnKN-Hgm2BhExrl-tq6KfWqb1tw133LtdyCpZx2mShUb8Wl0omlWfULa60-TrFmrqeLNkx7D5Nu3y5nMNhLhbwyO0SScenGClYlHlJPFoeSYIaxECR2PaHyR18iIhWwEZRQV7ocCWVXjhiigfTQEeykq1QcgxWzSKE1QVjEVBbHEoojiQWVXLfXBgSdQEubZ_5eyGzMS8uc_r18BXv9yWg4Hw7GT2ewbzh2pjJyDjvZRy4uFFbI2KVxkS9SqrP-
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SMC%2703+Conference+Proceedings.+2003+IEEE+International+Conference+on+Systems%2C+Man+and+Cybernetics.+Conference+Theme+-+System+Security+and+Assurance+%28Cat.+No.03CH37483%29&rft.atitle=Cluster+validity+analysis+using+subsampling&rft.au=Abul%2C+O.&rft.au=Lo%2C+A.&rft.au=Alhajj%2C+R.&rft.au=Polat%2C+F.&rft.date=2003-01-01&rft.pub=IEEE&rft.isbn=9780780379527&rft.issn=1062-922X&rft.eissn=2577-1655&rft.volume=2&rft.spage=1435&rft.epage=1440+vol.2&rft_id=info:doi/10.1109%2FICSMC.2003.1244614&rft.externalDocID=1244614
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1062-922X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1062-922X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1062-922X&client=summon