Cluster validity analysis using subsampling
Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this problem is estimating true number of clusters in a given dataset. There are a number of methods described in the literature to ha...
Saved in:
Published in | SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483) Vol. 2; pp. 1435 - 1440 vol.2 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
2003
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this problem is estimating true number of clusters in a given dataset. There are a number of methods described in the literature to handle both purposes. In this paper, we propose three methods for estimating confidence in the validity of clustering result. The first method validates clustering result by employing supervised classifiers. The dataset is divided into training and test sets and the accuracy of the classifier is evaluated on the test set. This method computes confidence in the generalization capability of clustering. The second method is based on the fact that if a clustering is valid then each of its subsets should be valid as well. The third method is similar to second method; it takes the dual approach, i.e., each cluster is expected to be stable and compact. Confidence is estimated by repeating the process a number of times on subsamples. Experimental results illustrate effectiveness of the proposed methods. |
---|---|
AbstractList | Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis. Related to this problem is estimating true number of clusters in a given dataset. There are a number of methods described in the literature to handle both purposes. In this paper, we propose three methods for estimating confidence in the validity of clustering result. The first method validates clustering result by employing supervised classifiers. The dataset is divided into training and test sets and the accuracy of the classifier is evaluated on the test set. This method computes confidence in the generalization capability of clustering. The second method is based on the fact that if a clustering is valid then each of its subsets should be valid as well. The third method is similar to second method; it takes the dual approach, i.e., each cluster is expected to be stable and compact. Confidence is estimated by repeating the process a number of times on subsamples. Experimental results illustrate effectiveness of the proposed methods. |
Author | Alhajj, R. Lo, A. Barker, K. Abul, O. Polat, F. |
Author_xml | – sequence: 1 givenname: O. surname: Abul fullname: Abul, O. organization: Dept. Comput. Sci., Calgary Univ., Alta., Canada – sequence: 2 givenname: A. surname: Lo fullname: Lo, A. organization: Dept. Comput. Sci., Calgary Univ., Alta., Canada – sequence: 3 givenname: R. surname: Alhajj fullname: Alhajj, R. organization: Dept. Comput. Sci., Calgary Univ., Alta., Canada – sequence: 4 givenname: F. surname: Polat fullname: Polat, F. – sequence: 5 givenname: K. surname: Barker fullname: Barker, K. |
BookMark | eNotj01Lw0AURQetYFr7B3STvSS-N29mMrOU4Eeh4kIFd2WSvJGRNJZOKvTfG7Bw4Z7V5dy5mA0_AwtxjVAigrtb1W8vdSkBqESplEF1JjKpq6pAo_W5WLrKwhSqnJbVTGQIRhZOys9LMU_pG0CCQpuJ27o_pJH3-a_vYxfHY-4H3x9TTPkhxeErT4cm-e2un_hKXATfJ16eeiE-Hh_e6-di_fq0qu_XRUTQY6EpNG0Aj4o6JMud0cGaxhhm6wFZIspANhhF7KyDAC3xZNtSQ04FpIW4-d-NzLzZ7ePW74-b00_6A33URhY |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICSMC.2003.1244614 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Sciences (General) Computer Science |
EISSN | 2577-1655 |
EndPage | 1440 vol.2 |
ExternalDocumentID | 1244614 |
GroupedDBID | 29F 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI JC5 M43 OCL RIE RIL RIO RNS |
ID | FETCH-LOGICAL-i105t-53fbcf0a143d138ed65f86b66ee8a01e2112f38f643e9890f0c3e807c3b394f13 |
IEDL.DBID | RIE |
ISBN | 9780780379527 0780379527 |
ISSN | 1062-922X |
IngestDate | Wed Jun 26 19:20:43 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i105t-53fbcf0a143d138ed65f86b66ee8a01e2112f38f643e9890f0c3e807c3b394f13 |
ParticipantIDs | ieee_primary_1244614 |
PublicationCentury | 2000 |
PublicationDate | 20030000 |
PublicationDateYYYYMMDD | 2003-01-01 |
PublicationDate_xml | – year: 2003 text: 20030000 |
PublicationDecade | 2000 |
PublicationTitle | SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483) |
PublicationTitleAbbrev | ICSMC |
PublicationYear | 2003 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0020418 ssj0000454527 |
Score | 1.5776298 |
Snippet | Cluster validity investigates whether generated clusters are true clusters or due to chance. This is usually done based on subsampling stability analysis.... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1435 |
SubjectTerms | Clustering algorithms Computer science Humans Organizing Pattern analysis Sampling methods Stability analysis Testing Visualization |
Title | Cluster validity analysis using subsampling |
URI | https://ieeexplore.ieee.org/document/1244614 |
Volume | 2 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT4NAEJ3UnvSi_TB-h4MHjS6lLCy7Z2JTTWpMtElvDbvsNo2GGgsXf727y5aq8eANNiTAMDBvhjdvAC6pzGOR4wzF2h2QjngUZUxxpGNPkqhISGG5OZNHMp5GD7N41oLbphdGSmnJZ9I3m_Zffr4SlSmVDUwsImZq9U7CWN2r1dRTjJRcHG6TrSAa1m1wJEQsDGc2ZacBTpg-yinvNPubbpqADe7T50lqdUJ9d7ofc1ds2Bntw2RzwTXb5NWvSu6Lz19ajv-9owPobxv8vKcmdHWgJYsu7G8mPHjuhe_C3je5wi503Prau3Jq1dc9uEnfKiO24GmXXeYa03uZ0znxDKd-4a31pykzvPVi0Yfp6O4lHSM3gQEtNe4qUYwVFyrINKjKh1g_VxIrSjghUtIsGEqdPYYKU6VhjWSUBSoQWGq7Cswxi9QQH0K7WBXyCLyMJxqrScoTrnM4njOSKCKZEqa5NqL4GHrGPPP3WmRj7ixz8vfyKexaVp2thZxBu_yo5LlGByW_sG7xBdjCsLE |
link.rule.ids | 310,311,783,787,792,793,799,4059,4060,27939,55088 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT4NAEJ00elAvtR_Gbzl40CgUWFh2z41Nq21jYpv01rDLbtNoqLFw8de7u1CqxoM32JAAw8C8Gd68AbgmIgl5gmI7VO5gq4hH7JhKZqvYE0Uy4IIbbs5ojPvT4HEWzmpwX_XCCCEM-Uw4etP8y09WPNelso6ORVhPrd4NNa4ourWqiooWkwv9bbrlBl7RCId9m_r-zCTtxEURVUeV2jvV_qafxqWdQfdl1DVKoU55wh-TV0zg6dVhtLnkgm_y6uQZc_jnLzXH_97TIbS3LX7WcxW8GlATaRPqmxkPVvnKN-Hgm2BhExrl-tq6KfWqb1tw133LtdyCpZx2mShUb8Wl0omlWfULa60-TrFmrqeLNkx7D5Nu3y5nMNhLhbwyO0SScenGClYlHlJPFoeSYIaxECR2PaHyR18iIhWwEZRQV7ocCWVXjhiigfTQEeykq1QcgxWzSKE1QVjEVBbHEoojiQWVXLfXBgSdQEubZ_5eyGzMS8uc_r18BXv9yWg4Hw7GT2ewbzh2pjJyDjvZRy4uFFbI2KVxkS9SqrP- |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SMC%2703+Conference+Proceedings.+2003+IEEE+International+Conference+on+Systems%2C+Man+and+Cybernetics.+Conference+Theme+-+System+Security+and+Assurance+%28Cat.+No.03CH37483%29&rft.atitle=Cluster+validity+analysis+using+subsampling&rft.au=Abul%2C+O.&rft.au=Lo%2C+A.&rft.au=Alhajj%2C+R.&rft.au=Polat%2C+F.&rft.date=2003-01-01&rft.pub=IEEE&rft.isbn=9780780379527&rft.issn=1062-922X&rft.eissn=2577-1655&rft.volume=2&rft.spage=1435&rft.epage=1440+vol.2&rft_id=info:doi/10.1109%2FICSMC.2003.1244614&rft.externalDocID=1244614 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1062-922X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1062-922X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1062-922X&client=summon |