Global and Local Information in Clustering Labeled Block Models
The stochastic block model is a classical cluster exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intracluster edge probability p, and intercluster edge prob...
Saved in:
Published in | IEEE transactions on information theory Vol. 62; no. 10; pp. 5906 - 5917 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.10.2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The stochastic block model is a classical cluster exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intracluster edge probability p, and intercluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore, and Zdeborová, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman, and Sly (2012), and more recently, the positive direction was independently proved by Massoulié and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information, efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information). |
---|---|
AbstractList | The stochastic block model is a classical cluster exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intracluster edge probability p, and intercluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore, and Zdeborová, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman, and Sly (2012), and more recently, the positive direction was independently proved by Massoulié and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information, efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information). The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intracluster edge probability p, and intercluster edge probability q. We focus on the sparse case, i.e., p,q=O(1/n) , which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore, and Zdeborova, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman, and Sly (2012), and more recently, the positive direction was independently proved by Massoulie and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information, efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information). |
Author | Schramm, Tselil Kanade, Varun Mossel, Elchanan |
Author_xml | – sequence: 1 givenname: Varun surname: Kanade fullname: Kanade, Varun email: varun.kanade@cs.ox.ac.uk organization: Univ. of California at Berkeley, Berkeley, CA, USA – sequence: 2 givenname: Elchanan surname: Mossel fullname: Mossel, Elchanan email: mossel@stat.berkeley.edu organization: Univ. of California at Berkeley, Berkeley, CA, USA – sequence: 3 givenname: Tselil surname: Schramm fullname: Schramm, Tselil email: tschramm@cs.berkeley.edu organization: Univ. of California at Berkeley, Berkeley, CA, USA |
BookMark | eNo9kM1LAzEQxYNUsFbvgpeA5635TvYkWrQWVrzsPSSbrGzdJjXZHvzvTWnxNDPw3jze7xrMQgwegDuMlhij-rHdtEuCsFgSjgUX7ALMMeeyqgVnMzBHCKuqZkxdgeuct-VkHJM5eFqP0ZoRmuBgE7uybUIf085MQwxwCHA1HvLk0xC-YGOsH72DL2PsvuFHdH7MN-CyN2P2t-e5AO3ba7t6r5rP9Wb13FQdpXSqJKaGdagTxhBEuRDWOsk7xbh1rLbcSd_XihFHhXfWWsEpMwIbUhOLHaEL8HB6u0_x5-DzpLfxkEJJ1FiRUkVJrIoKnVRdijkn3-t9GnYm_WqM9JGSLpT0kZI-UyqW-5Nl8N7_yyWVqkTTP7c6Y5Q |
CODEN | IETTAW |
CitedBy_id | crossref_primary_10_1214_23_AOP1665 crossref_primary_10_1109_JSTSP_2018_2834874 crossref_primary_10_1109_TIT_2020_3030764 crossref_primary_10_1093_imaiai_iaae008 crossref_primary_10_1109_TNSE_2017_2758201 crossref_primary_10_1109_TSP_2017_2786266 crossref_primary_10_3390_sym13112060 crossref_primary_10_1109_TIT_2023_3316795 |
Cites_doi | 10.1145/1014052.1014062 10.1214/aoap/1019487349 10.1145/2591796.2591857 10.1088/1742-5468/2012/12/P12021 10.1073/pnas.0907096106 10.1214/aoap/1060202828 10.1016/0196-6774(89)90001-1 10.1017/S0963548309990514 10.1103/PhysRevLett.107.065701 10.1109/SFCS.2001.959929 10.1214/aoap/998926994 10.1090/dimacs/063/12 10.1209/0295-5075/90/18002 10.1002/1098-2418(200103)18:2<116::AID-RSA1001>3.0.CO;2-2 10.1007/s003579900004 10.1103/PhysRevE.84.066106 10.1145/1536414.1536493 10.1007/978-3-642-65371-1 10.1103/PhysRevE.90.052802 10.1016/0378-8733(83)90021-7 10.1145/2554797.2554831 10.1016/S0166-218X(97)00133-9 10.1016/j.ejc.2011.03.008 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Oct 2016 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Oct 2016 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TIT.2016.2516564 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library Online CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Physics Computer Science Statistics |
EISSN | 1557-9654 |
EndPage | 5917 |
ExternalDocumentID | 4199274531 10_1109_TIT_2016_2516564 7378292 |
Genre | orig-research Feature |
GrantInformation_xml | – fundername: Simons Foundation and Fondation Sciences Mathématiques de Paris funderid: 10.13039/100000893 – fundername: National Science Foundation through the Division of Mathematical Sciences grantid: DMS 1106999 funderid: 10.13039/100000121 – fundername: Berkeley Chancellor’s Fellowship funderid: 10.13039/100000082 – fundername: National Science Foundation through the Division of Computing and Communication Foundations grantid: CCF 1320105 funderid: 10.13039/100000143 – fundername: Office of Naval Research through the Simons Foundation grantid: N000141110140; 328025 funderid: 10.13039/100000893 – fundername: National Science Foundation within the Division of Graduate Education through the Graduate Research Fellowship Program grantid: DGE 1106400 funderid: 10.13039/100000082 |
GroupedDBID | -~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AASAJ AAYOK ABFSI ABQJQ ABVLG ACGFO ACGFS ACGOD ACIWK AENEX AETEA AETIX AI. AIBXA AKJIK ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RIG RNS RXW TAE TN5 VH1 VJK XFK AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c333t-713a4c0c6aa203566bbd75c845bd49b5d7ef9842d36edbbb6534a61a292b1d23 |
IEDL.DBID | RIE |
ISSN | 0018-9448 |
IngestDate | Thu Oct 10 19:52:48 EDT 2024 Thu Sep 26 16:54:22 EDT 2024 Wed Jun 26 19:22:14 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 10 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c333t-713a4c0c6aa203566bbd75c845bd49b5d7ef9842d36edbbb6534a61a292b1d23 |
OpenAccessLink | https://drops.dagstuhl.de/opus/volltexte/2014/4738/pdf/55.pdf |
PQID | 1824518718 |
PQPubID | 36024 |
PageCount | 12 |
ParticipantIDs | crossref_primary_10_1109_TIT_2016_2516564 proquest_journals_1824518718 ieee_primary_7378292 |
PublicationCentury | 2000 |
PublicationDate | 2016-Oct. 2016-10-00 20161001 |
PublicationDateYYYYMMDD | 2016-10-01 |
PublicationDate_xml | – month: 10 year: 2016 text: 2016-Oct. |
PublicationDecade | 2010 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on information theory |
PublicationTitleAbbrev | TIT |
PublicationYear | 2016 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref14 ref30 ref11 ref10 mossel (ref12) 2012 mossel (ref26) 2004 mossel (ref25) 2003; 13 ref2 ref1 hatami (ref21) 2012 ref17 ref19 levin (ref31) 2006 mossel (ref13) 2013 ref23 ref20 chapelle (ref15) 2002 ref22 basu (ref16) 2002; 2 ref28 ref27 ref29 ref8 ref7 ref9 ref4 ver steeg (ref18) 2013 ref3 ref6 ref5 mossel (ref24) 2013 |
References_xml | – ident: ref17 doi: 10.1145/1014052.1014062 – ident: ref29 doi: 10.1214/aoap/1019487349 – ident: ref14 doi: 10.1145/2591796.2591857 – year: 2013 ident: ref24 article-title: Belief propagation, robust reconstruction, and optimal recovery of block models contributor: fullname: mossel – ident: ref9 doi: 10.1088/1742-5468/2012/12/P12021 – year: 2013 ident: ref13 article-title: A proof of the block model threshold conjecture contributor: fullname: mossel – ident: ref3 doi: 10.1073/pnas.0907096106 – volume: 13 start-page: 817 year: 2003 ident: ref25 article-title: Information flow on trees publication-title: Ann Appl Probab doi: 10.1214/aoap/1060202828 contributor: fullname: mossel – ident: ref4 doi: 10.1016/0196-6774(89)90001-1 – ident: ref11 doi: 10.1017/S0963548309990514 – ident: ref10 doi: 10.1103/PhysRevLett.107.065701 – year: 2013 ident: ref18 publication-title: Phase Transitions in Community Detection A Solvable Toy Model contributor: fullname: ver steeg – ident: ref7 doi: 10.1109/SFCS.2001.959929 – ident: ref27 doi: 10.1214/aoap/998926994 – year: 2004 ident: ref26 article-title: Survey: Information flow on trees doi: 10.1090/dimacs/063/12 contributor: fullname: mossel – ident: ref19 doi: 10.1209/0295-5075/90/18002 – ident: ref6 doi: 10.1002/1098-2418(200103)18:2<116::AID-RSA1001>3.0.CO;2-2 – ident: ref2 doi: 10.1007/s003579900004 – ident: ref8 doi: 10.1103/PhysRevE.84.066106 – volume: 2 start-page: 27 year: 2002 ident: ref16 article-title: Semi-supervised clustering by seeding publication-title: Proc 19th ICML contributor: fullname: basu – ident: ref30 doi: 10.1145/1536414.1536493 – ident: ref28 doi: 10.1007/978-3-642-65371-1 – ident: ref23 doi: 10.1103/PhysRevE.90.052802 – ident: ref1 doi: 10.1016/0378-8733(83)90021-7 – year: 2012 ident: ref12 article-title: Stochastic block models and reconstruction contributor: fullname: mossel – ident: ref22 doi: 10.1145/2554797.2554831 – year: 2012 ident: ref21 article-title: Limits of local-global convergent graph sequences contributor: fullname: hatami – start-page: 585 year: 2002 ident: ref15 article-title: Cluster kernels for semi-supervised learning publication-title: Proc NIPS contributor: fullname: chapelle – ident: ref5 doi: 10.1016/S0166-218X(97)00133-9 – ident: ref20 doi: 10.1016/j.ejc.2011.03.008 – year: 2006 ident: ref31 contributor: fullname: levin |
SSID | ssj0014512 |
Score | 2.4107075 |
Snippet | The stochastic block model is a classical cluster exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In... The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 5906 |
SubjectTerms | blockmodels Clustering Clustering algorithms Computational modeling Computer science Context Physics Probability Statistics Stochastic models Stochastic processes |
Title | Global and Local Information in Clustering Labeled Block Models |
URI | https://ieeexplore.ieee.org/document/7378292 https://www.proquest.com/docview/1824518718 |
Volume | 62 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4giYkeREEjiqYHLyYubGm3S09GiQQNeMKE26YvEgNZjMDFX-90H8TXwdsetk23M-03384L4KrLBePS0MBRKgOuZyZQQsXIeQRSNxEynTXtGz-L4Qt_mkbTCtxsc2Gcc1nwmWv7x8yXb5dm43-VdWKGeCbxwt2JpcxztbYeAx7RvDI4xQOMnKN0SYayM3mc-Bgu0UYsR_OFf4OgrKfKr4s4Q5dBDcbluvKgknl7s9Zt8_GjZON_F34IB4WZSe5yvTiCikvrUCtbOJDiRNdh_0s9wjrsZvGgZtWA27wXAFGpJSMPd6TIW_JyJK8p6S82vsQCDiMjpRG7LLlHXJwT31xtsTqGyeBh0h8GRa-FwDDG1gFyVcVNaIRS3ZChjae1jSPT45G2XOrIxm4me7xrmXBWay0ixpWgCr9KU9tlJ1BNl6k7BaKEUJG10knDuZUzzeNoRr1xZigqTNiE63L3k7e8okaSMZFQJiipxEsqKSTVhIbfzO17xT42oVWKKymO3CpBooQ6gPyvd_b3qHPY83PnkXgtqK7fN-4CLYq1vsxU6RM6F8aU |
link.rule.ids | 315,783,787,799,27936,27937,55086 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV05T8MwFH6qQAgYOAqIcnpgQSIlrh2nnhBUVC20TEFii3xVQq1SRNuFX89zjoprYMsQy47fs7_35V0AFy0uGJeGBo5SGXA9MoESKkbOI5C6iZDpvGnf8En0nvnDS_RSg6tlLoxzLg8-c03_mPvy7dQs_K-y65ghnkm8cFfRrm6LIltr6TPgES1qg1M8wsg6KqdkKK-TfuKjuEQT0RwNGP4NhPKuKr-u4hxfutswrFZWhJWMm4u5bpqPH0Ub_7v0HdgqDU1yW2jGLtRcVoftqokDKc90HTa_VCSsw1oeEWpme3BTdAMgKrNk4AGPlJlLXpLkNSOdycIXWcBhZKA0opcld4iMY-Lbq01m-5B075NOLyi7LQSGMTYPkK0qbkIjlGqFDK08rW0cmTaPtOVSRzZ2I9nmLcuEs1prETGuBFX4VZraFjuAlWyauUMgSggVWSudNJxbOdI8jkbUm2eGosqEDbisdj99K2pqpDkXCWWKkkq9pNJSUg3Y85u5fK_cxwacVOJKy0M3S5EqoQ4gA2wf_T3qHNZ7yXCQDvpPj8ew4ecp4vJOYGX-vnCnaF_M9VmuVp_I3cnf |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Global+and+Local+Information+in+Clustering+Labeled+Block+Models&rft.jtitle=IEEE+transactions+on+information+theory&rft.au=Kanade%2C+Varun&rft.au=Mossel%2C+Elchanan&rft.au=Schramm%2C+Tselil&rft.date=2016-10-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9448&rft.eissn=1557-9654&rft.volume=62&rft.issue=10&rft.spage=5906&rft_id=info:doi/10.1109%2FTIT.2016.2516564&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=4199274531 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9448&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9448&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9448&client=summon |