Global and Local Information in Clustering Labeled Block Models

The stochastic block model is a classical cluster exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intracluster edge probability p, and intercluster edge prob...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on information theory Vol. 62; no. 10; pp. 5906 - 5917
Main Authors Kanade, Varun, Mossel, Elchanan, Schramm, Tselil
Format Journal Article
LanguageEnglish
Published New York IEEE 01.10.2016
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The stochastic block model is a classical cluster exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intracluster edge probability p, and intercluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore, and Zdeborová, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman, and Sly (2012), and more recently, the positive direction was independently proved by Massoulié and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information, efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).
AbstractList The stochastic block model is a classical cluster exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intracluster edge probability p, and intercluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore, and Zdeborová, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman, and Sly (2012), and more recently, the positive direction was independently proved by Massoulié and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information, efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).
The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intracluster edge probability p, and intercluster edge probability q. We focus on the sparse case, i.e., p,q=O(1/n) , which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore, and Zdeborova, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman, and Sly (2012), and more recently, the positive direction was independently proved by Massoulie and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information, efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).
Author Schramm, Tselil
Kanade, Varun
Mossel, Elchanan
Author_xml – sequence: 1
  givenname: Varun
  surname: Kanade
  fullname: Kanade, Varun
  email: varun.kanade@cs.ox.ac.uk
  organization: Univ. of California at Berkeley, Berkeley, CA, USA
– sequence: 2
  givenname: Elchanan
  surname: Mossel
  fullname: Mossel, Elchanan
  email: mossel@stat.berkeley.edu
  organization: Univ. of California at Berkeley, Berkeley, CA, USA
– sequence: 3
  givenname: Tselil
  surname: Schramm
  fullname: Schramm, Tselil
  email: tschramm@cs.berkeley.edu
  organization: Univ. of California at Berkeley, Berkeley, CA, USA
BookMark eNo9kM1LAzEQxYNUsFbvgpeA5635TvYkWrQWVrzsPSSbrGzdJjXZHvzvTWnxNDPw3jze7xrMQgwegDuMlhij-rHdtEuCsFgSjgUX7ALMMeeyqgVnMzBHCKuqZkxdgeuct-VkHJM5eFqP0ZoRmuBgE7uybUIf085MQwxwCHA1HvLk0xC-YGOsH72DL2PsvuFHdH7MN-CyN2P2t-e5AO3ba7t6r5rP9Wb13FQdpXSqJKaGdagTxhBEuRDWOsk7xbh1rLbcSd_XihFHhXfWWsEpMwIbUhOLHaEL8HB6u0_x5-DzpLfxkEJJ1FiRUkVJrIoKnVRdijkn3-t9GnYm_WqM9JGSLpT0kZI-UyqW-5Nl8N7_yyWVqkTTP7c6Y5Q
CODEN IETTAW
CitedBy_id crossref_primary_10_1214_23_AOP1665
crossref_primary_10_1109_JSTSP_2018_2834874
crossref_primary_10_1109_TIT_2020_3030764
crossref_primary_10_1093_imaiai_iaae008
crossref_primary_10_1109_TNSE_2017_2758201
crossref_primary_10_1109_TSP_2017_2786266
crossref_primary_10_3390_sym13112060
crossref_primary_10_1109_TIT_2023_3316795
Cites_doi 10.1145/1014052.1014062
10.1214/aoap/1019487349
10.1145/2591796.2591857
10.1088/1742-5468/2012/12/P12021
10.1073/pnas.0907096106
10.1214/aoap/1060202828
10.1016/0196-6774(89)90001-1
10.1017/S0963548309990514
10.1103/PhysRevLett.107.065701
10.1109/SFCS.2001.959929
10.1214/aoap/998926994
10.1090/dimacs/063/12
10.1209/0295-5075/90/18002
10.1002/1098-2418(200103)18:2<116::AID-RSA1001>3.0.CO;2-2
10.1007/s003579900004
10.1103/PhysRevE.84.066106
10.1145/1536414.1536493
10.1007/978-3-642-65371-1
10.1103/PhysRevE.90.052802
10.1016/0378-8733(83)90021-7
10.1145/2554797.2554831
10.1016/S0166-218X(97)00133-9
10.1016/j.ejc.2011.03.008
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Oct 2016
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Oct 2016
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TIT.2016.2516564
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library Online
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Physics
Computer Science
Statistics
EISSN 1557-9654
EndPage 5917
ExternalDocumentID 4199274531
10_1109_TIT_2016_2516564
7378292
Genre orig-research
Feature
GrantInformation_xml – fundername: Simons Foundation and Fondation Sciences Mathématiques de Paris
  funderid: 10.13039/100000893
– fundername: National Science Foundation through the Division of Mathematical Sciences
  grantid: DMS 1106999
  funderid: 10.13039/100000121
– fundername: Berkeley Chancellor’s Fellowship
  funderid: 10.13039/100000082
– fundername: National Science Foundation through the Division of Computing and Communication Foundations
  grantid: CCF 1320105
  funderid: 10.13039/100000143
– fundername: Office of Naval Research through the Simons Foundation
  grantid: N000141110140; 328025
  funderid: 10.13039/100000893
– fundername: National Science Foundation within the Division of Graduate Education through the Graduate Research Fellowship Program
  grantid: DGE 1106400
  funderid: 10.13039/100000082
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AASAJ
AAYOK
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACGOD
ACIWK
AENEX
AETEA
AETIX
AI.
AIBXA
AKJIK
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RIG
RNS
RXW
TAE
TN5
VH1
VJK
XFK
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c333t-713a4c0c6aa203566bbd75c845bd49b5d7ef9842d36edbbb6534a61a292b1d23
IEDL.DBID RIE
ISSN 0018-9448
IngestDate Thu Oct 10 19:52:48 EDT 2024
Thu Sep 26 16:54:22 EDT 2024
Wed Jun 26 19:22:14 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 10
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c333t-713a4c0c6aa203566bbd75c845bd49b5d7ef9842d36edbbb6534a61a292b1d23
OpenAccessLink https://drops.dagstuhl.de/opus/volltexte/2014/4738/pdf/55.pdf
PQID 1824518718
PQPubID 36024
PageCount 12
ParticipantIDs crossref_primary_10_1109_TIT_2016_2516564
proquest_journals_1824518718
ieee_primary_7378292
PublicationCentury 2000
PublicationDate 2016-Oct.
2016-10-00
20161001
PublicationDateYYYYMMDD 2016-10-01
PublicationDate_xml – month: 10
  year: 2016
  text: 2016-Oct.
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on information theory
PublicationTitleAbbrev TIT
PublicationYear 2016
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref14
ref30
ref11
ref10
mossel (ref12) 2012
mossel (ref26) 2004
mossel (ref25) 2003; 13
ref2
ref1
hatami (ref21) 2012
ref17
ref19
levin (ref31) 2006
mossel (ref13) 2013
ref23
ref20
chapelle (ref15) 2002
ref22
basu (ref16) 2002; 2
ref28
ref27
ref29
ref8
ref7
ref9
ref4
ver steeg (ref18) 2013
ref3
ref6
ref5
mossel (ref24) 2013
References_xml – ident: ref17
  doi: 10.1145/1014052.1014062
– ident: ref29
  doi: 10.1214/aoap/1019487349
– ident: ref14
  doi: 10.1145/2591796.2591857
– year: 2013
  ident: ref24
  article-title: Belief propagation, robust reconstruction, and optimal recovery of block models
  contributor:
    fullname: mossel
– ident: ref9
  doi: 10.1088/1742-5468/2012/12/P12021
– year: 2013
  ident: ref13
  article-title: A proof of the block model threshold conjecture
  contributor:
    fullname: mossel
– ident: ref3
  doi: 10.1073/pnas.0907096106
– volume: 13
  start-page: 817
  year: 2003
  ident: ref25
  article-title: Information flow on trees
  publication-title: Ann Appl Probab
  doi: 10.1214/aoap/1060202828
  contributor:
    fullname: mossel
– ident: ref4
  doi: 10.1016/0196-6774(89)90001-1
– ident: ref11
  doi: 10.1017/S0963548309990514
– ident: ref10
  doi: 10.1103/PhysRevLett.107.065701
– year: 2013
  ident: ref18
  publication-title: Phase Transitions in Community Detection A Solvable Toy Model
  contributor:
    fullname: ver steeg
– ident: ref7
  doi: 10.1109/SFCS.2001.959929
– ident: ref27
  doi: 10.1214/aoap/998926994
– year: 2004
  ident: ref26
  article-title: Survey: Information flow on trees
  doi: 10.1090/dimacs/063/12
  contributor:
    fullname: mossel
– ident: ref19
  doi: 10.1209/0295-5075/90/18002
– ident: ref6
  doi: 10.1002/1098-2418(200103)18:2<116::AID-RSA1001>3.0.CO;2-2
– ident: ref2
  doi: 10.1007/s003579900004
– ident: ref8
  doi: 10.1103/PhysRevE.84.066106
– volume: 2
  start-page: 27
  year: 2002
  ident: ref16
  article-title: Semi-supervised clustering by seeding
  publication-title: Proc 19th ICML
  contributor:
    fullname: basu
– ident: ref30
  doi: 10.1145/1536414.1536493
– ident: ref28
  doi: 10.1007/978-3-642-65371-1
– ident: ref23
  doi: 10.1103/PhysRevE.90.052802
– ident: ref1
  doi: 10.1016/0378-8733(83)90021-7
– year: 2012
  ident: ref12
  article-title: Stochastic block models and reconstruction
  contributor:
    fullname: mossel
– ident: ref22
  doi: 10.1145/2554797.2554831
– year: 2012
  ident: ref21
  article-title: Limits of local-global convergent graph sequences
  contributor:
    fullname: hatami
– start-page: 585
  year: 2002
  ident: ref15
  article-title: Cluster kernels for semi-supervised learning
  publication-title: Proc NIPS
  contributor:
    fullname: chapelle
– ident: ref5
  doi: 10.1016/S0166-218X(97)00133-9
– ident: ref20
  doi: 10.1016/j.ejc.2011.03.008
– year: 2006
  ident: ref31
  contributor:
    fullname: levin
SSID ssj0014512
Score 2.4107075
Snippet The stochastic block model is a classical cluster exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In...
The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics, and computer science. In...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 5906
SubjectTerms blockmodels
Clustering
Clustering algorithms
Computational modeling
Computer science
Context
Physics
Probability
Statistics
Stochastic models
Stochastic processes
Title Global and Local Information in Clustering Labeled Block Models
URI https://ieeexplore.ieee.org/document/7378292
https://www.proquest.com/docview/1824518718
Volume 62
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4giYkeREEjiqYHLyYubGm3S09GiQQNeMKE26YvEgNZjMDFX-90H8TXwdsetk23M-03384L4KrLBePS0MBRKgOuZyZQQsXIeQRSNxEynTXtGz-L4Qt_mkbTCtxsc2Gcc1nwmWv7x8yXb5dm43-VdWKGeCbxwt2JpcxztbYeAx7RvDI4xQOMnKN0SYayM3mc-Bgu0UYsR_OFf4OgrKfKr4s4Q5dBDcbluvKgknl7s9Zt8_GjZON_F34IB4WZSe5yvTiCikvrUCtbOJDiRNdh_0s9wjrsZvGgZtWA27wXAFGpJSMPd6TIW_JyJK8p6S82vsQCDiMjpRG7LLlHXJwT31xtsTqGyeBh0h8GRa-FwDDG1gFyVcVNaIRS3ZChjae1jSPT45G2XOrIxm4me7xrmXBWay0ixpWgCr9KU9tlJ1BNl6k7BaKEUJG10knDuZUzzeNoRr1xZigqTNiE63L3k7e8okaSMZFQJiipxEsqKSTVhIbfzO17xT42oVWKKymO3CpBooQ6gPyvd_b3qHPY83PnkXgtqK7fN-4CLYq1vsxU6RM6F8aU
link.rule.ids 315,783,787,799,27936,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV05T8MwFH6qQAgYOAqIcnpgQSIlrh2nnhBUVC20TEFii3xVQq1SRNuFX89zjoprYMsQy47fs7_35V0AFy0uGJeGBo5SGXA9MoESKkbOI5C6iZDpvGnf8En0nvnDS_RSg6tlLoxzLg8-c03_mPvy7dQs_K-y65ghnkm8cFfRrm6LIltr6TPgES1qg1M8wsg6KqdkKK-TfuKjuEQT0RwNGP4NhPKuKr-u4hxfutswrFZWhJWMm4u5bpqPH0Ub_7v0HdgqDU1yW2jGLtRcVoftqokDKc90HTa_VCSsw1oeEWpme3BTdAMgKrNk4AGPlJlLXpLkNSOdycIXWcBhZKA0opcld4iMY-Lbq01m-5B075NOLyi7LQSGMTYPkK0qbkIjlGqFDK08rW0cmTaPtOVSRzZ2I9nmLcuEs1prETGuBFX4VZraFjuAlWyauUMgSggVWSudNJxbOdI8jkbUm2eGosqEDbisdj99K2pqpDkXCWWKkkq9pNJSUg3Y85u5fK_cxwacVOJKy0M3S5EqoQ4gA2wf_T3qHNZ7yXCQDvpPj8ew4ecp4vJOYGX-vnCnaF_M9VmuVp_I3cnf
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Global+and+Local+Information+in+Clustering+Labeled+Block+Models&rft.jtitle=IEEE+transactions+on+information+theory&rft.au=Kanade%2C+Varun&rft.au=Mossel%2C+Elchanan&rft.au=Schramm%2C+Tselil&rft.date=2016-10-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9448&rft.eissn=1557-9654&rft.volume=62&rft.issue=10&rft.spage=5906&rft_id=info:doi/10.1109%2FTIT.2016.2516564&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=4199274531
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9448&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9448&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9448&client=summon