Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification

This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are corr...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on signal processing Vol. 66; no. 19; pp. 5167 - 5179
Main Authors Berberidis, Dimitris, Giannakis, Georgios B.
Format Journal Article
LanguageEnglish
Published New York IEEE 01.10.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values-an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime.
AbstractList This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values-an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime.
This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values—an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the [Formula Omitted]optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime.
Author Berberidis, Dimitris
Giannakis, Georgios B.
Author_xml – sequence: 1
  givenname: Dimitris
  orcidid: 0000-0003-3563-6052
  surname: Berberidis
  fullname: Berberidis, Dimitris
  email: bermp001@umn.edu
  organization: Department of Electronic and Communication Engineering and Digital Technology Center, University of Minnesota, Minneapolis, MN, USA
– sequence: 2
  givenname: Georgios B.
  orcidid: 0000-0002-0196-0260
  surname: Giannakis
  fullname: Giannakis, Georgios B.
  email: georgios@umn.edu
  organization: Department of Electronic and Communication Engineering and Digital Technology Center, University of Minnesota, Minneapolis, MN, USA
BookMark eNo9kMFLwzAUxoNMcJveBS8Fz515SZqmx1G3qQwUtoO3kKbJzNjamnSC_vVmbnj63uP7vvfgN0KDpm0MQreAJwC4eFiv3iYEg5gQwbkAcoGGUDBIMcv5IM44o2km8vcrNAphizEwVvAhenlUvUqntep692WSqf6Tldp3O9dsEtv6ZGat0840fbLwqvtIy3bTuB8V93KnQnDRVb1rm2t0adUumJuzjtF6PluXT-nydfFcTpepppT2qbAFpZYWPOdFZjnmGmquoAbOrRIkgxyMLkhVE6FJlYlK6dpyXTFirBGUjtH96Wzn28-DCb3ctgffxI-SQCxzTASLKXxKad-G4I2VnXd75b8lYHkEJiMweQQmz8Bi5e5UccaY_7hgLOPR_QWgWmh_
CODEN ITPRED
CitedBy_id crossref_primary_10_1109_TNNLS_2020_3009682
crossref_primary_10_1016_j_laa_2019_09_031
crossref_primary_10_1016_j_eswa_2024_123903
Cites_doi 10.1007/s00521-014-1643-8
10.1145/2487575.2487641
10.3115/1613715.1613855
10.1109/TSP.2017.2664039
10.1214/ss/1177009939
10.1109/GlobalSIP.2016.7906056
10.1007/978-3-540-88269-5_17
10.1109/ICASSP.2016.7472892
10.1109/CVPR.2015.7298808
10.1109/CVPR.2012.6248050
10.1103/PhysRevE.78.046110
10.1109/ICASSP.2016.7472864
10.1371/journal.pcbi.1000498
10.1613/jair.295
10.1109/TIT.2011.2162269
10.2200/S00429ED1V01Y201207AIM018
10.1109/ICDM.2012.72
10.1007/978-0-387-88146-1
10.1007/s10115-012-0507-8
10.1007/978-3-319-10593-2_37
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TSP.2018.2866812
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1941-0476
EndPage 5179
ExternalDocumentID 10_1109_TSP_2018_2866812
8445612
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  grantid: 1711471; 1500713; 1442686
  funderid: 10.13039/501100008982
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
6IK
85S
97E
AAJGR
AASAJ
ABQJQ
ABVLG
ACGFO
ACIWK
ACNCT
AENEX
AJQPL
AKJIK
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
EBS
EJD
F5P
HZ~
IFIPE
IPLJI
JAVBF
LAI
MS~
O9-
OCL
P2P
RIA
RIE
RIG
RNS
TAE
TN5
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c333t-8f933f3967695f606c1d6a1d166fa825171ec92bd28c2b58bacdf6cb42efe833
IEDL.DBID RIE
ISSN 1053-587X
IngestDate Fri Sep 13 02:42:57 EDT 2024
Fri Aug 23 01:20:48 EDT 2024
Wed Jun 26 19:27:45 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 19
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c333t-8f933f3967695f606c1d6a1d166fa825171ec92bd28c2b58bacdf6cb42efe833
ORCID 0000-0002-0196-0260
0000-0003-3563-6052
OpenAccessLink https://doi.org/10.1109/tsp.2018.2866812
PQID 2117160284
PQPubID 85478
PageCount 13
ParticipantIDs crossref_primary_10_1109_TSP_2018_2866812
proquest_journals_2117160284
ieee_primary_8445612
PublicationCentury 2000
PublicationDate 2018-10-01
PublicationDateYYYYMMDD 2018-10-01
PublicationDate_xml – month: 10
  year: 2018
  text: 2018-10-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on signal processing
PublicationTitleAbbrev TSP
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref15
ref14
ref31
ref30
ref11
zhu (ref13) 0
(ref37) 0
cesa-bianchi (ref18) 2013
ref2
ref1
ref17
ref16
ref19
(ref33) 0
yang (ref36) 0
hauser (ref23) 0
ref24
ghahramani (ref29) 2002
ji (ref10) 0
ref25
krause (ref28) 2008; 9
kay (ref26) 1993
ref21
(ref34) 0
ref27
(ref32) 0
ref8
ref7
zhu (ref9) 0
ref4
ref3
ref6
ref5
kipf (ref35) 0
ma (ref12) 0
fujii (ref20) 0
pfeiffer iii (ref22) 0
References_xml – ident: ref15
  doi: 10.1007/s00521-014-1643-8
– ident: ref19
  doi: 10.1145/2487575.2487641
– start-page: 2751
  year: 0
  ident: ref12
  article-title: $\sigma \text{-}$optimality for active learning on Gaussian random fields
  publication-title: Proc 26th Int Conf Adv Neural Inf Process Syst
  contributor:
    fullname: ma
– ident: ref4
  doi: 10.3115/1613715.1613855
– ident: ref8
  doi: 10.1109/TSP.2017.2664039
– year: 2013
  ident: ref18
  article-title: Active learning on trees and graphs
  contributor:
    fullname: cesa-bianchi
– year: 0
  ident: ref33
– start-page: 215
  year: 0
  ident: ref13
  article-title: Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions
  publication-title: Proc Int Conf Mach Learn
  contributor:
    fullname: zhu
– ident: ref6
  doi: 10.1214/ss/1177009939
– year: 0
  ident: ref34
– year: 1993
  ident: ref26
  publication-title: Fundamentals of Statistical Signal Processing Vol I Estimation Theory
  contributor:
    fullname: kay
– ident: ref16
  doi: 10.1109/GlobalSIP.2016.7906056
– year: 0
  ident: ref9
  article-title: Semi-supervised learning using Gaussian fields and harmonic functions
  publication-title: Proc Int Conf Mach Learn
  contributor:
    fullname: zhu
– start-page: 123
  year: 0
  ident: ref23
  article-title: Two optimal strategies for active learning of causal models from interventions
  publication-title: Proc of Europ Work on Prob Graph Models
  contributor:
    fullname: hauser
– year: 2002
  ident: ref29
  article-title: Learning from labeled and unlabeled data with label propagation
  contributor:
    fullname: ghahramani
– ident: ref14
  doi: 10.1007/978-3-540-88269-5_17
– ident: ref7
  doi: 10.1109/ICASSP.2016.7472892
– ident: ref21
  doi: 10.1109/CVPR.2015.7298808
– ident: ref24
  doi: 10.1109/CVPR.2012.6248050
– year: 0
  ident: ref32
– year: 0
  ident: ref10
  article-title: A variance minimization criterion to active learning on graphs
  publication-title: Proc Int Conf Artif Intell Statist
  contributor:
    fullname: ji
– ident: ref31
  doi: 10.1103/PhysRevE.78.046110
– year: 0
  ident: ref22
  article-title: Active sampling of networks
  publication-title: Proc Int Workshop Mining Learn Graphs
  contributor:
    fullname: pfeiffer iii
– volume: 9
  start-page: 235
  year: 2008
  ident: ref28
  article-title: Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies
  publication-title: J Mach Learn Res
  contributor:
    fullname: krause
– ident: ref17
  doi: 10.1109/ICASSP.2016.7472864
– ident: ref1
  doi: 10.1371/journal.pcbi.1000498
– ident: ref5
  doi: 10.1613/jair.295
– start-page: 514
  year: 0
  ident: ref20
  article-title: Budgeted stream-based active learning via adaptive submodular maximization
  publication-title: Proc 30th Int Conf Neural Inf Process Syst
  contributor:
    fullname: fujii
– ident: ref2
  doi: 10.1109/TIT.2011.2162269
– ident: ref30
  doi: 10.2200/S00429ED1V01Y201207AIM018
– ident: ref11
  doi: 10.1109/ICDM.2012.72
– start-page: 40
  year: 0
  ident: ref36
  article-title: Revisiting semi-supervised learning with graph embeddings
  publication-title: Proc 33rd Int Conf Int Conf Mach Learn
  contributor:
    fullname: yang
– year: 0
  ident: ref35
  article-title: Semi-supervised classification with graph convolutional networks
  publication-title: Proc Int Conf Learn Represent
  contributor:
    fullname: kipf
– year: 0
  ident: ref37
– ident: ref27
  doi: 10.1007/978-0-387-88146-1
– ident: ref3
  doi: 10.1007/s10115-012-0507-8
– ident: ref25
  doi: 10.1007/978-3-319-10593-2_37
SSID ssj0014496
Score 2.387117
Snippet This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 5167
SubjectTerms Active learning
Adaptive sampling
Classification
Combinatorial analysis
Confidence
Correlation
Covariance matrices
expected change
graph-based
Graphical representations
Labels
Laplace equations
Markov chains
Minimization
Nodes
Numerical models
Optimality criteria
Optimization
Predictive models
Retraining
Sampling methods
Training
Title Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification
URI https://ieeexplore.ieee.org/document/8445612
https://www.proquest.com/docview/2117160284/abstract/
Volume 66
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VTjDwKohCQRlYkHCa2E7qjFVpqSoVIbVI3SI_GZDaCtKlvx7bSSpeA1uGxLLus--R--4O4JZJJhhLIiQ0NshqP4UyqhPEjKFSCW6IH9M5fUrHL3SySBYNuN_VwmitPflMh-7R5_LVSm7cr7Iuo87cW4W7xyJc1mrtMgaU-llc1l0gKGG9RZ2SjLLufPbsOFwsxCx17ba-mSA_U-WXIvbWZXQE03pfJankLdwUIpTbHy0b_7vxYzis3MygX56LE2jo5SkcfGk-2ILJAy846iu-dhov6HvFF8y4o5gvXwPrzAZD31_CLh08ur7WaOC4RluLReBnaTqWkQf2DOaj4XwwRtVkBSQJIYVFIiPEkMzxWxNjYxgZq5THKk5Tw10xay_WMsNCYSaxSJjgUplUCoq10YyQc2guV0t9AQHlnBohhcx0j4oEC5NqJWIjjY6MjR3bcFfLOl-X_TNyH3dEWW5xyR0ueYVLG1pOdLv3Kqm1oVODk1cX7CO3cauN9KxzRC___uoK9t3aJe-uA83ifaOvrf9QiBt_cD4BFyXDug
link.rule.ids 315,786,790,802,27957,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4QPagHX2hEUXvwYuJC292W7ZEgiAjEBEy4Nfv0YFKIlgu_3t1tIb4O3npot5v5tvPofDMDcEMF5ZRGPuIq1MhoP4kSoiJEtSZCcqaxG9M5Gsf9FzKYRbMK3G1qYZRSjnymGvbS5fLlXCztr7ImJdbcG4W7bey8nxTVWpucASFuGpdxGDCKaGu2Tkr6SXM6ebYsLtoIaWwbbn0zQm6qyi9V7OxL7wBG650VtJK3xjLnDbH60bTxv1s_hP3S0fTaxck4gorKjmHvS_vBKgzuWc5QW7KF1Xle26k-b8IsyTx79Yw763VdhwmztPdgO1ujjmUbrQwanpumaXlGDtoTmPa6004flbMVkMAY5waLBGONE8twjbSJYkQgYxbIII41s-WsrUCJJOQypCLkEeVMSB0LTkKlFcX4FLayeabOwCOMEc0FF4lqER6FXMdK8kALrXxtosca3K5lnS6KDhqpizz8JDW4pBaXtMSlBlUrus19pdRqUF-Dk5af2EdqIlcT6xn3iJz__dQ17PSno2E6fBw_XcCufU_BwqvDVv6-VJfGm8j5lTtEn5e2xxA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data-Adaptive+Active+Sampling+for+Efficient+Graph-Cognizant+Classification&rft.jtitle=IEEE+transactions+on+signal+processing&rft.au=Berberidis%2C+Dimitris&rft.au=Giannakis%2C+Georgios+B.&rft.date=2018-10-01&rft.issn=1053-587X&rft.eissn=1941-0476&rft.volume=66&rft.issue=19&rft.spage=5167&rft.epage=5179&rft_id=info:doi/10.1109%2FTSP.2018.2866812&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TSP_2018_2866812
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1053-587X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1053-587X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1053-587X&client=summon