Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification

This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are corr...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on signal processing Vol. 66; no. 19; pp. 5167 - 5179
Main Authors	Berberidis, Dimitris, Giannakis, Georgios B.
Format	Journal Article
Language	English
Published	New York IEEE 01.10.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Active learning Adaptive sampling Classification Combinatorial analysis Confidence Correlation Covariance matrices expected change graph-based Graphical representations Labels Laplace equations Markov chains Minimization Nodes Numerical models Optimality criteria Optimization Predictive models Retraining Sampling methods Training
Online Access	Get full text

Cover

Loading…

Abstract	This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values-an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime.
AbstractList	This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values-an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime. This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values—an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the [Formula Omitted]optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime.
Author	Berberidis, Dimitris Giannakis, Georgios B.
Author_xml	– sequence: 1 givenname: Dimitris orcidid: 0000-0003-3563-6052 surname: Berberidis fullname: Berberidis, Dimitris email: bermp001@umn.edu organization: Department of Electronic and Communication Engineering and Digital Technology Center, University of Minnesota, Minneapolis, MN, USA – sequence: 2 givenname: Georgios B. orcidid: 0000-0002-0196-0260 surname: Giannakis fullname: Giannakis, Georgios B. email: georgios@umn.edu organization: Department of Electronic and Communication Engineering and Digital Technology Center, University of Minnesota, Minneapolis, MN, USA
BookMark	eNo9kMFLwzAUxoNMcJveBS8Fz515SZqmx1G3qQwUtoO3kKbJzNjamnSC_vVmbnj63uP7vvfgN0KDpm0MQreAJwC4eFiv3iYEg5gQwbkAcoGGUDBIMcv5IM44o2km8vcrNAphizEwVvAhenlUvUqntep692WSqf6Tldp3O9dsEtv6ZGat0840fbLwqvtIy3bTuB8V93KnQnDRVb1rm2t0adUumJuzjtF6PluXT-nydfFcTpepppT2qbAFpZYWPOdFZjnmGmquoAbOrRIkgxyMLkhVE6FJlYlK6dpyXTFirBGUjtH96Wzn28-DCb3ctgffxI-SQCxzTASLKXxKad-G4I2VnXd75b8lYHkEJiMweQQmz8Bi5e5UccaY_7hgLOPR_QWgWmh_
CODEN	ITPRED
CitedBy_id	crossref_primary_10_1109_TNNLS_2020_3009682 crossref_primary_10_1016_j_laa_2019_09_031 crossref_primary_10_1016_j_eswa_2024_123903
Cites_doi	10.1007/s00521-014-1643-8 10.1145/2487575.2487641 10.3115/1613715.1613855 10.1109/TSP.2017.2664039 10.1214/ss/1177009939 10.1109/GlobalSIP.2016.7906056 10.1007/978-3-540-88269-5_17 10.1109/ICASSP.2016.7472892 10.1109/CVPR.2015.7298808 10.1109/CVPR.2012.6248050 10.1103/PhysRevE.78.046110 10.1109/ICASSP.2016.7472864 10.1371/journal.pcbi.1000498 10.1613/jair.295 10.1109/TIT.2011.2162269 10.2200/S00429ED1V01Y201207AIM018 10.1109/ICDM.2012.72 10.1007/978-0-387-88146-1 10.1007/s10115-012-0507-8 10.1007/978-3-319-10593-2_37
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/TSP.2018.2866812
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1941-0476
EndPage	5179
ExternalDocumentID	10_1109_TSP_2018_2866812 8445612
Genre	orig-research
GrantInformation_xml	– fundername: National Science Foundation grantid: 1711471; 1500713; 1442686 funderid: 10.13039/501100008982
GroupedDBID	-~X .DC 0R~ 29I 4.4 5GY 6IK 85S 97E AAJGR AASAJ ABQJQ ABVLG ACGFO ACIWK ACNCT AENEX AJQPL AKJIK ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI MS~ O9- OCL P2P RIA RIE RIG RNS TAE TN5 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c333t-8f933f3967695f606c1d6a1d166fa825171ec92bd28c2b58bacdf6cb42efe833
IEDL.DBID	RIE
ISSN	1053-587X
IngestDate	Fri Sep 13 02:42:57 EDT 2024 Fri Aug 23 01:20:48 EDT 2024 Wed Jun 26 19:27:45 EDT 2024
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	19
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c333t-8f933f3967695f606c1d6a1d166fa825171ec92bd28c2b58bacdf6cb42efe833
ORCID	0000-0002-0196-0260 0000-0003-3563-6052
OpenAccessLink	https://doi.org/10.1109/tsp.2018.2866812
PQID	2117160284
PQPubID	85478
PageCount	13
ParticipantIDs	crossref_primary_10_1109_TSP_2018_2866812 proquest_journals_2117160284 ieee_primary_8445612
PublicationCentury	2000
PublicationDate	2018-10-01
PublicationDateYYYYMMDD	2018-10-01
PublicationDate_xml	– month: 10 year: 2018 text: 2018-10-01 day: 01
PublicationDecade	2010
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on signal processing
PublicationTitleAbbrev	TSP
PublicationYear	2018
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref15 ref14 ref31 ref30 ref11 zhu (ref13) 0 (ref37) 0 cesa-bianchi (ref18) 2013 ref2 ref1 ref17 ref16 ref19 (ref33) 0 yang (ref36) 0 hauser (ref23) 0 ref24 ghahramani (ref29) 2002 ji (ref10) 0 ref25 krause (ref28) 2008; 9 kay (ref26) 1993 ref21 (ref34) 0 ref27 (ref32) 0 ref8 ref7 zhu (ref9) 0 ref4 ref3 ref6 ref5 kipf (ref35) 0 ma (ref12) 0 fujii (ref20) 0 pfeiffer iii (ref22) 0
References_xml	– ident: ref15 doi: 10.1007/s00521-014-1643-8 – ident: ref19 doi: 10.1145/2487575.2487641 – start-page: 2751 year: 0 ident: ref12 article-title: $\sigma \text{-}$optimality for active learning on Gaussian random fields publication-title: Proc 26th Int Conf Adv Neural Inf Process Syst contributor: fullname: ma – ident: ref4 doi: 10.3115/1613715.1613855 – ident: ref8 doi: 10.1109/TSP.2017.2664039 – year: 2013 ident: ref18 article-title: Active learning on trees and graphs contributor: fullname: cesa-bianchi – year: 0 ident: ref33 – start-page: 215 year: 0 ident: ref13 article-title: Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions publication-title: Proc Int Conf Mach Learn contributor: fullname: zhu – ident: ref6 doi: 10.1214/ss/1177009939 – year: 0 ident: ref34 – year: 1993 ident: ref26 publication-title: Fundamentals of Statistical Signal Processing Vol I Estimation Theory contributor: fullname: kay – ident: ref16 doi: 10.1109/GlobalSIP.2016.7906056 – year: 0 ident: ref9 article-title: Semi-supervised learning using Gaussian fields and harmonic functions publication-title: Proc Int Conf Mach Learn contributor: fullname: zhu – start-page: 123 year: 0 ident: ref23 article-title: Two optimal strategies for active learning of causal models from interventions publication-title: Proc of Europ Work on Prob Graph Models contributor: fullname: hauser – year: 2002 ident: ref29 article-title: Learning from labeled and unlabeled data with label propagation contributor: fullname: ghahramani – ident: ref14 doi: 10.1007/978-3-540-88269-5_17 – ident: ref7 doi: 10.1109/ICASSP.2016.7472892 – ident: ref21 doi: 10.1109/CVPR.2015.7298808 – ident: ref24 doi: 10.1109/CVPR.2012.6248050 – year: 0 ident: ref32 – year: 0 ident: ref10 article-title: A variance minimization criterion to active learning on graphs publication-title: Proc Int Conf Artif Intell Statist contributor: fullname: ji – ident: ref31 doi: 10.1103/PhysRevE.78.046110 – year: 0 ident: ref22 article-title: Active sampling of networks publication-title: Proc Int Workshop Mining Learn Graphs contributor: fullname: pfeiffer iii – volume: 9 start-page: 235 year: 2008 ident: ref28 article-title: Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies publication-title: J Mach Learn Res contributor: fullname: krause – ident: ref17 doi: 10.1109/ICASSP.2016.7472864 – ident: ref1 doi: 10.1371/journal.pcbi.1000498 – ident: ref5 doi: 10.1613/jair.295 – start-page: 514 year: 0 ident: ref20 article-title: Budgeted stream-based active learning via adaptive submodular maximization publication-title: Proc 30th Int Conf Neural Inf Process Syst contributor: fullname: fujii – ident: ref2 doi: 10.1109/TIT.2011.2162269 – ident: ref30 doi: 10.2200/S00429ED1V01Y201207AIM018 – ident: ref11 doi: 10.1109/ICDM.2012.72 – start-page: 40 year: 0 ident: ref36 article-title: Revisiting semi-supervised learning with graph embeddings publication-title: Proc 33rd Int Conf Int Conf Mach Learn contributor: fullname: yang – year: 0 ident: ref35 article-title: Semi-supervised classification with graph convolutional networks publication-title: Proc Int Conf Learn Represent contributor: fullname: kipf – year: 0 ident: ref37 – ident: ref27 doi: 10.1007/978-0-387-88146-1 – ident: ref3 doi: 10.1007/s10115-012-0507-8 – ident: ref25 doi: 10.1007/978-3-319-10593-2_37
SSID	ssj0014496
Score	2.387117
Snippet	This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Publisher
StartPage	5167
SubjectTerms	Active learning Adaptive sampling Classification Combinatorial analysis Confidence Correlation Covariance matrices expected change graph-based Graphical representations Labels Laplace equations Markov chains Minimization Nodes Numerical models Optimality criteria Optimization Predictive models Retraining Sampling methods Training
Title	Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification
URI	https://ieeexplore.ieee.org/document/8445612 https://www.proquest.com/docview/2117160284/abstract/
Volume	66
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VTjDwKohCQRlYkHCa2E7qjFVpqSoVIbVI3SI_GZDaCtKlvx7bSSpeA1uGxLLus--R--4O4JZJJhhLIiQ0NshqP4UyqhPEjKFSCW6IH9M5fUrHL3SySBYNuN_VwmitPflMh-7R5_LVSm7cr7Iuo87cW4W7xyJc1mrtMgaU-llc1l0gKGG9RZ2SjLLufPbsOFwsxCx17ba-mSA_U-WXIvbWZXQE03pfJankLdwUIpTbHy0b_7vxYzis3MygX56LE2jo5SkcfGk-2ILJAy846iu-dhov6HvFF8y4o5gvXwPrzAZD31_CLh08ur7WaOC4RluLReBnaTqWkQf2DOaj4XwwRtVkBSQJIYVFIiPEkMzxWxNjYxgZq5THKk5Tw10xay_WMsNCYSaxSJjgUplUCoq10YyQc2guV0t9AQHlnBohhcx0j4oEC5NqJWIjjY6MjR3bcFfLOl-X_TNyH3dEWW5xyR0ueYVLG1pOdLv3Kqm1oVODk1cX7CO3cauN9KxzRC___uoK9t3aJe-uA83ifaOvrf9QiBt_cD4BFyXDug
link.rule.ids	315,786,790,802,27957,27958,55109
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4QPagHX2hEUXvwYuJC292W7ZEgiAjEBEy4Nfv0YFKIlgu_3t1tIb4O3npot5v5tvPofDMDcEMF5ZRGPuIq1MhoP4kSoiJEtSZCcqaxG9M5Gsf9FzKYRbMK3G1qYZRSjnymGvbS5fLlXCztr7ImJdbcG4W7bey8nxTVWpucASFuGpdxGDCKaGu2Tkr6SXM6ebYsLtoIaWwbbn0zQm6qyi9V7OxL7wBG650VtJK3xjLnDbH60bTxv1s_hP3S0fTaxck4gorKjmHvS_vBKgzuWc5QW7KF1Xle26k-b8IsyTx79Yw763VdhwmztPdgO1ujjmUbrQwanpumaXlGDtoTmPa6004flbMVkMAY5waLBGONE8twjbSJYkQgYxbIII41s-WsrUCJJOQypCLkEeVMSB0LTkKlFcX4FLayeabOwCOMEc0FF4lqER6FXMdK8kALrXxtosca3K5lnS6KDhqpizz8JDW4pBaXtMSlBlUrus19pdRqUF-Dk5af2EdqIlcT6xn3iJz__dQ17PSno2E6fBw_XcCufU_BwqvDVv6-VJfGm8j5lTtEn5e2xxA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data-Adaptive+Active+Sampling+for+Efficient+Graph-Cognizant+Classification&rft.jtitle=IEEE+transactions+on+signal+processing&rft.au=Berberidis%2C+Dimitris&rft.au=Giannakis%2C+Georgios+B.&rft.date=2018-10-01&rft.issn=1053-587X&rft.eissn=1941-0476&rft.volume=66&rft.issue=19&rft.spage=5167&rft.epage=5179&rft_id=info:doi/10.1109%2FTSP.2018.2866812&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TSP_2018_2866812
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1053-587X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1053-587X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1053-587X&client=summon