Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification
This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are corr...
Saved in:
Published in | IEEE transactions on signal processing Vol. 66; no. 19; pp. 5167 - 5179 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.10.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values-an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime. |
---|---|
AbstractList | This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values-an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the Σ-optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime. This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using similarity measures among nodal features. Leveraging the graph for classification builds on the premise that labels across neighboring nodes are correlated according to a categorical Markov random field (MRF). This model is further relaxed to a Gaussian (G)MRF with labels taking continuous values—an approximation that not only mitigates the combinatorial complexity of the categorical model, but also offers optimal unbiased soft predictors of the unlabeled nodes. The proposed sampling strategy is based on querying the node whose label disclosure is expected to inflict the largest change on the GMRF, and in this sense it is the most informative on average. Connections are established to other sampling methods including uncertainty sampling, variance minimization, and sampling based on the [Formula Omitted]optimality criterion. A simple yet effective heuristic is also introduced for increasing the exploration capabilities of the sampler, and reducing bias of the resultant classifier, by adjusting the confidence on the model label predictions. The novel sampling strategies are based on quantities that are readily available without the need for model retraining, rendering them computationally efficient and scalable to large graphs. Numerical tests using synthetic and real data demonstrate that the proposed methods achieve accuracy that is comparable or superior to the state of the art even at reduced runtime. |
Author | Berberidis, Dimitris Giannakis, Georgios B. |
Author_xml | – sequence: 1 givenname: Dimitris orcidid: 0000-0003-3563-6052 surname: Berberidis fullname: Berberidis, Dimitris email: bermp001@umn.edu organization: Department of Electronic and Communication Engineering and Digital Technology Center, University of Minnesota, Minneapolis, MN, USA – sequence: 2 givenname: Georgios B. orcidid: 0000-0002-0196-0260 surname: Giannakis fullname: Giannakis, Georgios B. email: georgios@umn.edu organization: Department of Electronic and Communication Engineering and Digital Technology Center, University of Minnesota, Minneapolis, MN, USA |
BookMark | eNo9kMFLwzAUxoNMcJveBS8Fz515SZqmx1G3qQwUtoO3kKbJzNjamnSC_vVmbnj63uP7vvfgN0KDpm0MQreAJwC4eFiv3iYEg5gQwbkAcoGGUDBIMcv5IM44o2km8vcrNAphizEwVvAhenlUvUqntep692WSqf6Tldp3O9dsEtv6ZGat0840fbLwqvtIy3bTuB8V93KnQnDRVb1rm2t0adUumJuzjtF6PluXT-nydfFcTpepppT2qbAFpZYWPOdFZjnmGmquoAbOrRIkgxyMLkhVE6FJlYlK6dpyXTFirBGUjtH96Wzn28-DCb3ctgffxI-SQCxzTASLKXxKad-G4I2VnXd75b8lYHkEJiMweQQmz8Bi5e5UccaY_7hgLOPR_QWgWmh_ |
CODEN | ITPRED |
CitedBy_id | crossref_primary_10_1109_TNNLS_2020_3009682 crossref_primary_10_1016_j_laa_2019_09_031 crossref_primary_10_1016_j_eswa_2024_123903 |
Cites_doi | 10.1007/s00521-014-1643-8 10.1145/2487575.2487641 10.3115/1613715.1613855 10.1109/TSP.2017.2664039 10.1214/ss/1177009939 10.1109/GlobalSIP.2016.7906056 10.1007/978-3-540-88269-5_17 10.1109/ICASSP.2016.7472892 10.1109/CVPR.2015.7298808 10.1109/CVPR.2012.6248050 10.1103/PhysRevE.78.046110 10.1109/ICASSP.2016.7472864 10.1371/journal.pcbi.1000498 10.1613/jair.295 10.1109/TIT.2011.2162269 10.2200/S00429ED1V01Y201207AIM018 10.1109/ICDM.2012.72 10.1007/978-0-387-88146-1 10.1007/s10115-012-0507-8 10.1007/978-3-319-10593-2_37 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TSP.2018.2866812 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1941-0476 |
EndPage | 5179 |
ExternalDocumentID | 10_1109_TSP_2018_2866812 8445612 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Science Foundation grantid: 1711471; 1500713; 1442686 funderid: 10.13039/501100008982 |
GroupedDBID | -~X .DC 0R~ 29I 4.4 5GY 6IK 85S 97E AAJGR AASAJ ABQJQ ABVLG ACGFO ACIWK ACNCT AENEX AJQPL AKJIK ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI MS~ O9- OCL P2P RIA RIE RIG RNS TAE TN5 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c333t-8f933f3967695f606c1d6a1d166fa825171ec92bd28c2b58bacdf6cb42efe833 |
IEDL.DBID | RIE |
ISSN | 1053-587X |
IngestDate | Fri Sep 13 02:42:57 EDT 2024 Fri Aug 23 01:20:48 EDT 2024 Wed Jun 26 19:27:45 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 19 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c333t-8f933f3967695f606c1d6a1d166fa825171ec92bd28c2b58bacdf6cb42efe833 |
ORCID | 0000-0002-0196-0260 0000-0003-3563-6052 |
OpenAccessLink | https://doi.org/10.1109/tsp.2018.2866812 |
PQID | 2117160284 |
PQPubID | 85478 |
PageCount | 13 |
ParticipantIDs | crossref_primary_10_1109_TSP_2018_2866812 proquest_journals_2117160284 ieee_primary_8445612 |
PublicationCentury | 2000 |
PublicationDate | 2018-10-01 |
PublicationDateYYYYMMDD | 2018-10-01 |
PublicationDate_xml | – month: 10 year: 2018 text: 2018-10-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on signal processing |
PublicationTitleAbbrev | TSP |
PublicationYear | 2018 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref15 ref14 ref31 ref30 ref11 zhu (ref13) 0 (ref37) 0 cesa-bianchi (ref18) 2013 ref2 ref1 ref17 ref16 ref19 (ref33) 0 yang (ref36) 0 hauser (ref23) 0 ref24 ghahramani (ref29) 2002 ji (ref10) 0 ref25 krause (ref28) 2008; 9 kay (ref26) 1993 ref21 (ref34) 0 ref27 (ref32) 0 ref8 ref7 zhu (ref9) 0 ref4 ref3 ref6 ref5 kipf (ref35) 0 ma (ref12) 0 fujii (ref20) 0 pfeiffer iii (ref22) 0 |
References_xml | – ident: ref15 doi: 10.1007/s00521-014-1643-8 – ident: ref19 doi: 10.1145/2487575.2487641 – start-page: 2751 year: 0 ident: ref12 article-title: $\sigma \text{-}$optimality for active learning on Gaussian random fields publication-title: Proc 26th Int Conf Adv Neural Inf Process Syst contributor: fullname: ma – ident: ref4 doi: 10.3115/1613715.1613855 – ident: ref8 doi: 10.1109/TSP.2017.2664039 – year: 2013 ident: ref18 article-title: Active learning on trees and graphs contributor: fullname: cesa-bianchi – year: 0 ident: ref33 – start-page: 215 year: 0 ident: ref13 article-title: Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions publication-title: Proc Int Conf Mach Learn contributor: fullname: zhu – ident: ref6 doi: 10.1214/ss/1177009939 – year: 0 ident: ref34 – year: 1993 ident: ref26 publication-title: Fundamentals of Statistical Signal Processing Vol I Estimation Theory contributor: fullname: kay – ident: ref16 doi: 10.1109/GlobalSIP.2016.7906056 – year: 0 ident: ref9 article-title: Semi-supervised learning using Gaussian fields and harmonic functions publication-title: Proc Int Conf Mach Learn contributor: fullname: zhu – start-page: 123 year: 0 ident: ref23 article-title: Two optimal strategies for active learning of causal models from interventions publication-title: Proc of Europ Work on Prob Graph Models contributor: fullname: hauser – year: 2002 ident: ref29 article-title: Learning from labeled and unlabeled data with label propagation contributor: fullname: ghahramani – ident: ref14 doi: 10.1007/978-3-540-88269-5_17 – ident: ref7 doi: 10.1109/ICASSP.2016.7472892 – ident: ref21 doi: 10.1109/CVPR.2015.7298808 – ident: ref24 doi: 10.1109/CVPR.2012.6248050 – year: 0 ident: ref32 – year: 0 ident: ref10 article-title: A variance minimization criterion to active learning on graphs publication-title: Proc Int Conf Artif Intell Statist contributor: fullname: ji – ident: ref31 doi: 10.1103/PhysRevE.78.046110 – year: 0 ident: ref22 article-title: Active sampling of networks publication-title: Proc Int Workshop Mining Learn Graphs contributor: fullname: pfeiffer iii – volume: 9 start-page: 235 year: 2008 ident: ref28 article-title: Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies publication-title: J Mach Learn Res contributor: fullname: krause – ident: ref17 doi: 10.1109/ICASSP.2016.7472864 – ident: ref1 doi: 10.1371/journal.pcbi.1000498 – ident: ref5 doi: 10.1613/jair.295 – start-page: 514 year: 0 ident: ref20 article-title: Budgeted stream-based active learning via adaptive submodular maximization publication-title: Proc 30th Int Conf Neural Inf Process Syst contributor: fullname: fujii – ident: ref2 doi: 10.1109/TIT.2011.2162269 – ident: ref30 doi: 10.2200/S00429ED1V01Y201207AIM018 – ident: ref11 doi: 10.1109/ICDM.2012.72 – start-page: 40 year: 0 ident: ref36 article-title: Revisiting semi-supervised learning with graph embeddings publication-title: Proc 33rd Int Conf Int Conf Mach Learn contributor: fullname: yang – year: 0 ident: ref35 article-title: Semi-supervised classification with graph convolutional networks publication-title: Proc Int Conf Learn Represent contributor: fullname: kipf – year: 0 ident: ref37 – ident: ref27 doi: 10.1007/978-0-387-88146-1 – ident: ref3 doi: 10.1007/s10115-012-0507-8 – ident: ref25 doi: 10.1007/978-3-319-10593-2_37 |
SSID | ssj0014496 |
Score | 2.387117 |
Snippet | This paper deals with active sampling of graph nodes representing training data for binary classification. The graph may be given or constructed using... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 5167 |
SubjectTerms | Active learning Adaptive sampling Classification Combinatorial analysis Confidence Correlation Covariance matrices expected change graph-based Graphical representations Labels Laplace equations Markov chains Minimization Nodes Numerical models Optimality criteria Optimization Predictive models Retraining Sampling methods Training |
Title | Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification |
URI | https://ieeexplore.ieee.org/document/8445612 https://www.proquest.com/docview/2117160284/abstract/ |
Volume | 66 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VTjDwKohCQRlYkHCa2E7qjFVpqSoVIbVI3SI_GZDaCtKlvx7bSSpeA1uGxLLus--R--4O4JZJJhhLIiQ0NshqP4UyqhPEjKFSCW6IH9M5fUrHL3SySBYNuN_VwmitPflMh-7R5_LVSm7cr7Iuo87cW4W7xyJc1mrtMgaU-llc1l0gKGG9RZ2SjLLufPbsOFwsxCx17ba-mSA_U-WXIvbWZXQE03pfJankLdwUIpTbHy0b_7vxYzis3MygX56LE2jo5SkcfGk-2ILJAy846iu-dhov6HvFF8y4o5gvXwPrzAZD31_CLh08ur7WaOC4RluLReBnaTqWkQf2DOaj4XwwRtVkBSQJIYVFIiPEkMzxWxNjYxgZq5THKk5Tw10xay_WMsNCYSaxSJjgUplUCoq10YyQc2guV0t9AQHlnBohhcx0j4oEC5NqJWIjjY6MjR3bcFfLOl-X_TNyH3dEWW5xyR0ueYVLG1pOdLv3Kqm1oVODk1cX7CO3cauN9KxzRC___uoK9t3aJe-uA83ifaOvrf9QiBt_cD4BFyXDug |
link.rule.ids | 315,786,790,802,27957,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4QPagHX2hEUXvwYuJC292W7ZEgiAjEBEy4Nfv0YFKIlgu_3t1tIb4O3npot5v5tvPofDMDcEMF5ZRGPuIq1MhoP4kSoiJEtSZCcqaxG9M5Gsf9FzKYRbMK3G1qYZRSjnymGvbS5fLlXCztr7ImJdbcG4W7bey8nxTVWpucASFuGpdxGDCKaGu2Tkr6SXM6ebYsLtoIaWwbbn0zQm6qyi9V7OxL7wBG650VtJK3xjLnDbH60bTxv1s_hP3S0fTaxck4gorKjmHvS_vBKgzuWc5QW7KF1Xle26k-b8IsyTx79Yw763VdhwmztPdgO1ujjmUbrQwanpumaXlGDtoTmPa6004flbMVkMAY5waLBGONE8twjbSJYkQgYxbIII41s-WsrUCJJOQypCLkEeVMSB0LTkKlFcX4FLayeabOwCOMEc0FF4lqER6FXMdK8kALrXxtosca3K5lnS6KDhqpizz8JDW4pBaXtMSlBlUrus19pdRqUF-Dk5af2EdqIlcT6xn3iJz__dQ17PSno2E6fBw_XcCufU_BwqvDVv6-VJfGm8j5lTtEn5e2xxA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data-Adaptive+Active+Sampling+for+Efficient+Graph-Cognizant+Classification&rft.jtitle=IEEE+transactions+on+signal+processing&rft.au=Berberidis%2C+Dimitris&rft.au=Giannakis%2C+Georgios+B.&rft.date=2018-10-01&rft.issn=1053-587X&rft.eissn=1941-0476&rft.volume=66&rft.issue=19&rft.spage=5167&rft.epage=5179&rft_id=info:doi/10.1109%2FTSP.2018.2866812&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TSP_2018_2866812 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1053-587X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1053-587X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1053-587X&client=summon |