Class Prior Estimation from Positive and Unlabeled Data

We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully learned if the class prior is available. However, in practice, the class prior is unknown and thus must be estimated from data. In this paper...

Full description

Saved in:
Bibliographic Details
Published inIEICE Transactions on Information and Systems Vol. E97.D; no. 5; pp. 1358 - 1362
Main Authors PLESSIS, Marthinus Christoffel DU, SUGIYAMA, Masashi
Format Journal Article
LanguageEnglish
Published The Institute of Electronics, Information and Communication Engineers 2014
Subjects
Online AccessGet full text

Cover

Loading…
Abstract We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully learned if the class prior is available. However, in practice, the class prior is unknown and thus must be estimated from data. In this paper, we propose a new method to estimate the class prior by partially matching the class-conditional density of the positive class to the input density. By performing this partial matching in terms of the Pearson divergence, which we estimate directly without density estimation via lower-bound maximization, we can obtain an analytical estimator of the class prior. We further show that an existing class prior estimation method can also be interpreted as performing partial matching under the Pearson divergence, but in an indirect manner. The superiority of our direct class prior estimation method is illustrated on several benchmark datasets.
AbstractList We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully learned if the class prior is available. However, in practice, the class prior is unknown and thus must be estimated from data. In this paper, we propose a new method to estimate the class prior by partially matching the class-conditional density of the positive class to the input density. By performing this partial matching in terms of the Pearson divergence, which we estimate directly without density estimation via lower-bound maximization, we can obtain an analytical estimator of the class prior. We further show that an existing class prior estimation method can also be interpreted as performing partial matching under the Pearson divergence, but in an indirect manner. The superiority of our direct class prior estimation method is illustrated on several benchmark datasets.
Author SUGIYAMA, Masashi
PLESSIS, Marthinus Christoffel DU
Author_xml – sequence: 1
  fullname: PLESSIS, Marthinus Christoffel DU
  organization: Department of Computer Science, Tokyo Institute of Technology
– sequence: 2
  fullname: SUGIYAMA, Masashi
  organization: Department of Computer Science, Tokyo Institute of Technology
BookMark eNpdkLFOwzAQhi1UJNrCE7BkZEmwY1-cjKgtUKkSHehsXRIHXKV2sV0k3p5UhQqx3C3_95_um5CRdVYTcstoxqCU99GjDcZ22aKS2TxjHMoLMmZSQMp4wUZkTCtWpCXw_IpMQthSysqcwZjIWY8hJGtvnE8WIZodRuNs0nm3S9YumGg-dYK2TTa2x1r3uk3mGPGaXHbYB33zs6dk87h4nT2nq5en5exhlTZQ0JjKjkPdlqIQVOa1aFkjK6m5FNhWdQVIWyqoyEsYRlc1SAFBUA0V1Fi1OfApuTv17r37OOgQ1c6ERvc9Wu0OQbFCMhhOMTZE-SnaeBeC153a--Ed_6UYVUdN6leTGjSpuTpqGqjlidqGiG_6zKCPpun1fwb-sOdM845eacu_ARmweNM
CitedBy_id crossref_primary_10_1145_3575637_3575642
crossref_primary_10_1016_j_jisa_2024_103780
crossref_primary_10_1162_neco_a_01337
crossref_primary_10_1007_s44196_023_00373_9
crossref_primary_10_1007_s11634_021_00444_9
crossref_primary_10_1016_j_knosys_2020_106709
crossref_primary_10_1038_s44294_024_00011_5
crossref_primary_10_36469_jheor_2019_9727
crossref_primary_10_1007_s10994_020_05877_5
crossref_primary_10_1162_neco_a_01580
crossref_primary_10_3390_app122110763
crossref_primary_10_1016_j_asoc_2020_106986
crossref_primary_10_1002_1873_3468_12307
crossref_primary_10_1016_j_fmre_2022_09_019
crossref_primary_10_1109_TNNLS_2018_2870666
crossref_primary_10_1016_j_neunet_2018_05_001
crossref_primary_10_1007_s10115_022_01702_8
crossref_primary_10_1016_j_neucom_2019_08_001
crossref_primary_10_1214_20_AOAS1404
crossref_primary_10_1109_TMM_2018_2871421
crossref_primary_10_1145_3117807
crossref_primary_10_1002_int_22437
crossref_primary_10_1109_TKDE_2021_3119626
crossref_primary_10_1007_s10994_016_5604_6
crossref_primary_10_1016_j_media_2021_102185
Cites_doi 10.1007/978-0-387-21606-5
10.1109/TIT.2010.2068870
10.1007/978-1-4757-3264-1
10.1145/1401890.1401920
10.1109/ICDM.2008.49
10.1016/S1631-073X(03)00215-2
10.1109/TGRS.2010.2058578
ContentType Journal Article
Copyright 2014 The Institute of Electronics, Information and Communication Engineers
Copyright_xml – notice: 2014 The Institute of Electronics, Information and Communication Engineers
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1587/transinf.E97.D.1358
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1745-1361
EndPage 1362
ExternalDocumentID 10_1587_transinf_E97_D_1358
article_transinf_E97_D_5_E97_D_1358_article_char_en
GroupedDBID -~X
5GY
ABQTQ
ABZEH
ACGFS
ADNWM
AENEX
ALMA_UNASSIGNED_HOLDINGS
CS3
DU5
EBS
EJD
F5P
ICE
JSF
JSH
KQ8
OK1
P2P
RJT
RZJ
TN5
TQK
ZKX
1TH
AAYXX
ABTAH
AFFNX
C1A
CITATION
CKLRP
H13
RIG
RYL
VOH
ZE2
ZY4
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c560t-7f35bd8464072b4d1c797e374ad9b95a0d0404285042f9ca05a540e595ba9d253
ISSN 0916-8532
IngestDate Thu Apr 11 21:51:35 EDT 2024
Fri Aug 23 02:39:00 EDT 2024
Wed Apr 05 14:01:09 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c560t-7f35bd8464072b4d1c797e374ad9b95a0d0404285042f9ca05a540e595ba9d253
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://www.jstage.jst.go.jp/article/transinf/E97.D/5/E97.D_1358/_article/-char/en
PQID 1671556011
PQPubID 23500
PageCount 5
ParticipantIDs proquest_miscellaneous_1671556011
crossref_primary_10_1587_transinf_E97_D_1358
jstage_primary_article_transinf_E97_D_5_E97_D_1358_article_char_en
PublicationCentury 2000
PublicationDate 2014-00-00
PublicationDateYYYYMMDD 2014-01-01
PublicationDate_xml – year: 2014
  text: 2014-00-00
PublicationDecade 2010
PublicationTitle IEICE Transactions on Information and Systems
PublicationTitleAlternate IEICE Trans. Inf. & Syst.
PublicationYear 2014
Publisher The Institute of Electronics, Information and Communication Engineers
Publisher_xml – name: The Institute of Electronics, Information and Communication Engineers
References [8] W. Li, Q. Guo, and C. Elkan, “A positive and unlabeled learning algorithm for one-class classification of remote-sensing data,” IEEE Trans. Geosci. Remote Sen., vol.49, no.2, pp.717-725, 2011.
[11] V. Vapnik, The Nature of Statistical Learning Theory, Springer, 2000.
[5] T. Kanamori, S. Hido, and M. Sugiyama, “Efficient direct density ratio estimation for non-stationarity adaptation and outlier detection,” NIPS 21, pp.809-816, 2009.
[7] A. Keziou, “Dual representation of φ-divergences and applications,” Comptes Rendus Mathématique, vol.336, no.10, pp.857-862, 2003.
[10] M. Sugiyama, “Superfast-trainable multi-class probabilistic classifier by least-squares posterior fitting,” IEICE Trans. Inf. & Syst., vol.E93-D, no.10, pp.2690-2701, Oct. 2010.
[2] C. Elkan and K. Noto, “Learning classifiers from only positive and unlabeled data,” 14th ACM SIGKDD, pp.213-220, 2008.
[3] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York, NY, USA, 2001.
[4] S. Hido, Y. Tsuboi, H. Kashima, M. Sugiyama, and T. Kanamori, “Inlier-based outlier detection via direct density ratio estimation,” ICDM 2008, pp.223-232, 2008.
[9] X. Nguyen, M.J. Wainwright, and M.I. Jordan, “Estimating divergence functionals and the likelihood ratio by convex risk minimization,” IEEE Trans. Inf. Theory, vol.56, no.11, pp.5847-5861, 2010.
[1] M.C. du Plessis and M. Sugiyama, “Semi-supervised learning of class balance under class-prior change by distribution matching,” ICML 2012, pp.823-830, 2012.
[6] T. Kanamori, S. Hido, and M. Sugiyama, “A least-squares approach to direct importance estimation,” J. Machine Learning Research, vol.10, pp.1391-1445, July 2009.
11
1
2
3
4
5
6
7
8
9
10
References_xml – ident: 3
  doi: 10.1007/978-0-387-21606-5
– ident: 9
  doi: 10.1109/TIT.2010.2068870
– ident: 5
– ident: 11
  doi: 10.1007/978-1-4757-3264-1
– ident: 1
– ident: 2
  doi: 10.1145/1401890.1401920
– ident: 6
– ident: 4
  doi: 10.1109/ICDM.2008.49
– ident: 7
  doi: 10.1016/S1631-073X(03)00215-2
– ident: 10
– ident: 8
  doi: 10.1109/TGRS.2010.2058578
SSID ssj0018215
ssib053832749
ssib002991706
ssib036429076
ssib036429077
ssib023157076
Score 2.2783906
Snippet We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully...
SourceID proquest
crossref
jstage
SourceType Aggregation Database
Publisher
StartPage 1358
SubjectTerms class-prior change
Classifiers
Density
Divergence
divergence estimation
Estimates
Learning
Matching
Mathematical analysis
Maximization
outlier detection
pearson divergence
positive and unlabeled learning
Title Class Prior Estimation from Positive and Unlabeled Data
URI https://www.jstage.jst.go.jp/article/transinf/E97.D/5/E97.D_1358/_article/-char/en
https://search.proquest.com/docview/1671556011
Volume E97.D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
ispartofPNX IEICE Transactions on Information and Systems, 2014/05/01, Vol.E97.D(5), pp.1358-1362
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELdg8AAPfAwQ5UtG4m2kNHUcz4-wpdpgGkNqpb5FTpyMTixFbYoEfz0_x3YSWB8GL1Zr2VFyd7772ee7I-SN1joumdRY4mUURLlmQaZEZC5TZSNdchmVTbbP0_hoFn2c87mvd--iS-psmP_aGlfyP1xFH_hqomT_gbPtQ9GB3-AvWnAY7bV43FS03DtbLZarvQRr9dLdHDQhI2fNbawf1jswq8BsGBgNLteqD0iPk-ODxBSK8FXDG_eBS6da-7vK615ec6NHv5nLs2sX61N_XVSbtU9TUJYFFOmmczadL36qS2XHrk3ppv5Bgw3t9GeFYRzArlu1WVhNKSIehMxmUveqNJFieNiTGt5TjSGzOdqdmcXU8VYVzs0hyKT5bvQPm0cOu8n9hNmnn9PJ7OQknSbz6U1yawxdY251fvrS20EB_4rOkQw0y8WoQ2AM2y959X-LyGAOGPbtsnVM7Y9tUQxPEZfICi_9bssr_wF2bl8A759fNfoNkpk-IPfcFoS-t_L0kNy4ULvkvi_vQZ223yV3e7kqHxHRCBtthI12wkaNsFEvbBTCQltho0bYHpPZJJkeHAWu6kaQA_3WgSgZzzRgqUmdl0U6zIUUBROR0jKTXI009D42rRxNKXM14gqov-CSZ0rqMWdPyE61rIqnhOowhg0rgKrVfqQYV5lx0rI85yyWeRwPyFtPnvS7Ta6Smk0pqJl6aqagZnqYGmoOyAdLwnawW3l_D-a9Se0YE8UIpTEgrz35U2hS4x5TVbHcrNMwFgDXMQzes2uMeU7umEViT-NekJ16tSleAp_W2atGAn8D4w6L5w
link.rule.ids 315,783,787,4031,27935,27936,27937
linkProvider Colorado Alliance of Research Libraries
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Class+Prior+Estimation+from+Positive+and+Unlabeled+Data&rft.jtitle=IEICE+transactions+on+information+and+systems&rft.au=Plessis%2C+Marthinus+Christoffel+Du&rft.au=Sugiyama%2C+Masashi&rft.date=2014&rft.issn=0916-8532&rft.eissn=1745-1361&rft.volume=E97.D&rft.issue=5&rft.spage=1358&rft.epage=1362&rft_id=info:doi/10.1587%2Ftransinf.E97.D.1358&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0916-8532&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0916-8532&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0916-8532&client=summon