Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation

Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been dev...

Full description

Saved in:

Bibliographic Details
Published in	Journal of Data Science Vol. 21; no. 4; pp. 658 - 680
Main Authors	Zeng, Liyun, Zhang, Hao Helen
Format	Journal Article
Language	English
Published	中華資料採礦協會 01.10.2023
Subjects	probability estimation linear time algorithm support vector machines multiclass classification non-parametric scalability
Online Access	Get full text
ISSN	1683-8602 1680-743X 1683-8602
DOI	10.6339/22-JDS1069

Cover

Loading…

Abstract	Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for K-class problems (Wu et al., 2010; Wang et al., 2019), where K is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in K. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in K. Though not the most efficient in computation, the OVA is found to have the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate their finite sample performance.
AbstractList	Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for K-class problems (Wu et al., 2010; Wang et al., 2019), where K is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in K. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in K. Though not the most efficient in computation, the OVA is found to have the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate their finite sample performance. Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for K-class problems (Wu et al., 2010; Wang et al., 2019), where K is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in K. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in K. Though not the most efficient in computation, the OVA is found to have the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate their finite sample performance.
Author	Hao Helen Zhang Liyun Zeng
Author_xml	– sequence: 1 givenname: Liyun surname: Zeng fullname: Zeng, Liyun – sequence: 2 givenname: Hao Helen surname: Zhang fullname: Zhang, Hao Helen
BookMark	eNpNkD9PwzAUxC1UJNrCwifwjBR4tps4GatS_qkURGFgsp4dB1ylcWW7Q789qdqB5d0bTqe734gMOt9ZQq4Z3BZCVHecZy_3KwZFdUaGrChFVhbAB__-CzKKcQ3AKyhhSL4XrrMY6LT98cGl302kjQ_0w-tdTBS7mq4MtqhbS5e-22LAjU3BGfq6a5MzLcZI34PXqF3r0p7OY3IbTM53l-S8wTbaq5OOydfD_HP2lC3eHp9n00WGrMyrrOaFabQGBCuMNZBLrCXIQuSG5dKKGlglJxNRGmYsnzQgG25ymTPNDGiJYkxujrkm-BiDbdQ29BXCXjFQByiKc3WC0pvF0YyuX-vU2u9C17dTBz4HPGrJgQsGTAJAofoDQvwBj09l_Q
Cites_doi	10.1109/ICDAR.1995.598994 10.1080/10618600.2012.700878 10.1080/10618600.2019.1585260 10.1198/016214506000001383 10.1023/A:1009715923555 10.1109/ICDAR.1997.620583 10.1145/2939672.2939785 10.1016/0022-247X(71)90184-3 10.1198/106186005X25619 10.1007/s10462-017-9586-y 10.1198/jcgs.2010.09206 10.1198/016214504000000098 10.1016/S1535-6108(02)00032-6 10.1007/s42452-020-2266-6 10.1038/35000501 10.1198/016214502753479248 10.1093/biomet/asm077 10.1016/j.ins.2013.12.019 10.1002/cjs.5550340410 10.1016/j.knosys.2015.02.009 10.1080/00207179.2013.801080 10.1198/jasa.2010.tm09107 10.1023/A:1015469627679
ContentType	Journal Article
DBID	188 AAYXX CITATION
DOI	10.6339/22-JDS1069
DatabaseName	Airiti Library CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
EISSN	1683-8602
EndPage	680
ExternalDocumentID	10_6339_22_JDS1069 16838602_N202310170006_00003
GroupedDBID	188 29K 2UF 2WC 5GY ABDBF ACGFO ACIPV AEGXH AIAGR AINHJ ALMA_UNASSIGNED_HOLDINGS ATFKH CNMHZ CVCKV D-I E3Z EAD EAP EBS EJD EMK EPL ESX GROUPED_DOAJ J9A M~E OK1 P2P RNS TR2 TUS TUXDW UY8 UZ4 XSB AAYXX ACUHS C1A CITATION OVT
ID	FETCH-LOGICAL-a1859-d26cfbb0a0e3cec057ad707635c157e3d01974438c1ce24f07f2c5751b1c0b7a3
ISSN	1683-8602 1680-743X
IngestDate	Tue Jul 01 03:02:47 EDT 2025 Tue Oct 01 22:51:30 EDT 2024
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Issue	4
Keywords	probability estimation linear time algorithm support vector machines multiclass classification non-parametric scalability
Language	English
License	https://creativecommons.org/licenses/by/4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-a1859-d26cfbb0a0e3cec057ad707635c157e3d01974438c1ce24f07f2c5751b1c0b7a3
OpenAccessLink	https://jds-online.org/journal/JDS/article/1305/file/pdf
PageCount	23
ParticipantIDs	crossref_primary_10_6339_22_JDS1069 airiti_journals_16838602_N202310170006_00003
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-10-01
PublicationDateYYYYMMDD	2023-10-01
PublicationDate_xml	– month: 10 year: 2023 text: 2023-10-01 day: 01
PublicationDecade	2020
PublicationTitle	Journal of Data Science
PublicationYear	2023
Publisher	中華資料採礦協會
Publisher_xml	– name: 中華資料採礦協會
References	(2023101215262225505_j_jds1069_ref_031) 2004; 5 (2023101215262225505_j_jds1069_ref_010) 2019; 52 (2023101215262225505_j_jds1069_ref_033) 2015; 81 (2023101215262225505_j_jds1069_ref_025) 2002; 6 (2023101215262225505_j_jds1069_ref_029) 2012; 2 (2023101215262225505_j_jds1069_ref_039) 2019; 28 (2023101215262225505_j_jds1069_ref_007) 2016 (2023101215262225505_j_jds1069_ref_027) 2011; 20 (2023101215262225505_j_jds1069_ref_042) 1989; 44 (2023101215262225505_j_jds1069_ref_036) 2006; 16 (2023101215262225505_j_jds1069_ref_026) 2007 (2023101215262225505_j_jds1069_ref_032) 2020; 2 (2023101215262225505_j_jds1069_ref_037) 2008; 95 (2023101215262225505_j_jds1069_ref_041) 2010; 105 (2023101215262225505_j_jds1069_ref_018) 2013; 22 (2023101215262225505_j_jds1069_ref_030) 2021; 34 (2023101215262225505_j_jds1069_ref_046) 2003 (2023101215262225505_j_jds1069_ref_003) 1984 (2023101215262225505_j_jds1069_ref_015) 2006; 34 (2023101215262225505_j_jds1069_ref_044) 2013; 14 (2023101215262225505_j_jds1069_ref_005) 2013; 86 (2023101215262225505_j_jds1069_ref_009) 2000 (2023101215262225505_j_jds1069_ref_020) 2012 (2023101215262225505_j_jds1069_ref_004) 1998; 2 (2023101215262225505_j_jds1069_ref_022) 2014; 264 (2023101215262225505_j_jds1069_ref_034) 1998 (2023101215262225505_j_jds1069_ref_014) 2009 (2023101215262225505_j_jds1069_ref_038) 2007; 102 (2023101215262225505_j_jds1069_ref_028) 1989 (2023101215262225505_j_jds1069_ref_021) 1971; 33 (2023101215262225505_j_jds1069_ref_045) 2005; 14 (2023101215262225505_j_jds1069_ref_008) 2001; 2 (2023101215262225505_j_jds1069_ref_002) 2000; 403 (2023101215262225505_j_jds1069_ref_012) 2002; 97 (2023101215262225505_j_jds1069_ref_013) 2017; 70 (2023101215262225505_j_jds1069_ref_017) 1996 (2023101215262225505_j_jds1069_ref_023) 2004; 99 2023101215262225505_j_jds1069_ref_011 (2023101215262225505_j_jds1069_ref_019) 2016; 2016 (2023101215262225505_j_jds1069_ref_024) 2015; 2 (2023101215262225505_j_jds1069_ref_001) 1997; 2 (2023101215262225505_j_jds1069_ref_006) 2011 (2023101215262225505_j_jds1069_ref_016) 1995; 1 (2023101215262225505_j_jds1069_ref_035) 1990 (2023101215262225505_j_jds1069_ref_040) 1999 (2023101215262225505_j_jds1069_ref_043) 2002; 1
References_xml	– volume: 16 start-page: 569 issue: 2 year: 2006 ident: 2023101215262225505_j_jds1069_ref_036 article-title: Estimation of generalization error: random and fixed inputs publication-title: Statistica Sinica – start-page: 21 volume-title: Proceedings of the Seventh European Symposium on Artificial Neural Networks year: 1999 ident: 2023101215262225505_j_jds1069_ref_040 – volume: 1 start-page: 278 volume-title: Proceedings of the Third International Conference on Document Analysis and Recognition year: 1995 ident: 2023101215262225505_j_jds1069_ref_016 doi: 10.1109/ICDAR.1995.598994 – volume: 44 start-page: 157 issue: 1–3 year: 1989 ident: 2023101215262225505_j_jds1069_ref_042 article-title: An extension of Karmarkar’s projective algorithm for convex quadratic programming publication-title: Mathematical Programming – volume: 2016 start-page: 1 year: 2016 ident: 2023101215262225505_j_jds1069_ref_019 article-title: Discriminant feature distribution analysis-based hybrid feature selection for online bearing fault diagnosis in induction motors publication-title: Journal of Sensors – start-page: 351 volume-title: Proceedings of the Sixth International Conference on Bio-Inspired Computing: Theories and Applications year: 2011 ident: 2023101215262225505_j_jds1069_ref_006 – volume: 22 start-page: 953 issue: 4 year: 2013 ident: 2023101215262225505_j_jds1069_ref_018 article-title: Multiclass distance-weighted discrimination publication-title: Journal of Computational and Graphical Statistics doi: 10.1080/10618600.2012.700878 – volume-title: Spline Models for Observational Data year: 1990 ident: 2023101215262225505_j_jds1069_ref_035 – volume: 28 start-page: 586 issue: 3 year: 2019 ident: 2023101215262225505_j_jds1069_ref_039 article-title: Multiclass probability estimation with support vector machines publication-title: Journal of Computational and Graphical Statistics doi: 10.1080/10618600.2019.1585260 – volume: 102 start-page: 583 year: 2007 ident: 2023101215262225505_j_jds1069_ref_038 article-title: On L 1 -norm multiclass support vector machines publication-title: Journal of the American Statistical Association doi: 10.1198/016214506000001383 – start-page: 291 volume-title: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics year: 2007 ident: 2023101215262225505_j_jds1069_ref_026 – volume: 2 start-page: 2035 volume-title: Proceedings of the 28th International Conference on Neural Information Processing Systems year: 2015 ident: 2023101215262225505_j_jds1069_ref_024 – volume: 2 start-page: 121 year: 1998 ident: 2023101215262225505_j_jds1069_ref_004 article-title: A tutorial on support vector machines for pattern recognition publication-title: Data Mining and Knowledge Discovery doi: 10.1023/A:1009715923555 – volume: 70 start-page: 1321 volume-title: Proceedings of the 34th International Conference on Machine Learning year: 2017 ident: 2023101215262225505_j_jds1069_ref_013 – volume: 2 start-page: 637 volume-title: Proceedings of the Fourth International Conference on Document Analysis and Recognition year: 1997 ident: 2023101215262225505_j_jds1069_ref_001 doi: 10.1109/ICDAR.1997.620583 – start-page: 785 volume-title: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining year: 2016 ident: 2023101215262225505_j_jds1069_ref_007 doi: 10.1145/2939672.2939785 – volume: 33 start-page: 82 year: 1971 ident: 2023101215262225505_j_jds1069_ref_021 article-title: Some results on Tchebycheffian spline functions publication-title: Journal of Mathematical Analysis and Applications doi: 10.1016/0022-247X(71)90184-3 – volume: 14 start-page: 185 year: 2005 ident: 2023101215262225505_j_jds1069_ref_045 article-title: Kernel logistic regression and the import vector machine publication-title: Journal of Computational and Graphical Statistics doi: 10.1198/106186005X25619 – volume-title: An Introduction to Support Vector Machines and other Kernel-based Learning Methods year: 2000 ident: 2023101215262225505_j_jds1069_ref_009 – volume: 2 start-page: 265 year: 2001 ident: 2023101215262225505_j_jds1069_ref_008 article-title: On the algorithmic implementation of multiclass kernel-based vector machines publication-title: Journal of Machine Learning Research – volume: 52 start-page: 775 issue: 2 year: 2019 ident: 2023101215262225505_j_jds1069_ref_010 article-title: A review on multi-class TWSVM publication-title: Artificial Intelligence Review doi: 10.1007/s10462-017-9586-y – volume: 20 start-page: 901 year: 2011 ident: 2023101215262225505_j_jds1069_ref_027 article-title: Reinforced multicategory support vector machine publication-title: Journal of Computational and Graphical Statistics doi: 10.1198/jcgs.2010.09206 – volume: 99 start-page: 67 year: 2004 ident: 2023101215262225505_j_jds1069_ref_023 article-title: Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data publication-title: Journal of the American Statistical Association doi: 10.1198/016214504000000098 – volume: 1 start-page: 133 issue: 2 year: 2002 ident: 2023101215262225505_j_jds1069_ref_043 article-title: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling publication-title: Cancer Cell doi: 10.1016/S1535-6108(02)00032-6 – start-page: 109 volume-title: Proceeding of the Fourth International Conference on Intelligent Systems for Molecular Biology year: 1996 ident: 2023101215262225505_j_jds1069_ref_017 – volume: 2 start-page: 458 issue: 3 year: 2020 ident: 2023101215262225505_j_jds1069_ref_032 article-title: Multi-category news classification using support vector machine based classifiers publication-title: SN Applied Sciences doi: 10.1007/s42452-020-2266-6 – volume: 403 start-page: 503 issue: 6769 year: 2000 ident: 2023101215262225505_j_jds1069_ref_002 article-title: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling publication-title: Nature doi: 10.1038/35000501 – volume: 97 start-page: 77 issue: 457 year: 2002 ident: 2023101215262225505_j_jds1069_ref_012 article-title: Comparison of discrimination methods for the classification of tumors using gene expression data publication-title: Journal of the American Statistical Association doi: 10.1198/016214502753479248 – start-page: 49 volume-title: Proceedings of the 16th International Conference on Neural Information Processing Systems year: 2003 ident: 2023101215262225505_j_jds1069_ref_046 – volume: 95 start-page: 149 year: 2008 ident: 2023101215262225505_j_jds1069_ref_037 article-title: Probability estimation for large margin classifiers publication-title: Biometrika doi: 10.1093/biomet/asm077 – volume-title: Generalized Linear Models year: 1989 ident: 2023101215262225505_j_jds1069_ref_028 – volume: 264 start-page: 182 year: 2014 ident: 2023101215262225505_j_jds1069_ref_022 article-title: Clustering-based ensembles for one-class classification publication-title: Information Sciences doi: 10.1016/j.ins.2013.12.019 – volume-title: The Elements of Statistical Learning: Data mining, Inference and Prediction year: 2009 ident: 2023101215262225505_j_jds1069_ref_014 – volume: 34 start-page: 709 issue: 4 year: 2006 ident: 2023101215262225505_j_jds1069_ref_015 article-title: Classification with reject option publication-title: Canadian Journal of Statistics doi: 10.1002/cjs.5550340410 – volume: 81 start-page: 131 year: 2015 ident: 2023101215262225505_j_jds1069_ref_033 article-title: A comparison on multi-class classification methods based on least squares twin support vector machine publication-title: Knowledge-Based Systems doi: 10.1016/j.knosys.2015.02.009 – volume: 86 start-page: 1367 issue: 8 year: 2013 ident: 2023101215262225505_j_jds1069_ref_005 article-title: Projection-free parallel quadratic programming for linear model predictive control publication-title: International Journal of Control doi: 10.1080/00207179.2013.801080 – volume-title: Statistical Learning Theory year: 1998 ident: 2023101215262225505_j_jds1069_ref_034 – volume: 14 start-page: 1349 year: 2013 ident: 2023101215262225505_j_jds1069_ref_044 article-title: Multicategory large-margin unified machines publication-title: Journal of Machine Learning Research – volume: 105 start-page: 424 year: 2010 ident: 2023101215262225505_j_jds1069_ref_041 article-title: Robust model-free multiclass probability estimation publication-title: Journal of the American Statistical Association doi: 10.1198/jasa.2010.tm09107 – ident: 2023101215262225505_j_jds1069_ref_011 – volume: 2 start-page: 1 volume-title: Proceedings of the 19th International Conference on Neural Information Processing year: 2012 ident: 2023101215262225505_j_jds1069_ref_029 – volume: 34 start-page: 15682 volume-title: Proceedings of the 35th Advances in Neural Information Processing Systems year: 2021 ident: 2023101215262225505_j_jds1069_ref_030 – volume: 6 start-page: 259 year: 2002 ident: 2023101215262225505_j_jds1069_ref_025 article-title: Support vector machines and the bayes rule in classification publication-title: Data Mining and Knowledge Discovery doi: 10.1023/A:1015469627679 – volume-title: Classification and Regression Trees year: 1984 ident: 2023101215262225505_j_jds1069_ref_003 – start-page: 1 volume-title: Proceeding of the 19th International Conference on Telecommunications year: 2012 ident: 2023101215262225505_j_jds1069_ref_020 – volume: 5 start-page: 101 year: 2004 ident: 2023101215262225505_j_jds1069_ref_031 article-title: In defense of one-vs-all classification publication-title: Journal of Machine Learning Research
SSID	ssj0029080 ssib006573294 ssib044743962
Score	2.255189
Snippet	Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information....
SourceID	crossref airiti
SourceType	Index Database Publisher
StartPage	658
Title	Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation
URI	https://www.airitilibrary.com/Article/Detail/16838602-N202310170006-00003
Volume	21
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELeq8cILAgFifMkS-KkEnMSx48dsTTVNoi_bpMFLZLsOTGIpYu3D-BP4q7mL3SbAhBhSFUVWFbd3v9xHcvc7Ql5br6ROC5NA5mUTods2Md4UScuXok153hYCG4XfL-TRmTg-L84nkx-jqqXN2r5132_sK_kfrcIa6BW7ZG-h2d1FYQHOQb9wBA3D8Z90DIkk8vBUXz6tIMf_HLgVsFh6cxUKx09AA31v1GLVIcn3Jc7PctO-69Zh3IyNAjZQdV9Pa7jdLwdN_RmyzszabK3BrpLn4nrTTT_66AF7Y7YK3mzaP40eP1fIhgq1gARWC3ZQsmrG6pKVc1bN8eQgZ6VitWQaPhpPyppVKasVqwSrJKsLVs6YFv13DlkYUhNtqyzzBEdeBddzw1o0yKFlOgJPjKyrDCzv0VHLMALqdx8g8xwpVCHHPp6dQMKrB0-3qz_EbXHXZoH_HK0SD_V9PXHsnUyp8Mof0_OYvGveT-Hb_eBAdYubvRu2gjDGXCAZ1SjKGYUrp_fJvag0WgXQPCAT3z0kHwJg6AAYCoChATAUAEO3gKG_AIYOgKEjwNABMI_I2bw-PTxK4myNxECEppNlJl1rLTfc5847iNrNUnFkJ3RpoXy-hNBfCZGXLnU-Ey1XbebwHZ1NHbfK5I_JXrfq_BNCQXiQ12oLwaAWzhQ6N9rpYtlCqiF1xvfJmyCUJt46V83f5L9PXm1F13wNjCsNZKoo6SbLmijpp7e76DNyd8D4c7K3_rbxLyCaXNuXvZp_ApXFZrw
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Linear+Algorithms+for+Robust+and+Scalable+Nonparametric+Multiclass+Probability+Estimation&rft.jtitle=Journal+of+Data+Science&rft.au=Liyun+Zeng&rft.au=Hao+Helen+Zhang&rft.date=2023-10-01&rft.pub=%E4%B8%AD%E8%8F%AF%E8%B3%87%E6%96%99%E6%8E%A1%E7%A4%A6%E5%8D%94%E6%9C%83&rft.issn=1683-8602&rft.eissn=1683-8602&rft.volume=21&rft.issue=4&rft.spage=658&rft.epage=680&rft_id=info:doi/10.6339%2F22-JDS1069&rft.externalDocID=16838602_N202310170006_00003
thumbnail_m	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fwww.airitilibrary.com%2Fjnltitledo%2F16838602-c.jpg