KFC: A clusterwise supervised learning procedure based on the aggregation of distances

Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challen...

Full description

Saved in:

Bibliographic Details
Published in	Journal of statistical computation and simulation Vol. 91; no. 11; pp. 2307 - 2327
Main Authors	Has, Sothea, Fischer, Aurélie, Mougeot, Mathilde
Format	Journal Article
Language	English
Published	Abingdon Taylor & Francis 24.07.2021 Taylor & Francis Ltd
Subjects	Agglomeration aggregation Applications Bregman divergences classification Clustering Computation kernel Machine Learning Methodology Prediction models regression Statistical distributions Statistics Supervised learning Aggregation Kernel 2010 Mathematics Subject Classification: 68U99 Kernel 2010 Mathematics Subject Classification: 62J99 Kernel 2010 Mathematics Subject Classification: 62P30 Bregman divergences Kernel 2010 Mathematics Subject Classification: 68T05 Classification Regression Clustering
Online Access	Get full text
ISSN	0094-9655 1563-5163
DOI	10.1080/00949655.2021.1891539

Cover

Abstract	Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems.
AbstractList	Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems. Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems. Nowadays, many machine learning procedures are available on the shelves and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems.
Author	Fischer, Aurélie Has, Sothea Mougeot, Mathilde
Author_xml	– sequence: 1 givenname: Sothea surname: Has fullname: Has, Sothea email: sothea.has@lpsm.paris, hassothea.math@gmail.com organization: LPSM, Université de Paris – sequence: 2 givenname: Aurélie surname: Fischer fullname: Fischer, Aurélie organization: Centre Borelli, Ecole Normale Supérieure Paris-Saclay & ENSIIE – sequence: 3 givenname: Mathilde surname: Mougeot fullname: Mougeot, Mathilde organization: Centre Borelli, Ecole Normale Supérieure Paris-Saclay & ENSIIE
BackLink	https://hal.science/hal-02280297$$DView record in HAL
BookMark	eNp9kE1PxCAQhonRxPXjJ5iQePLQdYBCiyc3G7_iJl7UK2ELdGtqWaFds_9eml09emJgnnkzPCfosPOdReiCwJRACdcAMpeC8ykFSqaklIQzeYAmhAuWcSLYIZqMTDZCx-gkxg8AIITTCXp_vp_f4Bmu2iH2Nnw30eI4rG3YpMrg1urQNV2N18FX1gzB4qUeG77D_cpiXdfB1rpv0t07bJrY666y8QwdOd1Ge74_T9Hb_d3r_DFbvDw8zWeLrGKi6DMj5FLykgtSSbosBRBJBXWMQV4Y7rQouZOksAVzmuvCmFwIkLDUXBpLQLBTdLXLXelWrUPzqcNWed2ox9lCjW9AaQlUFhuW2Msdm_7yNdjYqw8_hC6tpyjPk8IirZIovqOq4GMM1v3FElCjbvWrW4261V53mrvdzTWd8-FTf_vQGtXrbeuDC0lKExX7P-IHrjeGiQ
Cites_doi	10.1109/TIT.1982.1056489 10.1016/S0893-6080(05)80023-1 10.1007/978-3-7091-2568-7_4 10.1016/j.spl.2016.07.017 10.3150/bj/1077544602 10.1016/j.jspi.2018.08.001 10.1080/00949655.2011.572882 10.1016/j.jmva.2015.04.007 10.1006/jmva.1999.1884 10.1016/j.clsr.2017.05.015 10.1016/j.patrec.2009.09.011 10.1007/s00180-015-0571-0 10.1007/BF00117832 10.1109/TIT.2005.850145 10.1081/SAC-120003337 10.1080/01621459.1999.10474154 10.1007/b99352 10.1109/21.155943 10.1016/S0167-7152(00)00024-9
ContentType	Journal Article
Copyright	2021 Informa UK Limited, trading as Taylor & Francis Group 2021 2021 Informa UK Limited, trading as Taylor & Francis Group Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml	– notice: 2021 Informa UK Limited, trading as Taylor & Francis Group 2021 – notice: 2021 Informa UK Limited, trading as Taylor & Francis Group – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID	AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D 1XC VOOES
DOI	10.1080/00949655.2021.1891539
DatabaseName	CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle	CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional
DatabaseTitleList	Computer and Information Systems Abstracts
DeliveryMethod	fulltext_linktorsrc
Discipline	Statistics Mathematics Computer Science
EISSN	1563-5163
EndPage	2327
ExternalDocumentID	oai_HAL_hal_02280297v3 10_1080_00949655_2021_1891539 1891539
Genre	Research Article
GroupedDBID	.7F .QJ 0BK 0R~ 29L 30N 4.4 5GY 5VS 8VB AAENE AAJMT AALDU AAMIU AAPUL AAQRR ABCCY ABFIM ABHAV ABJNI ABLIJ ABPAQ ABPEM ABTAI ABXUL ABXYU ACGEJ ACGFS ACGOD ACTIO ADCVX ADGTB ADXPE ADYSH AEISY AENEX AEOZL AEPSL AEYOC AFKVX AFRVT AGDLA AGMYJ AHDZW AIJEM AIYEW AJWEG AKBVH AKOOK ALMA_UNASSIGNED_HOLDINGS ALQZU AMPGV AQRUH AVBZW AWYRJ BLEHA CCCUG CE4 CS3 DGEBU DKSSO DU5 EBS E~A E~B F5P GTTXZ H13 HF~ HZ~ H~P IPNFZ J.P KYCEM LJTGL M4Z MS~ NA5 NY~ O9- P2P PQQKQ QWB RIG RNANH ROSJB RTWRZ S-T SNACF TBQAZ TDBHL TEJ TFL TFT TFW TN5 TTHFI TUROJ TWF UPT UT5 UU3 YQT ZGOLN ZL0 ~S~ AAGDL AAHIA AAYXX CITATION TASJS 7SC 8FD JQ2 L7M L~C L~D 1XC VOOES
ID	FETCH-LOGICAL-c367t-d69b958561c92b86019262f33047d5fa685f917e73fa5a7dd466090ba59de1063
ISSN	0094-9655
IngestDate	Thu Jul 10 09:02:08 EDT 2025 Sun Sep 07 03:53:35 EDT 2025 Sun Aug 03 02:37:23 EDT 2025 Tue May 20 10:45:41 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	11
Keywords	Aggregation Kernel 2010 Mathematics Subject Classification: 68U99 Kernel 2010 Mathematics Subject Classification: 62J99 Kernel 2010 Mathematics Subject Classification: 62P30 Bregman divergences Kernel 2010 Mathematics Subject Classification: 68T05 Classification Regression Clustering
Language	English
License	Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c367t-d69b958561c92b86019262f33047d5fa685f917e73fa5a7dd466090ba59de1063
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0009-0009-6346-4519
OpenAccessLink	https://hal.science/hal-02280297
PQID	2549497958
PQPubID	53118
PageCount	21
ParticipantIDs	hal_primary_oai_HAL_hal_02280297v3 crossref_primary_10_1080_00949655_2021_1891539 informaworld_taylorfrancis_310_1080_00949655_2021_1891539 proquest_journals_2549497958
PublicationCentury	2000
PublicationDate	2021-07-24
PublicationDateYYYYMMDD	2021-07-24
PublicationDate_xml	– month: 07 year: 2021 text: 2021-07-24 day: 24
PublicationDecade	2020
PublicationPlace	Abingdon
PublicationPlace_xml	– name: Abingdon
PublicationTitle	Journal of statistical computation and simulation
PublicationYear	2021
Publisher	Taylor & Francis Taylor & Francis Ltd
Publisher_xml	– name: Taylor & Francis – name: Taylor & Francis Ltd
References	CIT0012 CIT0011 Nemirovski A. (CIT0020) 2000; 28 CIT0013 CIT0016 CIT0015 CIT0018 CIT0019 Steinhaus H. (CIT0010) 1956; 1 CIT0021 CIT0001 CIT0022 Banerjee A (CIT0014) 2005; 6 Strehl A (CIT0027) 2002; 3 Göyrfi L (CIT0023) 2006 LeBlanc M (CIT0017) 1996; 91 CIT0003 CIT0025 CIT0002 CIT0024 CIT0005 CIT0004 CIT0026 CIT0007 CIT0006 CIT0028 CIT0009 CIT0008
References_xml	– ident: CIT0011 doi: 10.1109/TIT.1982.1056489 – volume: 28 start-page: 85 year: 2000 ident: CIT0020 publication-title: Ecole DâĂŹEté De Probabilités De Saint-Flour – ident: CIT0024 doi: 10.1016/S0893-6080(05)80023-1 – ident: CIT0012 doi: 10.1007/978-3-7091-2568-7_4 – volume: 91 start-page: 1641 issue: 436 year: 1996 ident: CIT0017 publication-title: J Am Stat Assoc – ident: CIT0026 doi: 10.1016/j.spl.2016.07.017 – ident: CIT0022 doi: 10.3150/bj/1077544602 – ident: CIT0008 doi: 10.1016/j.jspi.2018.08.001 – ident: CIT0002 doi: 10.1080/00949655.2011.572882 – ident: CIT0007 doi: 10.1016/j.jmva.2015.04.007 – ident: CIT0021 doi: 10.1006/jmva.1999.1884 – ident: CIT0004 – volume-title: A distribution-free theory of nonparametric regression year: 2006 ident: CIT0023 – ident: CIT0001 doi: 10.1016/j.clsr.2017.05.015 – volume: 6 start-page: 1705 year: 2005 ident: CIT0014 publication-title: J Mach Learn Res – ident: CIT0013 doi: 10.1016/j.patrec.2009.09.011 – ident: CIT0006 doi: 10.1007/s00180-015-0571-0 – ident: CIT0018 doi: 10.1007/BF00117832 – ident: CIT0015 doi: 10.1109/TIT.2005.850145 – ident: CIT0016 doi: 10.1081/SAC-120003337 – volume: 3 start-page: 583 year: 2002 ident: CIT0027 publication-title: J Mach Learn Res – ident: CIT0005 doi: 10.1080/01621459.1999.10474154 – ident: CIT0019 doi: 10.1007/b99352 – ident: CIT0003 – ident: CIT0025 doi: 10.1109/21.155943 – ident: CIT0028 – ident: CIT0009 doi: 10.1016/S0167-7152(00)00024-9 – volume: 1 start-page: 801 issue: 804 year: 1956 ident: CIT0010 publication-title: Bull Acad Polon Sci
SSID	ssj0001152
Score	2.251833
Snippet	Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when... Nowadays, many machine learning procedures are available on the shelves and may be used easily to calibrate predictive models on supervised data. However, when...
SourceID	hal proquest crossref informaworld
SourceType	Open Access Repository Aggregation Database Index Database Publisher
StartPage	2307
SubjectTerms	Agglomeration aggregation Applications Bregman divergences classification Clustering Computation kernel Machine Learning Methodology Prediction models regression Statistical distributions Statistics Supervised learning
Title	KFC: A clusterwise supervised learning procedure based on the aggregation of distances
URI	https://www.tandfonline.com/doi/abs/10.1080/00949655.2021.1891539 https://www.proquest.com/docview/2549497958 https://hal.science/hal-02280297
Volume	91
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3db9MwELe67WU88FFAFAayEG9TqtpJ7Ji3alBVwHhZBxMvURrbo1JpJ5KAxN_AH81d4jhBK-LrJaqcNIl8v9ydz3e_I-TZZJllmoU2mIC1DqJcigB0Hg8ibfkk53E24VjvfPpWzM-jVxfxxWDwvZe1VJXLcf5tZ13Jv0gVxkCuWCX7F5L1N4UB-A3yhSNIGI5_JOPXs5OmsjxfV0h48HVVmOOiusLvvwBPct3GPWozpXGvAK2WdjsEx9klrLYvvdOo0ZfEtOpfeKxYfFTzOtecItgNovlrHXtffXKNwDqdVkPkDEu8vO6frYoWJdPK7dKvV13-LbjzZuuKiEqsMzf9sARnGO_kXVhyca1DSF8LqyhQoqHnHRuneEUYxMwpO6eZmz5eLQJZX8-GTa9cZ7PBLZQ77UGbQKmQFj8e44uOWaJAzavOAPq0RHdmjxxwKXHT_2A6f_HhvbfsrOng5N-_rQhDrvZdj_jJ19n7WGfa9vlwr9n_2qlZ3CY3nWzptIHWHTIwmyG51Xb6oE7xD8mNU8_uWwzJ4VkLhOIueQcgfE6ntAdB2kGQthCkHoK0hiDdbijckvYgSLeWegjeI-ezl4uTeeCadQR5KGQZaKGWCtaeguWKLxOBSwfBLYbLpI5tJpLYKiaNDG0WZ1LrSIiJAlURK20YOMr3yf5muzEPCI0sXGSNNSGs5ZllSsO6O9KcJ1kiQpOMyLid1vSq4WRJmae6beSQohxSJ4cReQqT769FRvX59E2KYzUdFFfySzgiqi-btKwRbBvwpuFvHnDUCjJ1-qFIMfQSKQmT8vA_bv2IHHaf1xHZLz9X5jH4weXyiYPnDwSMrCM
linkProvider	Library Specific Holdings
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3fT9swED4N9gA8wChMlMGw0F5TEjt24r1VaFU32j7BxJuVxHaZNrVVk4DEX48vP6oCmnjgNbEd5Xy-u-90_g7gm58miQ6Y9Xznrb0wi4TnbB71Qm2pn1Ge-BTvO48nYngT_rrlt2t3YbCsEjG0rYkiKluNhxuT0W1J3AWWw0nBuYN3NOgFsXTHVm7AR-5id9Ry5k9W1jiou-7gFA_ntLd4_rfMM_-0cVdVR65zmL6y2ZUjGuxB1v5CXX_yt1cWaS97fMHu-L5__AS7TZxK-rVi7cMHM-vAXtsDgjQmoQM74xXva96BbYxda-rnA_h9Nbj8Tvok-1ciG8PDn9yQvFygccqNJk2_iimpfKgul4agS9VkPiNuSZJMp0szrRSHzC3RGOhizfch3Ax-XF8OvaaNg5cxERWeFjKVDpWIIJM0jQUGlYJaTKREmttExNw60GgiZhOeRFqHQvjSKRGX2jjEyj7D5mw-M0dAQusGWWMNcygvsIHUDpGFmtI4iQUzcRd67eapRc3WoYIVCWotUIUCVY1Au3Dutng1Frm2h_2RwmcVURCV0T3rglzXAFVUuRRbNz5R7I0PnLTqohrrkCsE5aGMnFCO37H0GWwNr8cjNfo5ufoC2_gKk840PIHNYlmaUxctFenX6jg8AYxSBFY
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Nj9MwEB2xRULdA2ULaAuFtVZcUxI7cWJuVaEq2w9x2CJuVhLbWQRqqyYBiV-PJx9VWYQ49JrYjmKPZ-ZZz28A3rhJHCuPGce10drx05A71udRx1eGuikNYpfifeflis_W_s2XoGUT5g2tEjG0qYUiKl-Nm3unTMuIe4tsOMGDwKI76o28SNhdK87gIbfpCbL6mLs6OGOvLrqDXRzs017i-dcwf4Sns7uKHHksYfqXy67i0LQHSfsHNf3k26gsklH6656440m_-AQeN1kqGddmdQEP9KYPvbYCBGkcQh_OlwfV17wPXcxca-Hnp_B5Pp28I2OSfi9Ri-Hn11yTvNyha8q1Ik21ioxUEVSVe00woCqy3RA7JImzbK-zymzI1hCFaS4yvp_BevrhdjJzmiIOTsp4WDiKi0RYTMK9VNAk4phScmrwGCVUgYl5FBgLGXXITBzEoVI-566wJhQIpS1eZc-hs9lu9CUQ39hGRhvNLMbzjCeUxWO-ojSKI850NIBRu3ZyV2t1SO8ggVpPqMQJlc2EDuDarvChLSptz8YLic8qmSAqwh9sAOLYAGRRnaSYuuyJZP_5wLC1Ftn4hlwiJPdFaCflxQlDX8GjT--ncvFxNX8JXXyDJ87UH0Kn2Jf6lU2ViuR1tRl-A0ddAvo
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=KFC%3A+A+clusterwise+supervised+learning+procedure+based+on+the+aggregation+of+distances&rft.jtitle=Journal+of+statistical+computation+and+simulation&rft.au=Has%2C+Sothea&rft.au=Fischer%2C+Aur%C3%A9lie&rft.au=Mougeot%2C+Mathilde&rft.date=2021-07-24&rft.pub=Taylor+%26+Francis&rft.issn=0094-9655&rft.eissn=1563-5163&rft.volume=91&rft.issue=11&rft.spage=2307&rft.epage=2327&rft_id=info:doi/10.1080%2F00949655.2021.1891539&rft.externalDocID=1891539
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0094-9655&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0094-9655&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0094-9655&client=summon