KFC: A clusterwise supervised learning procedure based on the aggregation of distances
Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challen...
Saved in:
Published in | Journal of statistical computation and simulation Vol. 91; no. 11; pp. 2307 - 2327 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Abingdon
Taylor & Francis
24.07.2021
Taylor & Francis Ltd |
Subjects | |
Online Access | Get full text |
ISSN | 0094-9655 1563-5163 |
DOI | 10.1080/00949655.2021.1891539 |
Cover
Abstract | Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems. |
---|---|
AbstractList | Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems. Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems. Nowadays, many machine learning procedures are available on the shelves and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems. |
Author | Fischer, Aurélie Has, Sothea Mougeot, Mathilde |
Author_xml | – sequence: 1 givenname: Sothea surname: Has fullname: Has, Sothea email: sothea.has@lpsm.paris, hassothea.math@gmail.com organization: LPSM, Université de Paris – sequence: 2 givenname: Aurélie surname: Fischer fullname: Fischer, Aurélie organization: Centre Borelli, Ecole Normale Supérieure Paris-Saclay & ENSIIE – sequence: 3 givenname: Mathilde surname: Mougeot fullname: Mougeot, Mathilde organization: Centre Borelli, Ecole Normale Supérieure Paris-Saclay & ENSIIE |
BackLink | https://hal.science/hal-02280297$$DView record in HAL |
BookMark | eNp9kE1PxCAQhonRxPXjJ5iQePLQdYBCiyc3G7_iJl7UK2ELdGtqWaFds_9eml09emJgnnkzPCfosPOdReiCwJRACdcAMpeC8ykFSqaklIQzeYAmhAuWcSLYIZqMTDZCx-gkxg8AIITTCXp_vp_f4Bmu2iH2Nnw30eI4rG3YpMrg1urQNV2N18FX1gzB4qUeG77D_cpiXdfB1rpv0t07bJrY666y8QwdOd1Ge74_T9Hb_d3r_DFbvDw8zWeLrGKi6DMj5FLykgtSSbosBRBJBXWMQV4Y7rQouZOksAVzmuvCmFwIkLDUXBpLQLBTdLXLXelWrUPzqcNWed2ox9lCjW9AaQlUFhuW2Msdm_7yNdjYqw8_hC6tpyjPk8IirZIovqOq4GMM1v3FElCjbvWrW4261V53mrvdzTWd8-FTf_vQGtXrbeuDC0lKExX7P-IHrjeGiQ |
Cites_doi | 10.1109/TIT.1982.1056489 10.1016/S0893-6080(05)80023-1 10.1007/978-3-7091-2568-7_4 10.1016/j.spl.2016.07.017 10.3150/bj/1077544602 10.1016/j.jspi.2018.08.001 10.1080/00949655.2011.572882 10.1016/j.jmva.2015.04.007 10.1006/jmva.1999.1884 10.1016/j.clsr.2017.05.015 10.1016/j.patrec.2009.09.011 10.1007/s00180-015-0571-0 10.1007/BF00117832 10.1109/TIT.2005.850145 10.1081/SAC-120003337 10.1080/01621459.1999.10474154 10.1007/b99352 10.1109/21.155943 10.1016/S0167-7152(00)00024-9 |
ContentType | Journal Article |
Copyright | 2021 Informa UK Limited, trading as Taylor & Francis Group 2021 2021 Informa UK Limited, trading as Taylor & Francis Group Distributed under a Creative Commons Attribution 4.0 International License |
Copyright_xml | – notice: 2021 Informa UK Limited, trading as Taylor & Francis Group 2021 – notice: 2021 Informa UK Limited, trading as Taylor & Francis Group – notice: Distributed under a Creative Commons Attribution 4.0 International License |
DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D 1XC VOOES |
DOI | 10.1080/00949655.2021.1891539 |
DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Computer and Information Systems Abstracts |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Statistics Mathematics Computer Science |
EISSN | 1563-5163 |
EndPage | 2327 |
ExternalDocumentID | oai_HAL_hal_02280297v3 10_1080_00949655_2021_1891539 1891539 |
Genre | Research Article |
GroupedDBID | .7F .QJ 0BK 0R~ 29L 30N 4.4 5GY 5VS 8VB AAENE AAJMT AALDU AAMIU AAPUL AAQRR ABCCY ABFIM ABHAV ABJNI ABLIJ ABPAQ ABPEM ABTAI ABXUL ABXYU ACGEJ ACGFS ACGOD ACTIO ADCVX ADGTB ADXPE ADYSH AEISY AENEX AEOZL AEPSL AEYOC AFKVX AFRVT AGDLA AGMYJ AHDZW AIJEM AIYEW AJWEG AKBVH AKOOK ALMA_UNASSIGNED_HOLDINGS ALQZU AMPGV AQRUH AVBZW AWYRJ BLEHA CCCUG CE4 CS3 DGEBU DKSSO DU5 EBS E~A E~B F5P GTTXZ H13 HF~ HZ~ H~P IPNFZ J.P KYCEM LJTGL M4Z MS~ NA5 NY~ O9- P2P PQQKQ QWB RIG RNANH ROSJB RTWRZ S-T SNACF TBQAZ TDBHL TEJ TFL TFT TFW TN5 TTHFI TUROJ TWF UPT UT5 UU3 YQT ZGOLN ZL0 ~S~ AAGDL AAHIA AAYXX CITATION TASJS 7SC 8FD JQ2 L7M L~C L~D 1XC VOOES |
ID | FETCH-LOGICAL-c367t-d69b958561c92b86019262f33047d5fa685f917e73fa5a7dd466090ba59de1063 |
ISSN | 0094-9655 |
IngestDate | Thu Jul 10 09:02:08 EDT 2025 Sun Sep 07 03:53:35 EDT 2025 Sun Aug 03 02:37:23 EDT 2025 Tue May 20 10:45:41 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 11 |
Keywords | Aggregation Kernel 2010 Mathematics Subject Classification: 68U99 Kernel 2010 Mathematics Subject Classification: 62J99 Kernel 2010 Mathematics Subject Classification: 62P30 Bregman divergences Kernel 2010 Mathematics Subject Classification: 68T05 Classification Regression Clustering |
Language | English |
License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c367t-d69b958561c92b86019262f33047d5fa685f917e73fa5a7dd466090ba59de1063 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0009-0009-6346-4519 |
OpenAccessLink | https://hal.science/hal-02280297 |
PQID | 2549497958 |
PQPubID | 53118 |
PageCount | 21 |
ParticipantIDs | hal_primary_oai_HAL_hal_02280297v3 crossref_primary_10_1080_00949655_2021_1891539 informaworld_taylorfrancis_310_1080_00949655_2021_1891539 proquest_journals_2549497958 |
PublicationCentury | 2000 |
PublicationDate | 2021-07-24 |
PublicationDateYYYYMMDD | 2021-07-24 |
PublicationDate_xml | – month: 07 year: 2021 text: 2021-07-24 day: 24 |
PublicationDecade | 2020 |
PublicationPlace | Abingdon |
PublicationPlace_xml | – name: Abingdon |
PublicationTitle | Journal of statistical computation and simulation |
PublicationYear | 2021 |
Publisher | Taylor & Francis Taylor & Francis Ltd |
Publisher_xml | – name: Taylor & Francis – name: Taylor & Francis Ltd |
References | CIT0012 CIT0011 Nemirovski A. (CIT0020) 2000; 28 CIT0013 CIT0016 CIT0015 CIT0018 CIT0019 Steinhaus H. (CIT0010) 1956; 1 CIT0021 CIT0001 CIT0022 Banerjee A (CIT0014) 2005; 6 Strehl A (CIT0027) 2002; 3 Göyrfi L (CIT0023) 2006 LeBlanc M (CIT0017) 1996; 91 CIT0003 CIT0025 CIT0002 CIT0024 CIT0005 CIT0004 CIT0026 CIT0007 CIT0006 CIT0028 CIT0009 CIT0008 |
References_xml | – ident: CIT0011 doi: 10.1109/TIT.1982.1056489 – volume: 28 start-page: 85 year: 2000 ident: CIT0020 publication-title: Ecole DâĂŹEté De Probabilités De Saint-Flour – ident: CIT0024 doi: 10.1016/S0893-6080(05)80023-1 – ident: CIT0012 doi: 10.1007/978-3-7091-2568-7_4 – volume: 91 start-page: 1641 issue: 436 year: 1996 ident: CIT0017 publication-title: J Am Stat Assoc – ident: CIT0026 doi: 10.1016/j.spl.2016.07.017 – ident: CIT0022 doi: 10.3150/bj/1077544602 – ident: CIT0008 doi: 10.1016/j.jspi.2018.08.001 – ident: CIT0002 doi: 10.1080/00949655.2011.572882 – ident: CIT0007 doi: 10.1016/j.jmva.2015.04.007 – ident: CIT0021 doi: 10.1006/jmva.1999.1884 – ident: CIT0004 – volume-title: A distribution-free theory of nonparametric regression year: 2006 ident: CIT0023 – ident: CIT0001 doi: 10.1016/j.clsr.2017.05.015 – volume: 6 start-page: 1705 year: 2005 ident: CIT0014 publication-title: J Mach Learn Res – ident: CIT0013 doi: 10.1016/j.patrec.2009.09.011 – ident: CIT0006 doi: 10.1007/s00180-015-0571-0 – ident: CIT0018 doi: 10.1007/BF00117832 – ident: CIT0015 doi: 10.1109/TIT.2005.850145 – ident: CIT0016 doi: 10.1081/SAC-120003337 – volume: 3 start-page: 583 year: 2002 ident: CIT0027 publication-title: J Mach Learn Res – ident: CIT0005 doi: 10.1080/01621459.1999.10474154 – ident: CIT0019 doi: 10.1007/b99352 – ident: CIT0003 – ident: CIT0025 doi: 10.1109/21.155943 – ident: CIT0028 – ident: CIT0009 doi: 10.1016/S0167-7152(00)00024-9 – volume: 1 start-page: 801 issue: 804 year: 1956 ident: CIT0010 publication-title: Bull Acad Polon Sci |
SSID | ssj0001152 |
Score | 2.251833 |
Snippet | Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when... Nowadays, many machine learning procedures are available on the shelves and may be used easily to calibrate predictive models on supervised data. However, when... |
SourceID | hal proquest crossref informaworld |
SourceType | Open Access Repository Aggregation Database Index Database Publisher |
StartPage | 2307 |
SubjectTerms | Agglomeration aggregation Applications Bregman divergences classification Clustering Computation kernel Machine Learning Methodology Prediction models regression Statistical distributions Statistics Supervised learning |
Title | KFC: A clusterwise supervised learning procedure based on the aggregation of distances |
URI | https://www.tandfonline.com/doi/abs/10.1080/00949655.2021.1891539 https://www.proquest.com/docview/2549497958 https://hal.science/hal-02280297 |
Volume | 91 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3db9MwELe67WU88FFAFAayEG9TqtpJ7Ji3alBVwHhZBxMvURrbo1JpJ5KAxN_AH81d4jhBK-LrJaqcNIl8v9ydz3e_I-TZZJllmoU2mIC1DqJcigB0Hg8ibfkk53E24VjvfPpWzM-jVxfxxWDwvZe1VJXLcf5tZ13Jv0gVxkCuWCX7F5L1N4UB-A3yhSNIGI5_JOPXs5OmsjxfV0h48HVVmOOiusLvvwBPct3GPWozpXGvAK2WdjsEx9klrLYvvdOo0ZfEtOpfeKxYfFTzOtecItgNovlrHXtffXKNwDqdVkPkDEu8vO6frYoWJdPK7dKvV13-LbjzZuuKiEqsMzf9sARnGO_kXVhyca1DSF8LqyhQoqHnHRuneEUYxMwpO6eZmz5eLQJZX8-GTa9cZ7PBLZQ77UGbQKmQFj8e44uOWaJAzavOAPq0RHdmjxxwKXHT_2A6f_HhvbfsrOng5N-_rQhDrvZdj_jJ19n7WGfa9vlwr9n_2qlZ3CY3nWzptIHWHTIwmyG51Xb6oE7xD8mNU8_uWwzJ4VkLhOIueQcgfE6ntAdB2kGQthCkHoK0hiDdbijckvYgSLeWegjeI-ezl4uTeeCadQR5KGQZaKGWCtaeguWKLxOBSwfBLYbLpI5tJpLYKiaNDG0WZ1LrSIiJAlURK20YOMr3yf5muzEPCI0sXGSNNSGs5ZllSsO6O9KcJ1kiQpOMyLid1vSq4WRJmae6beSQohxSJ4cReQqT769FRvX59E2KYzUdFFfySzgiqi-btKwRbBvwpuFvHnDUCjJ1-qFIMfQSKQmT8vA_bv2IHHaf1xHZLz9X5jH4weXyiYPnDwSMrCM |
linkProvider | Library Specific Holdings |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3fT9swED4N9gA8wChMlMGw0F5TEjt24r1VaFU32j7BxJuVxHaZNrVVk4DEX48vP6oCmnjgNbEd5Xy-u-90_g7gm58miQ6Y9Xznrb0wi4TnbB71Qm2pn1Ge-BTvO48nYngT_rrlt2t3YbCsEjG0rYkiKluNhxuT0W1J3AWWw0nBuYN3NOgFsXTHVm7AR-5id9Ry5k9W1jiou-7gFA_ntLd4_rfMM_-0cVdVR65zmL6y2ZUjGuxB1v5CXX_yt1cWaS97fMHu-L5__AS7TZxK-rVi7cMHM-vAXtsDgjQmoQM74xXva96BbYxda-rnA_h9Nbj8Tvok-1ciG8PDn9yQvFygccqNJk2_iimpfKgul4agS9VkPiNuSZJMp0szrRSHzC3RGOhizfch3Ax-XF8OvaaNg5cxERWeFjKVDpWIIJM0jQUGlYJaTKREmttExNw60GgiZhOeRFqHQvjSKRGX2jjEyj7D5mw-M0dAQusGWWMNcygvsIHUDpGFmtI4iQUzcRd67eapRc3WoYIVCWotUIUCVY1Au3Dutng1Frm2h_2RwmcVURCV0T3rglzXAFVUuRRbNz5R7I0PnLTqohrrkCsE5aGMnFCO37H0GWwNr8cjNfo5ufoC2_gKk840PIHNYlmaUxctFenX6jg8AYxSBFY |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Nj9MwEB2xRULdA2ULaAuFtVZcUxI7cWJuVaEq2w9x2CJuVhLbWQRqqyYBiV-PJx9VWYQ49JrYjmKPZ-ZZz28A3rhJHCuPGce10drx05A71udRx1eGuikNYpfifeflis_W_s2XoGUT5g2tEjG0qYUiKl-Nm3unTMuIe4tsOMGDwKI76o28SNhdK87gIbfpCbL6mLs6OGOvLrqDXRzs017i-dcwf4Sns7uKHHksYfqXy67i0LQHSfsHNf3k26gsklH6656440m_-AQeN1kqGddmdQEP9KYPvbYCBGkcQh_OlwfV17wPXcxca-Hnp_B5Pp28I2OSfi9Ri-Hn11yTvNyha8q1Ik21ioxUEVSVe00woCqy3RA7JImzbK-zymzI1hCFaS4yvp_BevrhdjJzmiIOTsp4WDiKi0RYTMK9VNAk4phScmrwGCVUgYl5FBgLGXXITBzEoVI-566wJhQIpS1eZc-hs9lu9CUQ39hGRhvNLMbzjCeUxWO-ojSKI850NIBRu3ZyV2t1SO8ggVpPqMQJlc2EDuDarvChLSptz8YLic8qmSAqwh9sAOLYAGRRnaSYuuyJZP_5wLC1Ftn4hlwiJPdFaCflxQlDX8GjT--ncvFxNX8JXXyDJ87UH0Kn2Jf6lU2ViuR1tRl-A0ddAvo |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=KFC%3A+A+clusterwise+supervised+learning+procedure+based+on+the+aggregation+of+distances&rft.jtitle=Journal+of+statistical+computation+and+simulation&rft.au=Has%2C+Sothea&rft.au=Fischer%2C+Aur%C3%A9lie&rft.au=Mougeot%2C+Mathilde&rft.date=2021-07-24&rft.pub=Taylor+%26+Francis&rft.issn=0094-9655&rft.eissn=1563-5163&rft.volume=91&rft.issue=11&rft.spage=2307&rft.epage=2327&rft_id=info:doi/10.1080%2F00949655.2021.1891539&rft.externalDocID=1891539 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0094-9655&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0094-9655&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0094-9655&client=summon |