KFC: A clusterwise supervised learning procedure based on the aggregation of distances

Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challen...

Full description

Saved in:
Bibliographic Details
Published inJournal of statistical computation and simulation Vol. 91; no. 11; pp. 2307 - 2327
Main Authors Has, Sothea, Fischer, Aurélie, Mougeot, Mathilde
Format Journal Article
LanguageEnglish
Published Abingdon Taylor & Francis 24.07.2021
Taylor & Francis Ltd
Subjects
Online AccessGet full text
ISSN0094-9655
1563-5163
DOI10.1080/00949655.2021.1891539

Cover

Abstract Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems.
AbstractList Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems.
Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems.
Nowadays, many machine learning procedures are available on the shelves and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems.
Author Fischer, Aurélie
Has, Sothea
Mougeot, Mathilde
Author_xml – sequence: 1
  givenname: Sothea
  surname: Has
  fullname: Has, Sothea
  email: sothea.has@lpsm.paris, hassothea.math@gmail.com
  organization: LPSM, Université de Paris
– sequence: 2
  givenname: Aurélie
  surname: Fischer
  fullname: Fischer, Aurélie
  organization: Centre Borelli, Ecole Normale Supérieure Paris-Saclay & ENSIIE
– sequence: 3
  givenname: Mathilde
  surname: Mougeot
  fullname: Mougeot, Mathilde
  organization: Centre Borelli, Ecole Normale Supérieure Paris-Saclay & ENSIIE
BackLink https://hal.science/hal-02280297$$DView record in HAL
BookMark eNp9kE1PxCAQhonRxPXjJ5iQePLQdYBCiyc3G7_iJl7UK2ELdGtqWaFds_9eml09emJgnnkzPCfosPOdReiCwJRACdcAMpeC8ykFSqaklIQzeYAmhAuWcSLYIZqMTDZCx-gkxg8AIITTCXp_vp_f4Bmu2iH2Nnw30eI4rG3YpMrg1urQNV2N18FX1gzB4qUeG77D_cpiXdfB1rpv0t07bJrY666y8QwdOd1Ge74_T9Hb_d3r_DFbvDw8zWeLrGKi6DMj5FLykgtSSbosBRBJBXWMQV4Y7rQouZOksAVzmuvCmFwIkLDUXBpLQLBTdLXLXelWrUPzqcNWed2ox9lCjW9AaQlUFhuW2Msdm_7yNdjYqw8_hC6tpyjPk8IirZIovqOq4GMM1v3FElCjbvWrW4261V53mrvdzTWd8-FTf_vQGtXrbeuDC0lKExX7P-IHrjeGiQ
Cites_doi 10.1109/TIT.1982.1056489
10.1016/S0893-6080(05)80023-1
10.1007/978-3-7091-2568-7_4
10.1016/j.spl.2016.07.017
10.3150/bj/1077544602
10.1016/j.jspi.2018.08.001
10.1080/00949655.2011.572882
10.1016/j.jmva.2015.04.007
10.1006/jmva.1999.1884
10.1016/j.clsr.2017.05.015
10.1016/j.patrec.2009.09.011
10.1007/s00180-015-0571-0
10.1007/BF00117832
10.1109/TIT.2005.850145
10.1081/SAC-120003337
10.1080/01621459.1999.10474154
10.1007/b99352
10.1109/21.155943
10.1016/S0167-7152(00)00024-9
ContentType Journal Article
Copyright 2021 Informa UK Limited, trading as Taylor & Francis Group 2021
2021 Informa UK Limited, trading as Taylor & Francis Group
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: 2021 Informa UK Limited, trading as Taylor & Francis Group 2021
– notice: 2021 Informa UK Limited, trading as Taylor & Francis Group
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
1XC
VOOES
DOI 10.1080/00949655.2021.1891539
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts


DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Mathematics
Computer Science
EISSN 1563-5163
EndPage 2327
ExternalDocumentID oai_HAL_hal_02280297v3
10_1080_00949655_2021_1891539
1891539
Genre Research Article
GroupedDBID .7F
.QJ
0BK
0R~
29L
30N
4.4
5GY
5VS
8VB
AAENE
AAJMT
AALDU
AAMIU
AAPUL
AAQRR
ABCCY
ABFIM
ABHAV
ABJNI
ABLIJ
ABPAQ
ABPEM
ABTAI
ABXUL
ABXYU
ACGEJ
ACGFS
ACGOD
ACTIO
ADCVX
ADGTB
ADXPE
ADYSH
AEISY
AENEX
AEOZL
AEPSL
AEYOC
AFKVX
AFRVT
AGDLA
AGMYJ
AHDZW
AIJEM
AIYEW
AJWEG
AKBVH
AKOOK
ALMA_UNASSIGNED_HOLDINGS
ALQZU
AMPGV
AQRUH
AVBZW
AWYRJ
BLEHA
CCCUG
CE4
CS3
DGEBU
DKSSO
DU5
EBS
E~A
E~B
F5P
GTTXZ
H13
HF~
HZ~
H~P
IPNFZ
J.P
KYCEM
LJTGL
M4Z
MS~
NA5
NY~
O9-
P2P
PQQKQ
QWB
RIG
RNANH
ROSJB
RTWRZ
S-T
SNACF
TBQAZ
TDBHL
TEJ
TFL
TFT
TFW
TN5
TTHFI
TUROJ
TWF
UPT
UT5
UU3
YQT
ZGOLN
ZL0
~S~
AAGDL
AAHIA
AAYXX
CITATION
TASJS
7SC
8FD
JQ2
L7M
L~C
L~D
1XC
VOOES
ID FETCH-LOGICAL-c367t-d69b958561c92b86019262f33047d5fa685f917e73fa5a7dd466090ba59de1063
ISSN 0094-9655
IngestDate Thu Jul 10 09:02:08 EDT 2025
Sun Sep 07 03:53:35 EDT 2025
Sun Aug 03 02:37:23 EDT 2025
Tue May 20 10:45:41 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Keywords Aggregation
Kernel 2010 Mathematics Subject Classification: 68U99
Kernel 2010 Mathematics Subject Classification: 62J99
Kernel 2010 Mathematics Subject Classification: 62P30
Bregman divergences
Kernel 2010 Mathematics Subject Classification: 68T05
Classification
Regression
Clustering
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c367t-d69b958561c92b86019262f33047d5fa685f917e73fa5a7dd466090ba59de1063
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0009-0009-6346-4519
OpenAccessLink https://hal.science/hal-02280297
PQID 2549497958
PQPubID 53118
PageCount 21
ParticipantIDs hal_primary_oai_HAL_hal_02280297v3
crossref_primary_10_1080_00949655_2021_1891539
informaworld_taylorfrancis_310_1080_00949655_2021_1891539
proquest_journals_2549497958
PublicationCentury 2000
PublicationDate 2021-07-24
PublicationDateYYYYMMDD 2021-07-24
PublicationDate_xml – month: 07
  year: 2021
  text: 2021-07-24
  day: 24
PublicationDecade 2020
PublicationPlace Abingdon
PublicationPlace_xml – name: Abingdon
PublicationTitle Journal of statistical computation and simulation
PublicationYear 2021
Publisher Taylor & Francis
Taylor & Francis Ltd
Publisher_xml – name: Taylor & Francis
– name: Taylor & Francis Ltd
References CIT0012
CIT0011
Nemirovski A. (CIT0020) 2000; 28
CIT0013
CIT0016
CIT0015
CIT0018
CIT0019
Steinhaus H. (CIT0010) 1956; 1
CIT0021
CIT0001
CIT0022
Banerjee A (CIT0014) 2005; 6
Strehl A (CIT0027) 2002; 3
Göyrfi L (CIT0023) 2006
LeBlanc M (CIT0017) 1996; 91
CIT0003
CIT0025
CIT0002
CIT0024
CIT0005
CIT0004
CIT0026
CIT0007
CIT0006
CIT0028
CIT0009
CIT0008
References_xml – ident: CIT0011
  doi: 10.1109/TIT.1982.1056489
– volume: 28
  start-page: 85
  year: 2000
  ident: CIT0020
  publication-title: Ecole DâĂŹEté De Probabilités De Saint-Flour
– ident: CIT0024
  doi: 10.1016/S0893-6080(05)80023-1
– ident: CIT0012
  doi: 10.1007/978-3-7091-2568-7_4
– volume: 91
  start-page: 1641
  issue: 436
  year: 1996
  ident: CIT0017
  publication-title: J Am Stat Assoc
– ident: CIT0026
  doi: 10.1016/j.spl.2016.07.017
– ident: CIT0022
  doi: 10.3150/bj/1077544602
– ident: CIT0008
  doi: 10.1016/j.jspi.2018.08.001
– ident: CIT0002
  doi: 10.1080/00949655.2011.572882
– ident: CIT0007
  doi: 10.1016/j.jmva.2015.04.007
– ident: CIT0021
  doi: 10.1006/jmva.1999.1884
– ident: CIT0004
– volume-title: A distribution-free theory of nonparametric regression
  year: 2006
  ident: CIT0023
– ident: CIT0001
  doi: 10.1016/j.clsr.2017.05.015
– volume: 6
  start-page: 1705
  year: 2005
  ident: CIT0014
  publication-title: J Mach Learn Res
– ident: CIT0013
  doi: 10.1016/j.patrec.2009.09.011
– ident: CIT0006
  doi: 10.1007/s00180-015-0571-0
– ident: CIT0018
  doi: 10.1007/BF00117832
– ident: CIT0015
  doi: 10.1109/TIT.2005.850145
– ident: CIT0016
  doi: 10.1081/SAC-120003337
– volume: 3
  start-page: 583
  year: 2002
  ident: CIT0027
  publication-title: J Mach Learn Res
– ident: CIT0005
  doi: 10.1080/01621459.1999.10474154
– ident: CIT0019
  doi: 10.1007/b99352
– ident: CIT0003
– ident: CIT0025
  doi: 10.1109/21.155943
– ident: CIT0028
– ident: CIT0009
  doi: 10.1016/S0167-7152(00)00024-9
– volume: 1
  start-page: 801
  issue: 804
  year: 1956
  ident: CIT0010
  publication-title: Bull Acad Polon Sci
SSID ssj0001152
Score 2.251833
Snippet Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when...
Nowadays, many machine learning procedures are available on the shelves and may be used easily to calibrate predictive models on supervised data. However, when...
SourceID hal
proquest
crossref
informaworld
SourceType Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage 2307
SubjectTerms Agglomeration
aggregation
Applications
Bregman divergences
classification
Clustering
Computation
kernel
Machine Learning
Methodology
Prediction models
regression
Statistical distributions
Statistics
Supervised learning
Title KFC: A clusterwise supervised learning procedure based on the aggregation of distances
URI https://www.tandfonline.com/doi/abs/10.1080/00949655.2021.1891539
https://www.proquest.com/docview/2549497958
https://hal.science/hal-02280297
Volume 91
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3db9MwELe67WU88FFAFAayEG9TqtpJ7Ji3alBVwHhZBxMvURrbo1JpJ5KAxN_AH81d4jhBK-LrJaqcNIl8v9ydz3e_I-TZZJllmoU2mIC1DqJcigB0Hg8ibfkk53E24VjvfPpWzM-jVxfxxWDwvZe1VJXLcf5tZ13Jv0gVxkCuWCX7F5L1N4UB-A3yhSNIGI5_JOPXs5OmsjxfV0h48HVVmOOiusLvvwBPct3GPWozpXGvAK2WdjsEx9klrLYvvdOo0ZfEtOpfeKxYfFTzOtecItgNovlrHXtffXKNwDqdVkPkDEu8vO6frYoWJdPK7dKvV13-LbjzZuuKiEqsMzf9sARnGO_kXVhyca1DSF8LqyhQoqHnHRuneEUYxMwpO6eZmz5eLQJZX8-GTa9cZ7PBLZQ77UGbQKmQFj8e44uOWaJAzavOAPq0RHdmjxxwKXHT_2A6f_HhvbfsrOng5N-_rQhDrvZdj_jJ19n7WGfa9vlwr9n_2qlZ3CY3nWzptIHWHTIwmyG51Xb6oE7xD8mNU8_uWwzJ4VkLhOIueQcgfE6ntAdB2kGQthCkHoK0hiDdbijckvYgSLeWegjeI-ezl4uTeeCadQR5KGQZaKGWCtaeguWKLxOBSwfBLYbLpI5tJpLYKiaNDG0WZ1LrSIiJAlURK20YOMr3yf5muzEPCI0sXGSNNSGs5ZllSsO6O9KcJ1kiQpOMyLid1vSq4WRJmae6beSQohxSJ4cReQqT769FRvX59E2KYzUdFFfySzgiqi-btKwRbBvwpuFvHnDUCjJ1-qFIMfQSKQmT8vA_bv2IHHaf1xHZLz9X5jH4weXyiYPnDwSMrCM
linkProvider Library Specific Holdings
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3fT9swED4N9gA8wChMlMGw0F5TEjt24r1VaFU32j7BxJuVxHaZNrVVk4DEX48vP6oCmnjgNbEd5Xy-u-90_g7gm58miQ6Y9Xznrb0wi4TnbB71Qm2pn1Ge-BTvO48nYngT_rrlt2t3YbCsEjG0rYkiKluNhxuT0W1J3AWWw0nBuYN3NOgFsXTHVm7AR-5id9Ry5k9W1jiou-7gFA_ntLd4_rfMM_-0cVdVR65zmL6y2ZUjGuxB1v5CXX_yt1cWaS97fMHu-L5__AS7TZxK-rVi7cMHM-vAXtsDgjQmoQM74xXva96BbYxda-rnA_h9Nbj8Tvok-1ciG8PDn9yQvFygccqNJk2_iimpfKgul4agS9VkPiNuSZJMp0szrRSHzC3RGOhizfch3Ax-XF8OvaaNg5cxERWeFjKVDpWIIJM0jQUGlYJaTKREmttExNw60GgiZhOeRFqHQvjSKRGX2jjEyj7D5mw-M0dAQusGWWMNcygvsIHUDpGFmtI4iQUzcRd67eapRc3WoYIVCWotUIUCVY1Au3Dutng1Frm2h_2RwmcVURCV0T3rglzXAFVUuRRbNz5R7I0PnLTqohrrkCsE5aGMnFCO37H0GWwNr8cjNfo5ufoC2_gKk840PIHNYlmaUxctFenX6jg8AYxSBFY
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Nj9MwEB2xRULdA2ULaAuFtVZcUxI7cWJuVaEq2w9x2CJuVhLbWQRqqyYBiV-PJx9VWYQ49JrYjmKPZ-ZZz28A3rhJHCuPGce10drx05A71udRx1eGuikNYpfifeflis_W_s2XoGUT5g2tEjG0qYUiKl-Nm3unTMuIe4tsOMGDwKI76o28SNhdK87gIbfpCbL6mLs6OGOvLrqDXRzs017i-dcwf4Sns7uKHHksYfqXy67i0LQHSfsHNf3k26gsklH6656440m_-AQeN1kqGddmdQEP9KYPvbYCBGkcQh_OlwfV17wPXcxca-Hnp_B5Pp28I2OSfi9Ri-Hn11yTvNyha8q1Ik21ioxUEVSVe00woCqy3RA7JImzbK-zymzI1hCFaS4yvp_BevrhdjJzmiIOTsp4WDiKi0RYTMK9VNAk4phScmrwGCVUgYl5FBgLGXXITBzEoVI-566wJhQIpS1eZc-hs9lu9CUQ39hGRhvNLMbzjCeUxWO-ojSKI850NIBRu3ZyV2t1SO8ggVpPqMQJlc2EDuDarvChLSptz8YLic8qmSAqwh9sAOLYAGRRnaSYuuyJZP_5wLC1Ftn4hlwiJPdFaCflxQlDX8GjT--ncvFxNX8JXXyDJ87UH0Kn2Jf6lU2ViuR1tRl-A0ddAvo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=KFC%3A+A+clusterwise+supervised+learning+procedure+based+on+the+aggregation+of+distances&rft.jtitle=Journal+of+statistical+computation+and+simulation&rft.au=Has%2C+Sothea&rft.au=Fischer%2C+Aur%C3%A9lie&rft.au=Mougeot%2C+Mathilde&rft.date=2021-07-24&rft.pub=Taylor+%26+Francis&rft.issn=0094-9655&rft.eissn=1563-5163&rft.volume=91&rft.issue=11&rft.spage=2307&rft.epage=2327&rft_id=info:doi/10.1080%2F00949655.2021.1891539&rft.externalDocID=1891539
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0094-9655&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0094-9655&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0094-9655&client=summon