KFC: A clusterwise supervised learning procedure based on the aggregation of distances

Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challen...

Full description

Saved in:

Bibliographic Details
Published in	Journal of statistical computation and simulation Vol. 91; no. 11; pp. 2307 - 2327
Main Authors	Has, Sothea, Fischer, Aurélie, Mougeot, Mathilde
Format	Journal Article
Language	English
Published	Abingdon Taylor & Francis 24.07.2021 Taylor & Francis Ltd
Subjects	Agglomeration aggregation Applications Bregman divergences classification Clustering Computation kernel Machine Learning Methodology Prediction models regression Statistical distributions Statistics Supervised learning Aggregation Kernel 2010 Mathematics Subject Classification: 68U99 Kernel 2010 Mathematics Subject Classification: 62J99 Kernel 2010 Mathematics Subject Classification: 62P30 Bregman divergences Kernel 2010 Mathematics Subject Classification: 68T05 Classification Regression Clustering
Online Access	Get full text
ISSN	0094-9655 1563-5163
DOI	10.1080/00949655.2021.1891539

Cover

Loading…

More Information
Summary:	Nowadays, many machine learning procedures are available on the shelve and may be used easily to calibrate predictive models on supervised data. However, when the input data consists of more than one unknown cluster, linked to different underlying predictive models, fitting a model is a more challenging task. We propose, in this paper, a three-step procedure to automatically solve this problem. The first step aims at catching the clustering structure of the input data, which may be characterized by several statistical distributions. For each partition, the second step fits a specific predictive model based on the data in each cluster. The overall model is computed by a consensual aggregation of the models corresponding to the different partitions. A comparison of the performances on different simulated and real data assesses the excellent performance of our method in a large variety of prediction problems.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0094-9655 1563-5163
DOI:	10.1080/00949655.2021.1891539