Combining clustering of variables and feature selection using random forests

Standard approaches to tackle high-dimensional supervised classification often include variable selection and dimension reduction. The proposed methodology combines clustering of variables and feature selection. Hierarchical clustering of variables allows to built groups of correlated variables and...

Full description

Saved in:

Bibliographic Details
Published in	Communications in statistics. Simulation and computation Vol. 50; no. 2; pp. 426 - 445
Main Authors	Chavent, Marie, Genuer, Robin, Saracco, Jérôme
Format	Journal Article
Language	English
Published	Philadelphia Taylor & Francis 01.02.2021 Taylor & Francis Ltd
Subjects	Cluster analysis Clustering clustering of variables Feature selection Mathematics random forests Statistics supervised classification variable selection Variables Clustering of variables Random forests Supervised classification Variable selection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Standard approaches to tackle high-dimensional supervised classification often include variable selection and dimension reduction. The proposed methodology combines clustering of variables and feature selection. Hierarchical clustering of variables allows to built groups of correlated variables and summarizes each group by a synthetic variable. Originality is that groups of variables are unknown a priori. Moreover clustering approach deals with both numerical and categorical variables. Among all the possible partitions, the most relevant synthetic variables are selected with a procedure using random forests. Numerical performances are illustrated on simulated and real datasets. Selection of groups of variables provides easier interpretation of results.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0361-0918 1532-4141
DOI:	10.1080/03610918.2018.1563145