Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features

Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techni...

Full description

Saved in:
Bibliographic Details
Published inComputational statistics Vol. 37; no. 5; pp. 2671 - 2692
Main Authors Pargent, Florian, Pfisterer, Florian, Thomas, Janek, Bischl, Bernd
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.11.2022
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…