Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features

Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techni...

Full description

Saved in:
Bibliographic Details
Published inComputational statistics Vol. 37; no. 5; pp. 2671 - 2692
Main Authors Pargent, Florian, Pfisterer, Florian, Thomas, Janek, Bischl, Bernd
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.11.2022
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techniques that yield numeric representations of categorical variables which can then be used in subsequent ML applications. We focus on the impact of these techniques on a subsequent algorithm’s predictive performance, and—if possible—derive best practices on when to use which technique. We conducted a large-scale benchmark experiment, where we compared different encoding strategies together with five ML algorithms (lasso, random forest, gradient boosting, k -nearest neighbors, support vector machine) using datasets from regression, binary- and multiclass–classification settings. In our study, regularized versions of target encoding (i.e. using target predictions based on the feature levels in the training set as a new numerical feature) consistently provided the best results. Traditionally widely used encodings that make unreasonable assumptions to map levels to integers (e.g. integer encoding) or to reduce the number of levels (possibly based on target information, e.g. leaf encoding) before creating binary indicator variables (one-hot or dummy encoding) were not as effective in comparison.
AbstractList Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techniques that yield numeric representations of categorical variables which can then be used in subsequent ML applications. We focus on the impact of these techniques on a subsequent algorithm’s predictive performance, and—if possible—derive best practices on when to use which technique. We conducted a large-scale benchmark experiment, where we compared different encoding strategies together with five ML algorithms (lasso, random forest, gradient boosting, k -nearest neighbors, support vector machine) using datasets from regression, binary- and multiclass–classification settings. In our study, regularized versions of target encoding (i.e. using target predictions based on the feature levels in the training set as a new numerical feature) consistently provided the best results. Traditionally widely used encodings that make unreasonable assumptions to map levels to integers (e.g. integer encoding) or to reduce the number of levels (possibly based on target information, e.g. leaf encoding) before creating binary indicator variables (one-hot or dummy encoding) were not as effective in comparison.
Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical predictor variables with a high number of levels. We study techniques that yield numeric representations of categorical variables which can then be used in subsequent ML applications. We focus on the impact of these techniques on a subsequent algorithm’s predictive performance, and—if possible—derive best practices on when to use which technique. We conducted a large-scale benchmark experiment, where we compared different encoding strategies together with five ML algorithms (lasso, random forest, gradient boosting, k-nearest neighbors, support vector machine) using datasets from regression, binary- and multiclass–classification settings. In our study, regularized versions of target encoding (i.e. using target predictions based on the feature levels in the training set as a new numerical feature) consistently provided the best results. Traditionally widely used encodings that make unreasonable assumptions to map levels to integers (e.g. integer encoding) or to reduce the number of levels (possibly based on target information, e.g. leaf encoding) before creating binary indicator variables (one-hot or dummy encoding) were not as effective in comparison.
Author Pargent, Florian
Pfisterer, Florian
Bischl, Bernd
Thomas, Janek
Author_xml – sequence: 1
  givenname: Florian
  orcidid: 0000-0002-2388-553X
  surname: Pargent
  fullname: Pargent, Florian
  email: florian.pargent@psy.lmu.de
  organization: Department of Psychology, Psychological Methods and Assessment, LMU Munich
– sequence: 2
  givenname: Florian
  orcidid: 0000-0001-8867-762X
  surname: Pfisterer
  fullname: Pfisterer, Florian
  organization: Department of Statistics, Statistical Learning and Data Science, LMU Munich
– sequence: 3
  givenname: Janek
  orcidid: 0000-0003-4511-6245
  surname: Thomas
  fullname: Thomas, Janek
  organization: Department of Statistics, Statistical Learning and Data Science, LMU Munich
– sequence: 4
  givenname: Bernd
  orcidid: 0000-0001-6002-6980
  surname: Bischl
  fullname: Bischl, Bernd
  organization: Department of Statistics, Statistical Learning and Data Science, LMU Munich
BookMark eNp9kFFr2zAUhcXIYEm2P7AnQZ-9XUmObT2O0K6FQKG0z0K2r2wFR8okeSP99VOWQaEPeTov33e556zIwnmHhHxl8I0B1N8jAGugAM4LYBzqovpAlqxiopDVplmQJchSFCVU_BNZxbiHTNacLcnpCYd50sG-Yk-TDgMmiq7zvXUD9XM6YjA-HCJNQfc2We_0RA-YRt9Hah2NcyZ-25jtg-5G65BOqIM7639sGuloh5F2OuSDerLpRA3qNAeMn8lHo6eIX_7nmrzc3T5v74vd48-H7Y9d0ZWsTkVnZC8EVgbbEqUogUO7abmUOSTq2gjDgVWmNbVsS8E2TWOamm16Dl3FDYo1ubncPQb_a8aY1N7PIT8TFa-Z5IJxJjPVXKgu-BgDGtXZpM99c3E7KQbqPLS6DK3yfOrf0KrKKn-nHoM96HC6LomLFDPsBgxvX12x_gLcTZVi
CitedBy_id crossref_primary_10_1007_s00542_025_05848_7
crossref_primary_10_3390_rs16163020
crossref_primary_10_1007_s42979_025_03766_z
crossref_primary_10_3390_s23104893
crossref_primary_10_3390_rs16214081
crossref_primary_10_1016_j_scitotenv_2024_176650
crossref_primary_10_1007_s44257_024_00015_0
crossref_primary_10_1080_01605682_2024_2398762
crossref_primary_10_1007_s10489_024_05330_3
crossref_primary_10_1587_comex_2023XBL0082
crossref_primary_10_1155_2024_8858524
crossref_primary_10_3390_app13074119
crossref_primary_10_1017_S1748499523000283
crossref_primary_10_1016_j_seppur_2024_127894
crossref_primary_10_1109_TIT_2023_3287432
crossref_primary_10_1007_s10653_024_02087_z
crossref_primary_10_1016_j_geoderma_2024_116838
crossref_primary_10_1007_s12652_024_04776_0
crossref_primary_10_48084_etasr_8226
crossref_primary_10_1063_5_0177271
crossref_primary_10_1007_s11606_023_08065_y
crossref_primary_10_3390_math12162553
crossref_primary_10_1007_s11416_024_00517_1
crossref_primary_10_1016_j_chempr_2024_07_025
crossref_primary_10_1109_ACCESS_2022_3170421
crossref_primary_10_1142_S2810939223500028
crossref_primary_10_1007_s42979_024_02999_8
crossref_primary_10_1016_j_jhazmat_2024_134012
crossref_primary_10_1109_ACCESS_2025_3536281
crossref_primary_10_1134_S1064230724700680
crossref_primary_10_1016_j_atmosenv_2024_120615
crossref_primary_10_1007_s40745_024_00575_8
crossref_primary_10_1016_j_rcsop_2024_100463
crossref_primary_10_1017_asb_2024_7
crossref_primary_10_1038_s41598_023_37746_1
crossref_primary_10_1016_j_eswa_2023_120373
crossref_primary_10_1016_j_ress_2024_110558
crossref_primary_10_1016_j_ipm_2023_103526
crossref_primary_10_3390_nano13061024
crossref_primary_10_1016_j_ipm_2024_103645
Cites_doi 10.1109/TKDE.2020.2992529
10.1177/1471082X16644998
10.18637/jss.v033.i01
10.1023/A:1024068626366
10.1111/j.1467-985X.1997.00078.x
10.1177/1471082X16652780
10.1007/978-3-540-70981-7_19
10.1007/3-540-44989-2_43
10.18637/jss.v067.i01
10.1016/j.patrec.2008.08.010
10.21105/joss.00135
10.18637/jss.v032.i09
10.1198/106186005X59630
10.1007/978-3-030-72657-7_14
10.1007/s10994-018-5724-2
10.1145/1553374.1553516
10.1145/507533.507538
10.18637/jss.v077.i01
10.1007/BF02296971
10.7717/peerj.6339
10.1016/j.imavis.2018.04.004
10.1002/bimj.201700129
10.1201/9781315108230
10.1186/s40537-020-00305-w
10.1002/widm.1441
10.1145/2487575.2487629
10.1145/2641190.2641198
10.1002/widm.1301
10.1017/CBO9780511790942
10.1016/j.csda.2019.106839
10.1007/BF02296972
10.32614/CRAN.package.mlrCPO
10.1201/9780203738535
ContentType Journal Article
Copyright The Author(s) 2022
The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2022
– notice: The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID C6C
AAYXX
CITATION
3V.
7SC
7TB
7WY
7WZ
7XB
87Z
88I
8AL
8C1
8FD
8FE
8FG
8FK
8FL
8G5
ABJCF
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BEZIV
BGLVJ
CCPQU
DWQXO
FR3
FRNLG
FYUFA
F~G
GHDGH
GNUQQ
GUQSH
HCIFZ
JQ2
K60
K6~
K7-
KR7
L.-
L6V
L7M
L~C
L~D
M0C
M0N
M2O
M2P
M7S
MBDVC
P5Z
P62
PHGZM
PHGZT
PJZUB
PKEHL
PPXIY
PQBIZ
PQBZA
PQEST
PQGLB
PQQKQ
PQUKI
PTHSS
Q9U
DOI 10.1007/s00180-022-01207-6
DatabaseName Springer Nature OA Free Journals
CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
Mechanical & Transportation Engineering Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Collection
Science Database (Alumni Edition)
Computing Database (Alumni Edition)
Public Health Database
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni)
ProQuest Research Library
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Business Premium Collection
Technology Collection
ProQuest One Community College
ProQuest Central
Engineering Research Database
Business Premium Collection (Alumni)
Health Research Premium Collection
ABI/INFORM Global (Corporate)
Health Research Premium Collection (Alumni)
ProQuest Central Student
ProQuest Research Library
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
Computer Science Database
Civil Engineering Abstracts
ABI/INFORM Professional Advanced
ProQuest Engineering Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
ABI/INFORM Global
Computing Database
Research Library
Science Database
Engineering Database
Research Library (Corporate)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Business
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
Engineering Collection
ProQuest Central Basic
DatabaseTitle CrossRef
ProQuest Business Collection (Alumni Edition)
Research Library Prep
Computer Science Database
ProQuest Central Student
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
SciTech Premium Collection
ABI/INFORM Complete
ProQuest One Applied & Life Sciences
Health Research Premium Collection
Health & Medical Research Collection
ProQuest Central (New)
Engineering Collection
Advanced Technologies & Aerospace Collection
Business Premium Collection
ABI/INFORM Global
Engineering Database
ProQuest Science Journals (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
Health Research Premium Collection (Alumni)
ProQuest Business Collection
ProQuest One Academic UKI Edition
Engineering Research Database
ProQuest One Academic
ProQuest One Academic (New)
ABI/INFORM Global (Corporate)
ProQuest One Business
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
Mechanical & Transportation Engineering Abstracts
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
Research Library (Alumni Edition)
ProQuest Central
ABI/INFORM Professional Advanced
ProQuest Health & Medical Research Collection
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Research Library
Advanced Technologies Database with Aerospace
ABI/INFORM Complete (Alumni Edition)
Civil Engineering Abstracts
ProQuest Computing
ProQuest Public Health
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Science Journals
ProQuest Computing (Alumni Edition)
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
Materials Science & Engineering Collection
ProQuest One Business (Alumni)
ProQuest Central (Alumni)
Business Premium Collection (Alumni)
DatabaseTitleList
CrossRef
ProQuest Business Collection (Alumni Edition)
Database_xml – sequence: 1
  dbid: C6C
  name: Springer Nature OA Free Journals
  url: http://www.springeropen.com/
  sourceTypes: Publisher
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Mathematics
EISSN 1613-9658
EndPage 2692
ExternalDocumentID 10_1007_s00180_022_01207_6
GrantInformation_xml – fundername: Bayerisches Staatsministerium für Wirtschaft und Medien, Energie und Technologie
  grantid: 20-3410-2-9-8
  funderid: http://dx.doi.org/10.13039/501100006463
– fundername: Bundesministerium für Bildung, Wissenschaft und Kultur
  grantid: 01IS18036A
  funderid: http://dx.doi.org/10.13039/501100006604
GroupedDBID -5D
-5G
-BR
-EM
-Y2
-~C
.86
.VR
06D
0R~
0VY
199
1N0
203
29F
2J2
2JN
2JY
2KG
2LR
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
53G
5GY
5VS
67Z
6NX
78A
7WY
88I
8C1
8FE
8FG
8FL
8G5
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTV
ABHLI
ABHQN
ABJCF
ABJNI
ABJOX
ABKCH
ABKTR
ABLJU
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACGOD
ACHSB
ACHXU
ACIWK
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACSNA
ACZOJ
ADBBV
ADHHG
ADHIR
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALIPV
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
AZQEC
B-.
BA0
BAPOH
BDATZ
BENPR
BEZIV
BGLVJ
BGNMA
BPHCQ
BSONS
C6C
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DNIVK
DPUIP
DU5
DWQXO
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRNLG
FRRFC
FSGXE
FWDCC
FYUFA
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GROUPED_ABI_INFORM_COMPLETE
GUQSH
GXS
H13
HCIFZ
HF~
HG5
HG6
HLICF
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IXE
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K60
K6V
K6~
K7-
KDC
KOV
L6V
LAS
LLZTM
M0C
M0N
M2O
M2P
M4Y
M7S
MA-
MK~
N2Q
N9A
NB0
NPVJJ
NQJWS
NU0
O9-
O93
O9J
OAM
P2P
P62
P9R
PF0
PQBIZ
PQBZA
PQQKQ
PROAC
PT4
PTHSS
Q2X
QOS
R89
R9I
RNS
ROL
RPX
RSV
S16
S1Z
S27
S3B
SAP
SDH
SHX
SISQX
SJYHP
SMT
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
TSG
TSK
TSV
TUC
U2A
UG4
UKHRP
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
Z7R
Z7X
Z7Y
Z81
Z83
Z88
ZMTXR
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
AMVHM
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
7SC
7TB
7XB
8AL
8FD
8FK
ABRTQ
FR3
JQ2
KR7
L.-
L7M
L~C
L~D
MBDVC
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQUKI
Q9U
ID FETCH-LOGICAL-c417t-cf9d33e6feb4e934020b5b2990b59ea7f3f2016fbf79b431588f8715d20c62fe3
IEDL.DBID BENPR
ISSN 0943-4062
IngestDate Fri Jul 25 19:04:43 EDT 2025
Tue Jul 01 04:23:18 EDT 2025
Thu Apr 24 23:00:55 EDT 2025
Fri Feb 21 02:44:33 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords Benchmark
Dummy encoding
Supervised machine learning
Generalized linear mixed models
Target encoding
High-cardinality categorical features
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c417t-cf9d33e6feb4e934020b5b2990b59ea7f3f2016fbf79b431588f8715d20c62fe3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-6002-6980
0000-0003-4511-6245
0000-0001-8867-762X
0000-0002-2388-553X
OpenAccessLink https://doi.org/10.1007/s00180-022-01207-6
PQID 2719231219
PQPubID 54096
PageCount 22
ParticipantIDs proquest_journals_2719231219
crossref_citationtrail_10_1007_s00180_022_01207_6
crossref_primary_10_1007_s00180_022_01207_6
springer_journals_10_1007_s00180_022_01207_6
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-11-01
PublicationDateYYYYMMDD 2022-11-01
PublicationDate_xml – month: 11
  year: 2022
  text: 2022-11-01
  day: 01
PublicationDecade 2020
PublicationPlace Berlin/Heidelberg
PublicationPlace_xml – name: Berlin/Heidelberg
– name: Heidelberg
PublicationTitle Computational statistics
PublicationTitleAbbrev Comput Stat
PublicationYear 2022
Publisher Springer Berlin Heidelberg
Springer Nature B.V
Publisher_xml – name: Springer Berlin Heidelberg
– name: Springer Nature B.V
References Chen T, He T, Benesty M, Khotilovich V,Tang Y, Cho H, Chen K, Mitchell R, Cano I, Zhou T, Li M, Xie J, Lin M, Geng Y, Li Y (2018) Xgboost: Extreme gradient boosting. R package version 0.71.2. https://CRAN.Rproject.org/package=xgboost
HancockJTKhoshgoftaarTMSurvey on categorical data for neural networksJ Big Data2020714110.1186/s40537-020-00305-w
FeurerMKleinAEggenspergerKSpringenbergJBlumMHutterFCortesCLawrenceNDLeeDDSugiyamaMGarnettREfficient and robust automated machine learningAdvances in neural information processing systems 282015New YorkCurran Associates Inc29622970
NadeauCBengioYInference for the generalization errorMach Learn20035223928110.1023/A:10240686263661039.68104
GelmanAHillJData analysis using regression and multilevel/hierarchical models2006CambridgeCambridge University Press10.1017/CBO9780511790942
Wright MN, König IR (2019) Splitting on categorical predictors in random forests. PeerJ 7. https://doi.org/10.7717/peerj.6339
De LeeuwJYoungFWTakaneYAdditive structure in qualitative data: an alternating least squares method with optimal scaling featuresPsychometrika19764147150310.1007/BF02296971
Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms, In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’13. ACM, New York, NY, USA, pp 847–855. https://doi.org/10.1145/2487575.2487629
R Core Team (2021) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Hornik K, Meyer D (2007) Deriving consensus rankings from benchmarking experiments, In: Advances in data analysis. Springer, pp 163–170. https://doi.org/10.1007/978-3-540-70981-7_19
Coors S (2018) Automatic gradient boosting (Master’sthesis). LMU Munich. https://epub.ub.uni-muenchen.de/59108/1/MA_Coors.pdf
KuhnMJohnsonKFeature engineering and selection: a practical approach for predictive models2019ChapmanHall/CRC10.1201/9781315108230
Binder M (2018) mlrCPO: Composable preprocessing operators and pipelines for machine learning. R package version 0.3.4-2. https://github.com/mlr-org/mlrCPO
RodríguezpBautistaMAGonzàlezJEscaleraSBeyond one-hot encoding: lower dimensional target embeddingImage Vis Comput201875213110.1016/j.imavis.2018.04.004
BrownGPocockAMing-JieZLujánMConditional likelihood maximisation: a unifying framework for information theoretic feature selectionJ Mach Learn Res201213276629136931283.68283
Thomas J, Coors S, Bischl B (2018) Automatic gradient boosting. arXiv preprint arXiv:1807.03873
Bates D (2020) Computational methods for mixed models. Vignette for lme4. https://cran.r-project.org/web/packages/lme4/vignettes/Theory.pdf
Therneau T, Atkinson B (2018) Rpart: recursive partitioning and regression trees. R package version 4.1-13. https://CRAN.R-project.org/package=rpart
MairPde LeeuwJA general framework for multivariate analysis with optimal scaling: the r package aspectJ Stat Softw20103212310.18637/jss.v032.i09
Weinberger KQ, Dasgupta A, Langford J, Smola AJ, Attenberg J (2009) Feature hashin for large scale multitask learning. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML ’09). Association for Computing Machinery, New York, NY, USA, 1113–1120. https://doi.org/10.1145/1553374.1553516
Chambers J, Hastie T (1992) Statistical models. Chapter 2 of statistical models in S, 1st edn. Routledge. https://doi.org/10.1201/9780203738535
Gra̧bczewskiKJankowskiNKaynakOAlpaydinEOjaEXuLTransformations of symbolic data for continuous data oriented modelsArtificial neural networks and neural information processing – ICANN/ICONIP 20032003Berlin, HeidelbergSpringer35936610.1007/3-540-44989-2_43
HothornTLeischFZeileisAHornikKThe design and analysis of benchmark experimentsJ Comput Graph Stat200514675699217020810.1198/106186005X59630
Schliep K, Hechenbichler K (2016) Kknn: Weighted k-nearest neighbors R package version 1.3.1. https://CRAN.R-project.org/package=kknn
VanschorenJvan RijnNBischlBTorgoLOpenML: networked science in machine learningSIGKDD Explor201315496010.1145/2641190.2641198
BatesDMächlerMBolkerBWalkerSFitting linear mixed-effects models using lme4J Stat Softw20156714810.18637/jss.v067.i01
Nießl C, Herrmann M, Wiedemann C,Casalicchio G, Boulesteix A-L (2021) Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results. WIREs Data Mining and Knowledge Discovery, e1441. https://doi.org/10.1002/widm.1441
BommertASunXBischlBRahnenführerJLangMBenchmark for filter methods for feature selection in high-dimensional classification dataComput Stat Data Anal2020401320910.1016/j.csda.2019.10683907135552
FriedmanJHastieTTibshiraniRRegularization paths for generalized linear models via coordinate descentJ Stat Softw20103312210.18637/jss.v033.i01
Prokopev V (2018) Mean (likelihood) encodings: a comprehensive study. Kaggle Forums
YoungFWDe LeeuwJTakaneYRegression with qualitative and quantitative variables: an alternating least squares method with optimal scaling featuresPsychometrika19764150552910.1007/BF022969720351.92032
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2018) CatBoost: Unbiased boosting with categorical features, in: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (Eds.), Advances in Neural Information Processing Systems 31. Curran Associates, Inc., pp. 6638–6648
Wright MN, Ziegler A (2017) Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1–17. https://doi.org/10.18637/jss.v077.i01
Meyer D, Hornik K (2018) Relations: data structures and algorithms for relations
CerdaPVaroquauxGKéglBSimilarity encoding for learning with dirty categorical variablesMach Learn201810714771494383527510.1007/s10994-018-5724-2
SecaDMendes-MoreiraJRochaÁAdeliHDzemydaGMoreiraFRamalho CorreiaAMBenchmark of encoders of nominal features for regressionTrends and applications in information systems and technologies2021ChamSpringer International Publishing14615510.1007/978-3-030-72657-7_14
TutzGGertheissJRejoinder: Regularized regression for categorical dataStat Model201616249260351539210.1177/1471082X16652780
LangMBischlBSurmannDBatchtools: tools for r to work on batch systemsJ Open Source Softw201710.21105/joss.00135
BischlBLangMKotthoffLSchiffnerJRichterJStuderusECasalicchioGJonesZMmlr: machine learning in rJ Mach Learn Res2016171535674381392.68007
ChiquetJGrandvaletYRigaillGOn coding effects in regularized categorical regressionStat Modell20161622823710.1177/1471082X16644998
HandDJHenleyWEStatistical classification methods in consumer credit scoring: a reviewJ R Stat Soc A Stat Soc199716052354110.1111/j.1467-985X.1997.00078.x
Fernández-DelgadoMCernadasEBarroSAmorimDDo we need hundreds of classifiers to solve real world classification problems?J Mach Learn Res2014153133318132771551319.62005
CerdaPVaroquauxGEncoding high-cardinality string categorical variablesIEEE Trans Knowl Data Eng202010.1109/TKDE.2020.2992529
Steinwart I, Thomann P (2017) liquidSVM: A fast and versatile SVM package. arXiv: 1702:06899
FerriCHernández-OralloJModroiuRAn experimental comparison of performance measures for classificationPattern Recogn Lett200930273810.1016/j.patrec.2008.08.010
Guo C, Berkhahn F (2016) Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737
Dehghani M, Tay Y, Gritsenko AA, Zhao Z, Houlsby N, Diaz F, Metzler D, Vinyals O (2021) The benchmark lottery. arXiv preprint arXiv:2107.07002
Micci-BarrecaDA preprocessing scheme for high-cardinality categorical attributes in classification and prediction problemsSIGKDD Explor Newsl20013273210.1145/507533.507538
Probst P, Wright MN, Boulesteix A-L (2019) Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. https://doi.org/10.1002/widm.1301
BoulesteixA-LBinderHAbrahamowiczMSauerbreiWOn the necessity and design of studies comparing statistical methodsBiomet J Biomet Zeitschrift201760216218374449410.1002/bimj.2017001291383.62019
G Brown (1207_CR7) 2012; 13
P Cerda (1207_CR8) 2020
J Vanschoren (1207_CR46) 2013; 15
C Ferri (1207_CR17) 2009; 30
FW Young (1207_CR50) 1976; 41
1207_CR22
J Chiquet (1207_CR12) 2016; 16
D Micci-Barreca (1207_CR31) 2001; 3
1207_CR25
A-L Boulesteix (1207_CR6) 2017; 60
J De Leeuw (1207_CR14) 1976; 41
1207_CR10
1207_CR11
K Gra̧bczewski (1207_CR21) 2003
P Cerda (1207_CR9) 2018; 107
1207_CR13
1207_CR15
G Tutz (1207_CR45) 2016; 16
D Seca (1207_CR40) 2021
1207_CR42
1207_CR43
1207_CR44
M Kuhn (1207_CR27) 2019
1207_CR41
M Fernández-Delgado (1207_CR16) 2014; 15
1207_CR47
1207_CR48
A Gelman (1207_CR20) 2006
1207_CR49
1207_CR3
T Hothorn (1207_CR26) 2005; 14
DJ Hand (1207_CR24) 1997; 160
C Nadeau (1207_CR32) 2003; 52
1207_CR1
P Mair (1207_CR29) 2010; 32
JT Hancock (1207_CR23) 2020; 7
1207_CR33
p Rodríguez (1207_CR38) 2018; 75
B Bischl (1207_CR4) 2016; 17
M Feurer (1207_CR18) 2015
J Friedman (1207_CR19) 2010; 33
1207_CR34
A Bommert (1207_CR5) 2020
1207_CR30
1207_CR39
D Bates (1207_CR2) 2015; 67
1207_CR35
M Lang (1207_CR28) 2017
1207_CR36
1207_CR37
References_xml – reference: FeurerMKleinAEggenspergerKSpringenbergJBlumMHutterFCortesCLawrenceNDLeeDDSugiyamaMGarnettREfficient and robust automated machine learningAdvances in neural information processing systems 282015New YorkCurran Associates Inc29622970
– reference: ChiquetJGrandvaletYRigaillGOn coding effects in regularized categorical regressionStat Modell20161622823710.1177/1471082X16644998
– reference: Probst P, Wright MN, Boulesteix A-L (2019) Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. https://doi.org/10.1002/widm.1301
– reference: FerriCHernández-OralloJModroiuRAn experimental comparison of performance measures for classificationPattern Recogn Lett200930273810.1016/j.patrec.2008.08.010
– reference: BischlBLangMKotthoffLSchiffnerJRichterJStuderusECasalicchioGJonesZMmlr: machine learning in rJ Mach Learn Res2016171535674381392.68007
– reference: Coors S (2018) Automatic gradient boosting (Master’sthesis). LMU Munich. https://epub.ub.uni-muenchen.de/59108/1/MA_Coors.pdf
– reference: BatesDMächlerMBolkerBWalkerSFitting linear mixed-effects models using lme4J Stat Softw20156714810.18637/jss.v067.i01
– reference: GelmanAHillJData analysis using regression and multilevel/hierarchical models2006CambridgeCambridge University Press10.1017/CBO9780511790942
– reference: Hornik K, Meyer D (2007) Deriving consensus rankings from benchmarking experiments, In: Advances in data analysis. Springer, pp 163–170. https://doi.org/10.1007/978-3-540-70981-7_19
– reference: Dehghani M, Tay Y, Gritsenko AA, Zhao Z, Houlsby N, Diaz F, Metzler D, Vinyals O (2021) The benchmark lottery. arXiv preprint arXiv:2107.07002
– reference: Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms, In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’13. ACM, New York, NY, USA, pp 847–855. https://doi.org/10.1145/2487575.2487629
– reference: YoungFWDe LeeuwJTakaneYRegression with qualitative and quantitative variables: an alternating least squares method with optimal scaling featuresPsychometrika19764150552910.1007/BF022969720351.92032
– reference: Therneau T, Atkinson B (2018) Rpart: recursive partitioning and regression trees. R package version 4.1-13. https://CRAN.R-project.org/package=rpart
– reference: Wright MN, Ziegler A (2017) Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1–17. https://doi.org/10.18637/jss.v077.i01
– reference: RodríguezpBautistaMAGonzàlezJEscaleraSBeyond one-hot encoding: lower dimensional target embeddingImage Vis Comput201875213110.1016/j.imavis.2018.04.004
– reference: Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2018) CatBoost: Unbiased boosting with categorical features, in: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (Eds.), Advances in Neural Information Processing Systems 31. Curran Associates, Inc., pp. 6638–6648
– reference: Fernández-DelgadoMCernadasEBarroSAmorimDDo we need hundreds of classifiers to solve real world classification problems?J Mach Learn Res2014153133318132771551319.62005
– reference: Steinwart I, Thomann P (2017) liquidSVM: A fast and versatile SVM package. arXiv: 1702:06899
– reference: MairPde LeeuwJA general framework for multivariate analysis with optimal scaling: the r package aspectJ Stat Softw20103212310.18637/jss.v032.i09
– reference: BoulesteixA-LBinderHAbrahamowiczMSauerbreiWOn the necessity and design of studies comparing statistical methodsBiomet J Biomet Zeitschrift201760216218374449410.1002/bimj.2017001291383.62019
– reference: Bates D (2020) Computational methods for mixed models. Vignette for lme4. https://cran.r-project.org/web/packages/lme4/vignettes/Theory.pdf
– reference: Guo C, Berkhahn F (2016) Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737
– reference: Binder M (2018) mlrCPO: Composable preprocessing operators and pipelines for machine learning. R package version 0.3.4-2. https://github.com/mlr-org/mlrCPO
– reference: Meyer D, Hornik K (2018) Relations: data structures and algorithms for relations
– reference: Thomas J, Coors S, Bischl B (2018) Automatic gradient boosting. arXiv preprint arXiv:1807.03873
– reference: Schliep K, Hechenbichler K (2016) Kknn: Weighted k-nearest neighbors R package version 1.3.1. https://CRAN.R-project.org/package=kknn
– reference: Micci-BarrecaDA preprocessing scheme for high-cardinality categorical attributes in classification and prediction problemsSIGKDD Explor Newsl20013273210.1145/507533.507538
– reference: NadeauCBengioYInference for the generalization errorMach Learn20035223928110.1023/A:10240686263661039.68104
– reference: CerdaPVaroquauxGEncoding high-cardinality string categorical variablesIEEE Trans Knowl Data Eng202010.1109/TKDE.2020.2992529
– reference: HandDJHenleyWEStatistical classification methods in consumer credit scoring: a reviewJ R Stat Soc A Stat Soc199716052354110.1111/j.1467-985X.1997.00078.x
– reference: Weinberger KQ, Dasgupta A, Langford J, Smola AJ, Attenberg J (2009) Feature hashin for large scale multitask learning. In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML ’09). Association for Computing Machinery, New York, NY, USA, 1113–1120. https://doi.org/10.1145/1553374.1553516
– reference: HancockJTKhoshgoftaarTMSurvey on categorical data for neural networksJ Big Data2020714110.1186/s40537-020-00305-w
– reference: Wright MN, König IR (2019) Splitting on categorical predictors in random forests. PeerJ 7. https://doi.org/10.7717/peerj.6339
– reference: Chambers J, Hastie T (1992) Statistical models. Chapter 2 of statistical models in S, 1st edn. Routledge. https://doi.org/10.1201/9780203738535
– reference: De LeeuwJYoungFWTakaneYAdditive structure in qualitative data: an alternating least squares method with optimal scaling featuresPsychometrika19764147150310.1007/BF02296971
– reference: R Core Team (2021) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
– reference: Nießl C, Herrmann M, Wiedemann C,Casalicchio G, Boulesteix A-L (2021) Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results. WIREs Data Mining and Knowledge Discovery, e1441. https://doi.org/10.1002/widm.1441
– reference: BrownGPocockAMing-JieZLujánMConditional likelihood maximisation: a unifying framework for information theoretic feature selectionJ Mach Learn Res201213276629136931283.68283
– reference: VanschorenJvan RijnNBischlBTorgoLOpenML: networked science in machine learningSIGKDD Explor201315496010.1145/2641190.2641198
– reference: TutzGGertheissJRejoinder: Regularized regression for categorical dataStat Model201616249260351539210.1177/1471082X16652780
– reference: Gra̧bczewskiKJankowskiNKaynakOAlpaydinEOjaEXuLTransformations of symbolic data for continuous data oriented modelsArtificial neural networks and neural information processing – ICANN/ICONIP 20032003Berlin, HeidelbergSpringer35936610.1007/3-540-44989-2_43
– reference: SecaDMendes-MoreiraJRochaÁAdeliHDzemydaGMoreiraFRamalho CorreiaAMBenchmark of encoders of nominal features for regressionTrends and applications in information systems and technologies2021ChamSpringer International Publishing14615510.1007/978-3-030-72657-7_14
– reference: Prokopev V (2018) Mean (likelihood) encodings: a comprehensive study. Kaggle Forums
– reference: KuhnMJohnsonKFeature engineering and selection: a practical approach for predictive models2019ChapmanHall/CRC10.1201/9781315108230
– reference: FriedmanJHastieTTibshiraniRRegularization paths for generalized linear models via coordinate descentJ Stat Softw20103312210.18637/jss.v033.i01
– reference: CerdaPVaroquauxGKéglBSimilarity encoding for learning with dirty categorical variablesMach Learn201810714771494383527510.1007/s10994-018-5724-2
– reference: HothornTLeischFZeileisAHornikKThe design and analysis of benchmark experimentsJ Comput Graph Stat200514675699217020810.1198/106186005X59630
– reference: Chen T, He T, Benesty M, Khotilovich V,Tang Y, Cho H, Chen K, Mitchell R, Cano I, Zhou T, Li M, Xie J, Lin M, Geng Y, Li Y (2018) Xgboost: Extreme gradient boosting. R package version 0.71.2. https://CRAN.Rproject.org/package=xgboost
– reference: LangMBischlBSurmannDBatchtools: tools for r to work on batch systemsJ Open Source Softw201710.21105/joss.00135
– reference: BommertASunXBischlBRahnenführerJLangMBenchmark for filter methods for feature selection in high-dimensional classification dataComput Stat Data Anal2020401320910.1016/j.csda.2019.10683907135552
– start-page: 2962
  volume-title: Advances in neural information processing systems 28
  year: 2015
  ident: 1207_CR18
– year: 2020
  ident: 1207_CR8
  publication-title: IEEE Trans Knowl Data Eng
  doi: 10.1109/TKDE.2020.2992529
– volume: 16
  start-page: 228
  year: 2016
  ident: 1207_CR12
  publication-title: Stat Modell
  doi: 10.1177/1471082X16644998
– volume: 33
  start-page: 1
  year: 2010
  ident: 1207_CR19
  publication-title: J Stat Softw
  doi: 10.18637/jss.v033.i01
– volume: 52
  start-page: 239
  year: 2003
  ident: 1207_CR32
  publication-title: Mach Learn
  doi: 10.1023/A:1024068626366
– ident: 1207_CR42
– ident: 1207_CR22
– volume: 160
  start-page: 523
  year: 1997
  ident: 1207_CR24
  publication-title: J R Stat Soc A Stat Soc
  doi: 10.1111/j.1467-985X.1997.00078.x
– volume: 16
  start-page: 249
  year: 2016
  ident: 1207_CR45
  publication-title: Stat Model
  doi: 10.1177/1471082X16652780
– ident: 1207_CR25
  doi: 10.1007/978-3-540-70981-7_19
– start-page: 359
  volume-title: Artificial neural networks and neural information processing – ICANN/ICONIP 2003
  year: 2003
  ident: 1207_CR21
  doi: 10.1007/3-540-44989-2_43
– ident: 1207_CR36
– volume: 67
  start-page: 1
  year: 2015
  ident: 1207_CR2
  publication-title: J Stat Softw
  doi: 10.18637/jss.v067.i01
– ident: 1207_CR13
– volume: 30
  start-page: 27
  year: 2009
  ident: 1207_CR17
  publication-title: Pattern Recogn Lett
  doi: 10.1016/j.patrec.2008.08.010
– ident: 1207_CR35
– ident: 1207_CR41
– year: 2017
  ident: 1207_CR28
  publication-title: J Open Source Softw
  doi: 10.21105/joss.00135
– volume: 15
  start-page: 3133
  year: 2014
  ident: 1207_CR16
  publication-title: J Mach Learn Res
– volume: 32
  start-page: 1
  year: 2010
  ident: 1207_CR29
  publication-title: J Stat Softw
  doi: 10.18637/jss.v032.i09
– volume: 14
  start-page: 675
  year: 2005
  ident: 1207_CR26
  publication-title: J Comput Graph Stat
  doi: 10.1198/106186005X59630
– ident: 1207_CR39
– start-page: 146
  volume-title: Trends and applications in information systems and technologies
  year: 2021
  ident: 1207_CR40
  doi: 10.1007/978-3-030-72657-7_14
– volume: 107
  start-page: 1477
  year: 2018
  ident: 1207_CR9
  publication-title: Mach Learn
  doi: 10.1007/s10994-018-5724-2
– ident: 1207_CR47
  doi: 10.1145/1553374.1553516
– volume: 3
  start-page: 27
  year: 2001
  ident: 1207_CR31
  publication-title: SIGKDD Explor Newsl
  doi: 10.1145/507533.507538
– ident: 1207_CR49
  doi: 10.18637/jss.v077.i01
– volume: 41
  start-page: 471
  year: 1976
  ident: 1207_CR14
  publication-title: Psychometrika
  doi: 10.1007/BF02296971
– ident: 1207_CR48
  doi: 10.7717/peerj.6339
– volume: 13
  start-page: 27
  year: 2012
  ident: 1207_CR7
  publication-title: J Mach Learn Res
– ident: 1207_CR15
– volume: 75
  start-page: 21
  year: 2018
  ident: 1207_CR38
  publication-title: Image Vis Comput
  doi: 10.1016/j.imavis.2018.04.004
– ident: 1207_CR11
– ident: 1207_CR30
– volume: 60
  start-page: 216
  year: 2017
  ident: 1207_CR6
  publication-title: Biomet J Biomet Zeitschrift
  doi: 10.1002/bimj.201700129
– volume-title: Feature engineering and selection: a practical approach for predictive models
  year: 2019
  ident: 1207_CR27
  doi: 10.1201/9781315108230
– volume: 7
  start-page: 1
  year: 2020
  ident: 1207_CR23
  publication-title: J Big Data
  doi: 10.1186/s40537-020-00305-w
– ident: 1207_CR33
  doi: 10.1002/widm.1441
– ident: 1207_CR43
– ident: 1207_CR44
  doi: 10.1145/2487575.2487629
– volume: 15
  start-page: 49
  year: 2013
  ident: 1207_CR46
  publication-title: SIGKDD Explor
  doi: 10.1145/2641190.2641198
– volume: 17
  start-page: 1
  year: 2016
  ident: 1207_CR4
  publication-title: J Mach Learn Res
– ident: 1207_CR34
  doi: 10.1002/widm.1301
– volume-title: Data analysis using regression and multilevel/hierarchical models
  year: 2006
  ident: 1207_CR20
  doi: 10.1017/CBO9780511790942
– year: 2020
  ident: 1207_CR5
  publication-title: Comput Stat Data Anal
  doi: 10.1016/j.csda.2019.106839
– volume: 41
  start-page: 505
  year: 1976
  ident: 1207_CR50
  publication-title: Psychometrika
  doi: 10.1007/BF02296972
– ident: 1207_CR37
– ident: 1207_CR3
  doi: 10.32614/CRAN.package.mlrCPO
– ident: 1207_CR10
  doi: 10.1201/9780203738535
– ident: 1207_CR1
SSID ssj0022721
Score 2.489572
Snippet Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis....
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2671
SubjectTerms Algorithms
Best practice
Data analysis
Economic Theory/Quantitative Economics/Mathematical Methods
Machine learning
Mathematics and Statistics
Original Paper
Performance prediction
Probability and Statistics in Computer Science
Probability Theory and Stochastic Processes
Statistics
Support vector machines
SummonAdditionalLinks – databaseName: Springer Nature OA Free Journals
  dbid: C6C
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1BS8MwFA46L_MgOhWnU3LwpoE1TZr2KMMxhHkQB7uVpnmRgc6xdgf99ealXYdDBc9NQun3mve95L3vEXIdxCCy3BqGSoRMADcsyzLLjHNmqq9sKDjWDo8fo9FEPEzltJbJwVqYrft7FPsM4j7DnHMs81Qs2iV7MggVtmkYRIMmuOLK11hhopyLiSJeF8j8vMZ3J7RhlluXod7HDA_JQU0O6V2F5hHZgXmH7I8bZdWiQ9rIDitx5WPy8eQ7yS9nn2BoldNNUZgS_RF9X5WLqiqgoOUyM7Pq1I9WPaMLOpvTYrXAraJws998UiXQuovEC8UDWopixjRHI6r4OrXghUCLEzIZ3j8PRqzupcByEaiS5TYxYQiRBS0gCTFq1FKjL9IygcyBYh0ViKy2KtGOVMg4ti6Wkob384hbCE9Ja_4-hzNCUXNOGZ1JrgIBASRWONbkNk0Q3IAUXRKsP26a10Lj2O_iNW0kkj0gqQMk9YCkUZfcNHMWlczGn6N7a8zS-pcrUvc2SFbdDtwlt2scN49_X-38f8MvSJujKfl6xB5plcsVXDpiUuorb5FfLHzZqQ
  priority: 102
  providerName: Springer Nature
Title Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features
URI https://link.springer.com/article/10.1007/s00180-022-01207-6
https://www.proquest.com/docview/2719231219
Volume 37
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Nb9swDCXa5NIdhnZrsaxtoMNunbBYlr9ORWokDVokGIIF6E6GbVFFgDZJY-fQ_fqJspJgA9qLDralg0mRjxL5CPDNi1HmpVacmAi5RKF4nueaK-PMol6kfSmodng8CUczefcQPLgDt8qlVW5tojXUalnSGfkPEVksYjbY9eqFU9coul11LTQOoW1McBy3oH0zmPyc7kIuEdnKK0qfM5FSKFzZjC2eo350PU7Z7FRAGvHwX9e0x5v_XZFazzM8ho8OMrJ-I-MTOMDFJ_gw3vGtVp_hdWpbyq_nf1CxJrmbEUMlOSa23NSrpjygYvU6V_Pm-I81zaMrNl-warMim1GZ2c82uxKZayfxyOiklhGrMStJmxrgzjRaRtDqFGbDwa90xF1TBV5KL6p5qRPl-xhqLCQmPoWPRVCQUyqCBHMjHW0wQagLHSWFQRdBHGsTVAVK9MpQaPTPoLVYLvALMCKfi1SRB0Y2Ej1MtDTwyVhPlEJhIDvgbf9nVjrGcWp88ZTtuJKtDDIjg8zKIAs7cLWbs2r4Nt79-mIrpsztvSrba0oHvm9Ft3_99mpf31_tHI4EaYstRLyAVr3e4KVBJHXRhcM49Wgc3nah3b_9fT_oOlU0T9MwNeNM9P8CcPHjIw
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9tAEB5ROLQcEI-ipuWxBzjRFcl6bccHhBAQwiMcKpC4ubZ3tooESYgdIfqj-huZWduJigQ3zrZXlufbeazn-wZgp9VGnWTWSFYilBqVkUmSWGkomIXN0HpaMXe4dx10b_XFnX83B_9qLgy3VdY-0TlqM8z4jHxfhS4XoQ12OHqUPDWK_67WIzRKWFzi8xOVbPnB-QnZd1epzunNcVdWUwVkplthITMbGc_DwGKqMfK4fkr9lL1y6keY0OtZCoqBTW0YpRRe_XbbUlXhG9XMAmXRo3U_wYL2KJIzM71zNi3wVOh4XtysR3VZoCqSjqPq8fS7puTeeaarhjL4PxDOsttXP2RdnOssw1KVoIqjElErMIeDVVjsTdVd8zV4_uUG2I_7f9GIspVcsB4mh0ExnBSjkoyQi2KcmH552CjKUdW56A9EPhmxh8rp6QfXy4miGl7xR_C5sGANZZExdssyQVh0-qP5V7j9kI-9DvOD4QC_gWCpu9CkiU9I0NjCyGpK1shXo1YGfd2AVv0946zSN-cxG_fxVJnZ2SAmG8TOBnHQgL3pM6NS3ePduzdqM8XVTs_jGS4b8LM23ezy26t9f3-1bfjcveldxVfn15c_4Iti5DgK5AbMF-MJblIuVKRbDoACfn804l8A-9YZiw
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9tAEB5RkKpyqKAPkZbHHuipXZGs1177gBACIh4FVVWRuLm2d7aKVJI0doTgp_HrmFnbiYoEN862V5bn23ms5_sGYLsXo84KZyUrEUqNysosy5y0FMxM17hAK-YOn19Ex5f69Cq8WoD7lgvDbZWtT_SO2o4KPiPfUcbnIrTBdlzTFvHjsL83_id5ghT_aW3HadQQOcPbGyrfyt2TQ7L1F6X6R78OjmUzYUAWumcqWbjEBgFGDnONScC1VB7m7KHzMMGMXtVRgIxc7kySU6gN49hRhRFa1S0i5TCgdV_BkglMzHssPpi1lyhlPOeLG_eoRotUQ9jxtD2ehNeV3EfP1FUjo_-D4jzTffRz1se8_gq8bZJVsV-jaxUWcPgOls9nSq_le7j96YfZTwZ3aEXdVi5YG5NDohhNq3FNTChFNcnsoD54FPXY6lIMhqKcjtlblfT0te_rRNEMsvgj-IxYsJ6yKBjHdckgHHot0vIDXL7Ix_4Ii8PRENdAsOydsXkWEio09jBxmhI38tuolcVQd6DXfs-0aLTOeeTG33Sm0uxtkJINUm-DNOrA19kz41rp49m711szpc2uL9M5RjvwrTXd_PLTq316frUteE1YT7-fXJx9hjeKgePZkOuwWE2muEFpUZVvevwJ-P3SgH8Ar-kdjA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Regularized+target+encoding+outperforms+traditional+methods+in+supervised+machine+learning+with+high+cardinality+features&rft.jtitle=Computational+statistics&rft.au=Pargent%2C+Florian&rft.au=Pfisterer%2C+Florian&rft.au=Thomas%2C+Janek&rft.au=Bischl%2C+Bernd&rft.date=2022-11-01&rft.pub=Springer+Nature+B.V&rft.issn=0943-4062&rft.eissn=1613-9658&rft.volume=37&rft.issue=5&rft.spage=2671&rft.epage=2692&rft_id=info:doi/10.1007%2Fs00180-022-01207-6&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0943-4062&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0943-4062&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0943-4062&client=summon