The choice of scaling technique matters for classification performance
Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in a way that they all vary within the same range. This transformation is known to improve the performance of classification models, but there a...
Saved in:
Published in | Applied soft computing Vol. 133; p. 109924 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.01.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in a way that they all vary within the same range. This transformation is known to improve the performance of classification models, but there are several scaling techniques to choose from, and this choice is not generally done carefully. In this paper, we execute a broad experiment comparing the impact of 5 scaling techniques on the performances of 20 classification algorithms among monolithic and ensemble models, applying them to 82 publicly available datasets with varying imbalance ratios. Results show that the choice of scaling technique matters for classification performance, and the performance difference between the best and the worst scaling technique is relevant and statistically significant in most cases. They also indicate that choosing an inadequate technique can be more detrimental to classification performance than not scaling the data at all. We also show how the performance variation of an ensemble model, considering different scaling techniques, tends to be dictated by that of its base model. Finally, we discuss the relationship between a model’s sensitivity to the choice of scaling technique and its performance and provide insights into its applicability on different model deployment scenarios. Full results and source code for the experiments in this paper are available in a GitHub repository.11https://github.com/amorimlb/scaling_matters.
•Compares classification performances after applying five scaling techniques.•Performance difference between best and worst scaling technique is largely relevant.•This difference increases when highly imbalanced datasets are considered.•The performance variation of an ensemble tends to be dictated by that of its base model.•Provides an analysis of sensitivity to the choice of scaling tech. vs model performance. |
---|---|
AbstractList | Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in a way that they all vary within the same range. This transformation is known to improve the performance of classification models, but there are several scaling techniques to choose from, and this choice is not generally done carefully. In this paper, we execute a broad experiment comparing the impact of 5 scaling techniques on the performances of 20 classification algorithms among monolithic and ensemble models, applying them to 82 publicly available datasets with varying imbalance ratios. Results show that the choice of scaling technique matters for classification performance, and the performance difference between the best and the worst scaling technique is relevant and statistically significant in most cases. They also indicate that choosing an inadequate technique can be more detrimental to classification performance than not scaling the data at all. We also show how the performance variation of an ensemble model, considering different scaling techniques, tends to be dictated by that of its base model. Finally, we discuss the relationship between a model’s sensitivity to the choice of scaling technique and its performance and provide insights into its applicability on different model deployment scenarios. Full results and source code for the experiments in this paper are available in a GitHub repository.11https://github.com/amorimlb/scaling_matters.
•Compares classification performances after applying five scaling techniques.•Performance difference between best and worst scaling technique is largely relevant.•This difference increases when highly imbalanced datasets are considered.•The performance variation of an ensemble tends to be dictated by that of its base model.•Provides an analysis of sensitivity to the choice of scaling tech. vs model performance. |
ArticleNumber | 109924 |
Author | Cavalcanti, George D.C. de Amorim, Lucas B.V. Cruz, Rafael M.O. |
Author_xml | – sequence: 1 givenname: Lucas B.V. orcidid: 0000-0003-2725-6527 surname: de Amorim fullname: de Amorim, Lucas B.V. email: lucas@ic.ufal.br organization: Centro de Informática - Universidade Federal de Pernambuco, Brazil – sequence: 2 givenname: George D.C. orcidid: 0000-0001-7714-2283 surname: Cavalcanti fullname: Cavalcanti, George D.C. organization: Centro de Informática - Universidade Federal de Pernambuco, Brazil – sequence: 3 givenname: Rafael M.O. orcidid: 0000-0001-9446-1040 surname: Cruz fullname: Cruz, Rafael M.O. organization: École de Technologie Supérieure, Université du Québec, Canada |
BookMark | eNp9kMFKAzEQhoNUsK2-gKe8wK5JdjebgBcpVoWCl3oO6ezEpmw3NYmCb-_WevLQ0ww_fMP834xMhjAgIbeclZxxebcrbQpQCibEGGgt6gsy5aoVhZaKT8a9kaqodS2vyCylHRshLdSULNdbpLANHpAGRxPY3g_vNCNsB__xiXRvc8aYqAuRQm9T8s6DzT4M9IBxTPd2ALwml872CW_-5py8LR_Xi-di9fr0snhYFVAxlgunqpo3rW1a5jaVFMgtgw407ypWOS2FalvRCKzbjbWNhE6BZkzJatM5ydFVc6JOdyGGlCI6Az7_fpOj9b3hzBx9mJ05-jBHH-bkY0TFP_QQ_d7G7_PQ_QnCsdSXx2gSeBwLdz4iZNMFfw7_Aal_fC8 |
CitedBy_id | crossref_primary_10_1109_ACCESS_2024_3406133 crossref_primary_10_1016_j_ecolind_2024_112577 crossref_primary_10_1109_ACCESS_2024_3412975 crossref_primary_10_15829_1728_8800_2025_4130 crossref_primary_10_1016_j_jretconser_2024_103778 crossref_primary_10_1186_s13244_023_01575_7 crossref_primary_10_1016_j_inffus_2023_102036 crossref_primary_10_1016_j_jenvman_2024_123478 crossref_primary_10_1016_j_rineng_2024_103434 crossref_primary_10_1109_ACCESS_2024_3488743 crossref_primary_10_1007_s00704_024_04923_9 crossref_primary_10_3390_info15060295 crossref_primary_10_15622_ia_24_1_8 crossref_primary_10_1007_s10462_024_10872_6 crossref_primary_10_1016_j_matdes_2024_113070 crossref_primary_10_1002_jso_27854 crossref_primary_10_1016_j_mex_2024_103031 crossref_primary_10_3390_particles8010025 crossref_primary_10_1038_s41598_024_64310_2 crossref_primary_10_3390_en17246453 crossref_primary_10_1007_s10586_024_04422_6 crossref_primary_10_1016_j_compag_2025_109905 crossref_primary_10_1016_j_knosys_2024_111833 crossref_primary_10_1016_j_tele_2024_102134 crossref_primary_10_1007_s00521_023_09155_y crossref_primary_10_1016_j_acags_2025_100230 crossref_primary_10_3390_a17060229 crossref_primary_10_3390_bdcc8090116 crossref_primary_10_1016_j_ecoinf_2024_102868 crossref_primary_10_3233_IDT_240465 crossref_primary_10_1109_ACCESS_2024_3423807 crossref_primary_10_1016_j_ribaf_2024_102639 crossref_primary_10_1016_j_prostr_2024_09_405 crossref_primary_10_1155_acis_2766701 crossref_primary_10_18493_kmusekad_1459230 crossref_primary_10_3847_1538_4357_ad9020 crossref_primary_10_1055_a_2500_7594 crossref_primary_10_1007_s12665_024_11942_2 crossref_primary_10_1016_j_enbuild_2025_115630 crossref_primary_10_1016_j_ijhydene_2024_04_331 crossref_primary_10_1016_j_geoen_2023_212587 crossref_primary_10_1007_s42001_024_00344_w crossref_primary_10_1016_j_comcom_2024_01_006 crossref_primary_10_3390_app14219821 crossref_primary_10_1038_s41597_024_02975_0 crossref_primary_10_3390_electronics13193885 crossref_primary_10_1136_bmjopen_2024_092594 crossref_primary_10_1109_ACCESS_2025_3543813 crossref_primary_10_1029_2024WR039054 crossref_primary_10_1109_ACCESS_2025_3530261 crossref_primary_10_1186_s40537_025_01120_x crossref_primary_10_1016_j_aej_2023_12_050 crossref_primary_10_1016_j_drudis_2024_104025 crossref_primary_10_1109_ACCESS_2024_3359989 crossref_primary_10_1109_JSEN_2024_3463209 crossref_primary_10_1177_14613484241287620 crossref_primary_10_1051_bioconf_202414802034 crossref_primary_10_1098_rsos_240699 crossref_primary_10_3390_math12182949 crossref_primary_10_1016_j_comnet_2024_110493 crossref_primary_10_1109_ACCESS_2025_3529526 crossref_primary_10_1109_TNNLS_2024_3366615 crossref_primary_10_1088_2631_8695_ad780f crossref_primary_10_1016_j_compag_2024_109101 crossref_primary_10_1007_s44163_025_00224_w crossref_primary_10_3390_app14041559 crossref_primary_10_3390_fire7090329 crossref_primary_10_59681_2175_4411_v16_iEspecial_2024_1333 crossref_primary_10_1049_cps2_12097 crossref_primary_10_1038_s41598_024_64594_4 crossref_primary_10_54752_ct_1569636 crossref_primary_10_3390_s24041137 |
Cites_doi | 10.1038/s41592-019-0686-2 10.1007/s00521-011-0737-9 10.1016/j.eswa.2018.04.008 10.1016/j.patcog.2014.05.003 10.1142/S0129065704001899 10.1023/A:1010933404324 10.1006/jcss.1997.1504 10.1016/j.inffus.2017.09.010 10.1016/S0003-2670(03)00094-1 10.1214/aos/1013203451 10.1109/IJCNN.2015.7280594 10.1007/BF00058655 10.1007/BF00994018 10.1007/978-3-030-82014-5_41 10.1016/j.patcog.2007.10.015 10.1016/j.patcog.2014.12.003 10.1016/j.patcog.2018.08.004 10.35784/iapgos.62 10.1016/j.asoc.2019.105524 10.1016/S0031-3203(00)00150-3 10.1109/34.982906 10.1109/TIT.1967.1053964 10.1109/34.588027 |
ContentType | Journal Article |
Copyright | 2022 Elsevier B.V. |
Copyright_xml | – notice: 2022 Elsevier B.V. |
DBID | AAYXX CITATION |
DOI | 10.1016/j.asoc.2022.109924 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1872-9681 |
ExternalDocumentID | 10_1016_j_asoc_2022_109924 S1568494622009735 |
GroupedDBID | --K --M .DC .~1 0R~ 1B1 1~. 1~5 23M 4.4 457 4G. 53G 5GY 5VS 6J9 7-5 71M 8P~ AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABFRF ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HVGLF HZ~ IHE J1W JJJVA KOM M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SDF SDG SES SEW SPC SPCBC SST SSV SSZ T5K UHS UNMZH ~G- AATTM AAXKI AAYWO AAYXX ABWVN ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AFXIZ AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP BNPGV CITATION SSH |
ID | FETCH-LOGICAL-c300t-f834157a570fb362e1a0cdc91d303f962877252e47baa56cd8c900863bdf61ef3 |
IEDL.DBID | .~1 |
ISSN | 1568-4946 |
IngestDate | Tue Jul 01 01:50:18 EDT 2025 Thu Apr 24 23:12:03 EDT 2025 Fri Feb 23 02:37:45 EST 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | Multiple Classifier System Preprocessing Classification Standardization Scaling Ensemble of classifiers Normalization |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c300t-f834157a570fb362e1a0cdc91d303f962877252e47baa56cd8c900863bdf61ef3 |
ORCID | 0000-0001-9446-1040 0000-0001-7714-2283 0000-0003-2725-6527 |
ParticipantIDs | crossref_citationtrail_10_1016_j_asoc_2022_109924 crossref_primary_10_1016_j_asoc_2022_109924 elsevier_sciencedirect_doi_10_1016_j_asoc_2022_109924 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | January 2023 2023-01-00 |
PublicationDateYYYYMMDD | 2023-01-01 |
PublicationDate_xml | – month: 01 year: 2023 text: January 2023 |
PublicationDecade | 2020 |
PublicationTitle | Applied soft computing |
PublicationYear | 2023 |
Publisher | Elsevier B.V |
Publisher_xml | – name: Elsevier B.V |
References | Friedman (b30) 2001; 29 Giacinto, Roli (b32) 2001; 34 Cruz, Sabourin, Cavalcanti, Ing Ren (b42) 2015; 48 (b11) 2021 Cover, Hart (b12) 1967; 13 Souza, Cavalcanti, Cruz, Sabourin (b41) 2019; 85 A. Sato, K. Yamada, Generalized Learning Vector Quantization, in: Proceedings of the 8th International Conference on Neural Information Processing Systems, 1996, pp. 423–429. Dua, Graff (b37) 2017 Cavalin, Sabourin, Suen (b40) 2013; 22 Hu, Gripon, Pateux (b10) 2021 Seeger (b18) 2004; 14 Kuncheva (b22) 2014 Tulyakov, Jaeger, Govindaraju, Doermann (b24) 2008 Kuncheva (b35) 2002; 24 Aggarwal (b21) 2018 Akosa (b36) 2017; 942 Raju, Lakshmi, Jain, Kalidindi, Padma (b6) 2020 Tung (b19) 2009 Britto, Sabourin, Oliveira (b33) 2014; 47 Cruz, Hafemann, Sabourin, Cavalcanti (b26) 2020; 21 Singh, Singh (b1) 2020; 97 Alcalá-Fdez, Fernández, Luengo, Derrac, García, Sánchez, Herrera (b2) 2011; 17 Eriksson (b8) 1999 Mishkov, Zorin, Kovtoniuk, Dereko, Morgun (b3) 2022; 77 Woods, Kegelmeyer, Bowyer (b31) 1997; 19 (July). Dzierżak (b5) 2019; 9 Breiman (b28) 2001; 45 Virtanen, Gommers, Oliphant, Haberland, Reddy, Cournapeau, Burovski, Peterson, Weckesser, Bright, van der Walt, Brett, Wilson, Millman, Mayorov, Nelson, Jones, Kern, Larson, Carey, Polat, Feng, Moore, VanderPlas, Laxalde, Perktold, Cimrman, Henriksen, Quintero, Harris, Archibald, Ribeiro, Pedregosa, van Mulbregt, SciPy 1.0 Contributors (b38) 2020; 17 H. Zhang, The optimality of Naive Bayes, in: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2004, Vol. 2, 2004, pp. 562–567. Cavalin, Sabourin, Suen (b39) 2010 Breiman (b27) 1996; 24 Chen, Guestrin (b25) 2016 I. Rish, An Empirical Study of the Naïve Bayes Classifier An empirical study of the naive Bayes classifier, in: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Vol. 3, 2001, pp. 41–46, (22). Keun, Ebbels, Antti, Bollard, Beckonert, Holmes, Lindon, Nicholson (b9) 2003; 490 Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, Duchesnay (b15) 2011; 12 Zhou (b23) 2012 R.M. Cruz, R. Sabourin, G.D. Cavalcanti, META-DES.H: A Dynamic Ensemble Selection technique using meta-learning and a dynamic weighting approach, in: Proceedings of the International Joint Conference on Neural Networks, Vol. 2015-September, ISBN: 9781479919604, 2015 Cortes, Vapnik (b14) 1995; 20 Breiman, Friedman, Olshen, Stone (b20) 2017 Freund, Schapire (b29) 1997; 55 Jain, Shukla, Wadhvani (b4) 2018; 106 Cruz, Sabourin, Cavalcanti (b7) 2018; 41 Ko, Sabourin, Britto, Jr. (b34) 2008; 41 Zhou (10.1016/j.asoc.2022.109924_b23) 2012 Hu (10.1016/j.asoc.2022.109924_b10) 2021 (10.1016/j.asoc.2022.109924_b11) 2021 Kuncheva (10.1016/j.asoc.2022.109924_b35) 2002; 24 Aggarwal (10.1016/j.asoc.2022.109924_b21) 2018 Freund (10.1016/j.asoc.2022.109924_b29) 1997; 55 Alcalá-Fdez (10.1016/j.asoc.2022.109924_b2) 2011; 17 Giacinto (10.1016/j.asoc.2022.109924_b32) 2001; 34 Breiman (10.1016/j.asoc.2022.109924_b27) 1996; 24 Kuncheva (10.1016/j.asoc.2022.109924_b22) 2014 10.1016/j.asoc.2022.109924_b43 Pedregosa (10.1016/j.asoc.2022.109924_b15) 2011; 12 Cavalin (10.1016/j.asoc.2022.109924_b39) 2010 Cruz (10.1016/j.asoc.2022.109924_b26) 2020; 21 Mishkov (10.1016/j.asoc.2022.109924_b3) 2022; 77 Dzierżak (10.1016/j.asoc.2022.109924_b5) 2019; 9 Seeger (10.1016/j.asoc.2022.109924_b18) 2004; 14 Cortes (10.1016/j.asoc.2022.109924_b14) 1995; 20 Tulyakov (10.1016/j.asoc.2022.109924_b24) 2008 Souza (10.1016/j.asoc.2022.109924_b41) 2019; 85 Cruz (10.1016/j.asoc.2022.109924_b42) 2015; 48 Cover (10.1016/j.asoc.2022.109924_b12) 1967; 13 Eriksson (10.1016/j.asoc.2022.109924_b8) 1999 Jain (10.1016/j.asoc.2022.109924_b4) 2018; 106 Cruz (10.1016/j.asoc.2022.109924_b7) 2018; 41 Breiman (10.1016/j.asoc.2022.109924_b28) 2001; 45 Breiman (10.1016/j.asoc.2022.109924_b20) 2017 Woods (10.1016/j.asoc.2022.109924_b31) 1997; 19 Britto (10.1016/j.asoc.2022.109924_b33) 2014; 47 Keun (10.1016/j.asoc.2022.109924_b9) 2003; 490 10.1016/j.asoc.2022.109924_b17 10.1016/j.asoc.2022.109924_b16 Tung (10.1016/j.asoc.2022.109924_b19) 2009 Friedman (10.1016/j.asoc.2022.109924_b30) 2001; 29 10.1016/j.asoc.2022.109924_b13 Raju (10.1016/j.asoc.2022.109924_b6) 2020 Ko (10.1016/j.asoc.2022.109924_b34) 2008; 41 Singh (10.1016/j.asoc.2022.109924_b1) 2020; 97 Cavalin (10.1016/j.asoc.2022.109924_b40) 2013; 22 Virtanen (10.1016/j.asoc.2022.109924_b38) 2020; 17 Akosa (10.1016/j.asoc.2022.109924_b36) 2017; 942 Dua (10.1016/j.asoc.2022.109924_b37) 2017 Chen (10.1016/j.asoc.2022.109924_b25) 2016 |
References_xml | – start-page: 361 year: 2008 end-page: 386 ident: b24 article-title: Review of classifier combination methods publication-title: Machine Learning in Document Analysis and Recognition – volume: 17 start-page: 261 year: 2020 end-page: 272 ident: b38 article-title: SciPy 1.0: Fundamental algorithms for scientific computing in python publication-title: Nature Methods – start-page: 785 year: 2016 end-page: 794 ident: b25 article-title: XGBoost: A scalable tree boosting system publication-title: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – volume: 14 start-page: 69 year: 2004 end-page: 106 ident: b18 article-title: Gaussian processes for machine learning publication-title: Int. J. Neural Syst. – start-page: 145 year: 2010 end-page: 154 ident: b39 article-title: Dynamic selection of ensembles of classifiers using contextual information publication-title: Multiple Classifier Systems – volume: 9 start-page: 66 year: 2019 end-page: 69 ident: b5 article-title: Comparison of the influence of standardization and normalization of data on the effectiveness of spongy tissue texture classification publication-title: Inform. Autom. Pomiary Gospod. Ochr. Środowiska – volume: 19 start-page: 405 year: 1997 end-page: 410 ident: b31 article-title: Combination of multiple classifiers using local accuracy estimates publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – start-page: 1 year: 2017 end-page: 358 ident: b20 article-title: Classification and regression trees publication-title: Classification and Regression Trees – volume: 29 start-page: 1189 year: 2001 end-page: 1232 ident: b30 article-title: Greedy function approximation: A gradient boosting machine publication-title: Ann. Statist. – volume: 17 start-page: 255 year: 2011 end-page: 287 ident: b2 article-title: KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework publication-title: J. Mult.-Valued Logic Soft Comput. – volume: 490 start-page: 265 year: 2003 end-page: 276 ident: b9 article-title: Improved analysis of multivariate data by variable stability scaling: Application to NMR-based metabolic profiling publication-title: Anal. Chim. Acta – volume: 97 year: 2020 ident: b1 article-title: Investigating the impact of data normalization on classification performance publication-title: Appl. Soft Comput. – start-page: 232 year: 2012 ident: b23 article-title: Ensemble Methods, Foundations and Algorithms – reference: R.M. Cruz, R. Sabourin, G.D. Cavalcanti, META-DES.H: A Dynamic Ensemble Selection technique using meta-learning and a dynamic weighting approach, in: Proceedings of the International Joint Conference on Neural Networks, Vol. 2015-September, ISBN: 9781479919604, 2015, – volume: 34 start-page: 1879 year: 2001 end-page: 1881 ident: b32 article-title: Dynamic classifier selection based on multiple classifier behaviour publication-title: Pattern Recognit. – reference: I. Rish, An Empirical Study of the Naïve Bayes Classifier An empirical study of the naive Bayes classifier, in: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Vol. 3, 2001, pp. 41–46, (22). – year: 2017 ident: b37 article-title: UCI machine learning repository – volume: 24 start-page: 123 year: 1996 end-page: 140 ident: b27 article-title: Bagging predictors publication-title: Mach. Learn. – volume: 12 start-page: 2825 year: 2011 end-page: 2830 ident: b15 article-title: Scikit-learn: Machine learning in Python publication-title: J. Mach. Learn. Res. – start-page: 487 year: 2021 end-page: 499 ident: b10 article-title: Leveraging the feature distribution in transfer-based few-shot learning publication-title: Artificial Neural Networks and Machine Learning – ICANN 2021 – volume: 21 start-page: 1 year: 2020 end-page: 5 ident: b26 article-title: DESlib: A dynamic ensemble selection library in Python publication-title: J. Mach. Learn. Res. – volume: 47 start-page: 3665 year: 2014 end-page: 3680 ident: b33 article-title: Dynamic selection of classifiers — A comprehensive review publication-title: Pattern Recognit. – volume: 77 start-page: 602 year: 2022 end-page: 612 ident: b3 article-title: Comparative analysis of normalizing techniques based on the use of classification quality criteria publication-title: Lect. Notes Data Eng. Commun. Technol. – reference: H. Zhang, The optimality of Naive Bayes, in: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2004, Vol. 2, 2004, pp. 562–567. – start-page: 351 year: 2014 ident: b22 article-title: Combining Pattern Classifiers: Methods and Algoritms – volume: 20 start-page: 273 year: 1995 end-page: 297 ident: b14 article-title: Support-vector networks publication-title: Mach. Learn. – start-page: 213 year: 1999 end-page: 225 ident: b8 article-title: Introduction to Multi-and Megavariate Data Analysis using Projection Methods (PCA & PLS) – reference: A. Sato, K. Yamada, Generalized Learning Vector Quantization, in: Proceedings of the 8th International Conference on Neural Information Processing Systems, 1996, pp. 423–429. – volume: 24 start-page: 281 year: 2002 end-page: 286 ident: b35 article-title: A theoretical study on six classifier fusion strategies publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 48 start-page: 1925 year: 2015 end-page: 1935 ident: b42 article-title: META-DES: A dynamic ensemble selection framework using meta-learning publication-title: Pattern Recognit. – start-page: 729 year: 2020 end-page: 735 ident: b6 article-title: Study the influence of normalization/transformation process on the accuracy of supervised classification publication-title: 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT) – volume: 41 start-page: 195 year: 2018 end-page: 216 ident: b7 article-title: Dynamic classifier selection: Recent advances and perspectives publication-title: Inf. Fusion – volume: 41 start-page: 1718 year: 2008 end-page: 1731 ident: b34 article-title: From dynamic classifier selection to dynamic ensemble selection publication-title: Pattern Recognit. – year: 2021 ident: b11 article-title: 6.3. Preprocessing data – volume: 13 start-page: 21 year: 1967 end-page: 27 ident: b12 article-title: Nearest neighbor pattern classification publication-title: IEEE Trans. Inform. Theory – reference: , (July). – volume: 85 start-page: 132 year: 2019 end-page: 148 ident: b41 article-title: Online local pool generation for dynamic classifier selection publication-title: Pattern Recognit. – volume: 55 start-page: 119 year: 1997 end-page: 139 ident: b29 article-title: A decision-theoretic generalization of on-line learning and an application to boosting publication-title: J. Comput. System Sci. – start-page: 2459 year: 2009 end-page: 2462 ident: b19 article-title: Rule-based classification publication-title: Encyclopedia of Database Systems – start-page: 497 year: 2018 ident: b21 article-title: Neural Networks and Deep Learning – volume: 106 start-page: 252 year: 2018 end-page: 262 ident: b4 article-title: Dynamic selection of normalization techniques using data complexity measures publication-title: Expert Syst. Appl. – volume: 45 start-page: 5 year: 2001 end-page: 32 ident: b28 article-title: Random forests publication-title: Mach. Learn. – volume: 942 start-page: 1 year: 2017 end-page: 12 ident: b36 article-title: Predictive accuracy : A misleading performance measure for highly imbalanced data publication-title: SAS Glob. Forum – volume: 22 start-page: 673 year: 2013 end-page: 688 ident: b40 article-title: Dynamic selection approaches for multiple classifier systems publication-title: Neural Comput. Appl. – volume: 17 start-page: 261 year: 2020 ident: 10.1016/j.asoc.2022.109924_b38 article-title: SciPy 1.0: Fundamental algorithms for scientific computing in python publication-title: Nature Methods doi: 10.1038/s41592-019-0686-2 – volume: 22 start-page: 673 issue: 3–4 year: 2013 ident: 10.1016/j.asoc.2022.109924_b40 article-title: Dynamic selection approaches for multiple classifier systems publication-title: Neural Comput. Appl. doi: 10.1007/s00521-011-0737-9 – start-page: 361 year: 2008 ident: 10.1016/j.asoc.2022.109924_b24 article-title: Review of classifier combination methods – volume: 106 start-page: 252 year: 2018 ident: 10.1016/j.asoc.2022.109924_b4 article-title: Dynamic selection of normalization techniques using data complexity measures publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2018.04.008 – volume: 47 start-page: 3665 issue: 11 year: 2014 ident: 10.1016/j.asoc.2022.109924_b33 article-title: Dynamic selection of classifiers — A comprehensive review publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2014.05.003 – volume: 942 start-page: 1 year: 2017 ident: 10.1016/j.asoc.2022.109924_b36 article-title: Predictive accuracy : A misleading performance measure for highly imbalanced data publication-title: SAS Glob. Forum – start-page: 145 year: 2010 ident: 10.1016/j.asoc.2022.109924_b39 article-title: Dynamic selection of ensembles of classifiers using contextual information – year: 2017 ident: 10.1016/j.asoc.2022.109924_b37 – volume: 14 start-page: 69 issue: 2 year: 2004 ident: 10.1016/j.asoc.2022.109924_b18 article-title: Gaussian processes for machine learning publication-title: Int. J. Neural Syst. doi: 10.1142/S0129065704001899 – volume: 45 start-page: 5 issue: 1 year: 2001 ident: 10.1016/j.asoc.2022.109924_b28 article-title: Random forests publication-title: Mach. Learn. doi: 10.1023/A:1010933404324 – volume: 55 start-page: 119 issue: 1 year: 1997 ident: 10.1016/j.asoc.2022.109924_b29 article-title: A decision-theoretic generalization of on-line learning and an application to boosting publication-title: J. Comput. System Sci. doi: 10.1006/jcss.1997.1504 – volume: 12 start-page: 2825 year: 2011 ident: 10.1016/j.asoc.2022.109924_b15 article-title: Scikit-learn: Machine learning in Python publication-title: J. Mach. Learn. Res. – volume: 17 start-page: 255 issue: 2–3 year: 2011 ident: 10.1016/j.asoc.2022.109924_b2 article-title: KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework publication-title: J. Mult.-Valued Logic Soft Comput. – volume: 41 start-page: 195 year: 2018 ident: 10.1016/j.asoc.2022.109924_b7 article-title: Dynamic classifier selection: Recent advances and perspectives publication-title: Inf. Fusion doi: 10.1016/j.inffus.2017.09.010 – start-page: 232 year: 2012 ident: 10.1016/j.asoc.2022.109924_b23 – volume: 490 start-page: 265 issue: 1–2 year: 2003 ident: 10.1016/j.asoc.2022.109924_b9 article-title: Improved analysis of multivariate data by variable stability scaling: Application to NMR-based metabolic profiling publication-title: Anal. Chim. Acta doi: 10.1016/S0003-2670(03)00094-1 – start-page: 785 year: 2016 ident: 10.1016/j.asoc.2022.109924_b25 article-title: XGBoost: A scalable tree boosting system – start-page: 497 year: 2018 ident: 10.1016/j.asoc.2022.109924_b21 – volume: 29 start-page: 1189 issue: 5 year: 2001 ident: 10.1016/j.asoc.2022.109924_b30 article-title: Greedy function approximation: A gradient boosting machine publication-title: Ann. Statist. doi: 10.1214/aos/1013203451 – ident: 10.1016/j.asoc.2022.109924_b13 – start-page: 1 year: 2017 ident: 10.1016/j.asoc.2022.109924_b20 article-title: Classification and regression trees – ident: 10.1016/j.asoc.2022.109924_b17 – ident: 10.1016/j.asoc.2022.109924_b43 doi: 10.1109/IJCNN.2015.7280594 – start-page: 487 year: 2021 ident: 10.1016/j.asoc.2022.109924_b10 article-title: Leveraging the feature distribution in transfer-based few-shot learning – start-page: 729 year: 2020 ident: 10.1016/j.asoc.2022.109924_b6 article-title: Study the influence of normalization/transformation process on the accuracy of supervised classification – volume: 24 start-page: 123 issue: 2 year: 1996 ident: 10.1016/j.asoc.2022.109924_b27 article-title: Bagging predictors publication-title: Mach. Learn. doi: 10.1007/BF00058655 – volume: 20 start-page: 273 issue: 3 year: 1995 ident: 10.1016/j.asoc.2022.109924_b14 article-title: Support-vector networks publication-title: Mach. Learn. doi: 10.1007/BF00994018 – volume: 77 start-page: 602 year: 2022 ident: 10.1016/j.asoc.2022.109924_b3 article-title: Comparative analysis of normalizing techniques based on the use of classification quality criteria publication-title: Lect. Notes Data Eng. Commun. Technol. doi: 10.1007/978-3-030-82014-5_41 – start-page: 213 year: 1999 ident: 10.1016/j.asoc.2022.109924_b8 – start-page: 351 year: 2014 ident: 10.1016/j.asoc.2022.109924_b22 – volume: 41 start-page: 1718 issue: 5 year: 2008 ident: 10.1016/j.asoc.2022.109924_b34 article-title: From dynamic classifier selection to dynamic ensemble selection publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2007.10.015 – year: 2021 ident: 10.1016/j.asoc.2022.109924_b11 – volume: 21 start-page: 1 issue: 8 year: 2020 ident: 10.1016/j.asoc.2022.109924_b26 article-title: DESlib: A dynamic ensemble selection library in Python publication-title: J. Mach. Learn. Res. – volume: 48 start-page: 1925 issue: 5 year: 2015 ident: 10.1016/j.asoc.2022.109924_b42 article-title: META-DES: A dynamic ensemble selection framework using meta-learning publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2014.12.003 – start-page: 2459 year: 2009 ident: 10.1016/j.asoc.2022.109924_b19 article-title: Rule-based classification – volume: 85 start-page: 132 issue: 1 year: 2019 ident: 10.1016/j.asoc.2022.109924_b41 article-title: Online local pool generation for dynamic classifier selection publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2018.08.004 – volume: 9 start-page: 66 issue: 3 year: 2019 ident: 10.1016/j.asoc.2022.109924_b5 article-title: Comparison of the influence of standardization and normalization of data on the effectiveness of spongy tissue texture classification publication-title: Inform. Autom. Pomiary Gospod. Ochr. Środowiska doi: 10.35784/iapgos.62 – volume: 97 year: 2020 ident: 10.1016/j.asoc.2022.109924_b1 article-title: Investigating the impact of data normalization on classification performance publication-title: Appl. Soft Comput. doi: 10.1016/j.asoc.2019.105524 – volume: 34 start-page: 1879 issue: 9 year: 2001 ident: 10.1016/j.asoc.2022.109924_b32 article-title: Dynamic classifier selection based on multiple classifier behaviour publication-title: Pattern Recognit. doi: 10.1016/S0031-3203(00)00150-3 – volume: 24 start-page: 281 issue: 2 year: 2002 ident: 10.1016/j.asoc.2022.109924_b35 article-title: A theoretical study on six classifier fusion strategies publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/34.982906 – volume: 13 start-page: 21 issue: 1 year: 1967 ident: 10.1016/j.asoc.2022.109924_b12 article-title: Nearest neighbor pattern classification publication-title: IEEE Trans. Inform. Theory doi: 10.1109/TIT.1967.1053964 – volume: 19 start-page: 405 issue: 4 year: 1997 ident: 10.1016/j.asoc.2022.109924_b31 article-title: Combination of multiple classifiers using local accuracy estimates publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/34.588027 – ident: 10.1016/j.asoc.2022.109924_b16 |
SSID | ssj0016928 |
Score | 2.6696987 |
Snippet | Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in... |
SourceID | crossref elsevier |
SourceType | Enrichment Source Index Database Publisher |
StartPage | 109924 |
SubjectTerms | Classification Ensemble of classifiers Multiple Classifier System Normalization Preprocessing Scaling Standardization |
Title | The choice of scaling technique matters for classification performance |
URI | https://dx.doi.org/10.1016/j.asoc.2022.109924 |
Volume | 133 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bS8MwFA5jvvjiXZyXkQffpK5Jk7R5HMMxb0PUwd5KkiYwcRd0vvrbPWnTqSB7EAqFkgPlS86tPec7CJ0bqyyEqTpyzoqIFUaDSnETZc5pkmSuEOUf3fuhGIzYzZiPG6hX98L4sspg-yubXlrr8KQT0OwsJpPOE2QeGZNMUFpyzvhGc8ZSf8ovP1dlHkTIcr6qXxz51aFxpqrxUoAA5IiUelYlSdnfzumHw-nvoK0QKeJu9TK7qGFne2i7nsKAg1Luoz7sNAYjBhqP5w6_A-jgjvCKnBVPKwpNDOEpNj5Y9tVB5YbgxXfbwAEa9a-ee4MoTEeITBLHy8hl4IB4qngaOw1uyBIVm8JIUoBXclJAKpRSTi1LtVJceA4A6ROYRBdOEOuSQ9SczWf2CGHmMk2kTQ2Di1KtnOYqk1RxSajTaQuRGpbcBOpwP8HiNa9rxF5yD2XuocwrKFvoYiWzqIgz1q7mNdr5r-3PwbKvkTv-p9wJ2vRz46tvKaeouXz7sGcQXSx1uzw-bbTR7T3ePfj79e1g-AVzZ9DK |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV07T8MwED7xGGDhjXjjASYUmjh2Eg8MCKhaCl1opW7BdmwJBKWCIsTCn-IPcs6jgIQ6IFXKFOUi5_Plu7vkHgAH2kiDbqryrDWRxzKt8JXi2kusVUGY2CzK_-het6NGl132eG8KPqtaGJdWWXJ_wek5W5dnaiWatcHdXe0GI4-ECRZRmvecqTIrW-b9DeO2l5PmOW7yIaX1i85ZwytHC3g69P2hZxNkbx5LHvtWIYebQPo60yLIkNKtiDCOiCmnhsVKSh65AnrhvP9QZTYKjA3xvtMwy5Au3NiE449RXkkQiXygq1ud55ZXVuoUSWUSIceglFLXxklQ9rc1_GHh6kuwULqm5LR4-mWYMv0VWKzGPpCSBVahjqpFkDWRYsiTJS-4y2j_yKgbLHksenYS9IeJdt65S0fKNYAMvusU1qA7EczWYab_1DcbQJhNVCBMrBkelCppFZeJoJKLgFoVb0JQwZLqsle5G5nxkFZJafepgzJ1UKYFlJtwNJIZFJ06xl7NK7TTX_qWoikZI7f1T7l9mGt0rq_Sq2a7tQ3zbmh98SFnB2aGz69mF12bodrLVYnA7aR19wvnDQo6 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+choice+of+scaling+technique+matters+for+classification+performance&rft.jtitle=Applied+soft+computing&rft.au=de+Amorim%2C+Lucas+B.V.&rft.au=Cavalcanti%2C+George+D.C.&rft.au=Cruz%2C+Rafael+M.O.&rft.date=2023-01-01&rft.pub=Elsevier+B.V&rft.issn=1568-4946&rft.eissn=1872-9681&rft.volume=133&rft_id=info:doi/10.1016%2Fj.asoc.2022.109924&rft.externalDocID=S1568494622009735 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1568-4946&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1568-4946&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1568-4946&client=summon |