Efficient permutation testing of variable importance measures by the example of random forests
Hypothesis testing of variable importance measures (VIMPs) is still the subject of ongoing research. This particularly applies to random forests (RF), for which VIMPs are a popular feature. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional...
Saved in:
Published in | Computational statistics & data analysis Vol. 181; p. 107689 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.05.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Hypothesis testing of variable importance measures (VIMPs) is still the subject of ongoing research. This particularly applies to random forests (RF), for which VIMPs are a popular feature. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. But these approaches can be computationally expensive or even practically infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any kind of prediction model and VIMP. Embracing this advantage, it is proposed to use sequential permutation tests and sequential p-value estimation to reduce the computational costs associated with conventional permutation tests. These costs can be particularly high in case of complex prediction models. Therefore, RF's popular and widely used permutation VIMP (pVIMP) serves as a practical and relevant application example. The results of simulation studies confirm the theoretical properties of the sequential tests, that is, the type-I error probability is controlled at a nominal level and a high power is maintained with considerably fewer permutations needed compared to conventional permutation testing. The numerical stability of the methods is investigated in two additional application studies. In summary, theoretically sound sequential permutation testing of VIMP is possible at greatly reduced computational costs. Recommendations for application are given. A respective implementation for RF's pVIMP is provided through the accompanying R package rfvimptest. |
---|---|
AbstractList | Hypothesis testing of variable importance measures (VIMPs) is still the subject of ongoing research. This particularly applies to random forests (RF), for which VIMPs are a popular feature. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. But these approaches can be computationally expensive or even practically infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any kind of prediction model and VIMP. Embracing this advantage, it is proposed to use sequential permutation tests and sequential p-value estimation to reduce the computational costs associated with conventional permutation tests. These costs can be particularly high in case of complex prediction models. Therefore, RF's popular and widely used permutation VIMP (pVIMP) serves as a practical and relevant application example. The results of simulation studies confirm the theoretical properties of the sequential tests, that is, the type-I error probability is controlled at a nominal level and a high power is maintained with considerably fewer permutations needed compared to conventional permutation testing. The numerical stability of the methods is investigated in two additional application studies. In summary, theoretically sound sequential permutation testing of VIMP is possible at greatly reduced computational costs. Recommendations for application are given. A respective implementation for RF's pVIMP is provided through the accompanying R package rfvimptest. |
ArticleNumber | 107689 |
Author | Hornung, Roman Haller, Bernhard Hapfelmeier, Alexander |
Author_xml | – sequence: 1 givenname: Alexander orcidid: 0000-0001-6765-6352 surname: Hapfelmeier fullname: Hapfelmeier, Alexander email: alexander.hapfelmeier@tum.de organization: Institute of General Practice and Health Services Research, School of Medicine, Technical University of Munich, Orleansstraße 47, Munich, 81667, Germany – sequence: 2 givenname: Roman surname: Hornung fullname: Hornung, Roman organization: Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, Munich, 81377, Germany – sequence: 3 givenname: Bernhard surname: Haller fullname: Haller, Bernhard organization: Institute of AI and Informatics in Medicine, School of Medicine, Technical University of Munich, Ismaninger Str. 22, Munich, 81675, Germany |
BookMark | eNp9kLtOwzAUQC0EEm3hB5g8sqT40diJxIKq8pAqscCK5TrX4CqJg-1W9O9xFSaGTpbsc651zxSd974HhG4omVNCxd12bmKj54wwli-kqOozNKGVZIXkJTtHkwzJol5IfommMW4JIWwhqwn6WFnrjIM-4QFCt0s6Od_jBDG5_hN7i_c6OL1pAbtu8CHp3gDuQMddgIg3B5y-AMOP7oaMZDzovvEdtj4_p3iFLqxuI1z_nTP0_rh6Wz4X69enl-XDujBcylSUXJTSApNWcKiayhLDDKOMlLUppTCcaGY5CCqMrEUp6rKmmnEgYMDWRPAZuh3nDsF_7_LPqnPRQNvqHvwuKlbxRZ5GJM0oG1ETfIwBrBqC63Q4KErUMabaqmNMdYypxphZqv5Jxo2pUtCuPa3ejyrk_fcOgorH3gYaF8Ak1Xh3Sv8FSPWSkA |
CitedBy_id | crossref_primary_10_1371_journal_pone_0303566 crossref_primary_10_3390_e25071028 crossref_primary_10_1039_D4DD00101J crossref_primary_10_3390_atmos15070799 crossref_primary_10_1093_cercor_bhae100 crossref_primary_10_32604_cmc_2024_057714 crossref_primary_10_1093_bib_bbaf096 crossref_primary_10_1111_exsy_70017 crossref_primary_10_3390_en17174379 crossref_primary_10_3390_math11234710 crossref_primary_10_1016_j_rse_2024_114231 crossref_primary_10_1088_1741_2552_acbee1 crossref_primary_10_1186_s12874_023_02023_2 crossref_primary_10_1186_s42466_024_00310_x crossref_primary_10_1177_17562864231161892 crossref_primary_10_1016_j_icheatmasstransfer_2024_108286 |
Cites_doi | 10.1080/10618600.2016.1256817 10.1186/1471-2105-14-125 10.1016/j.isprsjprs.2016.01.011 10.1037/a0016973 10.1080/03610926.2020.1764042 10.3390/e24050687 10.1007/s00357-018-9302-x 10.1186/s12859-022-04962-x 10.1186/s12859-020-03622-2 10.1080/07474946.2011.539924 10.1007/s11222-021-10057-z 10.1080/03610919108812956 10.1198/106186006X133933 10.1093/biomet/78.2.301 10.1214/aoms/1177731118 10.1016/j.csda.2011.09.024 10.1016/j.eswa.2019.05.028 10.1186/1471-2105-8-25 10.1186/s12859-016-0995-8 10.1093/bioinformatics/bty1025 10.1038/s41533-021-00258-4 10.1016/j.csda.2019.106839 10.1007/s11634-016-0276-4 10.1145/1147234.1147247 10.1093/bib/bbx124 10.1016/j.csda.2006.12.030 10.1007/BF00058655 10.1007/s11222-012-9349-1 10.1093/bib/bbu012 10.1002/widm.1072 10.1023/A:1010933404324 10.1016/j.csda.2012.09.020 10.1186/1471-2288-14-137 10.1371/journal.pone.0201904 10.1093/bioinformatics/bty373 10.1177/0962280217693034 10.1214/aoms/1177707045 10.1002/sim.7803 10.18637/jss.v077.i01 10.1016/j.csda.2014.06.017 10.1186/1471-2105-9-307 |
ContentType | Journal Article |
Copyright | 2023 The Author(s) |
Copyright_xml | – notice: 2023 The Author(s) |
DBID | 6I. AAFTH AAYXX CITATION 7S9 L.6 |
DOI | 10.1016/j.csda.2022.107689 |
DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef AGRICOLA AGRICOLA - Academic |
DatabaseTitle | CrossRef AGRICOLA AGRICOLA - Academic |
DatabaseTitleList | AGRICOLA |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Mathematics |
EISSN | 1872-7352 |
ExternalDocumentID | 10_1016_j_csda_2022_107689 S0167947322002699 |
GroupedDBID | --K --M -~X .~1 0R~ 1B1 1OL 1RT 1~. 1~5 29F 4.4 457 4G. 5GY 5VS 6I. 7-5 71M 8P~ 9JN 9JO AAAKF AAAKG AABNK AACTN AAEDT AAEDW AAFTH AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AARIN AAXUO AAYFN ABAOU ABBOA ABFNM ABMAC ABTAH ABUCO ABXDB ABYKQ ACAZW ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADGUI ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AI. AIALX AIEXJ AIGVJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM ARUGR ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HAMUX HLZ HMJ HVGLF HZ~ H~9 IHE J1W JJJVA KOM LG9 LY1 M26 M41 MHUIS MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG RNS ROL RPZ SBC SDF SDG SDS SES SEW SME SPC SPCBC SSB SSD SST SSV SSW SSZ T5K VH1 VOH WUQ XPP ZMT ZY4 ~02 ~G- AAHBH AATTM AAXKI AAYWO AAYXX ABJNI ABWVN ACRPL ACVFH ADCNI ADNMO ADXHL AEIPS AEUPX AFJKZ AFPUW AFXIZ AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP BNPGV CITATION SSH 7S9 EFKBS L.6 |
ID | FETCH-LOGICAL-c377t-53657fe27f63e8d8f0c2c212059c576c30a2f3e616c796569591a23e0ecef9063 |
IEDL.DBID | .~1 |
ISSN | 0167-9473 |
IngestDate | Fri Aug 22 20:17:42 EDT 2025 Tue Jul 01 02:24:38 EDT 2025 Thu Apr 24 23:01:32 EDT 2025 Fri Feb 23 02:38:29 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | p-value Sequential permutation test Prediction model Machine learning Variable selection |
Language | English |
License | This is an open access article under the CC BY-NC-ND license. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c377t-53657fe27f63e8d8f0c2c212059c576c30a2f3e616c796569591a23e0ecef9063 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0001-6765-6352 |
OpenAccessLink | https://www.sciencedirect.com/science/article/pii/S0167947322002699 |
PQID | 2834205071 |
PQPubID | 24069 |
ParticipantIDs | proquest_miscellaneous_2834205071 crossref_primary_10_1016_j_csda_2022_107689 crossref_citationtrail_10_1016_j_csda_2022_107689 elsevier_sciencedirect_doi_10_1016_j_csda_2022_107689 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | May 2023 2023-05-00 20230501 |
PublicationDateYYYYMMDD | 2023-05-01 |
PublicationDate_xml | – month: 05 year: 2023 text: May 2023 |
PublicationDecade | 2020 |
PublicationTitle | Computational statistics & data analysis |
PublicationYear | 2023 |
Publisher | Elsevier B.V |
Publisher_xml | – name: Elsevier B.V |
References | Breiman (br0090) 2001; 45 Li, Mansmann, Du, Hornung (br0350) 2022; 23 Janitza, Celik, Boulesteix (br0300) 2018; 12 McAlexander, Mentch (br0390) 2020; 7 Adler, Painsky (br0010) 2022; 24 Dwass (br0150) 1957 Hornung (br0260) 2020; 37 Dua, Graff (br0140) 2017 Strobl, Boulesteix, Augustin (br0530) 2007; 52 Belgiu, Drăguţ (br0020) 2016; 114 Kulldorff, Davis, Kolczak, Lewis, Lieu, Platt (br0320) 2011; 30 Besag, Clifford (br0040) 1991; 78 Boulesteix, Janitza, Kruppa, König (br0070) 2012; 2 Good (br0200) 2013 Hapfelmeier, Hothorn, Ulm, Strobl (br0220) 2014; 24 Schneider, Rauscher, Kellerer, Linde, Kneissl, Hapfelmeier (br0500) 2021; 31 Liaw, Wiener (br0360) 2002; 2 Wald (br0590) 1945; 16 Wright, Ziegler, König (br0610) 2016; 17 Friedman (br0170) 1991; 19 Strobl, Malley, Tutz (br0560) 2009; 14 Mentch, Hooker (br0410) 2016; 17 Meinshausen, Ridgeway (br0400) 2006; 7 Tutz (br0570) 2021 Good (br0190) 2006 Riley, Ensor, Snell, Harrell, Martin, Reitsma, Moons, Collins, van Smeden (br0490) 2020; 368 Hapfelmeier, Hothorn, Ulm (br0210) 2012; 56 Lock (br0370) 1991; 20 Degenhardt, Seifert, Szymczak (br0130) 2017; 20 Bommert, Sun, Bischl, Rahnenführer, Lang (br0050) 2020; 143 Hapfelmeier, Ulm (br0240) 2014; 80 Lehmann, Romano, Casella (br0330) 2005 Debeer, Strobl (br0120) 2020; 21 Fernández-Delgado, Cernadas, Barro, Amorim (br0160) 2014; 15 van der Ploeg, Austin, Steyerberg (br0480) 2014; 14 Nembrini, König, Wright (br0460) 2018; 34 Hothorn, Hornik, Zeileis (br0270) 2006; 15 Garge, Bobashev, Eggleston (br0180) 2013; 14 Mentch, Zhou (br0440) 2022; 23 Berry, Johnston, Mielke (br0030) 2011; 3 Hooker, Mentch, Zhou (br0250) 2021; 31 Breiman (br0080) 1996; 24 Wainberg, Alipanahi, Frey (br0580) 2016; 17 Janitza, Hornung (br0310) 2018; 13 Pearson (br0470) 2006; 8 Hapfelmeier, Ulm (br0230) 2013; 60 Czanner, Sarma, Eden, Brown (br0110) 2008 Mentch, Hooker (br0420) 2017; 26 Nembrini (br0450) 2018; 35 Speiser, Miller, Tooze, Ip (br0520) 2019; 134 Ishwaran, Lu (br0290) 2019; 38 Mentch, Zhou (br0430) 2020; 21 Hothorn, Zeileis (br0280) 0 2021 Leisch, Dimitriadou (br0340) 2021 Loecher (br0380) 2022; 51 Boulesteix, Janitza, Hapfelmeier, Van Steen, Strobl (br0060) 2015; 16 Coleman, Peng, Mentch (br0100) 2022; 23 Wright, Ziegler (br0600) 2017; 77 Seibold, Zeileis, Hothorn (br0510) 2018; 27 Strobl, Boulesteix, Zeileis, Hothorn (br0550) 2007; 8 Strobl, Boulesteix, Kneib, Augustin, Zeileis (br0540) 2008; 9 Breiman (10.1016/j.csda.2022.107689_br0090) 2001; 45 Schneider (10.1016/j.csda.2022.107689_br0500) 2021; 31 McAlexander (10.1016/j.csda.2022.107689_br0390) 2020; 7 Belgiu (10.1016/j.csda.2022.107689_br0020) 2016; 114 Janitza (10.1016/j.csda.2022.107689_br0310) 2018; 13 Ishwaran (10.1016/j.csda.2022.107689_br0290) 2019; 38 Hapfelmeier (10.1016/j.csda.2022.107689_br0210) 2012; 56 Boulesteix (10.1016/j.csda.2022.107689_br0060) 2015; 16 Hapfelmeier (10.1016/j.csda.2022.107689_br0230) 2013; 60 Bommert (10.1016/j.csda.2022.107689_br0050) 2020; 143 Riley (10.1016/j.csda.2022.107689_br0490) 2020; 368 Lehmann (10.1016/j.csda.2022.107689_br0330) 2005 Boulesteix (10.1016/j.csda.2022.107689_br0070) 2012; 2 Tutz (10.1016/j.csda.2022.107689_br0570) 2021 Dua (10.1016/j.csda.2022.107689_br0140) Hothorn (10.1016/j.csda.2022.107689_br0280) 2021 Lock (10.1016/j.csda.2022.107689_br0370) 1991; 20 Kulldorff (10.1016/j.csda.2022.107689_br0320) 2011; 30 Degenhardt (10.1016/j.csda.2022.107689_br0130) 2017; 20 Hapfelmeier (10.1016/j.csda.2022.107689_br0240) 2014; 80 Hothorn (10.1016/j.csda.2022.107689_br0270) 2006; 15 Mentch (10.1016/j.csda.2022.107689_br0410) 2016; 17 van der Ploeg (10.1016/j.csda.2022.107689_br0480) 2014; 14 Strobl (10.1016/j.csda.2022.107689_br0540) 2008; 9 Pearson (10.1016/j.csda.2022.107689_br0470) 2006; 8 Seibold (10.1016/j.csda.2022.107689_br0510) 2018; 27 Berry (10.1016/j.csda.2022.107689_br0030) 2011; 3 Nembrini (10.1016/j.csda.2022.107689_br0460) 2018; 34 Hapfelmeier (10.1016/j.csda.2022.107689_br0220) 2014; 24 Speiser (10.1016/j.csda.2022.107689_br0520) 2019; 134 Strobl (10.1016/j.csda.2022.107689_br0530) 2007; 52 Breiman (10.1016/j.csda.2022.107689_br0080) 1996; 24 Wainberg (10.1016/j.csda.2022.107689_br0580) 2016; 17 Adler (10.1016/j.csda.2022.107689_br0010) 2022; 24 Besag (10.1016/j.csda.2022.107689_br0040) 1991; 78 Nembrini (10.1016/j.csda.2022.107689_br0450) 2018; 35 Mentch (10.1016/j.csda.2022.107689_br0440) 2022; 23 Garge (10.1016/j.csda.2022.107689_br0180) 2013; 14 Hornung (10.1016/j.csda.2022.107689_br0260) 2020; 37 Good (10.1016/j.csda.2022.107689_br0190) 2006 Coleman (10.1016/j.csda.2022.107689_br0100) 2022; 23 Loecher (10.1016/j.csda.2022.107689_br0380) 2022; 51 Wright (10.1016/j.csda.2022.107689_br0610) 2016; 17 Janitza (10.1016/j.csda.2022.107689_br0300) 2018; 12 Debeer (10.1016/j.csda.2022.107689_br0120) 2020; 21 Good (10.1016/j.csda.2022.107689_br0200) 2013 Wright (10.1016/j.csda.2022.107689_br0600) 2017; 77 Li (10.1016/j.csda.2022.107689_br0350) 2022; 23 Liaw (10.1016/j.csda.2022.107689_br0360) 2002; 2 Dwass (10.1016/j.csda.2022.107689_br0150) 1957 Mentch (10.1016/j.csda.2022.107689_br0420) 2017; 26 Strobl (10.1016/j.csda.2022.107689_br0560) 2009; 14 Friedman (10.1016/j.csda.2022.107689_br0170) 1991; 19 Hooker (10.1016/j.csda.2022.107689_br0250) 2021; 31 Strobl (10.1016/j.csda.2022.107689_br0550) 2007; 8 Czanner (10.1016/j.csda.2022.107689_br0110) 2008 Fernández-Delgado (10.1016/j.csda.2022.107689_br0160) 2014; 15 Leisch (10.1016/j.csda.2022.107689_br0340) 2021 Wald (10.1016/j.csda.2022.107689_br0590) 1945; 16 Meinshausen (10.1016/j.csda.2022.107689_br0400) 2006; 7 Mentch (10.1016/j.csda.2022.107689_br0430) 2020; 21 |
References_xml | – volume: 114 start-page: 24 year: 2016 end-page: 31 ident: br0020 article-title: Random forest in remote sensing: a review of applications and future directions publication-title: ISPRS J. Photogramm. Remote Sens. – volume: 31 start-page: 1 year: 2021 end-page: 6 ident: br0500 article-title: Covid-19 assessment in family practice—a clinical decision rule based on self-rated symptoms and contact history publication-title: NPJ Prim. Care Respir. Med. – year: 2021 ident: br0340 article-title: mlbench: Machine Learning Benchmark Problems – volume: 9 start-page: 1 year: 2008 end-page: 11 ident: br0540 article-title: Conditional variable importance for random forests publication-title: BMC Bioinform. – volume: 15 start-page: 651 year: 2006 end-page: 674 ident: br0270 article-title: Unbiased recursive partitioning: a conditional inference framework publication-title: J. Comput. Graph. Stat. – volume: 13 year: 2018 ident: br0310 article-title: On the overestimation of random forest's out-of-bag error publication-title: PLoS ONE – year: 2005 ident: br0330 article-title: Testing Statistical Hypotheses, vol. 3 – volume: 51 start-page: 1413 year: 2022 end-page: 1425 ident: br0380 article-title: Unbiased variable importance for random forests publication-title: Commun. Stat., Theory Methods – volume: 14 start-page: 323 year: 2009 ident: br0560 article-title: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests publication-title: Psychol. Methods – volume: 26 start-page: 589 year: 2017 end-page: 597 ident: br0420 article-title: Formal hypothesis tests for additive structure in random forests publication-title: J. Comput. Graph. Stat. – volume: 23 start-page: 1 year: 2022 end-page: 35 ident: br0100 article-title: Scalable and efficient hypothesis testing with random forests publication-title: J. Mach. Learn. Res. – volume: 77 start-page: 1 year: 2017 end-page: 17 ident: br0600 article-title: ranger: a fast implementation of random forests for high dimensional data in C++ and R publication-title: J. Stat. Softw. – volume: 19 start-page: 1 year: 1991 end-page: 67 ident: br0170 article-title: Multivariate adaptive regression splines publication-title: Ann. Stat. – volume: 2 start-page: 18 year: 2002 end-page: 22 ident: br0360 article-title: Classification and regression by randomforest publication-title: R News – volume: 27 start-page: 3104 year: 2018 end-page: 3125 ident: br0510 article-title: Individual treatment effect prediction for amyotrophic lateral sclerosis patients publication-title: Stat. Methods Med. Res. – volume: 8 start-page: 83 year: 2006 end-page: 92 ident: br0470 article-title: The problem of disguised missing data publication-title: ACM SIGKDD Explor. Newsl. – volume: 30 start-page: 58 year: 2011 end-page: 78 ident: br0320 article-title: A maximized sequential probability ratio test for drug and vaccine safety surveillance publication-title: Seq. Anal. – start-page: 1 year: 2021 end-page: 23 ident: br0570 article-title: Ordinal trees and random forests: score-free recursive partitioning and improved ensembles publication-title: J. Classif. – start-page: 181 year: 1957 end-page: 187 ident: br0150 article-title: Modified randomization tests for nonparametric hypotheses publication-title: Ann. Math. Stat. – volume: 2 start-page: 493 year: 2012 end-page: 507 ident: br0070 article-title: Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics publication-title: WIREs Data Min. Knowl. Discov. – volume: 38 start-page: 558 year: 2019 end-page: 582 ident: br0290 article-title: Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival publication-title: Stat. Med. – volume: 21 start-page: 1 year: 2020 end-page: 36 ident: br0430 article-title: Randomization as regularization: a degrees of freedom explanation for random forest success publication-title: J. Mach. Learn. Res. – volume: 45 start-page: 5 year: 2001 end-page: 32 ident: br0090 article-title: Random forests publication-title: Mach. Learn. – volume: 15 start-page: 3133 year: 2014 end-page: 3181 ident: br0160 article-title: Do we need hundreds of classifiers to solve real world classification problems? publication-title: J. Mach. Learn. Res. – volume: 7 year: 2006 ident: br0400 article-title: Quantile regression forests publication-title: J. Mach. Learn. Res. – volume: 7 year: 2020 ident: br0390 article-title: Predictive inference with random forests: a new perspective on classical analyses publication-title: Res. Polit. – volume: 14 start-page: 1 year: 2014 end-page: 13 ident: br0480 article-title: Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints publication-title: BMC Med. Res. Methodol. – volume: 17 start-page: 1 year: 2016 end-page: 10 ident: br0610 article-title: Do little interactions get lost in dark random forests? publication-title: BMC Bioinform. – volume: 20 start-page: 341 year: 1991 end-page: 363 ident: br0370 article-title: A sequential approximation to a permutation test publication-title: Commun. Stat., Simul. Comput. – year: 2017 ident: br0140 article-title: UCI machine learning repository – volume: 134 start-page: 93 year: 2019 end-page: 101 ident: br0520 article-title: A comparison of random forest variable selection methods for classification prediction modeling publication-title: Expert Syst. Appl. – volume: 31 start-page: 1 year: 2021 end-page: 16 ident: br0250 article-title: Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance publication-title: Stat. Comput. – volume: 24 start-page: 687 year: 2022 ident: br0010 article-title: Feature importance in gradient boosting trees with cross-validation feature selection publication-title: Entropy – volume: 16 start-page: 338 year: 2015 end-page: 345 ident: br0060 article-title: Letter to the editor: on the term ‘interaction’ and related phrases in the literature on random forests publication-title: Brief. Bioinform. – volume: 16 start-page: 117 year: 1945 end-page: 186 ident: br0590 article-title: Sequential tests of statistical hypotheses publication-title: Ann. Math. Stat. – volume: 368 year: 2020 ident: br0490 article-title: Calculating the sample size required for developing a clinical prediction model publication-title: BMJ – volume: 3 start-page: 527 year: 2011 end-page: 542 ident: br0030 article-title: Permutation methods publication-title: WIREs: Comput. Stat. – volume: 14 start-page: 1 year: 2013 end-page: 8 ident: br0180 article-title: Random forest methodology for model-based recursive partitioning: the mobforest package for R publication-title: BMC Bioinform. – volume: 35 start-page: 2701 year: 2018 end-page: 2705 ident: br0450 article-title: On what to permute in test-based approaches for variable importance measures in random forests publication-title: Bioinformatics – volume: 12 start-page: 885 year: 2018 end-page: 915 ident: br0300 article-title: A computationally fast variable importance test for random forests for high-dimensional data publication-title: Adv. Data Anal. Classif. – volume: 24 start-page: 21 year: 2014 end-page: 34 ident: br0220 article-title: A new variable importance measure for random forests with missing data publication-title: Stat. Comput. – volume: 17 start-page: 3837 year: 2016 end-page: 3841 ident: br0580 article-title: Are random forests truly the best classifiers? publication-title: J. Mach. Learn. Res. – volume: 78 start-page: 301 year: 1991 end-page: 304 ident: br0040 article-title: Sequential Monte Carlo p-values publication-title: Biometrika – volume: 56 start-page: 1552 year: 2012 end-page: 1565 ident: br0210 article-title: Recursive partitioning on incomplete data using surrogate decisions and multiple imputation publication-title: Comput. Stat. Data Anal. – volume: 60 start-page: 50 year: 2013 end-page: 69 ident: br0230 article-title: A new variable selection approach using random forests publication-title: Comput. Stat. Data Anal. – volume: 52 start-page: 483 year: 2007 end-page: 501 ident: br0530 article-title: Unbiased split selection for classification trees based on the Gini index publication-title: Comput. Stat. Data Anal. – volume: 20 start-page: 492 year: 2017 end-page: 503 ident: br0130 article-title: Evaluation of variable selection methods for random forests and omics data sets publication-title: Brief. Bioinform. – year: 2008 ident: br0110 article-title: A signal-to-noise ratio estimator for generalized linear model systems publication-title: Proceedings of the World Congress on Engineering – volume: 17 start-page: 841 year: 2016 end-page: 881 ident: br0410 article-title: Quantifying uncertainty in random forests via confidence intervals and hypothesis tests publication-title: J. Mach. Learn. Res. – volume: 24 start-page: 123 year: 1996 end-page: 140 ident: br0080 article-title: Bagging predictors publication-title: Mach. Learn. – start-page: 1 year: 0 2021 end-page: 16 ident: br0280 article-title: Predictive distribution modeling using transformation forests publication-title: J. Comput. Graph. Stat. – volume: 23 start-page: 412 year: 2022 ident: br0350 article-title: Benchmark study of feature selection strategies for multi-omics data publication-title: BMC Bioinform. – year: 2006 ident: br0190 article-title: Permutation, Parametric, and Bootstrap Tests of Hypotheses publication-title: Springer Series in Statistics – volume: 23 start-page: 1 year: 2022 end-page: 32 ident: br0440 article-title: Getting better from worse: augmented bagging and a cautionary tale of variable importance publication-title: J. Mach. Learn. Res. – volume: 37 start-page: 4 year: 2020 end-page: 17 ident: br0260 article-title: Ordinal forests publication-title: J. Classif. – volume: 34 start-page: 3711 year: 2018 end-page: 3718 ident: br0460 article-title: The revival of the Gini importance? publication-title: Bioinformatics – volume: 8 start-page: 1 year: 2007 end-page: 21 ident: br0550 article-title: Bias in random forest variable importance measures: illustrations, sources and a solution publication-title: BMC Bioinform. – volume: 21 start-page: 1 year: 2020 end-page: 30 ident: br0120 article-title: Conditional permutation importance revisited publication-title: BMC Bioinform. – volume: 143 year: 2020 ident: br0050 article-title: Benchmark for filter methods for feature selection in high-dimensional classification data publication-title: Comput. Stat. Data Anal. – year: 2013 ident: br0200 article-title: Permutation Tests: a Practical Guide to Resampling Methods for Testing Hypotheses – volume: 80 start-page: 129 year: 2014 end-page: 139 ident: br0240 article-title: Variable selection by random forests using data with missing values publication-title: Comput. Stat. Data Anal. – volume: 19 start-page: 1 year: 1991 ident: 10.1016/j.csda.2022.107689_br0170 article-title: Multivariate adaptive regression splines publication-title: Ann. Stat. – volume: 26 start-page: 589 year: 2017 ident: 10.1016/j.csda.2022.107689_br0420 article-title: Formal hypothesis tests for additive structure in random forests publication-title: J. Comput. Graph. Stat. doi: 10.1080/10618600.2016.1256817 – volume: 14 start-page: 1 year: 2013 ident: 10.1016/j.csda.2022.107689_br0180 article-title: Random forest methodology for model-based recursive partitioning: the mobforest package for R publication-title: BMC Bioinform. doi: 10.1186/1471-2105-14-125 – volume: 114 start-page: 24 year: 2016 ident: 10.1016/j.csda.2022.107689_br0020 article-title: Random forest in remote sensing: a review of applications and future directions publication-title: ISPRS J. Photogramm. Remote Sens. doi: 10.1016/j.isprsjprs.2016.01.011 – volume: 14 start-page: 323 year: 2009 ident: 10.1016/j.csda.2022.107689_br0560 article-title: An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests publication-title: Psychol. Methods doi: 10.1037/a0016973 – volume: 51 start-page: 1413 year: 2022 ident: 10.1016/j.csda.2022.107689_br0380 article-title: Unbiased variable importance for random forests publication-title: Commun. Stat., Theory Methods doi: 10.1080/03610926.2020.1764042 – volume: 24 start-page: 687 year: 2022 ident: 10.1016/j.csda.2022.107689_br0010 article-title: Feature importance in gradient boosting trees with cross-validation feature selection publication-title: Entropy doi: 10.3390/e24050687 – volume: 37 start-page: 4 year: 2020 ident: 10.1016/j.csda.2022.107689_br0260 article-title: Ordinal forests publication-title: J. Classif. doi: 10.1007/s00357-018-9302-x – volume: 23 start-page: 412 year: 2022 ident: 10.1016/j.csda.2022.107689_br0350 article-title: Benchmark study of feature selection strategies for multi-omics data publication-title: BMC Bioinform. doi: 10.1186/s12859-022-04962-x – start-page: 1 year: 2021 ident: 10.1016/j.csda.2022.107689_br0280 article-title: Predictive distribution modeling using transformation forests publication-title: J. Comput. Graph. Stat. – volume: 21 start-page: 1 year: 2020 ident: 10.1016/j.csda.2022.107689_br0120 article-title: Conditional permutation importance revisited publication-title: BMC Bioinform. doi: 10.1186/s12859-020-03622-2 – volume: 30 start-page: 58 year: 2011 ident: 10.1016/j.csda.2022.107689_br0320 article-title: A maximized sequential probability ratio test for drug and vaccine safety surveillance publication-title: Seq. Anal. doi: 10.1080/07474946.2011.539924 – volume: 31 start-page: 1 year: 2021 ident: 10.1016/j.csda.2022.107689_br0250 article-title: Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance publication-title: Stat. Comput. doi: 10.1007/s11222-021-10057-z – volume: 20 start-page: 341 year: 1991 ident: 10.1016/j.csda.2022.107689_br0370 article-title: A sequential approximation to a permutation test publication-title: Commun. Stat., Simul. Comput. doi: 10.1080/03610919108812956 – year: 2013 ident: 10.1016/j.csda.2022.107689_br0200 – ident: 10.1016/j.csda.2022.107689_br0140 – year: 2005 ident: 10.1016/j.csda.2022.107689_br0330 – volume: 7 year: 2020 ident: 10.1016/j.csda.2022.107689_br0390 article-title: Predictive inference with random forests: a new perspective on classical analyses publication-title: Res. Polit. – volume: 15 start-page: 651 year: 2006 ident: 10.1016/j.csda.2022.107689_br0270 article-title: Unbiased recursive partitioning: a conditional inference framework publication-title: J. Comput. Graph. Stat. doi: 10.1198/106186006X133933 – volume: 7 year: 2006 ident: 10.1016/j.csda.2022.107689_br0400 article-title: Quantile regression forests publication-title: J. Mach. Learn. Res. – volume: 15 start-page: 3133 year: 2014 ident: 10.1016/j.csda.2022.107689_br0160 article-title: Do we need hundreds of classifiers to solve real world classification problems? publication-title: J. Mach. Learn. Res. – volume: 78 start-page: 301 year: 1991 ident: 10.1016/j.csda.2022.107689_br0040 article-title: Sequential Monte Carlo p-values publication-title: Biometrika doi: 10.1093/biomet/78.2.301 – volume: 16 start-page: 117 year: 1945 ident: 10.1016/j.csda.2022.107689_br0590 article-title: Sequential tests of statistical hypotheses publication-title: Ann. Math. Stat. doi: 10.1214/aoms/1177731118 – volume: 56 start-page: 1552 year: 2012 ident: 10.1016/j.csda.2022.107689_br0210 article-title: Recursive partitioning on incomplete data using surrogate decisions and multiple imputation publication-title: Comput. Stat. Data Anal. doi: 10.1016/j.csda.2011.09.024 – volume: 134 start-page: 93 year: 2019 ident: 10.1016/j.csda.2022.107689_br0520 article-title: A comparison of random forest variable selection methods for classification prediction modeling publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2019.05.028 – volume: 3 start-page: 527 year: 2011 ident: 10.1016/j.csda.2022.107689_br0030 article-title: Permutation methods publication-title: WIREs: Comput. Stat. – volume: 2 start-page: 18 year: 2002 ident: 10.1016/j.csda.2022.107689_br0360 article-title: Classification and regression by randomforest publication-title: R News – volume: 8 start-page: 1 year: 2007 ident: 10.1016/j.csda.2022.107689_br0550 article-title: Bias in random forest variable importance measures: illustrations, sources and a solution publication-title: BMC Bioinform. doi: 10.1186/1471-2105-8-25 – volume: 17 start-page: 1 year: 2016 ident: 10.1016/j.csda.2022.107689_br0610 article-title: Do little interactions get lost in dark random forests? publication-title: BMC Bioinform. doi: 10.1186/s12859-016-0995-8 – volume: 35 start-page: 2701 year: 2018 ident: 10.1016/j.csda.2022.107689_br0450 article-title: On what to permute in test-based approaches for variable importance measures in random forests publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty1025 – volume: 31 start-page: 1 year: 2021 ident: 10.1016/j.csda.2022.107689_br0500 article-title: Covid-19 assessment in family practice—a clinical decision rule based on self-rated symptoms and contact history publication-title: NPJ Prim. Care Respir. Med. doi: 10.1038/s41533-021-00258-4 – volume: 368 year: 2020 ident: 10.1016/j.csda.2022.107689_br0490 article-title: Calculating the sample size required for developing a clinical prediction model publication-title: BMJ – volume: 143 year: 2020 ident: 10.1016/j.csda.2022.107689_br0050 article-title: Benchmark for filter methods for feature selection in high-dimensional classification data publication-title: Comput. Stat. Data Anal. doi: 10.1016/j.csda.2019.106839 – volume: 17 start-page: 3837 year: 2016 ident: 10.1016/j.csda.2022.107689_br0580 article-title: Are random forests truly the best classifiers? publication-title: J. Mach. Learn. Res. – volume: 12 start-page: 885 year: 2018 ident: 10.1016/j.csda.2022.107689_br0300 article-title: A computationally fast variable importance test for random forests for high-dimensional data publication-title: Adv. Data Anal. Classif. doi: 10.1007/s11634-016-0276-4 – volume: 8 start-page: 83 year: 2006 ident: 10.1016/j.csda.2022.107689_br0470 article-title: The problem of disguised missing data publication-title: ACM SIGKDD Explor. Newsl. doi: 10.1145/1147234.1147247 – year: 2008 ident: 10.1016/j.csda.2022.107689_br0110 article-title: A signal-to-noise ratio estimator for generalized linear model systems – volume: 23 start-page: 1 year: 2022 ident: 10.1016/j.csda.2022.107689_br0100 article-title: Scalable and efficient hypothesis testing with random forests publication-title: J. Mach. Learn. Res. – volume: 20 start-page: 492 year: 2017 ident: 10.1016/j.csda.2022.107689_br0130 article-title: Evaluation of variable selection methods for random forests and omics data sets publication-title: Brief. Bioinform. doi: 10.1093/bib/bbx124 – volume: 52 start-page: 483 year: 2007 ident: 10.1016/j.csda.2022.107689_br0530 article-title: Unbiased split selection for classification trees based on the Gini index publication-title: Comput. Stat. Data Anal. doi: 10.1016/j.csda.2006.12.030 – start-page: 1 year: 2021 ident: 10.1016/j.csda.2022.107689_br0570 article-title: Ordinal trees and random forests: score-free recursive partitioning and improved ensembles publication-title: J. Classif. – volume: 24 start-page: 123 year: 1996 ident: 10.1016/j.csda.2022.107689_br0080 article-title: Bagging predictors publication-title: Mach. Learn. doi: 10.1007/BF00058655 – volume: 24 start-page: 21 year: 2014 ident: 10.1016/j.csda.2022.107689_br0220 article-title: A new variable importance measure for random forests with missing data publication-title: Stat. Comput. doi: 10.1007/s11222-012-9349-1 – volume: 16 start-page: 338 year: 2015 ident: 10.1016/j.csda.2022.107689_br0060 article-title: Letter to the editor: on the term ‘interaction’ and related phrases in the literature on random forests publication-title: Brief. Bioinform. doi: 10.1093/bib/bbu012 – volume: 2 start-page: 493 year: 2012 ident: 10.1016/j.csda.2022.107689_br0070 article-title: Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics publication-title: WIREs Data Min. Knowl. Discov. doi: 10.1002/widm.1072 – volume: 45 start-page: 5 year: 2001 ident: 10.1016/j.csda.2022.107689_br0090 article-title: Random forests publication-title: Mach. Learn. doi: 10.1023/A:1010933404324 – volume: 60 start-page: 50 year: 2013 ident: 10.1016/j.csda.2022.107689_br0230 article-title: A new variable selection approach using random forests publication-title: Comput. Stat. Data Anal. doi: 10.1016/j.csda.2012.09.020 – volume: 17 start-page: 841 year: 2016 ident: 10.1016/j.csda.2022.107689_br0410 article-title: Quantifying uncertainty in random forests via confidence intervals and hypothesis tests publication-title: J. Mach. Learn. Res. – volume: 21 start-page: 1 year: 2020 ident: 10.1016/j.csda.2022.107689_br0430 article-title: Randomization as regularization: a degrees of freedom explanation for random forest success publication-title: J. Mach. Learn. Res. – volume: 14 start-page: 1 year: 2014 ident: 10.1016/j.csda.2022.107689_br0480 article-title: Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints publication-title: BMC Med. Res. Methodol. doi: 10.1186/1471-2288-14-137 – year: 2006 ident: 10.1016/j.csda.2022.107689_br0190 article-title: Permutation, Parametric, and Bootstrap Tests of Hypotheses – volume: 13 year: 2018 ident: 10.1016/j.csda.2022.107689_br0310 article-title: On the overestimation of random forest's out-of-bag error publication-title: PLoS ONE doi: 10.1371/journal.pone.0201904 – year: 2021 ident: 10.1016/j.csda.2022.107689_br0340 – volume: 23 start-page: 1 year: 2022 ident: 10.1016/j.csda.2022.107689_br0440 article-title: Getting better from worse: augmented bagging and a cautionary tale of variable importance publication-title: J. Mach. Learn. Res. – volume: 34 start-page: 3711 year: 2018 ident: 10.1016/j.csda.2022.107689_br0460 article-title: The revival of the Gini importance? publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty373 – volume: 27 start-page: 3104 year: 2018 ident: 10.1016/j.csda.2022.107689_br0510 article-title: Individual treatment effect prediction for amyotrophic lateral sclerosis patients publication-title: Stat. Methods Med. Res. doi: 10.1177/0962280217693034 – start-page: 181 year: 1957 ident: 10.1016/j.csda.2022.107689_br0150 article-title: Modified randomization tests for nonparametric hypotheses publication-title: Ann. Math. Stat. doi: 10.1214/aoms/1177707045 – volume: 38 start-page: 558 year: 2019 ident: 10.1016/j.csda.2022.107689_br0290 article-title: Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival publication-title: Stat. Med. doi: 10.1002/sim.7803 – volume: 77 start-page: 1 year: 2017 ident: 10.1016/j.csda.2022.107689_br0600 article-title: ranger: a fast implementation of random forests for high dimensional data in C++ and R publication-title: J. Stat. Softw. doi: 10.18637/jss.v077.i01 – volume: 80 start-page: 129 year: 2014 ident: 10.1016/j.csda.2022.107689_br0240 article-title: Variable selection by random forests using data with missing values publication-title: Comput. Stat. Data Anal. doi: 10.1016/j.csda.2014.06.017 – volume: 9 start-page: 1 year: 2008 ident: 10.1016/j.csda.2022.107689_br0540 article-title: Conditional variable importance for random forests publication-title: BMC Bioinform. doi: 10.1186/1471-2105-9-307 |
SSID | ssj0002478 |
Score | 2.4513566 |
Snippet | Hypothesis testing of variable importance measures (VIMPs) is still the subject of ongoing research. This particularly applies to random forests (RF), for... |
SourceID | proquest crossref elsevier |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 107689 |
SubjectTerms | data analysis Machine learning p-value prediction Prediction model probability Sequential permutation test Variable selection |
Title | Efficient permutation testing of variable importance measures by the example of random forests |
URI | https://dx.doi.org/10.1016/j.csda.2022.107689 https://www.proquest.com/docview/2834205071 |
Volume | 181 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA6iFz2IT3wTwZvU3aRp0xxFlNVFDz7QkyGbJrLidhe7K3rxtzvTtIqCHjwVQhLamcnMl2bmCyF7MraGiaQduTy3kchSBmsuSyKfeZOalKWGYzXy-UXauRFnd8ndFDlqamEwrbL2_cGnV966bmnV0myN-v3WFSbQKyHBInEjobCITwiJVn7w_pXmwUXwxsjvjb3rwpmQ42XLHLmHOIcGwN3qt-D0w01XsedkgczXoJEehvdaJFOuWCJz55-Mq-UyuT-uuCAghNARONtJOGGnYyTRKB7o0NMX2BVjnRTtDyrMDV9KB-EPYUl7bxQmo-7VIFswdocYlg8HFDAtTFGukJuT4-ujTlTfnRDZWMpxlMRpIr3j0qexy_LMty23EKYATVnYYti4bbiPHSjDSgWYTiWKGR67trPOK8Atq2S6GBZujVBknOPMMiYN4D-TqFR5QBZeCM56NmPrhDVC07YmFsf7LZ50k0H2qFHQGgWtg6DXyf7nmFGg1fizd9LoQn8zDg1-_89xu43iNKwaPAoxhRtOSg2gSoAsAF9t_HPuTTKLN8-H3MctMj1-nrhtwCfj3k5lgDtk5vC027nAZ_fytvsB3yblGQ |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1NT-MwEB1BOQAHxC6s-F6vtDcUtXbiJD4iBCof7WVB4oTlOvaqiKYVaRH8e2bqpBJIcOBq2ZYztt88xzPPAH-z2BqeyE7kisJGSZ5y3HO5jHzuTWpSnhpB2ci9ftq9TS7v5N0SnDa5MBRWWWN_wPQ5Wtcl7dqa7clw2P5HAfQqyXBF0kFCqWVYIXUq2YKVk4urbn8ByCIJgEwS39Sgzp0JYV62Kkh-SAgsQOqtPvNPH5B67n7ON2Gj5o3sJAztByy58ies9xaiq9UW3J_N5SDQi7AJ4u0sXLKzKelolP_Z2LNnPBhTqhQbjua0Gz-WjcJPwooNXhl2xtyLIcFgqo5urBiPGNJa7KLahtvzs5vTblQ_nxDZOMumkYxTmXknMp_GLi9y37HCoqdCQmXxlGHjjhE-djgfNlNI65RU3IjYdZx1XiF1-QWtcly6HWAkOie45TwzSAGNVKnySC58kgg-sDnfBd4YTdtaW5yeuHjUTRDZgyZDazK0DobeheNFm0lQ1viytmzmQr9bHxqh_8t2f5qJ07hx6DbElG48qzTyqgRtgRRr75t9_4bV7k3vWl9f9K_2YY0eog-hkAfQmj7N3CHSlengqF6Ob5Rv5ic |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Efficient+permutation+testing+of+variable+importance+measures+by+the+example+of+random+forests&rft.jtitle=Computational+statistics+%26+data+analysis&rft.au=Hapfelmeier%2C+Alexander&rft.au=Hornung%2C+Roman&rft.au=Haller%2C+Bernhard&rft.date=2023-05-01&rft.issn=0167-9473&rft.volume=181+p.107689-&rft_id=info:doi/10.1016%2Fj.csda.2022.107689&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-9473&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-9473&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-9473&client=summon |