Generalising Random Forest Parameter Optimisation to Include Stability and Cost

Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that...

Full description

Saved in:

Bibliographic Details
Published in	Machine Learning and Knowledge Discovery in Databases Vol. 10536; pp. 102 - 113
Main Authors	Liu, C. H. Bryan, Chamberlain, Benjamin Paul, Little, Duncan A., Cardoso, Ângelo
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2017 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Bayesian optimisation Machine learning application Model stability Parameter tuning Random forest
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest parameters for commercial applications. We propose a novel metric that captures the stability of random forest predictions, which we argue is key for scenarios that require successive predictions. We motivate the need for multi-criteria optimization by showing that in practical applications, simply choosing the parameters that lead to the lowest error can introduce unnecessary costs and produce predictions that are not stable across independent runs. To optimize this multi-criteria trade-off, we present a new framework that efficiently finds a principled balance between these three considerations using Bayesian optimisation. The pitfalls of optimising forest parameters purely for error reduction are demonstrated using two publicly available real world datasets. We show that our framework leads to parameter settings that are markedly different from the values discovered by error reduction metrics alone.
ISBN:	9783319712727 3319712721
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-71273-4_9