A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model

Feature selection (FS) is vitally important for determining the optimum subsets of features with effective information and maximizing the model accuracy. This study proposes a novel FS method based on global sensitivity analysis (GSA) for effectively determining the most relevant feature subsets and...

Full description

Saved in:
Bibliographic Details
Published inApplied soft computing Vol. 85; p. 105859
Main Author Zhang, Pin
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.12.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Feature selection (FS) is vitally important for determining the optimum subsets of features with effective information and maximizing the model accuracy. This study proposes a novel FS method based on global sensitivity analysis (GSA) for effectively determining the most relevant feature subsets and improving prediction performance of machine learning (ML)based models. Feature ranking is determined based on the results obtained from three global sensitivity analysis (GSA) including Pearson, Sobol’ and PAWN. This novel GSA-based FS method is applied to engineering practice with the combination of ML algorithm random forest (RF) to predict tunnelling-induced settlement prediction model. Meanwhile, the feature extraction method principle component analysis (PCA) is also used to develop RF-based model for comparing the performance of proposed GSA-based FS method. The results indicate the novel GSA-based FS method effectively determines the significance of input variables. The prediction performance of RF-based model with the integration of GSA-based FS methods is enhanced dramatically, and obviously outperforms the model with the integration of PCA-based dimensionality reduction method. •A novel global sensitivity analysis based feature selection method is proposed.•Proposed feature selection method is integrated with random forest.•The performance of proposed feature selection is compared with principle component analysis.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2019.105859