Feature selection method reducing correlations among features by embedding domain knowledge

Selecting proper descriptors, also known as features, is one of the key problems in modeling for materials properties using machine learning models. Redundant features reduce accuracy of machine learning modeling, and results of purely data-driven feature selection methods are often inconsistent wit...

Full description

Saved in:
Bibliographic Details
Published inActa materialia Vol. 238; p. 118195
Main Authors Liu, Yue, Zou, Xinxin, Ma, Shuchang, Avdeev, Maxim, Shi, Siqi
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.10.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Selecting proper descriptors, also known as features, is one of the key problems in modeling for materials properties using machine learning models. Redundant features reduce accuracy of machine learning modeling, and results of purely data-driven feature selection methods are often inconsistent with materials domain knowledge. Herein, a feature selection method embedded with materials domain knowledge named NCOR-FS is proposed to select higher quality features. The method translates materials domain knowledge about highly correlated features into Non-Co-Occurrence Rules (NCORs), which allows to quantify the degree to which NCORs are violated by feature subsets and to design optimization process for FS method based on swarm intelligence algorithm. Experiments on seven datasets show that compared with multiple other FS methods commonly used in materials, NCOR-FS selects the feature subset with more appropriate number of highly correlated features, which improves the prediction accuracy and interpretability of the ML model. NCOR-FS can be applied to any materials systems, and the idea of embedding domain knowledge into data-driven algorithm is expected to facilitate constructing extensive machine learning models embedded with materials domain knowledge. [Display omitted]
ISSN:1359-6454
1873-2453
DOI:10.1016/j.actamat.2022.118195