Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation

[Display omitted] The materials discovery problem usually aims to identify novel “outlier” materials with extremely low or high property values outside of the scope of all known materials. It can be mapped as an explorative prediction problem. However, currently the performance of machine learning a...

Full description

Saved in:
Bibliographic Details
Published inComputational materials science Vol. 171; p. 109203
Main Authors Xiong, Zheng, Cui, Yuxin, Liu, Zhonghao, Zhao, Yong, Hu, Ming, Hu, Jianjun
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.01.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] The materials discovery problem usually aims to identify novel “outlier” materials with extremely low or high property values outside of the scope of all known materials. It can be mapped as an explorative prediction problem. However, currently the performance of machine learning algorithms for materials property prediction is usually evaluated via k-fold cross-validation (CV) or holdout-test, which tend to over-estimate their explorative prediction performance in discovering novel materials. We propose k-fold-m-step forward cross-validation (kmFCV) as a new way for evaluating exploration performance in materials property prediction and conducted a comprehensive benchmark evaluation on the exploration performance of a variety of prediction models on materials property (including formation energy, band gap, and superconducting critical temperature) prediction with different materials representation and machine learning algorithms. Our results show that even though current machine learning models can achieve good results when evaluated with traditional CV, their explorative power is actually very low as shown by our proposed kmFCV evaluation method and the proposed exploration accuracy. More advanced explorative machine learning algorithms are strongly needed for new materials discovery.
ISSN:0927-0256
1879-0801
DOI:10.1016/j.commatsci.2019.109203