Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation

[Display omitted] The materials discovery problem usually aims to identify novel “outlier” materials with extremely low or high property values outside of the scope of all known materials. It can be mapped as an explorative prediction problem. However, currently the performance of machine learning a...

Full description

Saved in:

Bibliographic Details
Published in	Computational materials science Vol. 171; p. 109203
Main Authors	Xiong, Zheng, Cui, Yuxin, Liu, Zhonghao, Zhao, Yong, Hu, Ming, Hu, Jianjun
Format	Journal Article
Language	English
Published	Elsevier B.V 01.01.2020
Subjects	Cross-validation Exploration Extrapolation Interpolation Machine learning Materials discovery Performance evaluation Performance evaluation Extrapolation Interpolation Cross-validation Machine learning Exploration Materials discovery
Online Access	Get full text

Cover

Loading…

More Information
Summary:	[Display omitted] The materials discovery problem usually aims to identify novel “outlier” materials with extremely low or high property values outside of the scope of all known materials. It can be mapped as an explorative prediction problem. However, currently the performance of machine learning algorithms for materials property prediction is usually evaluated via k-fold cross-validation (CV) or holdout-test, which tend to over-estimate their explorative prediction performance in discovering novel materials. We propose k-fold-m-step forward cross-validation (kmFCV) as a new way for evaluating exploration performance in materials property prediction and conducted a comprehensive benchmark evaluation on the exploration performance of a variety of prediction models on materials property (including formation energy, band gap, and superconducting critical temperature) prediction with different materials representation and machine learning algorithms. Our results show that even though current machine learning models can achieve good results when evaluated with traditional CV, their explorative power is actually very low as shown by our proposed kmFCV evaluation method and the proposed exploration accuracy. More advanced explorative machine learning algorithms are strongly needed for new materials discovery.
ISSN:	0927-0256 1879-0801
DOI:	10.1016/j.commatsci.2019.109203