Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation
[Display omitted] The materials discovery problem usually aims to identify novel “outlier” materials with extremely low or high property values outside of the scope of all known materials. It can be mapped as an explorative prediction problem. However, currently the performance of machine learning a...
Saved in:
Published in | Computational materials science Vol. 171; p. 109203 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.01.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | [Display omitted]
The materials discovery problem usually aims to identify novel “outlier” materials with extremely low or high property values outside of the scope of all known materials. It can be mapped as an explorative prediction problem. However, currently the performance of machine learning algorithms for materials property prediction is usually evaluated via k-fold cross-validation (CV) or holdout-test, which tend to over-estimate their explorative prediction performance in discovering novel materials. We propose k-fold-m-step forward cross-validation (kmFCV) as a new way for evaluating exploration performance in materials property prediction and conducted a comprehensive benchmark evaluation on the exploration performance of a variety of prediction models on materials property (including formation energy, band gap, and superconducting critical temperature) prediction with different materials representation and machine learning algorithms. Our results show that even though current machine learning models can achieve good results when evaluated with traditional CV, their explorative power is actually very low as shown by our proposed kmFCV evaluation method and the proposed exploration accuracy. More advanced explorative machine learning algorithms are strongly needed for new materials discovery. |
---|---|
ISSN: | 0927-0256 1879-0801 |
DOI: | 10.1016/j.commatsci.2019.109203 |