Information-entropy-driven generation of material-agnostic datasets for machine-learning interatomic potentials

In contrast to their empirical counterparts, machine-learning interatomic potentials (MLIAPs) promise to deliver near-quantum accuracy over broad regions of configuration space. However, due to their generic functional forms and extreme flexibility, they can catastrophically fail to capture the prop...

Full description

Saved in:
Bibliographic Details
Published innpj computational materials Vol. 11; no. 1; pp. 218 - 17
Main Authors P. A. Subramanyam, Aparna, Perez, Danny
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 07.07.2025
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In contrast to their empirical counterparts, machine-learning interatomic potentials (MLIAPs) promise to deliver near-quantum accuracy over broad regions of configuration space. However, due to their generic functional forms and extreme flexibility, they can catastrophically fail to capture the properties of novel, out-of-sample configurations, making the quality of the training set a determining factor, especially when investigating materials under extreme conditions. We propose a novel automated dataset generation method based on the maximization of the information entropy of the feature distribution, aiming at an extremely broad coverage of the configuration space in a way that is agnostic to the properties of specific target materials. The ability of the dataset to capture unique material properties is demonstrated on a range of unary materials, including elements with the FCC (Al), BCC (W), HCP (Be, Re and Os), graphite (C), and trigonal (Sb, Te) ground states. MLIAPs trained to this dataset are shown to be accurate over a range of application-relevant metrics, as well as extremely robust over very broad swaths of configurations space, even without dataset fine-tuning or hyper-parameter optimization, making the approach extremely attractive to rapidly and autonomously develop general-purpose MLIAPs suitable for simulations in extreme conditions.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2057-3960
2057-3960
DOI:10.1038/s41524-025-01602-9