Automated pipeline for superalloy data by text mining

Data provides a foundation for machine learning, which has accelerated data-driven materials design. The scientific literature contains a large amount of high-quality, reliable data, and automatically extracting data from the literature continues to be a challenge. We propose a natural language proc...

Full description

Saved in:
Bibliographic Details
Published innpj computational materials Vol. 8; no. 1; pp. 1 - 12
Main Authors Wang, Weiren, Jiang, Xue, Tian, Shaohan, Liu, Pei, Dang, Depeng, Su, Yanjing, Lookman, Turab, Xie, Jianxin
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 19.01.2022
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Data provides a foundation for machine learning, which has accelerated data-driven materials design. The scientific literature contains a large amount of high-quality, reliable data, and automatically extracting data from the literature continues to be a challenge. We propose a natural language processing pipeline to capture both chemical composition and property data that allows analysis and prediction of superalloys. Within 3 h, 2531 records with both composition and property are extracted from 14,425 articles, covering γ ′ solvus temperature, density, solidus, and liquidus temperatures. A data-driven model for γ ′ solvus temperature is built to predict unexplored Co-based superalloys with high γ ′ solvus temperatures within a relative error of 0.81%. We test the predictions via synthesis and characterization of three alloys. A web-based toolkit as an online open-source platform is provided and expected to serve as the basis for a general method to search for targeted materials using data extracted from the literature.
ISSN:2057-3960
2057-3960
DOI:10.1038/s41524-021-00687-2