Assessment of Predicting Frontier Orbital Energies for Small Organic Molecules Using Knowledge-Based and Structural Information

A systematic comparison is demonstrated for the predictions of frontier orbital energieshighest occupied molecular orbital (HOMO) (E H), lowest unoccupied molecular orbital (LUMO) (E L), and energy gap (ΔE HL) of the molecules in the QM9 dataset, where it contains 120k-plus three-dimensional organi...

Full description

Saved in:
Bibliographic Details
Published inACS Engineering Au Vol. 2; no. 4; pp. 360 - 368
Main Authors Ye, Zong-Rong, Hung, Sheng-Hsuan, Chen, Berlin, Tsai, Ming-Kang
Format Journal Article
LanguageEnglish
Published American Chemical Society 17.08.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A systematic comparison is demonstrated for the predictions of frontier orbital energieshighest occupied molecular orbital (HOMO) (E H), lowest unoccupied molecular orbital (LUMO) (E L), and energy gap (ΔE HL) of the molecules in the QM9 dataset, where it contains 120k-plus three-dimensional organic molecule structures determined by first-principles simulations. The target molecular properties (E H, E L, and ΔE HL) are predicted using linear regression (LR), machine learning (random forest, RF), and continuous-filter convolutional neural network (SchNET) approaches. LR and RF models built upon various knowledge-based descriptors, being derived from SMILES of the molecules, can provide predictivity of the target properties with the mean absolute errors (MAEs) 4–6 times the chemical accuracy (0.043 eV). The best approach, SchNET, using the graph representation derived from molecular Cartesian coordinates, is confirmed to provide MAEs of E H, E L, and ΔE HL at 0.051, 0.041, and 0.076 eV, respectively. With the introduction of bond-step matrix representation with the SchNET model, the computational cost of dataset preparation can be substantially reduced, and the corresponding MAEs increase moderately to 2–3 times the chemical accuracy. The chemical interpretation of the important descriptors identified in the LR and RF models appears to align with the chemical knowledge of describing these molecular electronic properties but is accompanied with tolerable prediction errors. The combination of bond-step representation and the SchNET model can provide an assessable and balanced option for the high-throughput screening of organic molecules and the development of the data science approach.
ISSN:2694-2488
2694-2488
DOI:10.1021/acsengineeringau.2c00011