Assessment of Predicting Frontier Orbital Energies for Small Organic Molecules Using Knowledge-Based and Structural Information
A systematic comparison is demonstrated for the predictions of frontier orbital energieshighest occupied molecular orbital (HOMO) (E H), lowest unoccupied molecular orbital (LUMO) (E L), and energy gap (ΔE HL) of the molecules in the QM9 dataset, where it contains 120k-plus three-dimensional organi...
Saved in:
Published in | ACS Engineering Au Vol. 2; no. 4; pp. 360 - 368 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
American Chemical Society
17.08.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A systematic comparison is demonstrated for the predictions of frontier orbital energieshighest occupied molecular orbital (HOMO) (E H), lowest unoccupied molecular orbital (LUMO) (E L), and energy gap (ΔE HL) of the molecules in the QM9 dataset, where it contains 120k-plus three-dimensional organic molecule structures determined by first-principles simulations. The target molecular properties (E H, E L, and ΔE HL) are predicted using linear regression (LR), machine learning (random forest, RF), and continuous-filter convolutional neural network (SchNET) approaches. LR and RF models built upon various knowledge-based descriptors, being derived from SMILES of the molecules, can provide predictivity of the target properties with the mean absolute errors (MAEs) 4–6 times the chemical accuracy (0.043 eV). The best approach, SchNET, using the graph representation derived from molecular Cartesian coordinates, is confirmed to provide MAEs of E H, E L, and ΔE HL at 0.051, 0.041, and 0.076 eV, respectively. With the introduction of bond-step matrix representation with the SchNET model, the computational cost of dataset preparation can be substantially reduced, and the corresponding MAEs increase moderately to 2–3 times the chemical accuracy. The chemical interpretation of the important descriptors identified in the LR and RF models appears to align with the chemical knowledge of describing these molecular electronic properties but is accompanied with tolerable prediction errors. The combination of bond-step representation and the SchNET model can provide an assessable and balanced option for the high-throughput screening of organic molecules and the development of the data science approach. |
---|---|
ISSN: | 2694-2488 2694-2488 |
DOI: | 10.1021/acsengineeringau.2c00011 |