One class classification as a practical approach for accelerating π-π co-crystal discovery

The implementation of machine learning models has brought major changes in the decision-making process for materials design. One matter of concern for the data-driven approaches is the lack of negative data from unsuccessful synthetic attempts, which might generate inherently imbalanced datasets. We...

Full description

Saved in:
Bibliographic Details
Published inChemical science (Cambridge) Vol. 12; no. 5; pp. 172 - 1719
Main Authors Vriza, Aikaterini, Canaj, Angelos B, Vismara, Rebecca, Kershaw Cook, Laurence J, Manning, Troy D, Gaultois, Michael W, Wood, Peter A, Kurlin, Vitaliy, Berry, Neil, Dyer, Matthew S, Rosseinsky, Matthew J
Format Journal Article
LanguageEnglish
Published England Royal Society of Chemistry 08.12.2020
The Royal Society of Chemistry
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The implementation of machine learning models has brought major changes in the decision-making process for materials design. One matter of concern for the data-driven approaches is the lack of negative data from unsuccessful synthetic attempts, which might generate inherently imbalanced datasets. We propose the application of the one-class classification methodology as an effective tool for tackling these limitations on the materials design problems. This is a concept of learning based only on a well-defined class without counter examples. An extensive study on the different one-class classification algorithms is performed until the most appropriate workflow is identified for guiding the discovery of emerging materials belonging to a relatively small class, that being the weakly bound polyaromatic hydrocarbon co-crystals. The two-step approach presented in this study first trains the model using all the known molecular combinations that form this class of co-crystals extracted from the Cambridge Structural Database (1722 molecular combinations), followed by scoring possible yet unknown pairs from the ZINC15 database (21 736 possible molecular combinations). Focusing on the highest-ranking pairs predicted to have higher probability of forming co-crystals, materials discovery can be accelerated by reducing the vast molecular space and directing the synthetic efforts of chemists. Further on, using interpretability techniques a more detailed understanding of the molecular properties causing co-crystallization is sought after. The applicability of the current methodology is demonstrated with the discovery of two novel co-crystals, namely pyrene-6 H -benzo[ c ]chromen-6-one ( 1 ) and pyrene-9,10-dicyanoanthracene ( 2 ). Machine learning using one class classification on a database of existing co-crystals enables the identification of co-formers which are likely to form stable co-crystals, resulting in the synthesis of two co-crystals of polyaromatic hydrocarbons.
Bibliography:2014576-2014577
Electronic supplementary information (ESI) available: X-ray crystallographic details and ML models description. CCDC
For ESI and crystallographic data in CIF or other electronic format see DOI
c
benzo
chromen-6-one (CIF). Code availability
H
10.1039/d0sc04263c
Crystallographic data for pyrene-9,10-dicyanoanthracene and pyrene-6
https://github.com/lrcfmd/cocrystal_design
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2041-6520
2041-6539
DOI:10.1039/d0sc04263c