One class classification as a practical approach for accelerating π-π co-crystal discovery
The implementation of machine learning models has brought major changes in the decision-making process for materials design. One matter of concern for the data-driven approaches is the lack of negative data from unsuccessful synthetic attempts, which might generate inherently imbalanced datasets. We...
Saved in:
Published in | Chemical science (Cambridge) Vol. 12; no. 5; pp. 172 - 1719 |
---|---|
Main Authors | , , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
England
Royal Society of Chemistry
08.12.2020
The Royal Society of Chemistry |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The implementation of machine learning models has brought major changes in the decision-making process for materials design. One matter of concern for the data-driven approaches is the lack of negative data from unsuccessful synthetic attempts, which might generate inherently imbalanced datasets. We propose the application of the one-class classification methodology as an effective tool for tackling these limitations on the materials design problems. This is a concept of learning based only on a well-defined class without counter examples. An extensive study on the different one-class classification algorithms is performed until the most appropriate workflow is identified for guiding the discovery of emerging materials belonging to a relatively small class, that being the weakly bound polyaromatic hydrocarbon co-crystals. The two-step approach presented in this study first trains the model using all the known molecular combinations that form this class of co-crystals extracted from the Cambridge Structural Database (1722 molecular combinations), followed by scoring possible yet unknown pairs from the ZINC15 database (21 736 possible molecular combinations). Focusing on the highest-ranking pairs predicted to have higher probability of forming co-crystals, materials discovery can be accelerated by reducing the vast molecular space and directing the synthetic efforts of chemists. Further on, using interpretability techniques a more detailed understanding of the molecular properties causing co-crystallization is sought after. The applicability of the current methodology is demonstrated with the discovery of two novel co-crystals, namely pyrene-6
H
-benzo[
c
]chromen-6-one (
1
) and pyrene-9,10-dicyanoanthracene (
2
).
Machine learning using one class classification on a database of existing co-crystals enables the identification of co-formers which are likely to form stable co-crystals, resulting in the synthesis of two co-crystals of polyaromatic hydrocarbons. |
---|---|
Bibliography: | 2014576-2014577 Electronic supplementary information (ESI) available: X-ray crystallographic details and ML models description. CCDC For ESI and crystallographic data in CIF or other electronic format see DOI c benzo chromen-6-one (CIF). Code availability H 10.1039/d0sc04263c Crystallographic data for pyrene-9,10-dicyanoanthracene and pyrene-6 https://github.com/lrcfmd/cocrystal_design ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2041-6520 2041-6539 |
DOI: | 10.1039/d0sc04263c |