Attention-based generative models for de novo molecular design

Attention mechanisms have led to many breakthroughs in sequential data modeling but have yet to be incorporated into any generative algorithms for molecular design. Here we explore the impact of adding self-attention layers to generative β -VAE models and show that those with attention are able to l...

Full description

Saved in:
Bibliographic Details
Published inChemical science (Cambridge) Vol. 12; no. 24; pp. 8362 - 8372
Main Authors Dollar, Orion, Joshi, Nisarg, Beck, David A. C., Pfaendtner, Jim
Format Journal Article
LanguageEnglish
Published England Royal Society of Chemistry 14.05.2021
The Royal Society of Chemistry
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Attention mechanisms have led to many breakthroughs in sequential data modeling but have yet to be incorporated into any generative algorithms for molecular design. Here we explore the impact of adding self-attention layers to generative β -VAE models and show that those with attention are able to learn a complex “molecular grammar” while improving performance on downstream tasks such as accurately sampling from the latent space (“model memory”) or exploring novel chemistries not present in the training data. There is a notable relationship between a model's architecture, the structure of its latent memory and its performance during inference. We demonstrate that there is an unavoidable tradeoff between model exploration and validity that is a function of the complexity of the latent memory. However, novel sampling schemes may be used that optimize this tradeoff. We anticipate that attention will play an important role in future molecular design algorithms that can make efficient use of the detailed molecular substructures learned by the transformer.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
EE0008492
USDOE
USDOE Office of Energy Efficiency and Renewable Energy (EERE)
ISSN:2041-6520
2041-6539
DOI:10.1039/D1SC01050F