A Transformer Model for Retrosynthesis

We describe a Transformer model for a retrosynthetic reaction prediction task. The model is trained on 45 033 experimental reaction examples extracted from USA patents. It can successfully predict the reactants set for 42.7% of cases on the external test set. During the training procedure, we applie...

Full description

Saved in:

Bibliographic Details
Published in	Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions pp. 817 - 830
Main Authors	Karpov, Pavel, Godin, Guillaume, Tetko, Igor V.
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing 2019
Series	Lecture Notes in Computer Science
Subjects	Character-based models Computer aided synthesis planning Retrosynthesis prediction Transformer
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We describe a Transformer model for a retrosynthetic reaction prediction task. The model is trained on 45 033 experimental reaction examples extracted from USA patents. It can successfully predict the reactants set for 42.7% of cases on the external test set. During the training procedure, we applied different learning rate schedules and snapshot learning. These techniques can prevent overfitting and thus can be a reason to get rid of internal validation dataset that is advantageous for deep models with millions of parameters. We thoroughly investigated different approaches to train Transformer models and found that snapshot learning with averaging weights on learning rates minima works best. While decoding the model output probabilities there is a strong influence of the temperature that improves at $$\text {T}=1.3$$ the accuracy of models up to 1–2%.
Bibliography:	Original Abstract: We describe a Transformer model for a retrosynthetic reaction prediction task. The model is trained on 45 033 experimental reaction examples extracted from USA patents. It can successfully predict the reactants set for 42.7% of cases on the external test set. During the training procedure, we applied different learning rate schedules and snapshot learning. These techniques can prevent overfitting and thus can be a reason to get rid of internal validation dataset that is advantageous for deep models with millions of parameters. We thoroughly investigated different approaches to train Transformer models and found that snapshot learning with averaging weights on learning rates minima works best. While decoding the model output probabilities there is a strong influence of the temperature that improves at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {T}=1.3$$\end{document} the accuracy of models up to 1–2%.
ISBN:	3030304922 9783030304928
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-30493-5_78