Automated extraction of chemical synthesis actions from experimental procedures

Experimental procedures for chemical synthesis are commonly reported in prose in patents or in the scientific literature. The extraction of the details necessary to reproduce and validate a synthesis in a chemical laboratory is often a tedious task requiring extensive human intervention. We present...

Full description

Saved in:
Bibliographic Details
Published inNature communications Vol. 11; no. 1; p. 3601
Main Authors Vaucher, Alain C., Zipoli, Federico, Geluykens, Joppe, Nair, Vishnu H., Schwaller, Philippe, Laino, Teodoro
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 17.07.2020
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Experimental procedures for chemical synthesis are commonly reported in prose in patents or in the scientific literature. The extraction of the details necessary to reproduce and validate a synthesis in a chemical laboratory is often a tedious task requiring extensive human intervention. We present a method to convert unstructured experimental procedures written in English to structured synthetic steps (action sequences) reflecting all the operations needed to successfully conduct the corresponding chemical reactions. To achieve this, we design a set of synthesis actions with predefined properties and a deep-learning sequence to sequence model based on the transformer architecture to convert experimental procedures to action sequences. The model is pretrained on vast amounts of data generated automatically with a custom rule-based natural language processing approach and refined on manually annotated samples. Predictions on our test set result in a perfect (100%) match of the action sequence for 60.8% of sentences, a 90% match for 71.3% of sentences, and a 75% match for 82.4% of sentences. Extracting experimental operations for chemical synthesis from procedures reported in prose is a tedious task. Here the authors develop a deep-learning model based on the transformer architecture to translate experimental procedures from the field of organic chemistry into synthesis actions.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-020-17266-6