Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
High-throughput reaction condition (RC) screening is fundamental to chemical synthesis. However, current RC screening suffers from laborious and costly trial-and-error workflows. Traditional computer-aided synthesis planning (CASP) tools fail to find suitable RCs due to data sparsity and inadequate...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
21.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | High-throughput reaction condition (RC) screening is fundamental to chemical
synthesis. However, current RC screening suffers from laborious and costly
trial-and-error workflows. Traditional computer-aided synthesis planning (CASP)
tools fail to find suitable RCs due to data sparsity and inadequate reaction
representations. Nowadays, large language models (LLMs) are capable of tackling
chemistry-related problems, such as molecule design, and chemical logic Q\&A
tasks. However, LLMs have not yet achieved accurate predictions of chemical
reaction conditions. Here, we present MM-RCR, a text-augmented multimodal LLM
that learns a unified reaction representation from SMILES, reaction graphs, and
textual corpus for chemical reaction recommendation (RCR). To train MM-RCR, we
construct 1.2 million pair-wised Q\&A instruction datasets. Our experimental
results demonstrate that MM-RCR achieves state-of-the-art performance on two
open benchmark datasets and exhibits strong generalization capabilities on
out-of-domain (OOD) and High-Throughput Experimentation (HTE) datasets. MM-RCR
has the potential to accelerate high-throughput condition screening in chemical
synthesis. |
---|---|
DOI: | 10.48550/arxiv.2407.15141 |