InstructBioMol: Advancing Biomolecule Understanding and Design Following Human Instructions
Understanding and designing biomolecules, such as proteins and small molecules, is central to advancing drug discovery, synthetic biology, and enzyme engineering. Recent breakthroughs in Artificial Intelligence (AI) have revolutionized biomolecular research, achieving remarkable accuracy in biomolec...
Saved in:
Main Authors | , , , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
10.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Understanding and designing biomolecules, such as proteins and small
molecules, is central to advancing drug discovery, synthetic biology, and
enzyme engineering. Recent breakthroughs in Artificial Intelligence (AI) have
revolutionized biomolecular research, achieving remarkable accuracy in
biomolecular prediction and design. However, a critical gap remains between
AI's computational power and researchers' intuition, using natural language to
align molecular complexity with human intentions. Large Language Models (LLMs)
have shown potential to interpret human intentions, yet their application to
biomolecular research remains nascent due to challenges including specialized
knowledge requirements, multimodal data integration, and semantic alignment
between natural language and biomolecules. To address these limitations, we
present InstructBioMol, a novel LLM designed to bridge natural language and
biomolecules through a comprehensive any-to-any alignment of natural language,
molecules, and proteins. This model can integrate multimodal biomolecules as
input, and enable researchers to articulate design goals in natural language,
providing biomolecular outputs that meet precise biological needs. Experimental
results demonstrate InstructBioMol can understand and design biomolecules
following human instructions. Notably, it can generate drug molecules with a
10% improvement in binding affinity and design enzymes that achieve an ESP
Score of 70.4, making it the only method to surpass the enzyme-substrate
interaction threshold of 60.0 recommended by the ESP developer. This highlights
its potential to transform real-world biomolecular research. |
---|---|
DOI: | 10.48550/arxiv.2410.07919 |