AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction
Epitope identification is vital for antibody design yet challenging due to the inherent variability in antibodies. While many deep learning methods have been developed for general protein binding site prediction tasks, whether they work for epitope prediction remains an understudied research questio...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
25.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Epitope identification is vital for antibody design yet challenging due to
the inherent variability in antibodies. While many deep learning methods have
been developed for general protein binding site prediction tasks, whether they
work for epitope prediction remains an understudied research question. The
challenge is also heightened by the lack of a consistent evaluation pipeline
with sufficient dataset size and epitope diversity. We introduce a filtered
antibody-antigen complex structure dataset, AsEP (Antibody-specific Epitope
Prediction). AsEP is the largest of its kind and provides clustered epitope
groups, allowing the community to develop and test novel epitope prediction
methods. AsEP comes with an easy-to-use interface in Python and pre-built graph
representations of each antibody-antigen complex while also supporting
customizable embedding methods. Based on this new dataset, we benchmarked
various representative general protein-binding site prediction methods and find
that their performances are not satisfactory as expected for epitope
prediction. We thus propose a new method, WALLE, that leverages both protein
language models and graph neural networks. WALLE demonstrate about 5X
performance gain over existing methods. Our empirical findings evidence that
epitope prediction benefits from combining sequential embeddings provided by
language models and geometrical information from graph representations,
providing a guideline for future method design. In addition, we reformulate the
task as bipartite link prediction, allowing easy model performance attribution
and interpretability. We open-source our data and code at
https://github.com/biochunan/AsEP-dataset. |
---|---|
DOI: | 10.48550/arxiv.2407.18184 |