Structural pre-training improves physical accuracy of antibody structure prediction using deep learning

Protein folding problem obtained a practical solution recently, owing to advances in deep learning. There are classes of proteins though, such as antibodies, that are structurally unique, where the general solution still lacks. In particular, the prediction of the CDR-H3 loop, which is an instrument...

Full description

Saved in:

Bibliographic Details
Published in	ImmunoInformatics Vol. 11; p. 100028
Main Authors	Kończak, Jarosław, Janusz, Bartosz, Młokosiewicz, Jakub, Satława, Tadeusz, Wróbel, Sonia, Dudzic, Paweł, Krawczyk, Konrad
Format	Journal Article
Language	English
Published	Elsevier B.V 01.09.2023
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Protein folding problem obtained a practical solution recently, owing to advances in deep learning. There are classes of proteins though, such as antibodies, that are structurally unique, where the general solution still lacks. In particular, the prediction of the CDR-H3 loop, which is an instrumental part of an antibody in its antigen recognition abilities, remains a challenge. Antibody-specific deep learning frameworks were proposed to tackle this problem noting great progress, both on accuracy and speed fronts. Oftentimes though, the original networks produce physically implausible bond geometries that then need to undergo a time-consuming energy minimization process. Here we hypothesized that pre-training the network on a large, augmented set of models with correct physical geometries, rather than a small set of real antibody X-ray structures, would allow the network to learn better bond geometries. We show that fine-tuning such a pre-trained network on a task of shape prediction on real X-ray structures improves the number of correct peptide bond distances, abstracted as the Cα distances. We further demonstrate that pre-training allows the network to produce physically plausible shapes on an artificial set of CDR-H3s, showing the ability to generalize to the vast antibody sequence space. We hope that our strategy will benefit the development of deep learning antibody models that rapidly generate physically plausible geometries, without the burden of time-consuming energy minimization. [Display omitted]
ISSN:	2667-1190
DOI:	10.1016/j.immuno.2023.100028