Convergent selection in antibody repertoires is revealed by deep learning

Adaptive immunity is driven by the ability of lymphocytes to undergo V(D)J recombination and generate a highly diverse set of immune receptors (B cell receptors/secreted antibodies and T cell receptors) and their subsequent clonal selection and expansion upon molecular recognition of foreign antigen...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Friedensohn, Simon, Neumeier, Daniel, Khan, Tarik A, Csepregi, Lucia, Parola, Cristina, Arthur R Gorter De Vries, Erlach, Lena, Mason, Derek M, Reddy, Sai T
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 26.02.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Adaptive immunity is driven by the ability of lymphocytes to undergo V(D)J recombination and generate a highly diverse set of immune receptors (B cell receptors/secreted antibodies and T cell receptors) and their subsequent clonal selection and expansion upon molecular recognition of foreign antigens. These principles lead to remarkable, unique and dynamic immune receptor repertoires. Deep sequencing provides increasing evidence for the presence of commonly shared (convergent) receptors across individual organisms within one species. Convergent selection of specific receptors towards various antigens offers one explanation for these findings. For example, single cases of convergence have been reported in antibody repertoires of viral infection or allergy. Recent studies demonstrate that convergent selection of sequence motifs within T cell receptor (TCR) repertoires can be identified on an even wider scale. Here we report that there is extensive convergent selection in antibody repertoires of mice for a range of protein antigens and immunization conditions. We employed a deep learning approach utilizing variational autoencoders (VAEs) to model the underlying process of B cell receptor (BCR) recombination and assume that the data generation follows a Gaussian mixture model (GMM) in latent space. This provides both a latent embedding and cluster labels that group similar sequences, thus enabling the discovery of a multitude of convergent, antigen-associated sequence patterns. Using a linear, one-versus-all support vector machine (SVM), we confirm that the identified sequence patterns are predictive of antigenic exposure and outperform predictions based on the occurrence of public clones. Recombinant expression of both natural and in silico-generated antibodies possessing convergent patterns confirms their binding specificity to target antigens. Our work highlights to which extent convergence in antibody repertoires can occur and shows how deep learning can be applied for immunodiagnostics and antibody discovery and engineering.
DOI:10.1101/2020.02.25.965673