Development of a novel clustering tool for linear peptide sequences

Summary Epitopes identified in large‐scale screens of overlapping peptides often share significant levels of sequence identity, complicating the analysis of epitope‐related data. Clustering algorithms are often used to facilitate these analyses, but available methods are generally insufficient in th...

Full description

Saved in:
Bibliographic Details
Published inImmunology Vol. 155; no. 3; pp. 331 - 345
Main Authors Dhanda, Sandeep K., Vaughan, Kerrie, Schulten, Veronique, Grifoni, Alba, Weiskopf, Daniela, Sidney, John, Peters, Bjoern, Sette, Alessandro
Format Journal Article
LanguageEnglish
Published England Wiley Subscription Services, Inc 01.11.2018
John Wiley and Sons Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Summary Epitopes identified in large‐scale screens of overlapping peptides often share significant levels of sequence identity, complicating the analysis of epitope‐related data. Clustering algorithms are often used to facilitate these analyses, but available methods are generally insufficient in their capacity to define biologically meaningful epitope clusters in the context of the immune response. To fulfil this need we developed an algorithm that generates epitope clusters based on representative or consensus sequences. This tool allows the user to cluster peptide sequences on the basis of a specified level of identity by selecting among three different method options. These include the ‘clique method’, in which all members of the cluster must share the same minimal level of identity with each other, and the ‘connected graph method’, in which all members of a cluster must share a defined level of identity with at least one other member of the cluster. In cases where it is not possible to define a clear consensus sequence with the connected graph method, a third option provides a novel ‘cluster‐breaking algorithm’ for consensus sequence driven sub‐clustering. Herein we demonstrate the tool's clustering performance and applicability using (i) a selection of dengue virus epitopes for the ‘clique method’, (ii) sets of allergen‐derived peptides from related species for the ‘connected graph method’ and (iii) large data sets of eluted ligand, major histocompatibility complex binding and T‐cell recognition data captured within the Immune Epitope Database (IEDB) with the newly developed ‘cluster‐breaking algorithm’. This novel clustering tool is accessible at http://tools.iedb.org/cluster2/. We report on a novel tool for clustering linear peptide sequences based on the identity matrix using a network‐based approach. The tool is accessible at http://tools.iedb.org/cluster2/.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0019-2805
1365-2567
DOI:10.1111/imm.12984