Faster and more accurate graphical model identification of tandem mass spectra using trellises

Tandem mass spectrometry (MS/MS) is the dominant high throughput technology for identifying and quantifying proteins in complex biological samples. Analysis of the tens of thousands of fragmentation spectra produced by an MS/MS experiment begins by assigning to each observed spectrum the peptide tha...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 32; no. 12; pp. i322 - i331
Main Authors Wang, Shengjie, Halloran, John T, Bilmes, Jeff A, Noble, William S
Format Journal Article
LanguageEnglish
Published England Oxford University Press 15.06.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Tandem mass spectrometry (MS/MS) is the dominant high throughput technology for identifying and quantifying proteins in complex biological samples. Analysis of the tens of thousands of fragmentation spectra produced by an MS/MS experiment begins by assigning to each observed spectrum the peptide that is hypothesized to be responsible for generating the spectrum. This assignment is typically done by searching each spectrum against a database of peptides. To our knowledge, all existing MS/MS search engines compute scores individually between a given observed spectrum and each possible candidate peptide from the database. In this work, we use a trellis, a data structure capable of jointly representing a large set of candidate peptides, to avoid redundantly recomputing common sub-computations among different candidates. We show how trellises may be used to significantly speed up existing scoring algorithms, and we theoretically quantify the expected speedup afforded by trellises. Furthermore, we demonstrate that compact trellis representations of whole sets of peptides enables efficient discriminative learning of a dynamic Bayesian network for spectrum identification, leading to greatly improved spectrum identification accuracy. bilmes@uw.edu or william-noble@uw.edu Supplementary data are available at Bioinformatics online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4803
1367-4811
1460-2059
DOI:10.1093/bioinformatics/btw269