Efficient indexing of peptides for database search using Tide
The first step in the analysis of protein tandem mass spectrometry data typically involves searching the observed spectra against a protein database. During database search, the search engine must digest the proteins in the database into peptides, subject to digestion rules that are under user contr...
Saved in:
Published in | bioRxiv |
---|---|
Main Authors | , , |
Format | Paper |
Language | English |
Published |
Cold Spring Harbor
Cold Spring Harbor Laboratory Press
01.10.2022
Cold Spring Harbor Laboratory |
Edition | 1.1 |
Subjects | |
Online Access | Get full text |
ISSN | 2692-8205 2692-8205 |
DOI | 10.1101/2022.09.30.510396 |
Cover
Abstract | The first step in the analysis of protein tandem mass spectrometry data typically involves searching the observed spectra against a protein database. During database search, the search engine must digest the proteins in the database into peptides, subject to digestion rules that are under user control. The choice of these digestion parameters, as well as selection of post-translational modifications (PTMs), can dramatically affect the size of the search space and hence the statistical power of the search. The Tide search engine separates the creation of the peptide index from the database search step, thereby saving time by allowing a peptide index to be reused in multiple searches. Here we describe an improved implementation of the indexing component of Tide that consumes around four times less resources (CPU and RAM) than the previous version and can generate arbitrarily large peptide databases, limited by only the amount of available disk space. We use this improved implementation to explore the relationship between database size and the parameters controlling digestion and PTMs, as well as database size and statistical power. Our results can help guide practitioners in proper selection of these important parameters. Competing Interest Statement The authors have declared no competing interest. Footnotes * http://crux.ms |
---|---|
AbstractList | The first step in the analysis of protein tandem mass spectrometry data typically involves searching the observed spectra against a protein database. During database search, the search engine must digest the proteins in the database into peptides, subject to digestion rules that are under user control. The choice of these digestion parameters, as well as selection of post-translational modifications (PTMs), can dramatically affect the size of the search space and hence the statistical power of the search. The Tide search engine separates the creation of the peptide index from the database search step, thereby saving time by allowing a peptide index to be reused in multiple searches. Here we describe an improved implementation of the indexing component of Tide that consumes around four times less resources (CPU and RAM) than the previous version and can generate arbitrarily large peptide databases, limited by only the amount of available disk space. We use this improved implementation to explore the relationship between database size and the parameters controlling digestion and PTMs, as well as database size and statistical power. Our results can help guide practitioners in proper selection of these important parameters. The first step in the analysis of protein tandem mass spectrometry data typically involves searching the observed spectra against a protein database. During database search, the search engine must digest the proteins in the database into peptides, subject to digestion rules that are under user control. The choice of these digestion parameters, as well as selection of post-translational modifications (PTMs), can dramatically affect the size of the search space and hence the statistical power of the search. The Tide search engine separates the creation of the peptide index from the database search step, thereby saving time by allowing a peptide index to be reused in multiple searches. Here we describe an improved implementation of the indexing component of Tide that consumes around four times less resources (CPU and RAM) than the previous version and can generate arbitrarily large peptide databases, limited by only the amount of available disk space. We use this improved implementation to explore the relationship between database size and the parameters controlling digestion and PTMs, as well as database size and statistical power. Our results can help guide practitioners in proper selection of these important parameters. Competing Interest Statement The authors have declared no competing interest. Footnotes * http://crux.ms |
Author | Frank Lawrence Nii Adoquaye Acquaye Kertesz-Farkas, Attila William Stafford Noble |
Author_xml | – sequence: 1 givenname: Frank Lawrence Nii surname: Adoquaye Acquaye fullname: Adoquaye Acquaye, Frank Lawrence Nii organization: Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University – sequence: 2 givenname: Attila surname: Kertesz-Farkas fullname: Kertesz-Farkas, Attila organization: Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University – sequence: 3 givenname: William Stafford orcidid: 0000-0001-7283-4715 surname: Noble fullname: Noble, William Stafford email: william-noble@uw.edu organization: Paul G. Allen School of Computer Science and Engineering, University of Washington |
BookMark | eNpNjz1PwzAYhC1UJErpD2CLxMKSYL-v48QDA6r4kiqxlNnyJ7iCJNgpKv-eVGVgupPu0enunMy6vvOEXDJaMUbZDVCAisoKaVUzilKckDkICWULtJ7982dkmfOWUgpSMGz4nNzehxBt9N1YxM75fezeij4Ugx_G6HwuQp8Kp0dtdPZF9jrZ92KXD9Rmyi_IadAf2S__dEFeH-43q6dy_fL4vLpbl4ZRLso2tNoYFE6A56axLTS1xWDRQgDLMWg_EcC1Yc5pHixFBtZxcLxGFBoX5PrYa2Kf9vFbDSl-6vSjDs8VlQqpOj6f0KsjOqT-a-fzqLb9LnXTOgUNkxKxbhr8BW6cWZg |
ContentType | Paper |
Copyright | 2022. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. 2022, Posted by Cold Spring Harbor Laboratory |
Copyright_xml | – notice: 2022. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: 2022, Posted by Cold Spring Harbor Laboratory |
DBID | 8FE 8FH ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M7P PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS FX. |
DOI | 10.1101/2022.09.30.510396 |
DatabaseName | ProQuest SciTech Collection ProQuest Natural Science Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials Biological Science Collection ProQuest Central Natural Science Collection ProQuest One Community College ProQuest Central Korea ProQuest Central Student SciTech Premium Collection Biological Sciences Biological Science Database ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China bioRxiv |
DatabaseTitle | Publicly Available Content Database ProQuest Central Student ProQuest One Academic Middle East (New) ProQuest Biological Science Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection Biological Science Database ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: FX. name: bioRxiv url: https://www.biorxiv.org/ sourceTypes: Open Access Repository – sequence: 2 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Statistics Biology |
EISSN | 2692-8205 |
Edition | 1.1 |
ExternalDocumentID | 2022.09.30.510396v1 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FH ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M7P NQS PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PROAC RHI FX. |
ID | FETCH-LOGICAL-b1046-8f8abb36d62e4b7c8275c3fc3c2f2c43faef8a24ab1dda4fc0312cd42d45336a3 |
IEDL.DBID | FX. |
ISSN | 2692-8205 |
IngestDate | Tue Jan 07 18:57:04 EST 2025 Fri Jul 25 09:19:21 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
License | This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-b1046-8f8abb36d62e4b7c8275c3fc3c2f2c43faef8a24ab1dda4fc0312cd42d45336a3 |
Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 Competing Interest Statement: The authors have declared no competing interest. |
ORCID | 0000-0001-7283-4715 |
OpenAccessLink | https://www.biorxiv.org/content/10.1101/2022.09.30.510396 |
PQID | 2719933577 |
PQPubID | 2050091 |
PageCount | 11 |
ParticipantIDs | biorxiv_primary_2022_09_30_510396 proquest_journals_2719933577 |
PublicationCentury | 2000 |
PublicationDate | 20221001 |
PublicationDateYYYYMMDD | 2022-10-01 |
PublicationDate_xml | – month: 10 year: 2022 text: 20221001 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Cold Spring Harbor |
PublicationPlace_xml | – name: Cold Spring Harbor |
PublicationTitle | bioRxiv |
PublicationYear | 2022 |
Publisher | Cold Spring Harbor Laboratory Press Cold Spring Harbor Laboratory |
Publisher_xml | – name: Cold Spring Harbor Laboratory Press – name: Cold Spring Harbor Laboratory |
References | Sulimov, Kertész-Farkas (2022.09.30.510396v1.10) 2020; 19 Kang, Lee, Byun, Han, Choi, Hwang, Lee (2022.09.30.510396v1.7) 2021 Eng, Jahan, Hoopmann (2022.09.30.510396v1.1) 2012; 13 Diament, Noble (2022.09.30.510396v1.2) 2011; 10 Kamaliddin, Guillochon, Salnot, Rombaut, Huguet, Guillonneau, Houzé, Cot, Deloron, Argy (2022.09.30.510396v1.6) 2021; 20 Park, Klammer, Käll, MacCoss, Noble (2022.09.30.510396v1.3) 2008; 7 Lin, Short, Noble, Keich (2022.09.30.510396v1.12) 2022 Huebbers, Büttgen, Leissing, Mantz, Pauly, Huesgen, Panstruga (2022.09.30.510396v1.8) 2022; 18 Käll, Canterbury, Weston, Noble, MacCoss (2022.09.30.510396v1.4) 2007; 4 Gao, Ping, Duong, Zhang, Dammer, Li, Chen, Chang, Gao, Wu (2022.09.30.510396v1.5) 2021; 20 He, Li, Fu, Gong, Sun (2022.09.30.510396v1.13) 2018 Stopfer, Mesfin, Joughin, Lauffenburger, White (2022.09.30.510396v1.9) 2020; 11 Elias, Gygi (2022.09.30.510396v1.11) 2007; 4 |
References_xml | – volume: 20 start-page: 1328 issue: 2 year: 2021 end-page: 1340 ident: 2022.09.30.510396v1.5 article-title: Mass-spectrometry-based near-complete draft of the Saccharomyces cerevisiae proteome publication-title: In: Journal of Proteome Research – volume: 13 start-page: 22 issue: 1 year: 2012 end-page: 24 ident: 2022.09.30.510396v1.1 article-title: Comet: an open source tandem mass spectrometry sequence database search tool publication-title: In: Proteomics – volume: 10 start-page: 3871 issue: 9 year: 2011 end-page: 3879 ident: 2022.09.30.510396v1.2 article-title: Faster SEQUEST searching for peptide identification from tandem mass spectra publication-title: In: Journal of Proteome Research – year: 2022 ident: 2022.09.30.510396v1.12 article-title: Improving peptide-level mass spectrometry analysis via double competition publication-title: In: bioRxiv – start-page: 5292 year: 2021 ident: 2022.09.30.510396v1.7 article-title: Extracellular vesicles induce aggressive phenotype of luminal breast cancer cells by PKM2 phosphorylation publication-title: In: Frontiers in oncology – volume: 19 start-page: 1481 issue: 4 year: 2020 end-page: 1490 ident: 2022.09.30.510396v1.10 article-title: Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics publication-title: In: Journal of Proteome Research – year: 2018 ident: 2022.09.30.510396v1.13 article-title: A direct approach to false discovery rates by decoy permutations publication-title: In: arXiv preprint – volume: 20 start-page: 1206 issue: 2 year: 2021 end-page: 1216 ident: 2022.09.30.510396v1.6 article-title: Comprehensive analysis of transcript and protein relative abundance during blood stages of Plasmodium falciparum infection publication-title: In: Journal of Proteome Research – volume: 18 start-page: 1 issue: 1 year: 2022 end-page: 23 ident: 2022.09.30.510396v1.8 article-title: An advanced method for the release, enrichment and purification of high-quality Arabidopsis thaliana rosette leaf trichomes enables profound insights into the trichome proteome publication-title: In: Plant Methods – volume: 4 start-page: 207 issue: 3 year: 2007 end-page: 214 ident: 2022.09.30.510396v1.11 article-title: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry publication-title: In: Nature Methods – volume: 11 start-page: 1 issue: 1 year: 2020 end-page: 14 ident: 2022.09.30.510396v1.9 article-title: Multiplexed relative and absolute quantitative immunopeptidomics reveals MHC I repertoire alterations induced by CDK4/6 inhibition publication-title: In: Nature Communications – volume: 7 start-page: 3022 issue: 7 year: 2008 end-page: 3027 ident: 2022.09.30.510396v1.3 article-title: Rapid and accurate peptide identification from tandem mass spectra publication-title: In: Journal of Proteome Research – volume: 4 start-page: 923 year: 2007 end-page: 25 ident: 2022.09.30.510396v1.4 article-title: A semi-supervised machine learning technique for peptide identification from shotgun proteomics datasets publication-title: In: Nature Methods |
SSID | ssj0002961374 |
Score | 1.6691797 |
SecondaryResourceType | preprint |
Snippet | The first step in the analysis of protein tandem mass spectrometry data typically involves searching the observed spectra against a protein database. During... |
SourceID | biorxiv proquest |
SourceType | Open Access Repository Aggregation Database |
SubjectTerms | Bioinformatics Digestion Mass spectroscopy Peptides Post-translation Search engines Statistics |
SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LS8NAEF60pdCbVsVqlRW8RpPdzeskKC1FsBRpobewT-mliX2I_ntnkq0eBG-BDTl8O6_MfvsNIbc4tV1ZcCSmIhOI3LpAKdTGczq2Mo0TzvA28sskGc_F8yJe-IbbxtMq9zGxDtSm1Ngjv2cpUs14nKYP1XuAU6PwdNWP0DgkbQjBGdh5-3E4mb7-dFlYDumqlmJmSQ6uz8LYH22CKeKPP0OVUx7eobAcCvd31LJcfy4__oTmOt-Mjkh7Kiu7PiYHdtUjnWZg5FePdLE2bKSVTwgqDy_r-4y01jyEJERLRyvkqRi7oVCOUiSAYqKijUVTpLm_0Rmsn5L5aDh7Ggd-GkKg8Bw2yFwmleKJSZgVKtUZS2MNkHLNHNOCO2nhDSYkgG6kcBrclWkjmBFQ0iWSn5HWqlzZc0JlpHVkVa4iocGB4UGKxDAT5cLZKNZ9cuNhKKpG86JAqIowL3hYNFD1yWAPUOHNflP8btLF_8uXpItfbFhxA9Larnf2CrL7Vl37LfwG5cOh6w priority: 102 providerName: ProQuest |
Title | Efficient indexing of peptides for database search using Tide |
URI | https://www.proquest.com/docview/2719933577 https://www.biorxiv.org/content/10.1101/2022.09.30.510396 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEB60RfDmEx-1rOA1JftI0lyVliJYirTQ27JP6aUtbRX9984kUQQ9eAt5ki8zO7OZb78BuKOu7TagIwnLfaLKEBNrSRsvuiyYIsuloNXIT-N8NFOP82z-o9UX0SrtYrV5X7xVdXwibOPoWzt3ymmuLkiYVKY90oIr831oo0kJ6townPe-f6-IEuNUoZo65p9XYsbbPOnXOFwFl-ERtCdmHTbHsBeWJ3BQd4f8OAVSFl5U6xVZpWmIQYatIlsTD8WHLcN0kxHBkwIRqy2WEY39hU3x-BnMhoPpwyhpuh0kluqsST_2jbUy97kIyhauL4rMIWTSiSicktEEPEMog6B6o6JDdxTOK-EVpmy5kefQWq6W4QKY4c7xYEvLlUMHxQ2jci88L1UMPHOXcNu8uV7Xmhaa0NFpqWWqa3QuofOFiW7MeqtFQXw_mRXF1T9ucQ2HtK-mvnWgtdu8hhsM4Tvbhfb9YDx57lYf7ROH8pe3 |
linkProvider | Cold Spring Harbor Laboratory Press |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB60RfSmVfHtCnqMNpNN0hxE8EVrtYhU8Bb3Femlra3PP-VvdCZJ9SB46y2wYVm-nZ2ZnZ35BmCfu7ZrRwcJtW89mbjM05q58TITOhWHUYBcjXzTiZr38uohfJiBr0ktDKdVTnRirqjtwHCM_AhjTjULwjg-GT573DWKX1cnLTQKsWi7z3e6so2PW-e0vweIlxfds6ZXdhXwNL9neo2sobQOIhuhkzo2DYxDQ0sLDGZoZJApR3-gVLR4q2RmSOzRWIlWkmsUqYDmnYWq5IrWClRPLzq3dz9RHUzIPObUzxglpGqwHpZPqST6HGhAZlUN6odMZMeNAuZ0bzD66L39MQW5fbtchOqtGrrREsy4fg3migaVnzVYYF-0oHJeBmY67uX1kyLnWCSjJwaZGHJejHVjQe6v4IRTNoyigEpwWv2T6NL4CtxPBadVqPQHfbcGQvnG-E4n2peGFAZ9KBlZtH4iM-eHZh32ShjSYcGxkTJUaT1Jg3paQLUOWxOA0vKYjdNfodj4f3gX5pvdm-v0utVpb8ICz15k5G1B5WX06rbJs3jRO-V2CnictgR9A23k4CA |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF7UonjzidWqK3hNyD6SNGc11FfpoYXeln1KL21oq-i_dyaJIujBW2DDhp3M7MzufPMNIdfYtd14MCRumItk4UNkDHLjBZt6naeZ4FiN_DzMBhP5ME2nP2phEFZpZovl--ytzuMjYBt238a4E4ZndY7EpCKJkQuuyGK8po4rFzZJB3SLoWaX0_j7noUX4LBy2SY0_5wCQt_2k7825NrLlHukM9KVX-6TDT8_INtNm8iPQ4IUw7O6cJHW5Ibgbegi0AoBKc6vKMSdFJGe6JFoo7oU8ewvdAzjR2RS3o1vBlHb9iAymHCN-qGvjRGZy7iXJrd9nqcWZCcsD9xKEbSHN7jUIF2nZbBgl9w6yZ2E2C3T4phszRdzf0KoZtYybwrDpAVLhQctM8cdK2TwLLVdctWuXFUNuYVC6aikUCJRjXS6pPclE9Xq90rxHIF_Is3z039McUl2RrelerofPp6RXRxu4HA9srVevvpzcOtrc1H_t09tk5wF |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Efficient+indexing+of+peptides+for+database+search+using+Tide&rft.jtitle=bioRxiv&rft.au=Adoquaye+Acquaye%2C+Frank+Lawrence+Nii&rft.au=Kertesz-Farkas%2C+Attila&rft.au=Noble%2C+William+Stafford&rft.date=2022-10-01&rft.pub=Cold+Spring+Harbor+Laboratory&rft.eissn=2692-8205&rft_id=info:doi/10.1101%2F2022.09.30.510396&rft.externalDocID=2022.09.30.510396v1 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2692-8205&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2692-8205&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2692-8205&client=summon |