Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies
Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, di...
Saved in:
Published in | Journal of proteome research Vol. 11; no. 11; pp. 5221 - 5234 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
United States
American Chemical Society
02.11.2012
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five “incorrect” targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives. |
---|---|
AbstractList | Proteogenomics has the potential to advance genome annotation
through high quality peptide identifications derived from mass spectrometry
experiments, which demonstrate a given gene or isoform is expressed
and translated at the protein level. This can advance our understanding
of genome function, discovering novel genes and gene structure that
have not yet been identified or validated. Because of the high-throughput
shotgun nature of most proteomics experiments, it is essential to
carefully control for false positives and prevent any potential misannotation.
A number of statistical procedures to deal with this are in wide use
in proteomics, calculating false discovery rate (FDR) and posterior
error probability (PEP) values for groups and individual peptide spectrum
matches (PSMs). These methods control for multiple testing and exploit
decoy databases to estimate statistical significance. Here, we show
that database choice has a major effect on these confidence estimates
leading to significant differences in the number of PSMs reported.
We note that standard target:decoy approaches using six-frame translations
of nucleotide sequences, such as assembled transcriptome data, apparently
underestimate the confidence assigned to the PSMs. The source of this
error stems from the inflated and unusual nature of the six-frame
database, where for every target sequence there exists five “incorrect”
targets that are unlikely to code for protein. The attendant FDR and
PEP estimates lead to fewer accepted PSMs at fixed thresholds, and
we show that this effect is a product of the database and statistical
modeling and not the search engine. A variety of approaches to limit
database size and remove noncoding target sequences are examined and
discussed in terms of the altered statistical estimates generated
and PSMs reported. These results are of importance to groups carrying
out proteogenomics, aiming to maximize the validation and discovery
of gene structure in sequenced genomes, while still controlling for
false positives. Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives. |
Author | Overton, Ian M Blakeley, Paul Hubbard, Simon J |
AuthorAffiliation | The University of Manchester University of Edinburgh |
AuthorAffiliation_xml | – name: The University of Manchester – name: University of Edinburgh |
Author_xml | – sequence: 1 givenname: Paul surname: Blakeley fullname: Blakeley, Paul – sequence: 2 givenname: Ian M surname: Overton fullname: Overton, Ian M – sequence: 3 givenname: Simon J surname: Hubbard fullname: Hubbard, Simon J email: simon.hubbard@manchester.ac.uk |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/23025403$$D View this record in MEDLINE/PubMed |
BookMark | eNptkU1LxDAQhoMofh_8A9KLoIdqPpq2exH8VhAV1HOYptM10k12k1Tw3xtdXRQ8Zcg8PDPMu0GWrbNIyA6jh4xydjT1gtKCsdkSWWdSyFyMaLX8U9cjsUY2QnillMmKilWyxgXlsqBinZiTtvUYgrHj7DFCNCEaDX12aiBgyIzN7gbdo4umxfwcvXnDNnvwLmJqnUOE5ovrnJ__ujFaNzE6e0Tw-iU5PUQcGwxbZKWDPuD297tJni8vns6u89v7q5uzk9scCipjzlvWVE0DtCpLaLuiLjkFWWmBgLruEGVdY4mik5p1WGAFomEFHyHItih4KTbJ8dw7HZoJthptWqFXU28m4N-VA6P-dqx5UWP3pkS6TTXiSbD_LfBuNmCIamKCxr4Hi24IijFelkKWlUzowRzV3oXgsVuMYVR9RqMW0SR29_deC_IniwTszQHQQb26wdt0pn9EH4Zrmns |
CitedBy_id | crossref_primary_10_1021_pr500812t crossref_primary_10_1093_bib_bbaa081 crossref_primary_10_1186_s40168_020_00981_z crossref_primary_10_3390_biom12040579 crossref_primary_10_1016_j_jprot_2014_01_007 crossref_primary_10_1016_j_cj_2022_10_006 crossref_primary_10_1021_pr5011394 crossref_primary_10_1074_mcp_M114_038299 crossref_primary_10_1371_journal_pone_0082981 crossref_primary_10_1074_mcp_M116_065078 crossref_primary_10_1128_jb_00353_21 crossref_primary_10_1002_pmic_201400174 crossref_primary_10_1002_pmic_201400372 crossref_primary_10_1016_j_jprot_2022_104622 crossref_primary_10_1021_acs_jproteome_3c00675 crossref_primary_10_1038_ncomms11778 crossref_primary_10_1186_1471_2164_14_S8_S5 crossref_primary_10_1021_acs_jproteome_6b00344 crossref_primary_10_1021_acs_jproteome_7b00033 crossref_primary_10_1093_femsml_uqad012 crossref_primary_10_1021_acs_jproteome_1c00264 crossref_primary_10_1002_pmic_201200576 crossref_primary_10_1074_mcp_O113_028142 crossref_primary_10_1016_j_jprot_2019_04_015 crossref_primary_10_1021_acs_jproteome_3c00054 crossref_primary_10_1038_s41467_020_14968_9 crossref_primary_10_1002_pmic_201400168 crossref_primary_10_1021_pr4002993 crossref_primary_10_1002_pmic_201400560 crossref_primary_10_1093_nar_gku1283 crossref_primary_10_1038_s41596_020_0368_7 crossref_primary_10_1101_gr_218255_116 crossref_primary_10_1007_s42485_023_00118_4 crossref_primary_10_1093_bioinformatics_btv236 crossref_primary_10_1016_j_it_2022_07_005 crossref_primary_10_1002_pmic_201900351 crossref_primary_10_1093_bib_bbac163 crossref_primary_10_1074_mcp_M116_066662 crossref_primary_10_1016_j_mcpro_2021_100076 crossref_primary_10_1186_1471_2164_15_703 crossref_primary_10_1002_bies_201700015 crossref_primary_10_1021_pr400820p crossref_primary_10_1038_nmeth_3144 crossref_primary_10_1146_annurev_anchem_071015_041722 crossref_primary_10_1021_acs_jproteome_5b00504 crossref_primary_10_1021_pr501164r crossref_primary_10_1021_acs_jproteome_1c00968 crossref_primary_10_1186_s13059_022_02701_2 crossref_primary_10_1016_j_smim_2023_101758 crossref_primary_10_1093_bioinformatics_btv340 crossref_primary_10_1186_s12859_016_1133_3 crossref_primary_10_1186_s12864_016_3327_5 crossref_primary_10_1016_j_celrep_2021_108815 crossref_primary_10_1021_acs_jproteome_7b00324 crossref_primary_10_1038_ncomms10238 crossref_primary_10_3389_fmicb_2019_01410 crossref_primary_10_1021_acs_jproteome_7b00483 crossref_primary_10_1002_pmic_201500074 crossref_primary_10_1074_mcp_M113_029165 |
Cites_doi | 10.1002/pmic.200800473 10.1038/msb4100142 10.1126/science.1157956 10.1186/gb-2004-6-1-r9 10.1002/pmic.200500126 10.1101/gr.114272.110 10.1093/bioinformatics/btp024 10.1002/pmic.200300511 10.1021/pr200827k 10.1074/mcp.M800394-MCP200 10.1021/pr2002116 10.1021/pr101143m 10.1021/pr700798h 10.1021/pr070198n 10.1093/bioinformatics/btn294 10.1152/physiolgenomics.2001.5.2.81 10.1093/bioinformatics/btp021 10.1016/j.jprot.2010.08.009 10.1074/mcp.M111.007690 10.1101/gr.5646507 10.1101/gr.074344.107 10.1101/gr.113779.110 10.1101/gr.103119.109 10.1093/bioinformatics/btq004 10.1021/pr700747q 10.1021/pr9004794 10.1016/S0960-9822(02)01296-4 10.1021/pr070542g 10.1074/mcp.M900359-MCP200 10.1111/1467-9868.00346 10.1074/mcp.M111.013722 10.1021/pr900256v 10.1101/gr.127951.111 10.1002/pmic.200900445 10.1101/gr.089391.108 10.1016/1044-0305(94)80016-2 10.1073/pnas.0811066106 10.1074/mcp.M110.002527 10.1021/pr700600n 10.1534/genetics.108.088336 10.1101/gr.077644.108 10.1021/ac025747h 10.1038/nbt1300 10.1021/pr200876c 10.1074/mcp.M900188-MCP200 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2 10.1074/mcp.M900045-MCP200 10.1021/ac050102d 10.1021/ac801664q 10.1186/1471-2105-5-187 10.1093/molbev/msq092 10.1186/1471-2164-6-128 10.1371/journal.pbio.1000048 10.1021/pr7007303 10.1093/bioinformatics/bth092 10.1038/nmeth1019 10.1007/s13361-011-0139-3 10.1021/pr200766z 10.1002/pmic.201000432 10.1021/pr700739d |
ContentType | Journal Article |
Copyright | Copyright © 2012 American Chemical
Society Copyright © 2012 American Chemical Society 2012 American Chemical Society |
Copyright_xml | – notice: Copyright © 2012 American Chemical Society – notice: Copyright © 2012 American Chemical Society 2012 American Chemical Society |
DBID | N~. CGR CUY CVF ECM EIF NPM AAYXX CITATION 7X8 5PM |
DOI | 10.1021/pr300411q |
DatabaseName | American Chemical Society (ACS) Open Access Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed CrossRef MEDLINE - Academic PubMed Central (Full Participant titles) |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) CrossRef MEDLINE - Academic |
DatabaseTitleList | MEDLINE |
Database_xml | – sequence: 1 dbid: N~. name: American Chemical Society (ACS) Open Access url: https://pubs.acs.org sourceTypes: Publisher – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Chemistry |
EISSN | 1535-3907 |
EndPage | 5234 |
ExternalDocumentID | 10_1021_pr300411q 23025403 a633174409 |
Genre | Research Support, Non-U.S. Gov't Journal Article |
GrantInformation_xml | – fundername: Biotechnology and Biological Sciences Research Council grantid: BB/I000631/1 – fundername: Medical Research Council |
GroupedDBID | - 4.4 53G 55A 5GY 7~N AABXI ABMVS ABUCX ACGFS ACS AEESW AENEX AFEFF ALMA_UNASSIGNED_HOLDINGS AQSVZ CS3 DU5 EBS ED ED~ EJD F5P GNL IH9 IHE JG JG~ LG6 N~. P2P RNS ROL UI2 VF5 VG9 W1F ZA5 --- 5VS AAHBH ABJNI ABQRX ADHLV AHGAQ BAANH CGR CUPRZ CUY CVF ECM EIF GGK NPM AAYXX CITATION 7X8 5PM |
ID | FETCH-LOGICAL-a405t-2d1b7bba0766adf48620a57c3eaec8fee588e6e3f5c1fe4e7a3b1429ea5d44263 |
IEDL.DBID | ACS |
ISSN | 1535-3893 |
IngestDate | Tue Sep 17 21:09:33 EDT 2024 Fri Oct 25 01:44:11 EDT 2024 Fri Dec 06 02:19:10 EST 2024 Sat Sep 28 08:06:03 EDT 2024 Thu Aug 27 13:50:18 EDT 2020 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 11 |
Keywords | peptide spectrum match posterior error probability expressed sequence tag proteogenomics false discovery rate |
Language | English |
License | http://pubs.acs.org/page/policy/authorchoice_termsofuse.html |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a405t-2d1b7bba0766adf48620a57c3eaec8fee588e6e3f5c1fe4e7a3b1429ea5d44263 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
OpenAccessLink | https://proxy.k.utb.cz/login?url=http://dx.doi.org/10.1021/pr300411q |
PMID | 23025403 |
PQID | 1126635675 |
PQPubID | 23479 |
PageCount | 14 |
ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_3703792 proquest_miscellaneous_1126635675 crossref_primary_10_1021_pr300411q pubmed_primary_23025403 acs_journals_10_1021_pr300411q |
ProviderPackageCode | JG~ 55A AABXI GNL VF5 7~N VG9 W1F ACS AEESW AFEFF ABMVS ABUCX IH9 AQSVZ ED~ N~. UI2 |
PublicationCentury | 2000 |
PublicationDate | 2012-11-02 |
PublicationDateYYYYMMDD | 2012-11-02 |
PublicationDate_xml | – month: 11 year: 2012 text: 2012-11-02 day: 02 |
PublicationDecade | 2010 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | Journal of proteome research |
PublicationTitleAlternate | J. Proteome Res |
PublicationYear | 2012 |
Publisher | American Chemical Society |
Publisher_xml | – name: American Chemical Society |
References | 10612281 - Electrophoresis. 1999 Dec;20(18):3551-67 18062665 - J Proteome Res. 2008 Jan;7(1):80-8 18493048 - Genetics. 2008 May;179(1):157-66 19253293 - Proteomics. 2009 Mar;9(5):1220-9 19627159 - J Proteome Res. 2009 Sep;8(9):4173-81 19153134 - Bioinformatics. 2009 Mar 1;25(5):670-1 19947654 - J Proteome Res. 2010 Feb 5;9(2):700-7 21030493 - Mol Cell Proteomics. 2011 Jan;10(1):M110.002527 16013882 - Anal Chem. 2005 Jul 15;77(14):4626-39 22103967 - J Proteome Res. 2012 Feb 3;11(2):1009-17 22129275 - J Proteome Res. 2012 Jan 1;11(1):247-60 20077415 - Proteomics. 2010 Mar;10(6):1127-40 24226387 - J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89 21876204 - Mol Cell Proteomics. 2011 Dec;10(12):M111.007690 19181659 - Mol Cell Proteomics. 2009 Jun;8(6):1295-305 18689838 - Bioinformatics. 2008 Aug 15;24(16):i42-8 21953092 - J Am Soc Mass Spectrom. 2011 Jul;22(7):1111-20 18052118 - J Proteome Res. 2008 Jan;7(1):40-4 20080508 - Bioinformatics. 2010 Mar 1;26(5):698-9 12403597 - Anal Chem. 2002 Oct 15;74(20):5383-92 18436743 - Science. 2008 May 16;320(5878):938-41 21460061 - Genome Res. 2011 May;21(5):756-67 21488652 - J Proteome Res. 2011 Jul 1;10(7):2949-58 21536722 - Genome Res. 2011 Jul;21(7):1193-200 16925833 - Genome Biol. 2006;7 Suppl 1:S11.1-8 14730672 - Proteomics. 2004 Jan;4(1):59-77 16171517 - BMC Genomics. 2005;6:128 19443417 - Mol Cell Proteomics. 2009 Aug;8(8):1891-907 10786296 - Proc Int Conf Intell Syst Mol Biol. 1999;:138-48 18653799 - Genome Res. 2008 Oct;18(10):1660-9 22021278 - Mol Cell Proteomics. 2012 Mar;11(3):M111.013722 21795387 - Genome Res. 2011 Nov;21(11):1872-81 18067251 - J Proteome Res. 2008 Jan;7(1):47-50 20237107 - Genome Res. 2010 Jun;20(6):837-46 17437027 - Mol Syst Biol. 2007;3:102 18558733 - J Proteome Res. 2008 Aug;7(8):3102-13 11242592 - Physiol Genomics. 2001 Mar 8;5(2):81-7 18159924 - J Proteome Res. 2008 Jan;7(1):254-65 21288048 - J Proteome Res. 2011 Apr 1;10(4):2123-7 20375075 - Mol Biol Evol. 2010 Sep;27(9):2000-13 20816881 - J Proteomics. 2010 Oct 10;73(11):2092-123 19098097 - Proc Natl Acad Sci U S A. 2008 Dec 30;105(52):21034-8 17189379 - Genome Res. 2007 Feb;17(2):231-9 22168127 - J Proteome Res. 2012 Feb 3;11(2):1152-62 16047398 - Proteomics. 2005 Aug;5(13):3475-90 12445392 - Curr Biol. 2002 Nov 19;12(22):1965-9 19260763 - PLoS Biol. 2009 Mar 3;7(3):e48 15571632 - BMC Bioinformatics. 2004 Nov 30;5:187 19875382 - Mol Cell Proteomics. 2010 Feb;9(2):415-26 19193729 - Bioinformatics. 2009 Apr 1;25(7):964-6 18426904 - Genome Res. 2008 Jul;18(7):1133-42 18067246 - J Proteome Res. 2008 Jan;7(1):29-34 19602707 - Mol Cell Proteomics. 2009 Oct;8(10):2368-81 21365749 - Proteomics. 2011 Mar;11(6):1086-93 19061407 - Anal Chem. 2009 Jan 1;81(1):146-59 14976030 - Bioinformatics. 2004 Jun 12;20(9):1466-7 18067248 - J Proteome Res. 2008 Jan;7(1):35-9 17327847 - Nat Methods. 2007 Mar;4(3):207-14 15642101 - Genome Biol. 2005;6(1):R9 19411605 - Genome Res. 2009 May;19(5):886-96 17450130 - Nat Biotechnol. 2007 May;25(5):576-83 Walters J. R. (ref59/cit59) 2010; 27 Ching A. T. (ref8/cit8) 2012; 11 Boardman P. E. (ref50/cit50) 2002; 12 Alves G. (ref55/cit55) 2008; 7 Choi H. (ref56/cit56) 2008; 7 Borchert N. (ref31/cit31) 2010; 20 Brosch M. (ref13/cit13) 2011; 21 Prasad T. S. (ref21/cit21) 2012; 11 Wasmuth J. D. (ref58/cit58) 2004; 5 Granholm V. (ref45/cit45) 2011; 11 Tanner S. (ref27/cit27) 2005; 77 Fitzgibbon M. (ref53/cit53) 2008; 7 Castellana N. E. (ref4/cit4) 2008; 105 Gouzy J. (ref25/cit25) 2009; 25 de Souza G. A. (ref22/cit22) 2010; 26 Nagaraj N. (ref1/cit1) 2012; 11 Sevinsky J. R. (ref30/cit30) 2008; 7 Findlay G. D. (ref20/cit20) 2009; 19 Robinson M. W. (ref10/cit10) 2009; 8 Blakeley P. (ref18/cit18) 2010; 10 Kwon T. (ref54/cit54) 2011; 10 Elias J. E. (ref40/cit40) 2007; 4 Kall L. (ref46/cit46) 2008; 24 Everett L. J. (ref61/cit61) 2010; 9 Jaffe J. D. (ref35/cit35) 2004; 4 Shteynberg D. (ref57/cit57) 2011; 10 Baerenfaller K. (ref6/cit6) 2008; 320 Craig R. (ref28/cit28) 2004; 20 Brosch M. (ref63/cit63) 2011; 21 Keller A. (ref48/cit48) 2002; 74 Kall L. (ref47/cit47) 2009; 25 Brunner E. (ref33/cit33) 2007; 25 Bern M. (ref60/cit60) 2011; 10 de Souza G. A. (ref16/cit16) 2011; 10 Edwards N. J. (ref9/cit9) 2007; 3 Perkins D. N. (ref26/cit26) 1999; 20 Kall L. (ref36/cit36) 2008; 7 Gupta N. (ref34/cit34) 2008; 18 Fukunishi Y. (ref24/cit24) 2001; 5 Desiere F. (ref19/cit19) 2005; 6 Jones A. R. (ref52/cit52) 2009; 9 Baudet M. (ref17/cit17) 2010; 9 Gupta N. (ref41/cit41) 2011; 22 Wang X. (ref7/cit7) 2012; 11 Choi H. (ref49/cit49) 2008; 7 Tanner S. (ref14/cit14) 2007; 17 Storey J. D. (ref38/cit38) 2002; 64 Nesvizhskii A. I. (ref43/cit43) 2010; 73 Iseli C. (ref23/cit23) 1999 Kall L. (ref39/cit39) 2008; 7 Merrihew G. E. (ref5/cit5) 2008; 18 Stanke M. (ref62/cit62) 2006; 7 Bindschedler L. V. (ref32/cit32) 2009; 8 Chaerkady R. (ref3/cit3) 2011; 21 Wang G. (ref44/cit44) 2009; 81 Hall S. L. (ref51/cit51) 2009; 8 May P. (ref11/cit11) 2008; 179 Gupta N. (ref42/cit42) 2009; 8 Schrimpf S. P. (ref2/cit2) 2009; 7 Adamidi C. (ref12/cit12) 2011; 21 Kapp E. A. (ref37/cit37) 2005; 5 Kalume D. E. (ref15/cit15) 2005; 6 Eng J. K. (ref29/cit29) 1994; 5 |
References_xml | – volume: 9 start-page: 1220 issue: 5 year: 2009 ident: ref52/cit52 publication-title: Proteomics doi: 10.1002/pmic.200800473 contributor: fullname: Jones A. R. – volume: 3 start-page: 102 year: 2007 ident: ref9/cit9 publication-title: Mol. Syst. Biol. doi: 10.1038/msb4100142 contributor: fullname: Edwards N. J. – volume: 320 start-page: 938 issue: 5878 year: 2008 ident: ref6/cit6 publication-title: Science doi: 10.1126/science.1157956 contributor: fullname: Baerenfaller K. – volume: 6 start-page: R9 issue: 1 year: 2005 ident: ref19/cit19 publication-title: Genome Biol. doi: 10.1186/gb-2004-6-1-r9 contributor: fullname: Desiere F. – volume: 5 start-page: 3475 issue: 13 year: 2005 ident: ref37/cit37 publication-title: Proteomics doi: 10.1002/pmic.200500126 contributor: fullname: Kapp E. A. – volume: 21 start-page: 756 issue: 5 year: 2011 ident: ref13/cit13 publication-title: Genome Res. doi: 10.1101/gr.114272.110 contributor: fullname: Brosch M. – volume: 25 start-page: 670 issue: 5 year: 2009 ident: ref25/cit25 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btp024 contributor: fullname: Gouzy J. – volume: 4 start-page: 59 issue: 1 year: 2004 ident: ref35/cit35 publication-title: Proteomics doi: 10.1002/pmic.200300511 contributor: fullname: Jaffe J. D. – volume: 11 start-page: 247 issue: 1 year: 2012 ident: ref21/cit21 publication-title: J. Proteome Res. doi: 10.1021/pr200827k contributor: fullname: Prasad T. S. – volume: 8 start-page: 1295 issue: 6 year: 2009 ident: ref51/cit51 publication-title: Mol. Cell. Proteomics doi: 10.1074/mcp.M800394-MCP200 contributor: fullname: Hall S. L. – volume: 21 start-page: 756 issue: 5 year: 2011 ident: ref63/cit63 publication-title: Genome Res. doi: 10.1101/gr.114272.110 contributor: fullname: Brosch M. – volume: 10 start-page: 2949 issue: 7 year: 2011 ident: ref54/cit54 publication-title: J. Proteome Res. doi: 10.1021/pr2002116 contributor: fullname: Kwon T. – volume: 10 start-page: 2123 issue: 4 year: 2011 ident: ref60/cit60 publication-title: J. Proteome Res. doi: 10.1021/pr101143m contributor: fullname: Bern M. – volume: 7 start-page: 3102 issue: 8 year: 2008 ident: ref55/cit55 publication-title: J. Proteome Res. doi: 10.1021/pr700798h contributor: fullname: Alves G. – volume: 7 start-page: 80 issue: 1 year: 2008 ident: ref30/cit30 publication-title: J. Proteome Res. doi: 10.1021/pr070198n contributor: fullname: Sevinsky J. R. – volume: 24 start-page: i42 issue: 16 year: 2008 ident: ref46/cit46 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btn294 contributor: fullname: Kall L. – volume: 5 start-page: 81 issue: 2 year: 2001 ident: ref24/cit24 publication-title: Physiol. Genomics doi: 10.1152/physiolgenomics.2001.5.2.81 contributor: fullname: Fukunishi Y. – volume: 25 start-page: 964 issue: 7 year: 2009 ident: ref47/cit47 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btp021 contributor: fullname: Kall L. – volume: 7 start-page: S11.1 issue: 1 year: 2006 ident: ref62/cit62 publication-title: Genome Biol. contributor: fullname: Stanke M. – volume: 73 start-page: 2092 issue: 11 year: 2010 ident: ref43/cit43 publication-title: J. Proteomics doi: 10.1016/j.jprot.2010.08.009 contributor: fullname: Nesvizhskii A. I. – volume: 10 start-page: M111.007690 issue: 12 year: 2011 ident: ref57/cit57 publication-title: Mol. Cell. Proteomics doi: 10.1074/mcp.M111.007690 contributor: fullname: Shteynberg D. – volume: 17 start-page: 231 issue: 2 year: 2007 ident: ref14/cit14 publication-title: Genome Res. doi: 10.1101/gr.5646507 contributor: fullname: Tanner S. – volume: 18 start-page: 1133 issue: 7 year: 2008 ident: ref34/cit34 publication-title: Genome Res. doi: 10.1101/gr.074344.107 contributor: fullname: Gupta N. – volume: 21 start-page: 1193 issue: 7 year: 2011 ident: ref12/cit12 publication-title: Genome Res. doi: 10.1101/gr.113779.110 contributor: fullname: Adamidi C. – volume: 20 start-page: 837 issue: 6 year: 2010 ident: ref31/cit31 publication-title: Genome Res. doi: 10.1101/gr.103119.109 contributor: fullname: Borchert N. – volume: 26 start-page: 698 issue: 5 year: 2010 ident: ref22/cit22 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btq004 contributor: fullname: de Souza G. A. – volume: 7 start-page: 47 issue: 1 year: 2008 ident: ref56/cit56 publication-title: J. Proteome Res. doi: 10.1021/pr700747q contributor: fullname: Choi H. – volume: 8 start-page: 4173 issue: 9 year: 2009 ident: ref42/cit42 publication-title: J. Proteome Res. doi: 10.1021/pr9004794 contributor: fullname: Gupta N. – volume: 12 start-page: 1965 issue: 22 year: 2002 ident: ref50/cit50 publication-title: Curr. Biol. doi: 10.1016/S0960-9822(02)01296-4 contributor: fullname: Boardman P. E. – volume: 7 start-page: 254 issue: 1 year: 2008 ident: ref49/cit49 publication-title: J. Proteome Res. doi: 10.1021/pr070542g contributor: fullname: Choi H. – volume: 9 start-page: 415 issue: 2 year: 2010 ident: ref17/cit17 publication-title: Mol. Cell. Proteomics doi: 10.1074/mcp.M900359-MCP200 contributor: fullname: Baudet M. – volume: 64 start-page: 479 year: 2002 ident: ref38/cit38 publication-title: J. R. Statist. Soc. B doi: 10.1111/1467-9868.00346 contributor: fullname: Storey J. D. – volume: 11 start-page: M111.013722 issue: 3 year: 2012 ident: ref1/cit1 publication-title: Mol. Cell. Proteomics doi: 10.1074/mcp.M111.013722 contributor: fullname: Nagaraj N. – volume: 9 start-page: 700 issue: 2 year: 2010 ident: ref61/cit61 publication-title: Journal of proteome research doi: 10.1021/pr900256v contributor: fullname: Everett L. J. – volume: 21 start-page: 1872 issue: 11 year: 2011 ident: ref3/cit3 publication-title: Genome Res. doi: 10.1101/gr.127951.111 contributor: fullname: Chaerkady R. – volume: 10 start-page: 1127 issue: 6 year: 2010 ident: ref18/cit18 publication-title: Proteomics doi: 10.1002/pmic.200900445 contributor: fullname: Blakeley P. – volume: 19 start-page: 886 issue: 5 year: 2009 ident: ref20/cit20 publication-title: Genome Res. doi: 10.1101/gr.089391.108 contributor: fullname: Findlay G. D. – volume: 5 start-page: 976 issue: 11 year: 1994 ident: ref29/cit29 publication-title: J. Am. Soc. Mass Spectrom. doi: 10.1016/1044-0305(94)80016-2 contributor: fullname: Eng J. K. – volume: 105 start-page: 21034 issue: 52 year: 2008 ident: ref4/cit4 publication-title: Proc. Natl. Acad. Sci. U.S.A. doi: 10.1073/pnas.0811066106 contributor: fullname: Castellana N. E. – volume: 10 start-page: M110.002527 issue: 1 year: 2011 ident: ref16/cit16 publication-title: Mol. Cell. Proteomics doi: 10.1074/mcp.M110.002527 contributor: fullname: de Souza G. A. – volume: 7 start-page: 29 issue: 1 year: 2008 ident: ref39/cit39 publication-title: J. Proteome Res. doi: 10.1021/pr700600n contributor: fullname: Kall L. – volume: 179 start-page: 157 issue: 1 year: 2008 ident: ref11/cit11 publication-title: Genetics doi: 10.1534/genetics.108.088336 contributor: fullname: May P. – volume: 18 start-page: 1660 issue: 10 year: 2008 ident: ref5/cit5 publication-title: Genome Res. doi: 10.1101/gr.077644.108 contributor: fullname: Merrihew G. E. – volume: 74 start-page: 5383 issue: 20 year: 2002 ident: ref48/cit48 publication-title: Anal. Chem. doi: 10.1021/ac025747h contributor: fullname: Keller A. – volume: 25 start-page: 576 issue: 5 year: 2007 ident: ref33/cit33 publication-title: Nat. Biotechnol. doi: 10.1038/nbt1300 contributor: fullname: Brunner E. – volume: 11 start-page: 1152 issue: 2 year: 2012 ident: ref8/cit8 publication-title: J. Proteome Res. doi: 10.1021/pr200876c contributor: fullname: Ching A. T. – volume: 8 start-page: 2368 issue: 10 year: 2009 ident: ref32/cit32 publication-title: Mol. Cell. Proteomics doi: 10.1074/mcp.M900188-MCP200 contributor: fullname: Bindschedler L. V. – volume: 20 start-page: 3551 issue: 18 year: 1999 ident: ref26/cit26 publication-title: Electrophoresis doi: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2 contributor: fullname: Perkins D. N. – volume: 8 start-page: 1891 issue: 8 year: 2009 ident: ref10/cit10 publication-title: Mol. Cell. Proteomics doi: 10.1074/mcp.M900045-MCP200 contributor: fullname: Robinson M. W. – volume: 77 start-page: 4626 issue: 14 year: 2005 ident: ref27/cit27 publication-title: Anal. Chem. doi: 10.1021/ac050102d contributor: fullname: Tanner S. – volume: 81 start-page: 146 issue: 1 year: 2009 ident: ref44/cit44 publication-title: Anal. Chem. doi: 10.1021/ac801664q contributor: fullname: Wang G. – volume: 5 start-page: 187 year: 2004 ident: ref58/cit58 publication-title: BMC Bioinf. doi: 10.1186/1471-2105-5-187 contributor: fullname: Wasmuth J. D. – volume: 27 start-page: 2000 issue: 9 year: 2010 ident: ref59/cit59 publication-title: Mol. Biol. Evol. doi: 10.1093/molbev/msq092 contributor: fullname: Walters J. R. – volume: 6 start-page: 128 year: 2005 ident: ref15/cit15 publication-title: BMC Genomics doi: 10.1186/1471-2164-6-128 contributor: fullname: Kalume D. E. – volume: 7 start-page: e48 issue: 3 year: 2009 ident: ref2/cit2 publication-title: PLoS Biol. doi: 10.1371/journal.pbio.1000048 contributor: fullname: Schrimpf S. P. – volume: 7 start-page: 35 issue: 1 year: 2008 ident: ref53/cit53 publication-title: J. Proteome Res. doi: 10.1021/pr7007303 contributor: fullname: Fitzgibbon M. – volume: 20 start-page: 1466 issue: 9 year: 2004 ident: ref28/cit28 publication-title: Bioinformatics doi: 10.1093/bioinformatics/bth092 contributor: fullname: Craig R. – volume: 4 start-page: 207 issue: 3 year: 2007 ident: ref40/cit40 publication-title: Nat. Methods doi: 10.1038/nmeth1019 contributor: fullname: Elias J. E. – volume: 22 start-page: 1111 issue: 7 year: 2011 ident: ref41/cit41 publication-title: J. Am. Soc. Mass Spectrom. doi: 10.1007/s13361-011-0139-3 contributor: fullname: Gupta N. – volume: 11 start-page: 1009 issue: 2 year: 2012 ident: ref7/cit7 publication-title: J. Proteome Res. doi: 10.1021/pr200766z contributor: fullname: Wang X. – start-page: 138 year: 1999 ident: ref23/cit23 publication-title: Proc. Int. Conf. Intell. Syst. Mol. Biol. contributor: fullname: Iseli C. – volume: 11 start-page: 1086 issue: 6 year: 2011 ident: ref45/cit45 publication-title: Proteomics doi: 10.1002/pmic.201000432 contributor: fullname: Granholm V. – volume: 7 start-page: 40 issue: 1 year: 2008 ident: ref36/cit36 publication-title: J. Proteome Res. doi: 10.1021/pr700739d contributor: fullname: Kall L. |
SSID | ssj0015703 |
Score | 2.3831449 |
Snippet | Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which... Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which... |
SourceID | pubmedcentral proquest crossref pubmed acs |
SourceType | Open Access Repository Aggregation Database Index Database Publisher |
StartPage | 5221 |
SubjectTerms | Base Sequence Databases, Protein Expressed Sequence Tags Genomics Mass Spectrometry Nucleotides - chemistry Probability Proteomics |
Title | Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies |
URI | http://dx.doi.org/10.1021/pr300411q https://www.ncbi.nlm.nih.gov/pubmed/23025403 https://search.proquest.com/docview/1126635675 https://pubmed.ncbi.nlm.nih.gov/PMC3703792 |
Volume | 11 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3JTsMwEB0VOMCFfSlLFZZrSuPESXqEFlQhUSEBUm-VlwlEiBRoyoED387YaSrKesnBsa3EM8m8pxk_AxwJX3Jfc88VKolc4l-hG2tirRz9IBYYNdFq6V12w85tcNHjvQoc_pLBZ97x04sVhfKeZ2CORUS7Df5pXU9SBUZCqhBF5a6JvqV80OehJvSo4XTo-YYnv5ZFfooz50vQLnfrFOUlD_VRLuvq7bt441-vsAyLY5zpnBSOsQIVzFZhvlUe77YG6YnWtgY2u3MM4rSCzTTiNKW4NnTSzOkaqeNBnmp02-Snr6idK6PqQLfaIhfS9iPMW7QOjNrrY6qcooDZKWVvcbgOt-dnN62OOz53wRUE33KXaU9GUgpa61DoJCDS0xA8Uj4KVHGCyOMYQ_QTrrwEA4zI4B7FNRRcB0YAfgNms0GGW-CEXMeJZkg4SQU8adJUMsagIennpkQsq1Ajw_TH382wb1PizOtPVqwKB6XNqNHqb_zUab-0Zp9W0aQ8RIaDEc3nMQOpiBVVYbOw7mQaIl_Ejht-FaIpu086GOXt6TtZem8VuH3yuajJtv97-h1YIIDF7N5Ftguz-csI9wjE5LJmnZiu3ff6B2PQ79Y |
link.rule.ids | 230,314,780,784,885,2765,27076,27080,27924,27925,56738,56762,56788,56812 |
linkProvider | American Chemical Society |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhZ3JTsMwEIZHUA7lgtgpa0BcA9mcpEdWlaUVh1bqLbLjCeRACk3KkWdn7DShBSSu3mR5Jpl_ZPszwCl3BXMls00eJ4FJ-ZdvhpKyVoauF3IM2qhZet2e3xl490M2nGJy1F0YmkROI-V6E_-bLmCfv401G8p-X4Ql5tMnrB65_DyrdwwUSapkozJTBeGKIjTbVUWgOJ-PQL9k5c_TkTPh5nYVVqY60bgoDbsGC5itQ_Oqep5tA9ILKfUZ1uzZUIpRA5epx2VKcSk30szoKVTxqEglmtfkZx8ojSdFZaCqa15woduRZi1LR4rW-prGRnkA2aiwtZhvwuD2pn_VMafvJpic5FdhOtIWgRDcCnyfy8SjpMXiLIhd5BiHCSILQ_TRTVhsJ-hhQAazKS4hZ9JTAPctaGSjDHfA8JkME-kg6ZzYY0mbhhIhepagn1PMQ9GCQ1rRaOr3eaS3tB07qpe8BSfVYlOh5mf81ei4MkNEq6i2LHiGowmNZztKElFW04Lt0iz1MJQ8UXZruS0I5gxWN1Dk7PmaLH3RBG2XnCVoO7v_zf4Imp1-9zF6vOs97MEyiSVH30N09qFRjCd4QIKkEIfaE78AytXd1w |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ3ZT-MwEIdHLEi7vHAf5QyrfQ00h5P0sbRUnF2kBYm3yMcEIkQKJOWBv56xc4jCSvCa2JZjTzK_0Yy_APzhnmCeYo7NZRLaFH8FdqQoamXo-RHHsIOGpXcxDI6v_dMbdlMFivosDE0ip5Fyk8TXb_WjSirCgHPw-Gz4UM7TD5hh9JXVJVzd3r8ma6BpUiUfldnaEdckofddtReS-aQX-iQtP1ZIvnM5g3n420zWVJrc748LsS9fP3Acv_80CzBXqU-rW5rLIkxhtgS_evVP35Yh7SplKmOzW0vrUINxph6HKXm73Eoza6gByKMiVWj3yXpfUFmXmvVAt_q84MK0IyVcXh1pBuxDKq2yrNmqYbiYr8D14Oiqd2xXf2OwOYm6wnaVI0IheDsMAq4Sn0KhNmeh9JCjjBJEFkUYoJcw6SToY0hm4JC3Q86Ur7HwqzCdjTJcBytgKkqUi6SepM-SDg0lIvTbgj55kkeiBTu0aHH1NuWxSZS7TtysWAt-19tHFw2V43-N9uqNjWkVdSKEZzga03iOq4UWxUotWCs3uhmGQjKKmdteC8IJE2gaaB735J0svTNcbo_ML-y4G1_Nfhd-XvYH8fnJ8GwTZkmBueZwo7sF08XzGLdJ5RRix5j2G8ay-as |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Addressing+Statistical+Biases+in+Nucleotide-Derived+Protein+Databases+for+Proteogenomic+Search+Strategies&rft.jtitle=Journal+of+proteome+research&rft.au=Blakeley%2C+Paul&rft.au=Overton%2C+Ian+M.&rft.au=Hubbard%2C+Simon+J.&rft.date=2012-11-02&rft.pub=American+Chemical+Society&rft.issn=1535-3893&rft.eissn=1535-3907&rft.volume=11&rft.issue=11&rft.spage=5221&rft.epage=5234&rft_id=info:doi/10.1021%2Fpr300411q&rft_id=info%3Apmid%2F23025403&rft.externalDBID=PMC3703792 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1535-3893&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1535-3893&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1535-3893&client=summon |