Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies

Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, di...

Full description

Saved in:
Bibliographic Details
Published inJournal of proteome research Vol. 11; no. 11; pp. 5221 - 5234
Main Authors Blakeley, Paul, Overton, Ian M, Hubbard, Simon J
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 02.11.2012
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five “incorrect” targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives.
AbstractList Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five “incorrect” targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives.
Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which demonstrate a given gene or isoform is expressed and translated at the protein level. This can advance our understanding of genome function, discovering novel genes and gene structure that have not yet been identified or validated. Because of the high-throughput shotgun nature of most proteomics experiments, it is essential to carefully control for false positives and prevent any potential misannotation. A number of statistical procedures to deal with this are in wide use in proteomics, calculating false discovery rate (FDR) and posterior error probability (PEP) values for groups and individual peptide spectrum matches (PSMs). These methods control for multiple testing and exploit decoy databases to estimate statistical significance. Here, we show that database choice has a major effect on these confidence estimates leading to significant differences in the number of PSMs reported. We note that standard target:decoy approaches using six-frame translations of nucleotide sequences, such as assembled transcriptome data, apparently underestimate the confidence assigned to the PSMs. The source of this error stems from the inflated and unusual nature of the six-frame database, where for every target sequence there exists five "incorrect" targets that are unlikely to code for protein. The attendant FDR and PEP estimates lead to fewer accepted PSMs at fixed thresholds, and we show that this effect is a product of the database and statistical modeling and not the search engine. A variety of approaches to limit database size and remove noncoding target sequences are examined and discussed in terms of the altered statistical estimates generated and PSMs reported. These results are of importance to groups carrying out proteogenomics, aiming to maximize the validation and discovery of gene structure in sequenced genomes, while still controlling for false positives.
Author Overton, Ian M
Blakeley, Paul
Hubbard, Simon J
AuthorAffiliation The University of Manchester
University of Edinburgh
AuthorAffiliation_xml – name: The University of Manchester
– name: University of Edinburgh
Author_xml – sequence: 1
  givenname: Paul
  surname: Blakeley
  fullname: Blakeley, Paul
– sequence: 2
  givenname: Ian M
  surname: Overton
  fullname: Overton, Ian M
– sequence: 3
  givenname: Simon J
  surname: Hubbard
  fullname: Hubbard, Simon J
  email: simon.hubbard@manchester.ac.uk
BackLink https://www.ncbi.nlm.nih.gov/pubmed/23025403$$D View this record in MEDLINE/PubMed
BookMark eNptkU1LxDAQhoMofh_8A9KLoIdqPpq2exH8VhAV1HOYptM10k12k1Tw3xtdXRQ8Zcg8PDPMu0GWrbNIyA6jh4xydjT1gtKCsdkSWWdSyFyMaLX8U9cjsUY2QnillMmKilWyxgXlsqBinZiTtvUYgrHj7DFCNCEaDX12aiBgyIzN7gbdo4umxfwcvXnDNnvwLmJqnUOE5ovrnJ__ujFaNzE6e0Tw-iU5PUQcGwxbZKWDPuD297tJni8vns6u89v7q5uzk9scCipjzlvWVE0DtCpLaLuiLjkFWWmBgLruEGVdY4mik5p1WGAFomEFHyHItih4KTbJ8dw7HZoJthptWqFXU28m4N-VA6P-dqx5UWP3pkS6TTXiSbD_LfBuNmCIamKCxr4Hi24IijFelkKWlUzowRzV3oXgsVuMYVR9RqMW0SR29_deC_IniwTszQHQQb26wdt0pn9EH4Zrmns
CitedBy_id crossref_primary_10_1021_pr500812t
crossref_primary_10_1093_bib_bbaa081
crossref_primary_10_1186_s40168_020_00981_z
crossref_primary_10_3390_biom12040579
crossref_primary_10_1016_j_jprot_2014_01_007
crossref_primary_10_1016_j_cj_2022_10_006
crossref_primary_10_1021_pr5011394
crossref_primary_10_1074_mcp_M114_038299
crossref_primary_10_1371_journal_pone_0082981
crossref_primary_10_1074_mcp_M116_065078
crossref_primary_10_1128_jb_00353_21
crossref_primary_10_1002_pmic_201400174
crossref_primary_10_1002_pmic_201400372
crossref_primary_10_1016_j_jprot_2022_104622
crossref_primary_10_1021_acs_jproteome_3c00675
crossref_primary_10_1038_ncomms11778
crossref_primary_10_1186_1471_2164_14_S8_S5
crossref_primary_10_1021_acs_jproteome_6b00344
crossref_primary_10_1021_acs_jproteome_7b00033
crossref_primary_10_1093_femsml_uqad012
crossref_primary_10_1021_acs_jproteome_1c00264
crossref_primary_10_1002_pmic_201200576
crossref_primary_10_1074_mcp_O113_028142
crossref_primary_10_1016_j_jprot_2019_04_015
crossref_primary_10_1021_acs_jproteome_3c00054
crossref_primary_10_1038_s41467_020_14968_9
crossref_primary_10_1002_pmic_201400168
crossref_primary_10_1021_pr4002993
crossref_primary_10_1002_pmic_201400560
crossref_primary_10_1093_nar_gku1283
crossref_primary_10_1038_s41596_020_0368_7
crossref_primary_10_1101_gr_218255_116
crossref_primary_10_1007_s42485_023_00118_4
crossref_primary_10_1093_bioinformatics_btv236
crossref_primary_10_1016_j_it_2022_07_005
crossref_primary_10_1002_pmic_201900351
crossref_primary_10_1093_bib_bbac163
crossref_primary_10_1074_mcp_M116_066662
crossref_primary_10_1016_j_mcpro_2021_100076
crossref_primary_10_1186_1471_2164_15_703
crossref_primary_10_1002_bies_201700015
crossref_primary_10_1021_pr400820p
crossref_primary_10_1038_nmeth_3144
crossref_primary_10_1146_annurev_anchem_071015_041722
crossref_primary_10_1021_acs_jproteome_5b00504
crossref_primary_10_1021_pr501164r
crossref_primary_10_1021_acs_jproteome_1c00968
crossref_primary_10_1186_s13059_022_02701_2
crossref_primary_10_1016_j_smim_2023_101758
crossref_primary_10_1093_bioinformatics_btv340
crossref_primary_10_1186_s12859_016_1133_3
crossref_primary_10_1186_s12864_016_3327_5
crossref_primary_10_1016_j_celrep_2021_108815
crossref_primary_10_1021_acs_jproteome_7b00324
crossref_primary_10_1038_ncomms10238
crossref_primary_10_3389_fmicb_2019_01410
crossref_primary_10_1021_acs_jproteome_7b00483
crossref_primary_10_1002_pmic_201500074
crossref_primary_10_1074_mcp_M113_029165
Cites_doi 10.1002/pmic.200800473
10.1038/msb4100142
10.1126/science.1157956
10.1186/gb-2004-6-1-r9
10.1002/pmic.200500126
10.1101/gr.114272.110
10.1093/bioinformatics/btp024
10.1002/pmic.200300511
10.1021/pr200827k
10.1074/mcp.M800394-MCP200
10.1021/pr2002116
10.1021/pr101143m
10.1021/pr700798h
10.1021/pr070198n
10.1093/bioinformatics/btn294
10.1152/physiolgenomics.2001.5.2.81
10.1093/bioinformatics/btp021
10.1016/j.jprot.2010.08.009
10.1074/mcp.M111.007690
10.1101/gr.5646507
10.1101/gr.074344.107
10.1101/gr.113779.110
10.1101/gr.103119.109
10.1093/bioinformatics/btq004
10.1021/pr700747q
10.1021/pr9004794
10.1016/S0960-9822(02)01296-4
10.1021/pr070542g
10.1074/mcp.M900359-MCP200
10.1111/1467-9868.00346
10.1074/mcp.M111.013722
10.1021/pr900256v
10.1101/gr.127951.111
10.1002/pmic.200900445
10.1101/gr.089391.108
10.1016/1044-0305(94)80016-2
10.1073/pnas.0811066106
10.1074/mcp.M110.002527
10.1021/pr700600n
10.1534/genetics.108.088336
10.1101/gr.077644.108
10.1021/ac025747h
10.1038/nbt1300
10.1021/pr200876c
10.1074/mcp.M900188-MCP200
10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2
10.1074/mcp.M900045-MCP200
10.1021/ac050102d
10.1021/ac801664q
10.1186/1471-2105-5-187
10.1093/molbev/msq092
10.1186/1471-2164-6-128
10.1371/journal.pbio.1000048
10.1021/pr7007303
10.1093/bioinformatics/bth092
10.1038/nmeth1019
10.1007/s13361-011-0139-3
10.1021/pr200766z
10.1002/pmic.201000432
10.1021/pr700739d
ContentType Journal Article
Copyright Copyright © 2012 American Chemical Society
Copyright © 2012 American Chemical Society 2012 American Chemical Society
Copyright_xml – notice: Copyright © 2012 American Chemical Society
– notice: Copyright © 2012 American Chemical Society 2012 American Chemical Society
DBID N~.
CGR
CUY
CVF
ECM
EIF
NPM
AAYXX
CITATION
7X8
5PM
DOI 10.1021/pr300411q
DatabaseName American Chemical Society (ACS) Open Access
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
CrossRef
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
CrossRef
MEDLINE - Academic
DatabaseTitleList
MEDLINE

Database_xml – sequence: 1
  dbid: N~.
  name: American Chemical Society (ACS) Open Access
  url: https://pubs.acs.org
  sourceTypes: Publisher
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Chemistry
EISSN 1535-3907
EndPage 5234
ExternalDocumentID 10_1021_pr300411q
23025403
a633174409
Genre Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: Biotechnology and Biological Sciences Research Council
  grantid: BB/I000631/1
– fundername: Medical Research Council
GroupedDBID -
4.4
53G
55A
5GY
7~N
AABXI
ABMVS
ABUCX
ACGFS
ACS
AEESW
AENEX
AFEFF
ALMA_UNASSIGNED_HOLDINGS
AQSVZ
CS3
DU5
EBS
ED
ED~
EJD
F5P
GNL
IH9
IHE
JG
JG~
LG6
N~.
P2P
RNS
ROL
UI2
VF5
VG9
W1F
ZA5
---
5VS
AAHBH
ABJNI
ABQRX
ADHLV
AHGAQ
BAANH
CGR
CUPRZ
CUY
CVF
ECM
EIF
GGK
NPM
AAYXX
CITATION
7X8
5PM
ID FETCH-LOGICAL-a405t-2d1b7bba0766adf48620a57c3eaec8fee588e6e3f5c1fe4e7a3b1429ea5d44263
IEDL.DBID ACS
ISSN 1535-3893
IngestDate Tue Sep 17 21:09:33 EDT 2024
Fri Oct 25 01:44:11 EDT 2024
Fri Dec 06 02:19:10 EST 2024
Sat Sep 28 08:06:03 EDT 2024
Thu Aug 27 13:50:18 EDT 2020
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Keywords peptide spectrum match
posterior error probability
expressed sequence tag
proteogenomics
false discovery rate
Language English
License http://pubs.acs.org/page/policy/authorchoice_termsofuse.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a405t-2d1b7bba0766adf48620a57c3eaec8fee588e6e3f5c1fe4e7a3b1429ea5d44263
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://proxy.k.utb.cz/login?url=http://dx.doi.org/10.1021/pr300411q
PMID 23025403
PQID 1126635675
PQPubID 23479
PageCount 14
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_3703792
proquest_miscellaneous_1126635675
crossref_primary_10_1021_pr300411q
pubmed_primary_23025403
acs_journals_10_1021_pr300411q
ProviderPackageCode JG~
55A
AABXI
GNL
VF5
7~N
VG9
W1F
ACS
AEESW
AFEFF
ABMVS
ABUCX
IH9
AQSVZ
ED~
N~.
UI2
PublicationCentury 2000
PublicationDate 2012-11-02
PublicationDateYYYYMMDD 2012-11-02
PublicationDate_xml – month: 11
  year: 2012
  text: 2012-11-02
  day: 02
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of proteome research
PublicationTitleAlternate J. Proteome Res
PublicationYear 2012
Publisher American Chemical Society
Publisher_xml – name: American Chemical Society
References 10612281 - Electrophoresis. 1999 Dec;20(18):3551-67
18062665 - J Proteome Res. 2008 Jan;7(1):80-8
18493048 - Genetics. 2008 May;179(1):157-66
19253293 - Proteomics. 2009 Mar;9(5):1220-9
19627159 - J Proteome Res. 2009 Sep;8(9):4173-81
19153134 - Bioinformatics. 2009 Mar 1;25(5):670-1
19947654 - J Proteome Res. 2010 Feb 5;9(2):700-7
21030493 - Mol Cell Proteomics. 2011 Jan;10(1):M110.002527
16013882 - Anal Chem. 2005 Jul 15;77(14):4626-39
22103967 - J Proteome Res. 2012 Feb 3;11(2):1009-17
22129275 - J Proteome Res. 2012 Jan 1;11(1):247-60
20077415 - Proteomics. 2010 Mar;10(6):1127-40
24226387 - J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89
21876204 - Mol Cell Proteomics. 2011 Dec;10(12):M111.007690
19181659 - Mol Cell Proteomics. 2009 Jun;8(6):1295-305
18689838 - Bioinformatics. 2008 Aug 15;24(16):i42-8
21953092 - J Am Soc Mass Spectrom. 2011 Jul;22(7):1111-20
18052118 - J Proteome Res. 2008 Jan;7(1):40-4
20080508 - Bioinformatics. 2010 Mar 1;26(5):698-9
12403597 - Anal Chem. 2002 Oct 15;74(20):5383-92
18436743 - Science. 2008 May 16;320(5878):938-41
21460061 - Genome Res. 2011 May;21(5):756-67
21488652 - J Proteome Res. 2011 Jul 1;10(7):2949-58
21536722 - Genome Res. 2011 Jul;21(7):1193-200
16925833 - Genome Biol. 2006;7 Suppl 1:S11.1-8
14730672 - Proteomics. 2004 Jan;4(1):59-77
16171517 - BMC Genomics. 2005;6:128
19443417 - Mol Cell Proteomics. 2009 Aug;8(8):1891-907
10786296 - Proc Int Conf Intell Syst Mol Biol. 1999;:138-48
18653799 - Genome Res. 2008 Oct;18(10):1660-9
22021278 - Mol Cell Proteomics. 2012 Mar;11(3):M111.013722
21795387 - Genome Res. 2011 Nov;21(11):1872-81
18067251 - J Proteome Res. 2008 Jan;7(1):47-50
20237107 - Genome Res. 2010 Jun;20(6):837-46
17437027 - Mol Syst Biol. 2007;3:102
18558733 - J Proteome Res. 2008 Aug;7(8):3102-13
11242592 - Physiol Genomics. 2001 Mar 8;5(2):81-7
18159924 - J Proteome Res. 2008 Jan;7(1):254-65
21288048 - J Proteome Res. 2011 Apr 1;10(4):2123-7
20375075 - Mol Biol Evol. 2010 Sep;27(9):2000-13
20816881 - J Proteomics. 2010 Oct 10;73(11):2092-123
19098097 - Proc Natl Acad Sci U S A. 2008 Dec 30;105(52):21034-8
17189379 - Genome Res. 2007 Feb;17(2):231-9
22168127 - J Proteome Res. 2012 Feb 3;11(2):1152-62
16047398 - Proteomics. 2005 Aug;5(13):3475-90
12445392 - Curr Biol. 2002 Nov 19;12(22):1965-9
19260763 - PLoS Biol. 2009 Mar 3;7(3):e48
15571632 - BMC Bioinformatics. 2004 Nov 30;5:187
19875382 - Mol Cell Proteomics. 2010 Feb;9(2):415-26
19193729 - Bioinformatics. 2009 Apr 1;25(7):964-6
18426904 - Genome Res. 2008 Jul;18(7):1133-42
18067246 - J Proteome Res. 2008 Jan;7(1):29-34
19602707 - Mol Cell Proteomics. 2009 Oct;8(10):2368-81
21365749 - Proteomics. 2011 Mar;11(6):1086-93
19061407 - Anal Chem. 2009 Jan 1;81(1):146-59
14976030 - Bioinformatics. 2004 Jun 12;20(9):1466-7
18067248 - J Proteome Res. 2008 Jan;7(1):35-9
17327847 - Nat Methods. 2007 Mar;4(3):207-14
15642101 - Genome Biol. 2005;6(1):R9
19411605 - Genome Res. 2009 May;19(5):886-96
17450130 - Nat Biotechnol. 2007 May;25(5):576-83
Walters J. R. (ref59/cit59) 2010; 27
Ching A. T. (ref8/cit8) 2012; 11
Boardman P. E. (ref50/cit50) 2002; 12
Alves G. (ref55/cit55) 2008; 7
Choi H. (ref56/cit56) 2008; 7
Borchert N. (ref31/cit31) 2010; 20
Brosch M. (ref13/cit13) 2011; 21
Prasad T. S. (ref21/cit21) 2012; 11
Wasmuth J. D. (ref58/cit58) 2004; 5
Granholm V. (ref45/cit45) 2011; 11
Tanner S. (ref27/cit27) 2005; 77
Fitzgibbon M. (ref53/cit53) 2008; 7
Castellana N. E. (ref4/cit4) 2008; 105
Gouzy J. (ref25/cit25) 2009; 25
de Souza G. A. (ref22/cit22) 2010; 26
Nagaraj N. (ref1/cit1) 2012; 11
Sevinsky J. R. (ref30/cit30) 2008; 7
Findlay G. D. (ref20/cit20) 2009; 19
Robinson M. W. (ref10/cit10) 2009; 8
Blakeley P. (ref18/cit18) 2010; 10
Kwon T. (ref54/cit54) 2011; 10
Elias J. E. (ref40/cit40) 2007; 4
Kall L. (ref46/cit46) 2008; 24
Everett L. J. (ref61/cit61) 2010; 9
Jaffe J. D. (ref35/cit35) 2004; 4
Shteynberg D. (ref57/cit57) 2011; 10
Baerenfaller K. (ref6/cit6) 2008; 320
Craig R. (ref28/cit28) 2004; 20
Brosch M. (ref63/cit63) 2011; 21
Keller A. (ref48/cit48) 2002; 74
Kall L. (ref47/cit47) 2009; 25
Brunner E. (ref33/cit33) 2007; 25
Bern M. (ref60/cit60) 2011; 10
de Souza G. A. (ref16/cit16) 2011; 10
Edwards N. J. (ref9/cit9) 2007; 3
Perkins D. N. (ref26/cit26) 1999; 20
Kall L. (ref36/cit36) 2008; 7
Gupta N. (ref34/cit34) 2008; 18
Fukunishi Y. (ref24/cit24) 2001; 5
Desiere F. (ref19/cit19) 2005; 6
Jones A. R. (ref52/cit52) 2009; 9
Baudet M. (ref17/cit17) 2010; 9
Gupta N. (ref41/cit41) 2011; 22
Wang X. (ref7/cit7) 2012; 11
Choi H. (ref49/cit49) 2008; 7
Tanner S. (ref14/cit14) 2007; 17
Storey J. D. (ref38/cit38) 2002; 64
Nesvizhskii A. I. (ref43/cit43) 2010; 73
Iseli C. (ref23/cit23) 1999
Kall L. (ref39/cit39) 2008; 7
Merrihew G. E. (ref5/cit5) 2008; 18
Stanke M. (ref62/cit62) 2006; 7
Bindschedler L. V. (ref32/cit32) 2009; 8
Chaerkady R. (ref3/cit3) 2011; 21
Wang G. (ref44/cit44) 2009; 81
Hall S. L. (ref51/cit51) 2009; 8
May P. (ref11/cit11) 2008; 179
Gupta N. (ref42/cit42) 2009; 8
Schrimpf S. P. (ref2/cit2) 2009; 7
Adamidi C. (ref12/cit12) 2011; 21
Kapp E. A. (ref37/cit37) 2005; 5
Kalume D. E. (ref15/cit15) 2005; 6
Eng J. K. (ref29/cit29) 1994; 5
References_xml – volume: 9
  start-page: 1220
  issue: 5
  year: 2009
  ident: ref52/cit52
  publication-title: Proteomics
  doi: 10.1002/pmic.200800473
  contributor:
    fullname: Jones A. R.
– volume: 3
  start-page: 102
  year: 2007
  ident: ref9/cit9
  publication-title: Mol. Syst. Biol.
  doi: 10.1038/msb4100142
  contributor:
    fullname: Edwards N. J.
– volume: 320
  start-page: 938
  issue: 5878
  year: 2008
  ident: ref6/cit6
  publication-title: Science
  doi: 10.1126/science.1157956
  contributor:
    fullname: Baerenfaller K.
– volume: 6
  start-page: R9
  issue: 1
  year: 2005
  ident: ref19/cit19
  publication-title: Genome Biol.
  doi: 10.1186/gb-2004-6-1-r9
  contributor:
    fullname: Desiere F.
– volume: 5
  start-page: 3475
  issue: 13
  year: 2005
  ident: ref37/cit37
  publication-title: Proteomics
  doi: 10.1002/pmic.200500126
  contributor:
    fullname: Kapp E. A.
– volume: 21
  start-page: 756
  issue: 5
  year: 2011
  ident: ref13/cit13
  publication-title: Genome Res.
  doi: 10.1101/gr.114272.110
  contributor:
    fullname: Brosch M.
– volume: 25
  start-page: 670
  issue: 5
  year: 2009
  ident: ref25/cit25
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp024
  contributor:
    fullname: Gouzy J.
– volume: 4
  start-page: 59
  issue: 1
  year: 2004
  ident: ref35/cit35
  publication-title: Proteomics
  doi: 10.1002/pmic.200300511
  contributor:
    fullname: Jaffe J. D.
– volume: 11
  start-page: 247
  issue: 1
  year: 2012
  ident: ref21/cit21
  publication-title: J. Proteome Res.
  doi: 10.1021/pr200827k
  contributor:
    fullname: Prasad T. S.
– volume: 8
  start-page: 1295
  issue: 6
  year: 2009
  ident: ref51/cit51
  publication-title: Mol. Cell. Proteomics
  doi: 10.1074/mcp.M800394-MCP200
  contributor:
    fullname: Hall S. L.
– volume: 21
  start-page: 756
  issue: 5
  year: 2011
  ident: ref63/cit63
  publication-title: Genome Res.
  doi: 10.1101/gr.114272.110
  contributor:
    fullname: Brosch M.
– volume: 10
  start-page: 2949
  issue: 7
  year: 2011
  ident: ref54/cit54
  publication-title: J. Proteome Res.
  doi: 10.1021/pr2002116
  contributor:
    fullname: Kwon T.
– volume: 10
  start-page: 2123
  issue: 4
  year: 2011
  ident: ref60/cit60
  publication-title: J. Proteome Res.
  doi: 10.1021/pr101143m
  contributor:
    fullname: Bern M.
– volume: 7
  start-page: 3102
  issue: 8
  year: 2008
  ident: ref55/cit55
  publication-title: J. Proteome Res.
  doi: 10.1021/pr700798h
  contributor:
    fullname: Alves G.
– volume: 7
  start-page: 80
  issue: 1
  year: 2008
  ident: ref30/cit30
  publication-title: J. Proteome Res.
  doi: 10.1021/pr070198n
  contributor:
    fullname: Sevinsky J. R.
– volume: 24
  start-page: i42
  issue: 16
  year: 2008
  ident: ref46/cit46
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btn294
  contributor:
    fullname: Kall L.
– volume: 5
  start-page: 81
  issue: 2
  year: 2001
  ident: ref24/cit24
  publication-title: Physiol. Genomics
  doi: 10.1152/physiolgenomics.2001.5.2.81
  contributor:
    fullname: Fukunishi Y.
– volume: 25
  start-page: 964
  issue: 7
  year: 2009
  ident: ref47/cit47
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp021
  contributor:
    fullname: Kall L.
– volume: 7
  start-page: S11.1
  issue: 1
  year: 2006
  ident: ref62/cit62
  publication-title: Genome Biol.
  contributor:
    fullname: Stanke M.
– volume: 73
  start-page: 2092
  issue: 11
  year: 2010
  ident: ref43/cit43
  publication-title: J. Proteomics
  doi: 10.1016/j.jprot.2010.08.009
  contributor:
    fullname: Nesvizhskii A. I.
– volume: 10
  start-page: M111.007690
  issue: 12
  year: 2011
  ident: ref57/cit57
  publication-title: Mol. Cell. Proteomics
  doi: 10.1074/mcp.M111.007690
  contributor:
    fullname: Shteynberg D.
– volume: 17
  start-page: 231
  issue: 2
  year: 2007
  ident: ref14/cit14
  publication-title: Genome Res.
  doi: 10.1101/gr.5646507
  contributor:
    fullname: Tanner S.
– volume: 18
  start-page: 1133
  issue: 7
  year: 2008
  ident: ref34/cit34
  publication-title: Genome Res.
  doi: 10.1101/gr.074344.107
  contributor:
    fullname: Gupta N.
– volume: 21
  start-page: 1193
  issue: 7
  year: 2011
  ident: ref12/cit12
  publication-title: Genome Res.
  doi: 10.1101/gr.113779.110
  contributor:
    fullname: Adamidi C.
– volume: 20
  start-page: 837
  issue: 6
  year: 2010
  ident: ref31/cit31
  publication-title: Genome Res.
  doi: 10.1101/gr.103119.109
  contributor:
    fullname: Borchert N.
– volume: 26
  start-page: 698
  issue: 5
  year: 2010
  ident: ref22/cit22
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btq004
  contributor:
    fullname: de Souza G. A.
– volume: 7
  start-page: 47
  issue: 1
  year: 2008
  ident: ref56/cit56
  publication-title: J. Proteome Res.
  doi: 10.1021/pr700747q
  contributor:
    fullname: Choi H.
– volume: 8
  start-page: 4173
  issue: 9
  year: 2009
  ident: ref42/cit42
  publication-title: J. Proteome Res.
  doi: 10.1021/pr9004794
  contributor:
    fullname: Gupta N.
– volume: 12
  start-page: 1965
  issue: 22
  year: 2002
  ident: ref50/cit50
  publication-title: Curr. Biol.
  doi: 10.1016/S0960-9822(02)01296-4
  contributor:
    fullname: Boardman P. E.
– volume: 7
  start-page: 254
  issue: 1
  year: 2008
  ident: ref49/cit49
  publication-title: J. Proteome Res.
  doi: 10.1021/pr070542g
  contributor:
    fullname: Choi H.
– volume: 9
  start-page: 415
  issue: 2
  year: 2010
  ident: ref17/cit17
  publication-title: Mol. Cell. Proteomics
  doi: 10.1074/mcp.M900359-MCP200
  contributor:
    fullname: Baudet M.
– volume: 64
  start-page: 479
  year: 2002
  ident: ref38/cit38
  publication-title: J. R. Statist. Soc. B
  doi: 10.1111/1467-9868.00346
  contributor:
    fullname: Storey J. D.
– volume: 11
  start-page: M111.013722
  issue: 3
  year: 2012
  ident: ref1/cit1
  publication-title: Mol. Cell. Proteomics
  doi: 10.1074/mcp.M111.013722
  contributor:
    fullname: Nagaraj N.
– volume: 9
  start-page: 700
  issue: 2
  year: 2010
  ident: ref61/cit61
  publication-title: Journal of proteome research
  doi: 10.1021/pr900256v
  contributor:
    fullname: Everett L. J.
– volume: 21
  start-page: 1872
  issue: 11
  year: 2011
  ident: ref3/cit3
  publication-title: Genome Res.
  doi: 10.1101/gr.127951.111
  contributor:
    fullname: Chaerkady R.
– volume: 10
  start-page: 1127
  issue: 6
  year: 2010
  ident: ref18/cit18
  publication-title: Proteomics
  doi: 10.1002/pmic.200900445
  contributor:
    fullname: Blakeley P.
– volume: 19
  start-page: 886
  issue: 5
  year: 2009
  ident: ref20/cit20
  publication-title: Genome Res.
  doi: 10.1101/gr.089391.108
  contributor:
    fullname: Findlay G. D.
– volume: 5
  start-page: 976
  issue: 11
  year: 1994
  ident: ref29/cit29
  publication-title: J. Am. Soc. Mass Spectrom.
  doi: 10.1016/1044-0305(94)80016-2
  contributor:
    fullname: Eng J. K.
– volume: 105
  start-page: 21034
  issue: 52
  year: 2008
  ident: ref4/cit4
  publication-title: Proc. Natl. Acad. Sci. U.S.A.
  doi: 10.1073/pnas.0811066106
  contributor:
    fullname: Castellana N. E.
– volume: 10
  start-page: M110.002527
  issue: 1
  year: 2011
  ident: ref16/cit16
  publication-title: Mol. Cell. Proteomics
  doi: 10.1074/mcp.M110.002527
  contributor:
    fullname: de Souza G. A.
– volume: 7
  start-page: 29
  issue: 1
  year: 2008
  ident: ref39/cit39
  publication-title: J. Proteome Res.
  doi: 10.1021/pr700600n
  contributor:
    fullname: Kall L.
– volume: 179
  start-page: 157
  issue: 1
  year: 2008
  ident: ref11/cit11
  publication-title: Genetics
  doi: 10.1534/genetics.108.088336
  contributor:
    fullname: May P.
– volume: 18
  start-page: 1660
  issue: 10
  year: 2008
  ident: ref5/cit5
  publication-title: Genome Res.
  doi: 10.1101/gr.077644.108
  contributor:
    fullname: Merrihew G. E.
– volume: 74
  start-page: 5383
  issue: 20
  year: 2002
  ident: ref48/cit48
  publication-title: Anal. Chem.
  doi: 10.1021/ac025747h
  contributor:
    fullname: Keller A.
– volume: 25
  start-page: 576
  issue: 5
  year: 2007
  ident: ref33/cit33
  publication-title: Nat. Biotechnol.
  doi: 10.1038/nbt1300
  contributor:
    fullname: Brunner E.
– volume: 11
  start-page: 1152
  issue: 2
  year: 2012
  ident: ref8/cit8
  publication-title: J. Proteome Res.
  doi: 10.1021/pr200876c
  contributor:
    fullname: Ching A. T.
– volume: 8
  start-page: 2368
  issue: 10
  year: 2009
  ident: ref32/cit32
  publication-title: Mol. Cell. Proteomics
  doi: 10.1074/mcp.M900188-MCP200
  contributor:
    fullname: Bindschedler L. V.
– volume: 20
  start-page: 3551
  issue: 18
  year: 1999
  ident: ref26/cit26
  publication-title: Electrophoresis
  doi: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2
  contributor:
    fullname: Perkins D. N.
– volume: 8
  start-page: 1891
  issue: 8
  year: 2009
  ident: ref10/cit10
  publication-title: Mol. Cell. Proteomics
  doi: 10.1074/mcp.M900045-MCP200
  contributor:
    fullname: Robinson M. W.
– volume: 77
  start-page: 4626
  issue: 14
  year: 2005
  ident: ref27/cit27
  publication-title: Anal. Chem.
  doi: 10.1021/ac050102d
  contributor:
    fullname: Tanner S.
– volume: 81
  start-page: 146
  issue: 1
  year: 2009
  ident: ref44/cit44
  publication-title: Anal. Chem.
  doi: 10.1021/ac801664q
  contributor:
    fullname: Wang G.
– volume: 5
  start-page: 187
  year: 2004
  ident: ref58/cit58
  publication-title: BMC Bioinf.
  doi: 10.1186/1471-2105-5-187
  contributor:
    fullname: Wasmuth J. D.
– volume: 27
  start-page: 2000
  issue: 9
  year: 2010
  ident: ref59/cit59
  publication-title: Mol. Biol. Evol.
  doi: 10.1093/molbev/msq092
  contributor:
    fullname: Walters J. R.
– volume: 6
  start-page: 128
  year: 2005
  ident: ref15/cit15
  publication-title: BMC Genomics
  doi: 10.1186/1471-2164-6-128
  contributor:
    fullname: Kalume D. E.
– volume: 7
  start-page: e48
  issue: 3
  year: 2009
  ident: ref2/cit2
  publication-title: PLoS Biol.
  doi: 10.1371/journal.pbio.1000048
  contributor:
    fullname: Schrimpf S. P.
– volume: 7
  start-page: 35
  issue: 1
  year: 2008
  ident: ref53/cit53
  publication-title: J. Proteome Res.
  doi: 10.1021/pr7007303
  contributor:
    fullname: Fitzgibbon M.
– volume: 20
  start-page: 1466
  issue: 9
  year: 2004
  ident: ref28/cit28
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bth092
  contributor:
    fullname: Craig R.
– volume: 4
  start-page: 207
  issue: 3
  year: 2007
  ident: ref40/cit40
  publication-title: Nat. Methods
  doi: 10.1038/nmeth1019
  contributor:
    fullname: Elias J. E.
– volume: 22
  start-page: 1111
  issue: 7
  year: 2011
  ident: ref41/cit41
  publication-title: J. Am. Soc. Mass Spectrom.
  doi: 10.1007/s13361-011-0139-3
  contributor:
    fullname: Gupta N.
– volume: 11
  start-page: 1009
  issue: 2
  year: 2012
  ident: ref7/cit7
  publication-title: J. Proteome Res.
  doi: 10.1021/pr200766z
  contributor:
    fullname: Wang X.
– start-page: 138
  year: 1999
  ident: ref23/cit23
  publication-title: Proc. Int. Conf. Intell. Syst. Mol. Biol.
  contributor:
    fullname: Iseli C.
– volume: 11
  start-page: 1086
  issue: 6
  year: 2011
  ident: ref45/cit45
  publication-title: Proteomics
  doi: 10.1002/pmic.201000432
  contributor:
    fullname: Granholm V.
– volume: 7
  start-page: 40
  issue: 1
  year: 2008
  ident: ref36/cit36
  publication-title: J. Proteome Res.
  doi: 10.1021/pr700739d
  contributor:
    fullname: Kall L.
SSID ssj0015703
Score 2.3831449
Snippet Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which...
Proteogenomics has the potential to advance genome annotation through high quality peptide identifications derived from mass spectrometry experiments, which...
SourceID pubmedcentral
proquest
crossref
pubmed
acs
SourceType Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage 5221
SubjectTerms Base Sequence
Databases, Protein
Expressed Sequence Tags
Genomics
Mass Spectrometry
Nucleotides - chemistry
Probability
Proteomics
Title Addressing Statistical Biases in Nucleotide-Derived Protein Databases for Proteogenomic Search Strategies
URI http://dx.doi.org/10.1021/pr300411q
https://www.ncbi.nlm.nih.gov/pubmed/23025403
https://search.proquest.com/docview/1126635675
https://pubmed.ncbi.nlm.nih.gov/PMC3703792
Volume 11
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV3JTsMwEB0VOMCFfSlLFZZrSuPESXqEFlQhUSEBUm-VlwlEiBRoyoED387YaSrKesnBsa3EM8m8pxk_AxwJX3Jfc88VKolc4l-hG2tirRz9IBYYNdFq6V12w85tcNHjvQoc_pLBZ97x04sVhfKeZ2CORUS7Df5pXU9SBUZCqhBF5a6JvqV80OehJvSo4XTo-YYnv5ZFfooz50vQLnfrFOUlD_VRLuvq7bt441-vsAyLY5zpnBSOsQIVzFZhvlUe77YG6YnWtgY2u3MM4rSCzTTiNKW4NnTSzOkaqeNBnmp02-Snr6idK6PqQLfaIhfS9iPMW7QOjNrrY6qcooDZKWVvcbgOt-dnN62OOz53wRUE33KXaU9GUgpa61DoJCDS0xA8Uj4KVHGCyOMYQ_QTrrwEA4zI4B7FNRRcB0YAfgNms0GGW-CEXMeJZkg4SQU8adJUMsagIennpkQsq1Ajw_TH382wb1PizOtPVqwKB6XNqNHqb_zUab-0Zp9W0aQ8RIaDEc3nMQOpiBVVYbOw7mQaIl_Ejht-FaIpu086GOXt6TtZem8VuH3yuajJtv97-h1YIIDF7N5Ftguz-csI9wjE5LJmnZiu3ff6B2PQ79Y
link.rule.ids 230,314,780,784,885,2765,27076,27080,27924,27925,56738,56762,56788,56812
linkProvider American Chemical Society
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhZ3JTsMwEIZHUA7lgtgpa0BcA9mcpEdWlaUVh1bqLbLjCeRACk3KkWdn7DShBSSu3mR5Jpl_ZPszwCl3BXMls00eJ4FJ-ZdvhpKyVoauF3IM2qhZet2e3xl490M2nGJy1F0YmkROI-V6E_-bLmCfv401G8p-X4Ql5tMnrB65_DyrdwwUSapkozJTBeGKIjTbVUWgOJ-PQL9k5c_TkTPh5nYVVqY60bgoDbsGC5itQ_Oqep5tA9ILKfUZ1uzZUIpRA5epx2VKcSk30szoKVTxqEglmtfkZx8ojSdFZaCqa15woduRZi1LR4rW-prGRnkA2aiwtZhvwuD2pn_VMafvJpic5FdhOtIWgRDcCnyfy8SjpMXiLIhd5BiHCSILQ_TRTVhsJ-hhQAazKS4hZ9JTAPctaGSjDHfA8JkME-kg6ZzYY0mbhhIhepagn1PMQ9GCQ1rRaOr3eaS3tB07qpe8BSfVYlOh5mf81ei4MkNEq6i2LHiGowmNZztKElFW04Lt0iz1MJQ8UXZruS0I5gxWN1Dk7PmaLH3RBG2XnCVoO7v_zf4Imp1-9zF6vOs97MEyiSVH30N09qFRjCd4QIKkEIfaE78AytXd1w
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ3ZT-MwEIdHLEi7vHAf5QyrfQ00h5P0sbRUnF2kBYm3yMcEIkQKJOWBv56xc4jCSvCa2JZjTzK_0Yy_APzhnmCeYo7NZRLaFH8FdqQoamXo-RHHsIOGpXcxDI6v_dMbdlMFivosDE0ip5Fyk8TXb_WjSirCgHPw-Gz4UM7TD5hh9JXVJVzd3r8ma6BpUiUfldnaEdckofddtReS-aQX-iQtP1ZIvnM5g3n420zWVJrc748LsS9fP3Acv_80CzBXqU-rW5rLIkxhtgS_evVP35Yh7SplKmOzW0vrUINxph6HKXm73Eoza6gByKMiVWj3yXpfUFmXmvVAt_q84MK0IyVcXh1pBuxDKq2yrNmqYbiYr8D14Oiqd2xXf2OwOYm6wnaVI0IheDsMAq4Sn0KhNmeh9JCjjBJEFkUYoJcw6SToY0hm4JC3Q86Ur7HwqzCdjTJcBytgKkqUi6SepM-SDg0lIvTbgj55kkeiBTu0aHH1NuWxSZS7TtysWAt-19tHFw2V43-N9uqNjWkVdSKEZzga03iOq4UWxUotWCs3uhmGQjKKmdteC8IJE2gaaB735J0svTNcbo_ML-y4G1_Nfhd-XvYH8fnJ8GwTZkmBueZwo7sF08XzGLdJ5RRix5j2G8ay-as
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Addressing+Statistical+Biases+in+Nucleotide-Derived+Protein+Databases+for+Proteogenomic+Search+Strategies&rft.jtitle=Journal+of+proteome+research&rft.au=Blakeley%2C+Paul&rft.au=Overton%2C+Ian+M.&rft.au=Hubbard%2C+Simon+J.&rft.date=2012-11-02&rft.pub=American+Chemical+Society&rft.issn=1535-3893&rft.eissn=1535-3907&rft.volume=11&rft.issue=11&rft.spage=5221&rft.epage=5234&rft_id=info:doi/10.1021%2Fpr300411q&rft_id=info%3Apmid%2F23025403&rft.externalDBID=PMC3703792
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1535-3893&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1535-3893&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1535-3893&client=summon