SCOPEC: a database of protein catalytic domains
Motivation: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains,...
Saved in:
Published in | Bioinformatics Vol. 20; no. suppl-1; pp. i130 - i136 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
England
Oxford University Press
04.08.2004
Oxford Publishing Limited (England) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Motivation: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains, SCOPEC, by combining structural domain information from SCOP, full-length sequence information from Swiss-Prot, and verified functional information from the Enzyme Classification (EC) database. Two major problems need to be overcome to create a database of domain–function relationships; (1) for sequences, EC numbers are typically assigned to whole sequences rather than the functional unit, and (2) The Protein Data Bank (PDB) structures elucidated from a larger multi-domain protein will often have EC annotation although the relevant catalytic domain may lie elsewhere. Results: SCOPEC entries have high quality enzyme assignments; having passed both computational and manual checks. SCOPEC currently contains entries for 75% of all EC annotations in the PDB. Overall, EC number is fairly well conserved within a superfamily, even when the proteins are distantly related. Initial analysis is encouraging; suggesting that there is a 50:50 chance of conserved function in distant homologues first detected by a third iteration PSI-BLAST search. Therefore, we envisage that a knowledge-based approach to function assignment using the domain–EC relationships in SCOPEC will gain a marked improvement over this base line. Availability: The SCOPEC database is a valuable resource in the analysis and prediction of protein structure and function. It can be obtained or queried at our website http://www.enzome.com |
---|---|
AbstractList | MOTIVATIONDomains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains, SCOPEC, by combining structural domain information from SCOP, full-length sequence information from Swiss-Prot, and verified functional information from the Enzyme Classification (EC) database. Two major problems need to be overcome to create a database of domain-function relationships; (1) for sequences, EC numbers are typically assigned to whole sequences rather than the functional unit, and (2) The Protein Data Bank (PDB) structures elucidated from a larger multi-domain protein will often have EC annotation although the relevant catalytic domain may lie elsewhere.RESULTSSCOPEC entries have high quality enzyme assignments; having passed both computational and manual checks. SCOPEC currently contains entries for 75% of all EC annotations in the PDB. Overall, EC number is fairly well conserved within a superfamily, even when the proteins are distantly related. Initial analysis is encouraging; suggesting that there is a 50:50 chance of conserved function in distant homologues first detected by a third iteration PSI-BLAST search. Therefore, we envisage that a knowledge-based approach to function assignment using the domain-EC relationships in SCOPEC will gain a marked improvement over this base line.AVAILABILITYThe SCOPEC database is a valuable resource in the analysis and prediction of protein structure and function. It can be obtained or queried at our website http://www.enzome.com MOTIVATION: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains, SCOPEC, by combining structural domain information from SCOP, full- length sequence information from Swiss-Prot, and verified functional information from the Enzyme Classification (EC) database. Two major problems need to be overcome to create a database of domain-function relationships; (1) for sequences, EC numbers are typically assigned to whole sequences rather than the functional unit, and (2) The Protein Data Bank (PDB) structures elucidated from a larger multi-domain protein will often have EC annotation although the relevant catalytic domain may lie elsewhere. RESULTS: SCOPEC entries have high quality enzyme assignments; having passed both computational and manual checks. SCOPEC currently contains entries for 75% of all EC annotations in the PDB. Overall, EC number is fairly well conserved within a superfamily, even when the proteins are distantly related. Initial analysis is encouraging; suggesting that there is a 50:50 chance of conserved function in distant homologues first detected by a third iteration PSI-BLAST search. Therefore, we envisage that a knowledge-based approach to function assignment using the domain-EC relationships in SCOPEC will gain a marked improvement over this base line. AVAILABILITY: The SCOPEC database is a valuable resource in the analysis and prediction of protein structure and function. It can be obtained or queried at our website http://www.enzome.com Abstract Motivation: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains, SCOPEC, by combining structural domain information from SCOP, full-length sequence information from Swiss-Prot, and verified functional information from the Enzyme Classification (EC) database. Two major problems need to be overcome to create a database of domain–function relationships; (1) for sequences, EC numbers are typically assigned to whole sequences rather than the functional unit, and (2) The Protein Data Bank (PDB) structures elucidated from a larger multi-domain protein will often have EC annotation although the relevant catalytic domain may lie elsewhere. Results: SCOPEC entries have high quality enzyme assignments; having passed both computational and manual checks. SCOPEC currently contains entries for 75% of all EC annotations in the PDB. Overall, EC number is fairly well conserved within a superfamily, even when the proteins are distantly related. Initial analysis is encouraging; suggesting that there is a 50:50 chance of conserved function in distant homologues first detected by a third iteration PSI-BLAST search. Therefore, we envisage that a knowledge-based approach to function assignment using the domain–EC relationships in SCOPEC will gain a marked improvement over this base line. Availability: The SCOPEC database is a valuable resource in the analysis and prediction of protein structure and function. It can be obtained or queried at our website http://www.enzome.com Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains, SCOPEC, by combining structural domain information from SCOP, full-length sequence information from Swiss-Prot, and verified functional information from the Enzyme Classification (EC) database. Two major problems need to be overcome to create a database of domain-function relationships; (1) for sequences, EC numbers are typically assigned to whole sequences rather than the functional unit, and (2) The Protein Data Bank (PDB) structures elucidated from a larger multi-domain protein will often have EC annotation although the relevant catalytic domain may lie elsewhere. SCOPEC entries have high quality enzyme assignments; having passed both computational and manual checks. SCOPEC currently contains entries for 75% of all EC annotations in the PDB. Overall, EC number is fairly well conserved within a superfamily, even when the proteins are distantly related. Initial analysis is encouraging; suggesting that there is a 50:50 chance of conserved function in distant homologues first detected by a third iteration PSI-BLAST search. Therefore, we envisage that a knowledge-based approach to function assignment using the domain-EC relationships in SCOPEC will gain a marked improvement over this base line. The SCOPEC database is a valuable resource in the analysis and prediction of protein structure and function. It can be obtained or queried at our website http://www.enzome.com |
Author | George, Richard A. Swindells, Mark B. Spriggs, Ruth V. Al-Lazikani, Bissan Thornton, Janet M. |
Author_xml | – sequence: 1 givenname: Richard A. surname: George fullname: George, Richard A. organization: Inpharmatica Ltd, 60 Charlotte street, London W1T 2NU, UK and – sequence: 2 givenname: Ruth V. surname: Spriggs fullname: Spriggs, Ruth V. organization: Inpharmatica Ltd, 60 Charlotte street, London W1T 2NU, UK and – sequence: 3 givenname: Janet M. surname: Thornton fullname: Thornton, Janet M. organization: European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK – sequence: 4 givenname: Bissan surname: Al-Lazikani fullname: Al-Lazikani, Bissan organization: Inpharmatica Ltd, 60 Charlotte street, London W1T 2NU, UK and – sequence: 5 givenname: Mark B. surname: Swindells fullname: Swindells, Mark B. organization: Inpharmatica Ltd, 60 Charlotte street, London W1T 2NU, UK and |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/15262791$$D View this record in MEDLINE/PubMed |
BookMark | eNqFkF1LwzAUhoNM3Kb-BKV44V1dvpt4J2U6YbiBE8SbkKYpdq7NbFpw_95Ih6I3Xp2Xc57z9Y7BoHa1BeAMwSsEJZlkpSvrwjWVbkvjJ1n7Kqk4ACNEOYwxZHIQNOFJTAUkQzD2fg0hQ5TSIzBEDHOcSDQCk8d0sZym15GOct3qTHsbuSLaNq61ZR2ZkNvswoYod5Uua38CDgu98fZ0H4_B0-10lc7i-eLuPr2Zx4Yh3sZWUFEgCGEmsIGGaoJtUnCcBcFySSXkWHJhMNVaaMZJbpktCEUUJhYzQ47BZT83XPLeWd-qqvTGbja6tq7zivMEI4LkvyBKmOCMkABe_AHXrmvq8IRCUnDOMIIBYj1kGud9Ywu1bcpKNzuFoPryXf32XfW-h77z_fAuq2z-07U3OgBxD5S-tR_fdd28KZ6QhKnZ84ta0iV_YGyuVuQTTxWR0A |
CODEN | BOINFP |
CitedBy_id | crossref_primary_10_1016_j_jmb_2007_06_017 crossref_primary_10_1021_ci050372x crossref_primary_10_1371_journal_pcbi_1000605 crossref_primary_10_1186_1471_2105_9_145 crossref_primary_10_1007_s00894_006_0165_4 crossref_primary_10_1002_prot_20506 crossref_primary_10_1016_j_cbpa_2011_03_008 crossref_primary_10_1146_annurev_biophys_083012_130432 crossref_primary_10_1371_journal_pcbi_1000700 crossref_primary_10_7554_eLife_65543 crossref_primary_10_1007_s10539_018_9613_7 crossref_primary_10_1093_gbe_evx119 crossref_primary_10_1007_s00239_016_9732_1 crossref_primary_10_1002_bip_20291 crossref_primary_10_1016_j_jmb_2007_07_034 crossref_primary_10_1016_j_jmgm_2005_04_004 crossref_primary_10_1021_jm050480a crossref_primary_10_1093_nar_gkt1242 crossref_primary_10_1186_1471_2105_7_53 crossref_primary_10_1186_s12859_017_1519_x crossref_primary_10_1016_j_jmb_2008_11_057 |
ContentType | Journal Article |
Copyright | Copyright Oxford Publishing Limited(England) Aug 4 2004 |
Copyright_xml | – notice: Copyright Oxford Publishing Limited(England) Aug 4 2004 |
DBID | BSCLL CGR CUY CVF ECM EIF NPM AAYXX CITATION 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 |
DOI | 10.1093/bioinformatics/bth948 |
DatabaseName | Istex Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed CrossRef Aluminium Industry Abstracts Biotechnology Research Abstracts Ceramic Abstracts Computer and Information Systems Abstracts Corrosion Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts Materials Business File Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts Oncogenes and Growth Factors Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database Aerospace Database Copper Technical Reference Library AIDS and Cancer Research Abstracts Materials Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Civil Engineering Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) CrossRef Materials Research Database Oncogenes and Growth Factors Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Nucleic Acids Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Materials Business File Aerospace Database Copper Technical Reference Library Engineered Materials Abstracts Biotechnology Research Abstracts AIDS and Cancer Research Abstracts Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Civil Engineering Abstracts Aluminium Industry Abstracts Electronics & Communications Abstracts Ceramic Abstracts METADEX Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional Solid State and Superconductivity Abstracts Engineering Research Database Corrosion Abstracts MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic Engineering Research Database CrossRef MEDLINE Materials Research Database |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 1460-2059 1367-4811 |
EndPage | i136 |
ExternalDocumentID | 669662311 10_1093_bioinformatics_bth948 15262791 ark_67375_HXZ_P4P6N55L_T |
Genre | Research Support, Non-U.S. Gov't Journal Article |
GroupedDBID | -~X .2P .I3 482 48X 5GY AAMVS AAVAP ABPTD ACGFS ACUFI ADEIU ADRIX ADZXQ ALMA_UNASSIGNED_HOLDINGS BCRHZ BSCLL CZ4 EE~ F5P F9B H5~ HAR HW0 IOX KOP KSI KSN NGC Q5Y RD5 ROX ROZ RXO TLC TN5 TOX WH7 ~91 --- -E4 .-4 .DC .GJ 0R~ 1TH 23N 2WC 4.4 53G 5WA 70D AAIJN AAIMJ AAJKP AAJQQ AAKPC AAMDB AAOGV AAPQZ AAPXW AASNB AAUQX AAVLN ABEFU ABEUO ABIXL ABNKS ABQLI ABQTQ ABWST ABXVV ABZBJ ACIWK ACPRK ACYTK ADBBV ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADOCK ADPDF ADRDM ADRTK ADVEK ADYVW ADZTZ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AFXEN AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AI. AIJHB AJEEA AJEUX AKHUL AKWXX ALTZX ALUQC APIBT APWMN ARIXL ASPBG AVWKF AXUDD AYOIW AZFZN AZVOD BAWUL BAYMD BHONS BQDIO BQUQU BSWAC BTQHN C1A C45 CAG CDBKE CGR COF CS3 CUY CVF DAKXR DIK DILTD DU5 D~K EBD EBS ECM EIF EJD EMOBN FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GX1 HVGLF HZ~ J21 JXSIZ KAQDR KQ8 M-Z M49 MK~ ML0 N9A NLBLG NMDNZ NOMLY NPM NTWIH NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PB- PEELM PQQKQ Q1. R44 RIG RNI RNS ROL RUSNO RW1 RZO SV3 TEORI TJP TR2 VH1 W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ~KM AAYXX CITATION 7QF 7QO 7QQ 7SC 7SE 7SP 7SR 7TA 7TB 7TM 7TO 7U5 8BQ 8FD F28 FR3 H8D H8G H94 JG9 JQ2 K9. KR7 L7M L~C L~D P64 7X8 |
ID | FETCH-LOGICAL-c516t-e848f1000b82c0c4a32e7f62ba325d949062968c24aa8a563de5ef341407e25c3 |
ISSN | 1367-4803 |
IngestDate | Fri Aug 16 22:32:02 EDT 2024 Fri Jun 28 01:18:49 EDT 2024 Fri Sep 13 01:34:38 EDT 2024 Fri Aug 23 01:40:14 EDT 2024 Sat Sep 28 07:40:42 EDT 2024 Wed Jan 17 04:54:31 EST 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | suppl-1 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c516t-e848f1000b82c0c4a32e7f62ba325d949062968c24aa8a563de5ef341407e25c3 |
Notes | local:bth948 Contact: richardg@inpharmatica.co.uk ark:/67375/HXZ-P4P6N55L-T istex:08AD5D41261D5ACCC3CE5C1123CDBDC815B61F8D ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2 |
OpenAccessLink | https://academic.oup.com/bioinformatics/article-pdf/20/suppl_1/i130/580068/bth948.pdf |
PMID | 15262791 |
PQID | 198665210 |
PQPubID | 36124 |
ParticipantIDs | proquest_miscellaneous_66721319 proquest_miscellaneous_17586533 proquest_journals_198665210 crossref_primary_10_1093_bioinformatics_bth948 pubmed_primary_15262791 istex_primary_ark_67375_HXZ_P4P6N55L_T |
PublicationCentury | 2000 |
PublicationDate | 2004-08-04 |
PublicationDateYYYYMMDD | 2004-08-04 |
PublicationDate_xml | – month: 08 year: 2004 text: 2004-08-04 day: 04 |
PublicationDecade | 2000 |
PublicationPlace | England |
PublicationPlace_xml | – name: England – name: Oxford |
PublicationTitle | Bioinformatics |
PublicationTitleAlternate | Bioinformatics |
PublicationYear | 2004 |
Publisher | Oxford University Press Oxford Publishing Limited (England) |
Publisher_xml | – name: Oxford University Press – name: Oxford Publishing Limited (England) |
SSID | ssj0051444 ssj0005056 |
Score | 1.9537072 |
Snippet | Motivation: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the... Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of... Abstract Motivation: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when... MOTIVATION: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the... MOTIVATIONDomains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the... |
SourceID | proquest crossref pubmed istex |
SourceType | Aggregation Database Index Database Publisher |
StartPage | i130 |
SubjectTerms | Catalysis Computer Simulation Database Management Systems Databases, Protein Information Storage and Retrieval - methods Models, Chemical Protein Structure, Tertiary Proteins - chemistry Sequence Alignment - methods Sequence Analysis, Protein - methods |
Title | SCOPEC: a database of protein catalytic domains |
URI | https://api.istex.fr/ark:/67375/HXZ-P4P6N55L-T/fulltext.pdf https://www.ncbi.nlm.nih.gov/pubmed/15262791 https://www.proquest.com/docview/198665210/abstract/ https://search.proquest.com/docview/17586533 https://search.proquest.com/docview/66721319 |
Volume | 20 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3db9MwELfKJiReEN8L4yMPiJcqbeLYTsJbqQbVNEYlOlTxEjmuo1WDBHWptO3f4B_mLnbSFG2I8RKltmXVvp8vPt_Pd4S8iRRPfBYJMEtU7DEVZF6ySHz0PgKcY6W4rtkWx2Jywg7nfN7r_eqwltZVNlBX194r-R-pQhnIFW_J3kKybadQAO8gX3iChOH5TzL-Mv48PRib-8pI9cRPUk1hxuALSDDHs5lLDMm6KH_IpT2Ya1y4y9JGTa0jNWPY0YuG6W5Te3SOCczZeecqfn802HjeV80ROxLm-1_bmtlpuSosQf9QFrrqf2rrRt-9I3m1PDNJpfrvAQEWqc0hBKspcKyjN0MMnx77RlfpbpnVpVbZUr8DqnNMXJoGHf25DKyXRjc_xbV63sTAyrYmCguq08QE7tyOrP3HF6_lIRoPfJhud5Sabu6QXRolHC36j_MOb8ivUwK3I26uhSXhcLuboelma8Ozi2v34mZrpt7VzB6Q-9YccUcGWw9JTxePyF2ToPTyMRkahL1zpdvgyy1z1-LLbfHlWnw9IScfDmbjiWdTbHiKB6LydMziHF08WUyVr5gMqY5yQTN44YuEYRTrBJYsZVLGkotwobnOYefD_EhTrsKnZKcoC71HXKHDjFFFMZEL06GSEZNUQAFskZnKA4cMmnlIf5pIKulf598hb-vZalvL1RnSECOeTubf0imbimPOj9KZQ_ab6Uzt6jxPgwQjOdLAd8jrthZUJ_rDAO_lGpqArSzA3Lm5hRARDeAj5ZBnRkqbf85hbFESPL_tqPbJvc0CekF2qtVav4R9bZW9qnH2G8_bpFU |
link.rule.ids | 315,786,790,27957,27958 |
linkProvider | Geneva Foundation for Medical Education and Research |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SCOPEC%3A+a+database+of+protein+catalytic+domains&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=George%2C+Richard+A.&rft.au=Spriggs%2C+Ruth+V.&rft.au=Thornton%2C+Janet+M.&rft.au=Al-Lazikani%2C+Bissan&rft.date=2004-08-04&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=20&rft.issue=suppl_1&rft.spage=i130&rft.epage=i136&rft_id=info:doi/10.1093%2Fbioinformatics%2Fbth948&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_bth948 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |