Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database sear...
Saved in:
Published in | PLoS computational biology Vol. 7; no. 1; p. e1001047 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
Public Library of Science
01.01.2011
Public Library of Science (PLoS) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space. |
---|---|
AbstractList | Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space. Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods—i.e., measures of similarity between query and target sequences—provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called P rot E mbed , which learns an embedding of protein sequences into a low-dimensional “semantic space.” Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that P rot E mbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous R ank P rop algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the P rot E mbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space. Searching a protein or DNA sequence database to find sequences that are evolutionarily related to a query is one of the foundational problems in computational biology. These database searches rely on pairwise comparisons of sequence similarity between the query and targets, but despite years of method refinements, pairwise comparisons still often fail to detect more distantly related targets. In this study, we adapt recent work from natural language processing to exploit the global structure of the data space in this detection problem. In particular, we borrow the idea of a semantic embedding, where by training on a large text data set, one learns an embedding of words into a low-dimensional semantic space such that words embedded close to each other are likely to be semantically related. We present the ProtEmbed algorithm, which learns an embedding of protein sequences into a semantic space where evolutionarily-related proteins are embedded in close proximity. The flexible training algorithm allows additional pieces of evidence, such as 3D structural information, to be incorporated in the learning process and enables ProtEmbed to achieve state-of-the-art performance for the task of detecting targets that have remote evolutionary relationships to the query. Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space.Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space. Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space. Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space. Searching a protein or DNA sequence database to find sequences that are evolutionarily related to a query is one of the foundational problems in computational biology. These database searches rely on pairwise comparisons of sequence similarity between the query and targets, but despite years of method refinements, pairwise comparisons still often fail to detect more distantly related targets. In this study, we adapt recent work from natural language processing to exploit the global structure of the data space in this detection problem. In particular, we borrow the idea of a semantic embedding, where by training on a large text data set, one learns an embedding of words into a low-dimensional semantic space such that words embedded close to each other are likely to be semantically related. We present the ProtEmbed algorithm, which learns an embedding of protein sequences into a semantic space where evolutionarily-related proteins are embedded in close proximity. The flexible training algorithm allows additional pieces of evidence, such as 3D structural information, to be incorporated in the learning process and enables ProtEmbed to achieve state-of-the-art performance for the task of detecting targets that have remote evolutionary relationships to the query. |
Audience | Academic |
Author | Noble, William Stafford Leslie, Christina Melvin, Iain Weston, Jason |
AuthorAffiliation | Stanford University, United States of America 1 NEC Laboratories America, Princeton, New Jersey, United States of America 4 Computational Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America 3 Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America 2 Google, New York, New York, United States of America |
AuthorAffiliation_xml | – name: 1 NEC Laboratories America, Princeton, New Jersey, United States of America – name: 2 Google, New York, New York, United States of America – name: Stanford University, United States of America – name: 3 Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America – name: 4 Computational Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America |
Author_xml | – sequence: 1 givenname: Iain surname: Melvin fullname: Melvin, Iain – sequence: 2 givenname: Jason surname: Weston fullname: Weston, Jason – sequence: 3 givenname: William Stafford surname: Noble fullname: Noble, William Stafford – sequence: 4 givenname: Christina surname: Leslie fullname: Leslie, Christina |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/21298082$$D View this record in MEDLINE/PubMed |
BookMark | eNqVkk1v00AQhi1URD_gHyDwDTgk7If3wxyQqhIgUgSogRvSar07cbeyvcFrV_TfM2kS1HCohPbg0fiZd2Zn39PsqIsdZNlzSqaUK_r2Oo59Z5vp2lVhSgmhpFCPshMqBJ8oLvTRvfg4O03pmhAMS_kkO2aUlZpodpL9_AADuCF0dX4JbRwgn93EZhxC7Gx_i7nGbuJ0FdYpt21E7luPWOhSXt3mC9vXMFk620C-hNZ2Q3D5rK3Ae5R8mj1e2SbBs933LPvxcfb94vNk8fXT_OJ8MXGSq2FSABOWa7HivvJQVGwFVnrmPJNUQSFcBVJxR4XkhDgpraSCy5Ir6RzFOn6WvdzqrpuYzG4xyVCORxeaKiTmW8JHe23WfWjxdibaYO4Ssa-N7XH2BgyzTCmgqsJhCu-F9V5ja-e1ZURxQK33u25j1YJ30A29bQ5ED_904crU8cZwwkrGSxR4tRPo468R0mDakBw0je0gjsloQYQkJSVIvn6QpFoxzcuy4IhOt2iNb2FCt4rY2-Hx0AaH1lkFzJ-zQhclF6LAgjcHBcgM8Huo7ZiSmS8v_4P9csi-uL-ev3vZew6Bd1vA9TGlHlbGheHOZThxaAwlZmPw_TuajcHNzuBYXPxTvNd_sOwPROkBcA |
CitedBy_id | crossref_primary_10_1038_s42256_022_00457_9 crossref_primary_10_3390_life12020307 crossref_primary_10_1002_prot_25669 crossref_primary_10_1093_bib_bbw108 crossref_primary_10_1093_bioinformatics_btv413 crossref_primary_10_1016_j_ab_2020_114013 crossref_primary_10_1093_bioinformatics_btw271 crossref_primary_10_1016_j_sbi_2011_03_005 crossref_primary_10_1038_srep32333 crossref_primary_10_1073_pnas_1102727108 crossref_primary_10_1109_TCBB_2017_2765331 crossref_primary_10_12720_jomb_3_1_17_22 crossref_primary_10_1093_bioinformatics_btt709 crossref_primary_10_1109_TCBB_2018_2789880 crossref_primary_10_1146_annurev_pharmtox_010611_134630 crossref_primary_10_1038_s41592_019_0511_y crossref_primary_10_1109_ACCESS_2019_2929363 crossref_primary_10_1016_j_sbi_2025_102984 crossref_primary_10_1093_bib_bby104 crossref_primary_10_1093_bioinformatics_btx429 |
Cites_doi | 10.1093/bioinformatics/btq034 10.1093/bioinformatics/btm358 10.1093/bioinformatics/btn567 10.1073/pnas.0308067101 10.1016/S0022-2836(05)80134-2 10.1093/bioinformatics/btp452 10.1016/S0022-2836(05)80360-2 10.1111/1467-9868.00346 10.1111/j.2517-6161.1995.tb02031.x 10.1093/nar/25.17.3389 10.1093/nar/gki096 10.1093/nar/gki408 10.1110/ps.0215902 10.1110/ps.9.2.232 10.1016/0022-2836(81)90087-5 10.1093/nar/28.1.254 |
ContentType | Journal Article |
Copyright | COPYRIGHT 2011 Public Library of Science Melvin et al. 2011 2011 Melvin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited: Melvin I, Weston J, Noble WS, Leslie C (2011) Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding. PLoS Comput Biol 7(1): e1001047. doi:10.1371/journal.pcbi.1001047 |
Copyright_xml | – notice: COPYRIGHT 2011 Public Library of Science – notice: Melvin et al. 2011 – notice: 2011 Melvin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited: Melvin I, Weston J, Noble WS, Leslie C (2011) Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding. PLoS Comput Biol 7(1): e1001047. doi:10.1371/journal.pcbi.1001047 |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM ISN ISR 7QO 8FD FR3 P64 7X8 5PM DOA |
DOI | 10.1371/journal.pcbi.1001047 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Gale In Context: Canada Gale In Context: Science Biotechnology Research Abstracts Technology Research Database Engineering Research Database Biotechnology and BioEngineering Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Engineering Research Database Biotechnology Research Abstracts Technology Research Database Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
DatabaseTitleList | MEDLINE MEDLINE - Academic Engineering Research Database |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
DocumentTitleAlternate | Detecting Remote Evolutionary Relationships |
EISSN | 1553-7358 |
ExternalDocumentID | 1313184817 oai_doaj_org_article_2a277e17bdbd4dd5add8156cd8a2073e PMC3029239 A248493554 21298082 10_1371_journal_pcbi_1001047 |
Genre | Journal Article Research Support, N.I.H., Extramural |
GeographicLocations | United States |
GeographicLocations_xml | – name: United States |
GrantInformation_xml | – fundername: NIGMS NIH HHS grantid: R01 GM074257 – fundername: NIGMS NIH HHS grantid: R01GM074257 |
GroupedDBID | --- 123 29O 2WC 53G 5VS 7X7 88E 8FE 8FG 8FH 8FI 8FJ AAFWJ AAKPC AAUCC AAWOE AAYXX ABDBF ABUWG ACGFO ACIHN ACIWK ACPRK ACUHS ADBBV ADRAZ AEAQA AENEX AEUYN AFKRA AFPKN AFRAH AHMBA ALIPV ALMA_UNASSIGNED_HOLDINGS AOIJS ARAPS AZQEC B0M BAWUL BBNVY BCNDV BENPR BGLVJ BHPHI BPHCQ BVXVI BWKFM C1A CCPQU CITATION CS3 DIK DWQXO E3Z EAP EAS EBD EBS EJD EMK EMOBN ESX F5P FPL FYUFA GNUQQ GROUPED_DOAJ GX1 HCIFZ HMCUK HYE IAO IGS INH INR IPNFZ ISN ISR ITC J9A K6V K7- KQ8 LK8 M1P M48 M7P O5R O5S OK1 OVT P2P P62 PHGZM PHGZT PIMPY PQQKQ PROAC PSQYO RIG RNS RPM SV3 TR2 TUS UKHRP WOW XSB ~8M CGR CUY CVF ECM EIF NPM PJZUB PPXIY PQGLB PMFND 7QO 8FD FR3 P64 7X8 5PM PUEGO 3V. AAPBV ABPTK M0N M~E N95 PQEST PQUKI |
ID | FETCH-LOGICAL-c637t-4e25a385f3dbde4b2fea6d2cd2617e45cbe673c156300c66a615369376cc13853 |
IEDL.DBID | M48 |
ISSN | 1553-7358 1553-734X |
IngestDate | Sun Aug 06 00:39:29 EDT 2023 Wed Aug 27 01:32:21 EDT 2025 Thu Aug 21 17:14:18 EDT 2025 Fri Jul 11 04:47:53 EDT 2025 Fri Jul 11 11:34:00 EDT 2025 Tue Jun 10 20:41:10 EDT 2025 Fri Jun 27 04:22:17 EDT 2025 Fri Jun 27 03:40:20 EDT 2025 Mon Jul 21 05:57:13 EDT 2025 Thu Apr 24 23:00:51 EDT 2025 Tue Jul 01 05:25:29 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Keywords | Biological Evolution Proteins Algorithms Sequence Analysis, DNA |
Language | English |
License | This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. Creative Commons Attribution License |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c637t-4e25a385f3dbde4b2fea6d2cd2617e45cbe673c156300c66a615369376cc13853 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Conceived and designed the experiments: JW WSN CL. Performed the experiments: IM. Analyzed the data: IM JW. Wrote the paper: IM JW WSN CL. |
OpenAccessLink | http://journals.scholarsportal.info/openUrl.xqy?doi=10.1371/journal.pcbi.1001047 |
PMID | 21298082 |
PQID | 1872839943 |
PQPubID | 23462 |
ParticipantIDs | plos_journals_1313184817 doaj_primary_oai_doaj_org_article_2a277e17bdbd4dd5add8156cd8a2073e pubmedcentral_primary_oai_pubmedcentral_nih_gov_3029239 proquest_miscellaneous_850560910 proquest_miscellaneous_1872839943 gale_infotracacademiconefile_A248493554 gale_incontextgauss_ISR_A248493554 gale_incontextgauss_ISN_A248493554 pubmed_primary_21298082 crossref_citationtrail_10_1371_journal_pcbi_1001047 crossref_primary_10_1371_journal_pcbi_1001047 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2011-01-01 |
PublicationDateYYYYMMDD | 2011-01-01 |
PublicationDate_xml | – month: 01 year: 2011 text: 2011-01-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States – name: San Francisco, USA |
PublicationTitle | PLoS computational biology |
PublicationTitleAlternate | PLoS Comput Biol |
PublicationYear | 2011 |
Publisher | Public Library of Science Public Library of Science (PLoS) |
Publisher_xml | – name: Public Library of Science – name: Public Library of Science (PLoS) |
References | J Soding (ref6) 2005; 33 D Grangier (ref14) 2005 AR Ortiz (ref15) 2002; 11 T Jaakkola (ref18) 1999 SR Eddy (ref4) 1995 SF Altschul (ref3) 1997; 25 Y Benjamini (ref21) 1995; 57 I Melvin (ref19) 2009; 25 A Heger (ref17) 2005; 33 C Kemena (ref22) 2009; 25 C Burges (ref13) 2005 T Joachims (ref12) 2002 J Weston (ref7) 2004; 101 L Rychlewski (ref5) 2000; 9 JD Storey (ref20) 2002; 64 AG Murzin (ref10) 1995; 247 A Heger (ref23) 2007; 23 SE Brenner (ref16) 2000; 28 SF Altschul (ref1) 1990; 215 T Smith (ref2) 1981; 147 R Collobert (ref9) 2008 R Herbrich (ref11) 2000 B Bai (ref8) 2009 C Yeats (ref24) 2010; 26 |
References_xml | – start-page: 359 year: 2005 ident: ref14 article-title: Inferring document similarity from hyperlinks. – volume: 26 start-page: 745 year: 2010 ident: ref24 article-title: A fast and automated solution for accurately resolving protein domain architectures. publication-title: Bioinformatics doi: 10.1093/bioinformatics/btq034 – start-page: 149 year: 1999 ident: ref18 article-title: Using the Fisher kernel method to detect remote protein homologies. – volume: 23 start-page: 2361 year: 2007 ident: ref23 article-title: The global trace graph, a novel paradigm for searching protein sequence databases. publication-title: Bioinformatics doi: 10.1093/bioinformatics/btm358 – volume: 25 start-page: 121 year: 2009 ident: ref19 article-title: RANKPROP: a web server for protein remote homology detection. publication-title: Bioinformatics doi: 10.1093/bioinformatics/btn567 – volume: 101 start-page: 6559 year: 2004 ident: ref7 article-title: Protein ranking: From local to global structure in the protein similarity network. publication-title: Proc Natl Acad Sci U S A doi: 10.1073/pnas.0308067101 – start-page: 160 year: 2008 ident: ref9 article-title: A unified architecture for natural language processing: deep neural networks with multitask learning. – volume: 247 start-page: 536 year: 1995 ident: ref10 article-title: SCOP: A structural classification of proteins database for the investigation of sequences and structures. publication-title: J Mol Biol doi: 10.1016/S0022-2836(05)80134-2 – start-page: 133 year: 2002 ident: ref12 article-title: Optimizing search engines using clickthrough data. – volume: 25 start-page: 2455 year: 2009 ident: ref22 article-title: Upcoming challenges for multiple sequence alignment methods in the high-throughput era. publication-title: Bioinformatics doi: 10.1093/bioinformatics/btp452 – volume: 215 start-page: 403 year: 1990 ident: ref1 article-title: A basic local alignment search tool. publication-title: J Mol Biol doi: 10.1016/S0022-2836(05)80360-2 – start-page: 64 year: 2009 ident: ref8 article-title: Polynomial semantic indexing. – volume: 64 start-page: 479 year: 2002 ident: ref20 article-title: A direct approach to false discovery rates. publication-title: J R Stat Soc Series B doi: 10.1111/1467-9868.00346 – start-page: 114 year: 1995 ident: ref4 article-title: Multiple alignment using hidden Markov models. – start-page: 89 year: 2005 ident: ref13 article-title: Learning to rank using gradient descent. – volume: 57 start-page: 289 year: 1995 ident: ref21 article-title: Controlling the false discovery rate: a practical and powerful approach to multiple testing. publication-title: J R Stat Soc Series B doi: 10.1111/j.2517-6161.1995.tb02031.x – volume: 25 start-page: 3389 year: 1997 ident: ref3 article-title: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. publication-title: Nucleic Acids Res doi: 10.1093/nar/25.17.3389 – volume: 33 start-page: 188 year: 2005 ident: ref17 article-title: ADDA: a domain database with global coverage of the protein universe. publication-title: Nucleic Acids Res doi: 10.1093/nar/gki096 – volume: 33 start-page: W244 year: 2005 ident: ref6 article-title: The HHpred interactive server for protein homology detection and structure prediction. publication-title: Nucleic Acids Res doi: 10.1093/nar/gki408 – volume: 11 start-page: 2606 year: 2002 ident: ref15 article-title: MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison. publication-title: Protein Sci doi: 10.1110/ps.0215902 – start-page: 115 year: 2000 ident: ref11 article-title: Large margin rank boundaries for ordinal regression. – volume: 9 start-page: 232 year: 2000 ident: ref5 article-title: Comparison of sequence profiles: Strategies for structural predictions using sequence information. publication-title: Protein Sci doi: 10.1110/ps.9.2.232 – volume: 147 start-page: 195 year: 1981 ident: ref2 article-title: Identification of common molecular subsequences. publication-title: J Mol Biol doi: 10.1016/0022-2836(81)90087-5 – volume: 28 start-page: 254 year: 2000 ident: ref16 article-title: The ASTRAL compendium for sequence and structure analysis. publication-title: Nucleic Acids Res doi: 10.1093/nar/28.1.254 |
SSID | ssj0035896 |
Score | 2.1019363 |
Snippet | Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query.... Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query.... |
SourceID | plos doaj pubmedcentral proquest gale pubmed crossref |
SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source |
StartPage | e1001047 |
SubjectTerms | Algorithms Bioinformatics Biological Evolution Computational Biology/Protein Homology Detection DNA Markov processes Methods Neighborhoods Physiological aspects Proteins Proteins - chemistry Proteins - genetics Semantics Sequence Analysis, DNA Studies |
SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Li9swEBYlUOil9L3uC7UUenLXlmTLPm4fy7bQPbRdyKFg9Bh3DVknxEkh_74zthzi0mUvxTk544NmRppP1vj7GHuTmbwQVtpYAs4mldQQl17rWHiHeNg5qXs5n6_n-dmF-jLP5gdSX9QTNtADD447FkZoDam23nrlfYbzkQhOnC-MwPQEWn2x5o2bqWENllnRK3ORKE6spZqHj-akTo9DjN6tnG16BqKEpFUOilLP3b9foWerxbL7F_z8u4vyoCyd3mN3A57kJ8M47rNb0D5gtweFyd1D9vMj0CkB1ie-BowKcPgdks2sd3w9tsJdNquO98JDvGduaNqO2x1fUJ943GEcgXdwhVFoHIcrC55K3iN2cfrpx4ezOAgqxC6XehMrEJmRRVZLdCYoK2owuRfOEy07qMxZyLV0KXGGJS7PDaHBHAFM7lyKz8nHbNYuWzhiXJWQK9xd2MSActqbRLgy07X3ifKqSCMmR49WLrCNk-jFouqP0DTuOgYHVRSHKsQhYvH-qdXAtnGD_XsK1t6WuLL7G5hBVcig6qYMithrCnVFbBgttdv8Mtuuqz5_P69OhCoUMdCra42-TYzeBqN6iYN1JnzigC4jlq2J5RHl1TioDoeIF0kb4JhejblW4VSn8xvTwnKLNoVGMFiWSkaMX2NTEKIlDBixJ0N67p2DIKUsEPFFTE8Sd-K96T9tc9kzjstE4EagfPo_3P2M3Rney9PvOZtt1lt4gcBuY1_2c_gPQNdNtg priority: 102 providerName: Directory of Open Access Journals |
Title | Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding |
URI | https://www.ncbi.nlm.nih.gov/pubmed/21298082 https://www.proquest.com/docview/1872839943 https://www.proquest.com/docview/850560910 https://pubmed.ncbi.nlm.nih.gov/PMC3029239 https://doaj.org/article/2a277e17bdbd4dd5add8156cd8a2073e http://dx.doi.org/10.1371/journal.pcbi.1001047 |
Volume | 7 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1ba9swFBZtymAvZfd6l6CNwZ5cHEm27IcxkrVZN2gY3QJ5GBhZkttA6mR2MpZ_v3NkO8yjYTg4kBwZzkU6Rxd_HyFvQxXFLOOZzy30JhHk1k-MlD4zGuphrbl0dD6Xk-hiKr7MwtkBaTlbGwNWd07tkE9qWi5Of__cfoAO_96xNshB2-h0pbO5wxQKhDwkR5CbJHIaXIrdvgIP4yRqXqDb19LBA7MkDmLWyVUO0n83cPdWi2V1V1X67-HKv7LV-AE5bspMOqzj4iE5sMUjcq8mntw-Jj_OLG4eQNqipQVnWWp_NTGoyi0t2xNyN_NVRR0fEXWADvOiotmWLvD4uF-Bey2t7C04Z66pvc2swUz4hEzH598_XvgNz4KvIy7XvrAsVDwOc24yY0XGcqsiw7RBtHYrQp3ZSHI9QCixQEeRwiIxgrom0noA7fhT0iuWhT0hVCQ2EjDpyAJlhZZGBUwnocyNCYQR8cAjvLVoqhsQcuTCWKRuZ03CZKQ2UIouSRuXeMTftVrVIBz_kR-hs3ayCKHtfliW12nTI1OmmJR2IDNQWhgTwkCPyDnaxIrBuGc98gZdnSJIRoGncK7VpqrSz98m6ZCJWCAwvdgrdNURetcI5UtQVqvmzQcwGYJvdSRPMK5apSpQES5kPACdXrexlsIIgNs6qrDLDcjEEmrEJBHcI3SPTIyFLpaGHnlWh-fOOG2we0R2Ardjve4_xfzGAZHzgMH8IHm-95kvyP16DR4_L0lvXW7sKyji1lmfHMqZhHs8_tQnR8PR2WgM36PzydervlsY6bue-wd2JU1h |
linkProvider | Scholars Portal |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Detecting+remote+evolutionary+relationships+among+proteins+by+large-scale+semantic+embedding&rft.jtitle=PLoS+computational+biology&rft.au=Melvin%2C+Iain&rft.au=Weston%2C+Jason&rft.au=Noble%2C+William+Stafford&rft.au=Leslie%2C+Christina&rft.date=2011-01-01&rft.eissn=1553-7358&rft.volume=7&rft.issue=1&rft.spage=e1001047&rft_id=info:doi/10.1371%2Fjournal.pcbi.1001047&rft_id=info%3Apmid%2F21298082&rft.externalDocID=21298082 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1553-7358&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1553-7358&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1553-7358&client=summon |