Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding

Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database sear...

Full description

Saved in:
Bibliographic Details
Published inPLoS computational biology Vol. 7; no. 1; p. e1001047
Main Authors Melvin, Iain, Weston, Jason, Noble, William Stafford, Leslie, Christina
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 01.01.2011
Public Library of Science (PLoS)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space.
AbstractList Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space.
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods—i.e., measures of similarity between query and target sequences—provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called P rot E mbed , which learns an embedding of protein sequences into a low-dimensional “semantic space.” Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that P rot E mbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous R ank P rop algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the P rot E mbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space. Searching a protein or DNA sequence database to find sequences that are evolutionarily related to a query is one of the foundational problems in computational biology. These database searches rely on pairwise comparisons of sequence similarity between the query and targets, but despite years of method refinements, pairwise comparisons still often fail to detect more distantly related targets. In this study, we adapt recent work from natural language processing to exploit the global structure of the data space in this detection problem. In particular, we borrow the idea of a semantic embedding, where by training on a large text data set, one learns an embedding of words into a low-dimensional semantic space such that words embedded close to each other are likely to be semantically related. We present the ProtEmbed algorithm, which learns an embedding of protein sequences into a semantic space where evolutionarily-related proteins are embedded in close proximity. The flexible training algorithm allows additional pieces of evidence, such as 3D structural information, to be incorporated in the learning process and enables ProtEmbed to achieve state-of-the-art performance for the task of detecting targets that have remote evolutionary relationships to the query.
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space.Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space.
  Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space.
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods--i.e., measures of similarity between query and target sequences--provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e.g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called ProtEmbed, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that ProtEmbed achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RankProp algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the ProtEmbed embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space. Searching a protein or DNA sequence database to find sequences that are evolutionarily related to a query is one of the foundational problems in computational biology. These database searches rely on pairwise comparisons of sequence similarity between the query and targets, but despite years of method refinements, pairwise comparisons still often fail to detect more distantly related targets. In this study, we adapt recent work from natural language processing to exploit the global structure of the data space in this detection problem. In particular, we borrow the idea of a semantic embedding, where by training on a large text data set, one learns an embedding of words into a low-dimensional semantic space such that words embedded close to each other are likely to be semantically related. We present the ProtEmbed algorithm, which learns an embedding of protein sequences into a semantic space where evolutionarily-related proteins are embedded in close proximity. The flexible training algorithm allows additional pieces of evidence, such as 3D structural information, to be incorporated in the learning process and enables ProtEmbed to achieve state-of-the-art performance for the task of detecting targets that have remote evolutionary relationships to the query.
Audience Academic
Author Noble, William Stafford
Leslie, Christina
Melvin, Iain
Weston, Jason
AuthorAffiliation Stanford University, United States of America
1 NEC Laboratories America, Princeton, New Jersey, United States of America
4 Computational Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
3 Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
2 Google, New York, New York, United States of America
AuthorAffiliation_xml – name: 1 NEC Laboratories America, Princeton, New Jersey, United States of America
– name: 2 Google, New York, New York, United States of America
– name: Stanford University, United States of America
– name: 3 Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
– name: 4 Computational Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America
Author_xml – sequence: 1
  givenname: Iain
  surname: Melvin
  fullname: Melvin, Iain
– sequence: 2
  givenname: Jason
  surname: Weston
  fullname: Weston, Jason
– sequence: 3
  givenname: William Stafford
  surname: Noble
  fullname: Noble, William Stafford
– sequence: 4
  givenname: Christina
  surname: Leslie
  fullname: Leslie, Christina
BackLink https://www.ncbi.nlm.nih.gov/pubmed/21298082$$D View this record in MEDLINE/PubMed
BookMark eNqVkk1v00AQhi1URD_gHyDwDTgk7If3wxyQqhIgUgSogRvSar07cbeyvcFrV_TfM2kS1HCohPbg0fiZd2Zn39PsqIsdZNlzSqaUK_r2Oo59Z5vp2lVhSgmhpFCPshMqBJ8oLvTRvfg4O03pmhAMS_kkO2aUlZpodpL9_AADuCF0dX4JbRwgn93EZhxC7Gx_i7nGbuJ0FdYpt21E7luPWOhSXt3mC9vXMFk620C-hNZ2Q3D5rK3Ae5R8mj1e2SbBs933LPvxcfb94vNk8fXT_OJ8MXGSq2FSABOWa7HivvJQVGwFVnrmPJNUQSFcBVJxR4XkhDgpraSCy5Ir6RzFOn6WvdzqrpuYzG4xyVCORxeaKiTmW8JHe23WfWjxdibaYO4Ssa-N7XH2BgyzTCmgqsJhCu-F9V5ja-e1ZURxQK33u25j1YJ30A29bQ5ED_904crU8cZwwkrGSxR4tRPo468R0mDakBw0je0gjsloQYQkJSVIvn6QpFoxzcuy4IhOt2iNb2FCt4rY2-Hx0AaH1lkFzJ-zQhclF6LAgjcHBcgM8Huo7ZiSmS8v_4P9csi-uL-ev3vZew6Bd1vA9TGlHlbGheHOZThxaAwlZmPw_TuajcHNzuBYXPxTvNd_sOwPROkBcA
CitedBy_id crossref_primary_10_1038_s42256_022_00457_9
crossref_primary_10_3390_life12020307
crossref_primary_10_1002_prot_25669
crossref_primary_10_1093_bib_bbw108
crossref_primary_10_1093_bioinformatics_btv413
crossref_primary_10_1016_j_ab_2020_114013
crossref_primary_10_1093_bioinformatics_btw271
crossref_primary_10_1016_j_sbi_2011_03_005
crossref_primary_10_1038_srep32333
crossref_primary_10_1073_pnas_1102727108
crossref_primary_10_1109_TCBB_2017_2765331
crossref_primary_10_12720_jomb_3_1_17_22
crossref_primary_10_1093_bioinformatics_btt709
crossref_primary_10_1109_TCBB_2018_2789880
crossref_primary_10_1146_annurev_pharmtox_010611_134630
crossref_primary_10_1038_s41592_019_0511_y
crossref_primary_10_1109_ACCESS_2019_2929363
crossref_primary_10_1016_j_sbi_2025_102984
crossref_primary_10_1093_bib_bby104
crossref_primary_10_1093_bioinformatics_btx429
Cites_doi 10.1093/bioinformatics/btq034
10.1093/bioinformatics/btm358
10.1093/bioinformatics/btn567
10.1073/pnas.0308067101
10.1016/S0022-2836(05)80134-2
10.1093/bioinformatics/btp452
10.1016/S0022-2836(05)80360-2
10.1111/1467-9868.00346
10.1111/j.2517-6161.1995.tb02031.x
10.1093/nar/25.17.3389
10.1093/nar/gki096
10.1093/nar/gki408
10.1110/ps.0215902
10.1110/ps.9.2.232
10.1016/0022-2836(81)90087-5
10.1093/nar/28.1.254
ContentType Journal Article
Copyright COPYRIGHT 2011 Public Library of Science
Melvin et al. 2011
2011 Melvin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited: Melvin I, Weston J, Noble WS, Leslie C (2011) Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding. PLoS Comput Biol 7(1): e1001047. doi:10.1371/journal.pcbi.1001047
Copyright_xml – notice: COPYRIGHT 2011 Public Library of Science
– notice: Melvin et al. 2011
– notice: 2011 Melvin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited: Melvin I, Weston J, Noble WS, Leslie C (2011) Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding. PLoS Comput Biol 7(1): e1001047. doi:10.1371/journal.pcbi.1001047
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
ISN
ISR
7QO
8FD
FR3
P64
7X8
5PM
DOA
DOI 10.1371/journal.pcbi.1001047
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Gale In Context: Canada
Gale In Context: Science
Biotechnology Research Abstracts
Technology Research Database
Engineering Research Database
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Engineering Research Database
Biotechnology Research Abstracts
Technology Research Database
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitleList

MEDLINE
MEDLINE - Academic


Engineering Research Database
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
DocumentTitleAlternate Detecting Remote Evolutionary Relationships
EISSN 1553-7358
ExternalDocumentID 1313184817
oai_doaj_org_article_2a277e17bdbd4dd5add8156cd8a2073e
PMC3029239
A248493554
21298082
10_1371_journal_pcbi_1001047
Genre Journal Article
Research Support, N.I.H., Extramural
GeographicLocations United States
GeographicLocations_xml – name: United States
GrantInformation_xml – fundername: NIGMS NIH HHS
  grantid: R01 GM074257
– fundername: NIGMS NIH HHS
  grantid: R01GM074257
GroupedDBID ---
123
29O
2WC
53G
5VS
7X7
88E
8FE
8FG
8FH
8FI
8FJ
AAFWJ
AAKPC
AAUCC
AAWOE
AAYXX
ABDBF
ABUWG
ACGFO
ACIHN
ACIWK
ACPRK
ACUHS
ADBBV
ADRAZ
AEAQA
AENEX
AEUYN
AFKRA
AFPKN
AFRAH
AHMBA
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AOIJS
ARAPS
AZQEC
B0M
BAWUL
BBNVY
BCNDV
BENPR
BGLVJ
BHPHI
BPHCQ
BVXVI
BWKFM
C1A
CCPQU
CITATION
CS3
DIK
DWQXO
E3Z
EAP
EAS
EBD
EBS
EJD
EMK
EMOBN
ESX
F5P
FPL
FYUFA
GNUQQ
GROUPED_DOAJ
GX1
HCIFZ
HMCUK
HYE
IAO
IGS
INH
INR
IPNFZ
ISN
ISR
ITC
J9A
K6V
K7-
KQ8
LK8
M1P
M48
M7P
O5R
O5S
OK1
OVT
P2P
P62
PHGZM
PHGZT
PIMPY
PQQKQ
PROAC
PSQYO
RIG
RNS
RPM
SV3
TR2
TUS
UKHRP
WOW
XSB
~8M
CGR
CUY
CVF
ECM
EIF
NPM
PJZUB
PPXIY
PQGLB
PMFND
7QO
8FD
FR3
P64
7X8
5PM
PUEGO
3V.
AAPBV
ABPTK
M0N
M~E
N95
PQEST
PQUKI
ID FETCH-LOGICAL-c637t-4e25a385f3dbde4b2fea6d2cd2617e45cbe673c156300c66a615369376cc13853
IEDL.DBID M48
ISSN 1553-7358
1553-734X
IngestDate Sun Aug 06 00:39:29 EDT 2023
Wed Aug 27 01:32:21 EDT 2025
Thu Aug 21 17:14:18 EDT 2025
Fri Jul 11 04:47:53 EDT 2025
Fri Jul 11 11:34:00 EDT 2025
Tue Jun 10 20:41:10 EDT 2025
Fri Jun 27 04:22:17 EDT 2025
Fri Jun 27 03:40:20 EDT 2025
Mon Jul 21 05:57:13 EDT 2025
Thu Apr 24 23:00:51 EDT 2025
Tue Jul 01 05:25:29 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Biological Evolution
Proteins
Algorithms
Sequence Analysis, DNA
Language English
License This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
Creative Commons Attribution License
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c637t-4e25a385f3dbde4b2fea6d2cd2617e45cbe673c156300c66a615369376cc13853
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Conceived and designed the experiments: JW WSN CL. Performed the experiments: IM. Analyzed the data: IM JW. Wrote the paper: IM JW WSN CL.
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.1371/journal.pcbi.1001047
PMID 21298082
PQID 1872839943
PQPubID 23462
ParticipantIDs plos_journals_1313184817
doaj_primary_oai_doaj_org_article_2a277e17bdbd4dd5add8156cd8a2073e
pubmedcentral_primary_oai_pubmedcentral_nih_gov_3029239
proquest_miscellaneous_850560910
proquest_miscellaneous_1872839943
gale_infotracacademiconefile_A248493554
gale_incontextgauss_ISR_A248493554
gale_incontextgauss_ISN_A248493554
pubmed_primary_21298082
crossref_citationtrail_10_1371_journal_pcbi_1001047
crossref_primary_10_1371_journal_pcbi_1001047
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2011-01-01
PublicationDateYYYYMMDD 2011-01-01
PublicationDate_xml – month: 01
  year: 2011
  text: 2011-01-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: San Francisco, USA
PublicationTitle PLoS computational biology
PublicationTitleAlternate PLoS Comput Biol
PublicationYear 2011
Publisher Public Library of Science
Public Library of Science (PLoS)
Publisher_xml – name: Public Library of Science
– name: Public Library of Science (PLoS)
References J Soding (ref6) 2005; 33
D Grangier (ref14) 2005
AR Ortiz (ref15) 2002; 11
T Jaakkola (ref18) 1999
SR Eddy (ref4) 1995
SF Altschul (ref3) 1997; 25
Y Benjamini (ref21) 1995; 57
I Melvin (ref19) 2009; 25
A Heger (ref17) 2005; 33
C Kemena (ref22) 2009; 25
C Burges (ref13) 2005
T Joachims (ref12) 2002
J Weston (ref7) 2004; 101
L Rychlewski (ref5) 2000; 9
JD Storey (ref20) 2002; 64
AG Murzin (ref10) 1995; 247
A Heger (ref23) 2007; 23
SE Brenner (ref16) 2000; 28
SF Altschul (ref1) 1990; 215
T Smith (ref2) 1981; 147
R Collobert (ref9) 2008
R Herbrich (ref11) 2000
B Bai (ref8) 2009
C Yeats (ref24) 2010; 26
References_xml – start-page: 359
  year: 2005
  ident: ref14
  article-title: Inferring document similarity from hyperlinks.
– volume: 26
  start-page: 745
  year: 2010
  ident: ref24
  article-title: A fast and automated solution for accurately resolving protein domain architectures.
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btq034
– start-page: 149
  year: 1999
  ident: ref18
  article-title: Using the Fisher kernel method to detect remote protein homologies.
– volume: 23
  start-page: 2361
  year: 2007
  ident: ref23
  article-title: The global trace graph, a novel paradigm for searching protein sequence databases.
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btm358
– volume: 25
  start-page: 121
  year: 2009
  ident: ref19
  article-title: RANKPROP: a web server for protein remote homology detection.
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btn567
– volume: 101
  start-page: 6559
  year: 2004
  ident: ref7
  article-title: Protein ranking: From local to global structure in the protein similarity network.
  publication-title: Proc Natl Acad Sci U S A
  doi: 10.1073/pnas.0308067101
– start-page: 160
  year: 2008
  ident: ref9
  article-title: A unified architecture for natural language processing: deep neural networks with multitask learning.
– volume: 247
  start-page: 536
  year: 1995
  ident: ref10
  article-title: SCOP: A structural classification of proteins database for the investigation of sequences and structures.
  publication-title: J Mol Biol
  doi: 10.1016/S0022-2836(05)80134-2
– start-page: 133
  year: 2002
  ident: ref12
  article-title: Optimizing search engines using clickthrough data.
– volume: 25
  start-page: 2455
  year: 2009
  ident: ref22
  article-title: Upcoming challenges for multiple sequence alignment methods in the high-throughput era.
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp452
– volume: 215
  start-page: 403
  year: 1990
  ident: ref1
  article-title: A basic local alignment search tool.
  publication-title: J Mol Biol
  doi: 10.1016/S0022-2836(05)80360-2
– start-page: 64
  year: 2009
  ident: ref8
  article-title: Polynomial semantic indexing.
– volume: 64
  start-page: 479
  year: 2002
  ident: ref20
  article-title: A direct approach to false discovery rates.
  publication-title: J R Stat Soc Series B
  doi: 10.1111/1467-9868.00346
– start-page: 114
  year: 1995
  ident: ref4
  article-title: Multiple alignment using hidden Markov models.
– start-page: 89
  year: 2005
  ident: ref13
  article-title: Learning to rank using gradient descent.
– volume: 57
  start-page: 289
  year: 1995
  ident: ref21
  article-title: Controlling the false discovery rate: a practical and powerful approach to multiple testing.
  publication-title: J R Stat Soc Series B
  doi: 10.1111/j.2517-6161.1995.tb02031.x
– volume: 25
  start-page: 3389
  year: 1997
  ident: ref3
  article-title: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs.
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/25.17.3389
– volume: 33
  start-page: 188
  year: 2005
  ident: ref17
  article-title: ADDA: a domain database with global coverage of the protein universe.
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gki096
– volume: 33
  start-page: W244
  year: 2005
  ident: ref6
  article-title: The HHpred interactive server for protein homology detection and structure prediction.
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gki408
– volume: 11
  start-page: 2606
  year: 2002
  ident: ref15
  article-title: MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison.
  publication-title: Protein Sci
  doi: 10.1110/ps.0215902
– start-page: 115
  year: 2000
  ident: ref11
  article-title: Large margin rank boundaries for ordinal regression.
– volume: 9
  start-page: 232
  year: 2000
  ident: ref5
  article-title: Comparison of sequence profiles: Strategies for structural predictions using sequence information.
  publication-title: Protein Sci
  doi: 10.1110/ps.9.2.232
– volume: 147
  start-page: 195
  year: 1981
  ident: ref2
  article-title: Identification of common molecular subsequences.
  publication-title: J Mol Biol
  doi: 10.1016/0022-2836(81)90087-5
– volume: 28
  start-page: 254
  year: 2000
  ident: ref16
  article-title: The ASTRAL compendium for sequence and structure analysis.
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/28.1.254
SSID ssj0035896
Score 2.1019363
Snippet Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query....
  Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query....
SourceID plos
doaj
pubmedcentral
proquest
gale
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e1001047
SubjectTerms Algorithms
Bioinformatics
Biological Evolution
Computational Biology/Protein Homology Detection
DNA
Markov processes
Methods
Neighborhoods
Physiological aspects
Proteins
Proteins - chemistry
Proteins - genetics
Semantics
Sequence Analysis, DNA
Studies
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1Li9swEBYlUOil9L3uC7UUenLXlmTLPm4fy7bQPbRdyKFg9Bh3DVknxEkh_74zthzi0mUvxTk544NmRppP1vj7GHuTmbwQVtpYAs4mldQQl17rWHiHeNg5qXs5n6_n-dmF-jLP5gdSX9QTNtADD447FkZoDam23nrlfYbzkQhOnC-MwPQEWn2x5o2bqWENllnRK3ORKE6spZqHj-akTo9DjN6tnG16BqKEpFUOilLP3b9foWerxbL7F_z8u4vyoCyd3mN3A57kJ8M47rNb0D5gtweFyd1D9vMj0CkB1ie-BowKcPgdks2sd3w9tsJdNquO98JDvGduaNqO2x1fUJ943GEcgXdwhVFoHIcrC55K3iN2cfrpx4ezOAgqxC6XehMrEJmRRVZLdCYoK2owuRfOEy07qMxZyLV0KXGGJS7PDaHBHAFM7lyKz8nHbNYuWzhiXJWQK9xd2MSActqbRLgy07X3ifKqSCMmR49WLrCNk-jFouqP0DTuOgYHVRSHKsQhYvH-qdXAtnGD_XsK1t6WuLL7G5hBVcig6qYMithrCnVFbBgttdv8Mtuuqz5_P69OhCoUMdCra42-TYzeBqN6iYN1JnzigC4jlq2J5RHl1TioDoeIF0kb4JhejblW4VSn8xvTwnKLNoVGMFiWSkaMX2NTEKIlDBixJ0N67p2DIKUsEPFFTE8Sd-K96T9tc9kzjstE4EagfPo_3P2M3Rney9PvOZtt1lt4gcBuY1_2c_gPQNdNtg
  priority: 102
  providerName: Directory of Open Access Journals
Title Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding
URI https://www.ncbi.nlm.nih.gov/pubmed/21298082
https://www.proquest.com/docview/1872839943
https://www.proquest.com/docview/850560910
https://pubmed.ncbi.nlm.nih.gov/PMC3029239
https://doaj.org/article/2a277e17bdbd4dd5add8156cd8a2073e
http://dx.doi.org/10.1371/journal.pcbi.1001047
Volume 7
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1ba9swFBZtymAvZfd6l6CNwZ5cHEm27IcxkrVZN2gY3QJ5GBhZkttA6mR2MpZ_v3NkO8yjYTg4kBwZzkU6Rxd_HyFvQxXFLOOZzy30JhHk1k-MlD4zGuphrbl0dD6Xk-hiKr7MwtkBaTlbGwNWd07tkE9qWi5Of__cfoAO_96xNshB2-h0pbO5wxQKhDwkR5CbJHIaXIrdvgIP4yRqXqDb19LBA7MkDmLWyVUO0n83cPdWi2V1V1X67-HKv7LV-AE5bspMOqzj4iE5sMUjcq8mntw-Jj_OLG4eQNqipQVnWWp_NTGoyi0t2xNyN_NVRR0fEXWADvOiotmWLvD4uF-Bey2t7C04Z66pvc2swUz4hEzH598_XvgNz4KvIy7XvrAsVDwOc24yY0XGcqsiw7RBtHYrQp3ZSHI9QCixQEeRwiIxgrom0noA7fhT0iuWhT0hVCQ2EjDpyAJlhZZGBUwnocyNCYQR8cAjvLVoqhsQcuTCWKRuZ03CZKQ2UIouSRuXeMTftVrVIBz_kR-hs3ayCKHtfliW12nTI1OmmJR2IDNQWhgTwkCPyDnaxIrBuGc98gZdnSJIRoGncK7VpqrSz98m6ZCJWCAwvdgrdNURetcI5UtQVqvmzQcwGYJvdSRPMK5apSpQES5kPACdXrexlsIIgNs6qrDLDcjEEmrEJBHcI3SPTIyFLpaGHnlWh-fOOG2we0R2Ardjve4_xfzGAZHzgMH8IHm-95kvyP16DR4_L0lvXW7sKyji1lmfHMqZhHs8_tQnR8PR2WgM36PzydervlsY6bue-wd2JU1h
linkProvider Scholars Portal
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Detecting+remote+evolutionary+relationships+among+proteins+by+large-scale+semantic+embedding&rft.jtitle=PLoS+computational+biology&rft.au=Melvin%2C+Iain&rft.au=Weston%2C+Jason&rft.au=Noble%2C+William+Stafford&rft.au=Leslie%2C+Christina&rft.date=2011-01-01&rft.eissn=1553-7358&rft.volume=7&rft.issue=1&rft.spage=e1001047&rft_id=info:doi/10.1371%2Fjournal.pcbi.1001047&rft_id=info%3Apmid%2F21298082&rft.externalDocID=21298082
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1553-7358&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1553-7358&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1553-7358&client=summon