Alignment of protein sequences by their profiles
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thir...
Saved in:
Published in | Protein science Vol. 13; no. 4; pp. 1071 - 1087 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Bristol
Cold Spring Harbor Laboratory Press
01.04.2004
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure‐based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence‐profile alignment by PSI‐BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure‐based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large‐scale comparative protein structure modeling of all known sequences. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Article and publication are at http://www.proteinscience.org/cgi/doi/10.1110/ps.03379804 Reprint requests to: Marc A. Marti-Renom, Mission Bay Genentech Hall, University of California, San Francisco, San Francisco, CA 94143, USA; e-mail: marcius@salilab.org; fax: (415) 514-4231. |
ISSN: | 0961-8368 1469-896X |
DOI: | 10.1110/ps.03379804 |