Accurate anchoring alignment of divergent sequences

Motivation: Obtaining high quality alignments of divergent homologous sequences for cross-species sequence comparison remains a challenge. Results: We propose a novel pairwise sequence alignment algorithm, ACANA (ACcurate ANchoring Alignment), for aligning biological sequences at both local and glob...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics Vol. 22; no. 1; pp. 29 - 34
Main Authors	Huang, Weichun, Umbach, David M., Li, Leping
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 01.01.2006 Oxford Publishing Limited (England)
Subjects	Algorithms Animals Base Sequence Biological and medical sciences Cluster Analysis Computational Biology - methods Fundamental and applied biological sciences. Psychology Gene Deletion General aspects Genome Genomics Humans Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Mice Models, Statistical Molecular Sequence Data Promoter Regions, Genetic Sequence Alignment - methods Sequence Analysis, Protein - methods Sequence Homology Software Species Specificity Transcription Factors - chemistry Divergence Sequence alignment Prediction Identification Global local method Functional Original document Conserved sequence Quality Dynamic programming Transcription factor Bioinformatics Comparative study
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Motivation: Obtaining high quality alignments of divergent homologous sequences for cross-species sequence comparison remains a challenge. Results: We propose a novel pairwise sequence alignment algorithm, ACANA (ACcurate ANchoring Alignment), for aligning biological sequences at both local and global levels. Like many fast heuristic methods, ACANA uses an anchoring strategy. However, unlike others, ACANA uses a Smith–Waterman-like dynamic programming algorithm to recursively identify near-optimal regions as anchors for a global alignment. Performance evaluations using a simulated benchmark dataset and real promoter sequences suggest that ACANA is accurate and consistent, especially for divergent sequences. Specifically, we use a simulated benchmark dataset to show that ACANA has the highest sensitivity to align constrained functional sites compared to BLASTZ, CHAOS and DIALIGN for local alignment and compared to AVID, ClustalW, DIALIGN and LAGAN for global alignment. Applied to 6007 pairs of human-mouse orthologous promoter sequences, ACANA identified the largest number of conserved regions (defined as over 70% identity over 100 bp) compared to AVID, ClustalW, DIALIGN and LAGAN. In addition, the average length of conserved region identified by ACANA was the longest. Thus, we suggest that ACANA is a useful tool for identifying functional elements in cross-species sequence analysis, such as predicting transcription factor binding sites in non-coding DNA. Availability: ACANA software and test sequence data are publicly available at Supplementary information: Supplementary materials are available at Bioinformatics online. Contact: li3@niehs.nih.gov
Bibliography:	ark:/67375/HXZ-0R6P731D-L To whom correspondence should be addressed. istex:80A17440C0181CB3FA798F9F2181ABE6F0C5AAE6 Associate Editor: Charlie Hodgman ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	1367-4803 1460-2059 1367-4811
DOI:	10.1093/bioinformatics/bti772