A Novel Algorithm for Finding Interspersed Repeat Regions

The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exh...

Full description

Saved in:
Bibliographic Details
Published inGenomics, proteomics & bioinformatics Vol. 2; no. 3; pp. 184 - 191
Main Authors Li, Dongdong, Wang, Zhengzhi, Ni, Qingshan
Format Journal Article
LanguageEnglish
Published England Elsevier Ltd 01.08.2004
College of Mechatronics Engineering and Automation, National University of Defense Technology, Changsha 410073, China
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The analysis of repeats in the DNA sequences is an important subject in bioinformatics. In this paper, we propose a novel projection-assemble algorithm to find unknown interspersed repeats in DNA sequences. The algorithm employs random projection algorithm to obtain a candidate fragment set, and exhaustive search algorithm to search each pair of fragments from the candidate fragment set to find potential linkage, and then assemble them together. The complexity of our projection-assemble algorithm is nearly linear to the length of the genome sequence, and its memory usage is limited by the hardware. We tested our algorithm with both simulated data and real biology data, and the results show that our projection-assemble algorithm is efficient. By means of this algorithm, we found an un-labeled repeat region that occurs five times in Escherichia coli genome, with its length more than 5,000bp, and a mismatch probability less than 4%.
Bibliography:content type line 23
SourceType-Scholarly Journals-1
ObjectType-Correspondence-1
ISSN:1672-0229
2210-3244
DOI:10.1016/S1672-0229(04)02024-8