Algorithms for Computing Bidirectional Best Hit r-Window Gene Clusters
Genome rearrangements are large-scale mutations that result in a shuffling of the genes on a genome. Despite these rearrangements, whole genome analysis of modern species has revealed sets of genes that are found close to one another in multiple species. These conserved gene clusters provide useful...
Saved in:
Published in | Frontiers in Algorithmics and Algorithmic Aspects in Information and Management pp. 275 - 286 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Berlin, Heidelberg
Springer Berlin Heidelberg
2011
|
Series | Lecture Notes in Computer Science |
Online Access | Get full text |
Cover
Loading…
Summary: | Genome rearrangements are large-scale mutations that result in a shuffling of the genes on a genome. Despite these rearrangements, whole genome analysis of modern species has revealed sets of genes that are found close to one another in multiple species. These conserved gene clusters provide useful information on gene function and genome evolution. In this paper, we consider a novel gene cluster model called bidirectional best hit r-window (BBHRW) in which the idea is to (a) capture the “frequency of common genes” in an r-window (interval of at most r consecutive genes) of each genome and (b) to further strengthen it by the bidirectional best hit criteria. We define two variants of BBHRW using two different similarity measures to define the “frequency of common genes” in two r-windows. Then the algorithmic problem is as follows: Give two genomes of length n and m, and an integer r, compute all the BBHRW clusters. A straight-forward algorithm for solving this problem is an O(nm) algorithm that compares all pairs of r-windows. In this paper, we present faster algorithms (SWBST and SWOT) for solving these two BBHRW variants. Algorithm SWBST is a simpler algorithm that solves the first variant of the BBHRW, while algorithm SWOT solves both variants of the BBHRW. Both algorithms have running time \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$O((n+m) r \lg r)$\end{document}. The algorithmic speed-up is achieved via a sliding window approach and with the use of efficient data structures. We implemented the algorithms and compare their running times for finding BBHRW clusters conserved in E. coli K-12 (2339 genes) and B. subtilis (2332 genes) with r from 1 to 30 to illustrate the speed-up achieved. We also compare the two similarity measures for these genomes to show that the choice of similarity measure is an important factor for this cluster model. |
---|---|
ISBN: | 9783642212031 3642212034 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-642-21204-8_30 |