Mining repetitive sequences using a big data ecosystem

Identifying repetitive gene sequences occurring within DNA sequences that span a collection of species is a challenge that is conceptually simple yet computationally challenging. Biological research suggests that certain regions within genomic sequences may be unchanged for hundreds of millions of y...

Full description

Saved in:
Bibliographic Details
Published in2013 IEEE International Conference on Bioinformatics and Biomedicine pp. 60 - 62
Main Authors Phinney, Michael, Hongfei Cao, Dhroso, Andi, Chi-Ren Shyu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Identifying repetitive gene sequences occurring within DNA sequences that span a collection of species is a challenge that is conceptually simple yet computationally challenging. Biological research suggests that certain regions within genomic sequences may be unchanged for hundreds of millions of years; understanding and identifying these highly preserved regions is a major challenge faced by bioinformaticians. Taking an evolutionary perspective on DNA, pinpointing these repetitive sequences is the first step to understanding functional similarities and diversities. The difficulty of this problem arises from the volume of the data required for analysis; it grows with every genome that is sequenced. Traditional approaches used to identify repetitive sequences often require the pair-wise comparison of chromosomes, which takes a significant amount of time to gather results. When comparing n chromosomes, n(n-l) individual comparisons must be made. To avoid exhaustive pair-wise comparisons, we designed an algorithm that partitions genomic sequences into search key values representing potential repetitive sequences, which are hashed into bins. With the introduction of new genomes, we only process the new sequences and aggregate new results with those that were previously processed.
DOI:10.1109/BIBM.2013.6732763