CalcGen Sequence Assembler Using a Spatio-temporally Efficient DNA Sequence Search Algorithm

The advent of ultra-high-throughput sequencing technology produces an enormous amount of bio-sequence information. Also, the current advances in the bio-industry bring forward the era of personalized medicine using individual genome information. However, the analysis of massive number of bio-sequenc...

Full description

Saved in:

Bibliographic Details
Published in	Procedia computer science Vol. 23; pp. 122 - 128
Main Authors	Yoon, Kyong Oh, Cho, Sung-Bae
Format	Journal Article
Language	English
Published	Elsevier B.V 2013
Subjects	assembly analysis next-generation sequencing (NGS) sequence search algorithm sequence search algorithm next-generation sequencing (NGS) assembly analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The advent of ultra-high-throughput sequencing technology produces an enormous amount of bio-sequence information. Also, the current advances in the bio-industry bring forward the era of personalized medicine using individual genome information. However, the analysis of massive number of bio-sequences requires large storage, so that analysis sometimes needs supercomputer and novel software that can handle such volume of sequence information. For that type of analysis, several sequence match algorithms have been devised in terms of alignment and assembly, which are fundamental for analyzing bio- sequences. Those algorithms regard nucleotide sequences as strings and compare characters one-by-one during analysis of sequences. They use hash index tables, de Bruijn graph, Burrows-Wheeler transform method, and so on. In this paper, for time and space efficient DNA searching, we propose a simple algorithm that transforms base sequence into k-mer integer array and then we analyze the integer array transformed by unit search operator and non-unit search operator, resulting in a storage space reduction of about 0.28 fold. Furthermore, based on the proposed algorithm, we have developed a sequence analysis program called CalcGen assembler, and show the usefulness of the program with several experiments.
ISSN:	1877-0509 1877-0509
DOI:	10.1016/j.procs.2013.10.016