Efficient Compression of non-repetitive DNA sequences using Dynamic Programming

DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do...

Full description

Saved in:
Bibliographic Details
Published in2006 International Conference on Advanced Computing and Communications : Mangalore, India, 20-23 December 2006 pp. 569 - 574
Main Authors Srinivasa, K.G., Jagadish, M., Venugopal, K.R., Patnaik, L.M.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2006
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.
ISBN:142440715X
9781424407156
DOI:10.1109/ADCOM.2006.4289956