Fast compression of huge DNA sequence data
DNA sequences can be enormous in size. There have been several DNA sequence oriented compression methods like Biocompress, DNACompress, Cfact, CTW+LZ, and DNADP. These compression methods can achieve high compression ratio, but sacrifice too much of time. For example, CTW+LZ takes several hours to c...
Saved in:
Published in | 2012 5th International Conference on Biomedical Engineering and Informatics pp. 885 - 888 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.10.2012
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | DNA sequences can be enormous in size. There have been several DNA sequence oriented compression methods like Biocompress, DNACompress, Cfact, CTW+LZ, and DNADP. These compression methods can achieve high compression ratio, but sacrifice too much of time. For example, CTW+LZ takes several hours to compress a sequence HEMCMVCG of 227 KB. DNADP takes about 20 minutes to compress standard benchmark sequences. Here we introduce an improved RLE method, which has lower computation complex. Thus, it significantly improves the running time against previous DNA compression programs. Our improved LRE can achieve compression ratio of 1.862 bits per base. It only takes about 1 minute on a 2.1 GHz Core 2 duo processor to compress a 250MB chromosomes sequence file. And we use the Delta Encoding to reduce the second sequence to 4.8MB. |
---|---|
ISBN: | 9781467311830 1467311839 |
DOI: | 10.1109/BMEI.2012.6512909 |