Efficient Compression of non-repetitive DNA sequences using Dynamic Programming

DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do...

Full description

Saved in:
Bibliographic Details
Published in2006 International Conference on Advanced Computing and Communications : Mangalore, India, 20-23 December 2006 pp. 569 - 574
Main Authors Srinivasa, K.G., Jagadish, M., Venugopal, K.R., Patnaik, L.M.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2006
Subjects
Online AccessGet full text

Cover

Loading…
Abstract DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.
AbstractList DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases.
Author Srinivasa, K.G.
Jagadish, M.
Venugopal, K.R.
Patnaik, L.M.
Author_xml – sequence: 1
  givenname: K.G.
  surname: Srinivasa
  fullname: Srinivasa, K.G.
  organization: Bangalore Univ., Bangalore
– sequence: 2
  givenname: M.
  surname: Jagadish
  fullname: Jagadish, M.
– sequence: 3
  givenname: K.R.
  surname: Venugopal
  fullname: Venugopal, K.R.
– sequence: 4
  givenname: L.M.
  surname: Patnaik
  fullname: Patnaik, L.M.
BookMark eNo1UEtOwzAUNAIkaMkFYOMLJPiXxF5GSQtIhbDogl1lJ8-VEXGKnSL19kSizGY0M09vpFmgKz96QOiekoxSoh6rpm5fM0ZIkQkmlcqLC5SoUlLBhCAlLfglWvyL_OMGJTF-khlc8Zznt6hdWes6B37C9TgcAsToRo9Hi-emNMABJje5H8DNW4UjfB_BdxDxMTq_x83J68F1-D2M-6CHYfbu0LXVXxGSMy_Rdr3a1s_ppn16qatN6hSZUlCGaUFzZWjZCSFBGqYs4czoPifWkL4vjJ1PiBHc6K7X0hjWSSl7O6ecL9HD31sHALtDcIMOp915Av4Le7ZTPA
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ADCOM.2006.4289956
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781424407163
1424407168
EndPage 574
ExternalDocumentID 4289956
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AARBI
AAWTH
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IERZE
OCL
RIE
RIL
ID FETCH-LOGICAL-i90t-e9b2a4159b17c448e8b29f032bad50fb0dd6bf2a40b43bacda8bb2c888dffb033
IEDL.DBID RIE
ISBN 142440715X
9781424407156
IngestDate Wed Aug 27 02:20:26 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-e9b2a4159b17c448e8b29f032bad50fb0dd6bf2a40b43bacda8bb2c888dffb033
PageCount 6
ParticipantIDs ieee_primary_4289956
PublicationCentury 2000
PublicationDate 2006-Dec.
PublicationDateYYYYMMDD 2006-12-01
PublicationDate_xml – month: 12
  year: 2006
  text: 2006-Dec.
PublicationDecade 2000
PublicationTitle 2006 International Conference on Advanced Computing and Communications : Mangalore, India, 20-23 December 2006
PublicationTitleAbbrev ADCOM
PublicationYear 2006
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000393535
Score 1.4084696
Snippet DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of...
SourceID ieee
SourceType Publisher
StartPage 569
SubjectTerms Bioinformatics
Compression algorithms
Data engineering
DNA
Dynamic programming
Educational institutions
Genomics
Protein engineering
Sequences
Title Efficient Compression of non-repetitive DNA sequences using Dynamic Programming
URI https://ieeexplore.ieee.org/document/4289956
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELbaTkwFWsRbHhhx68RxHmPVhyqktgxF6lb5bAchRFpV6cKv55wXAjGwJXbknGwld5_9fXeEPFiLsEJqxWLQCFBCI5hKvZSJUCrf1zqIwGmHF8tw_hI8beSmRR4bLYy1tiCf2YG7LM7yzU4f3VbZMHDoQIZt0kbgVmq1mv2UQmMqZK3dQs8pN3VKp-o-rEUzPBmOJuPVojyLqEb9UV6l8C6zLlnUdpWkkvfBMYeB_vyVsvG_hp-S_reOjz43HuqMtGx2Trp1IQdafdc9spoWiSRwDOo6S2psRncpzXYZO9i9k6Lhb5FOliPakK-p48y_0klZ0969yDG9PrCtT9az6Xo8Z1WlBfaW8JzZBHyFnjwBL9KI12wMfpJy4YMykqfAjQkhxUc4BAKUNioG8DWCZ5NirxAXpIP22EtCMVbnJtKx8HCdhbKxC5CMVDbSKgItrkjPTc92X-bS2FYzc_138w058atCQdy7JZ38cLR3GATkcF-s_hcjJbAX
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2VcoBTgRax4wNH0iZxnOVYdVGBpuVQpN4qb0EIkVZVeuHrGWdDIA7cEjtyRs7yZuz3ZgDutMawgkluhUJigOIravHESSzqM-66UnqBMNrheOZPXrzHJVs24L7Wwmitc_KZ7prDfC9freXOLJX1PBMdMH8P9hH3mVOoteoVlVxlSlml3kLsZMsqqVN57leyGTvq9YeDeVzsRpTj_iiwkuPLuAVxZVlBK3nv7jLRlZ-_kjb-1_Qj6Hwr-chzjVHH0NDpCbSqUg6k_LLbMB_lqSRwDGI6C3JsStYJSdeptdUbI0bDHyMZzvqkpl8Tw5p_JcOiqr25keF6fWBbBxbj0WIwscpaC9ZbZGeWjoTLEcsj4QQSIzYdCjdKbOoKrpidCFspXyR4iS08KrhUPBTClRg-qwR7KT2FJtqjz4Cgt26rQIbUwSdNuQ6Ni6QY14HkgZD0HNpmelabIpvGqpyZi7-bb-Fgsoinq-nD7OkSDt2ybJDtXEEz2-70NboEmbjJ34QvhIqzYA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2006+International+Conference+on+Advanced+Computing+and+Communications+%3A+Mangalore%2C+India%2C+20-23+December+2006&rft.atitle=Efficient+Compression+of+non-repetitive+DNA+sequences+using+Dynamic+Programming&rft.au=Srinivasa%2C+K.G.&rft.au=Jagadish%2C+M.&rft.au=Venugopal%2C+K.R.&rft.au=Patnaik%2C+L.M.&rft.date=2006-12-01&rft.pub=IEEE&rft.isbn=9781424407156&rft.spage=569&rft.epage=574&rft_id=info:doi/10.1109%2FADCOM.2006.4289956&rft.externalDocID=4289956
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424407156/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424407156/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424407156/sc.gif&client=summon&freeimage=true