Efficient Compression of non-repetitive DNA sequences using Dynamic Programming
DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do...
Saved in:
Published in | 2006 International Conference on Advanced Computing and Communications : Mangalore, India, 20-23 December 2006 pp. 569 - 574 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2006
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases. |
---|---|
AbstractList | DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of DNA ( namely A, G, T and C ), the massive size DNA sequences compels the need for efficient compression. General text compression methods do not make use of characteristics specific to DNA sequences. DNA specific compression algorithms usually take advantage of repeat sequences. DNA sequences with high repetition rates can be best compressed by dictionary-based compression algorithms. However segments of DNA that do not reappear in the sequence are compressed using different text compression scheme. In this paper, we propose an encoding scheme to compress non repeat regions of DNA sequences, based on dynamic programming approach. In order to test the efficiency of the method we incorporate the encoding scheme in a DNA-specific algorithm, DNAPack. The performance of this algorithm is compared with various DNA compression algorithms. The results show that our method achieve better results in many cases. |
Author | Srinivasa, K.G. Jagadish, M. Venugopal, K.R. Patnaik, L.M. |
Author_xml | – sequence: 1 givenname: K.G. surname: Srinivasa fullname: Srinivasa, K.G. organization: Bangalore Univ., Bangalore – sequence: 2 givenname: M. surname: Jagadish fullname: Jagadish, M. – sequence: 3 givenname: K.R. surname: Venugopal fullname: Venugopal, K.R. – sequence: 4 givenname: L.M. surname: Patnaik fullname: Patnaik, L.M. |
BookMark | eNo1UEtOwzAUNAIkaMkFYOMLJPiXxF5GSQtIhbDogl1lJ8-VEXGKnSL19kSizGY0M09vpFmgKz96QOiekoxSoh6rpm5fM0ZIkQkmlcqLC5SoUlLBhCAlLfglWvyL_OMGJTF-khlc8Zznt6hdWes6B37C9TgcAsToRo9Hi-emNMABJje5H8DNW4UjfB_BdxDxMTq_x83J68F1-D2M-6CHYfbu0LXVXxGSMy_Rdr3a1s_ppn16qatN6hSZUlCGaUFzZWjZCSFBGqYs4czoPifWkL4vjJ1PiBHc6K7X0hjWSSl7O6ecL9HD31sHALtDcIMOp915Av4Le7ZTPA |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ADCOM.2006.4289956 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9781424407163 1424407168 |
EndPage | 574 |
ExternalDocumentID | 4289956 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AARBI AAWTH ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IERZE OCL RIE RIL |
ID | FETCH-LOGICAL-i90t-e9b2a4159b17c448e8b29f032bad50fb0dd6bf2a40b43bacda8bb2c888dffb033 |
IEDL.DBID | RIE |
ISBN | 142440715X 9781424407156 |
IngestDate | Wed Aug 27 02:20:26 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i90t-e9b2a4159b17c448e8b29f032bad50fb0dd6bf2a40b43bacda8bb2c888dffb033 |
PageCount | 6 |
ParticipantIDs | ieee_primary_4289956 |
PublicationCentury | 2000 |
PublicationDate | 2006-Dec. |
PublicationDateYYYYMMDD | 2006-12-01 |
PublicationDate_xml | – month: 12 year: 2006 text: 2006-Dec. |
PublicationDecade | 2000 |
PublicationTitle | 2006 International Conference on Advanced Computing and Communications : Mangalore, India, 20-23 December 2006 |
PublicationTitleAbbrev | ADCOM |
PublicationYear | 2006 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000393535 |
Score | 1.4084696 |
Snippet | DNA compression has been a subject of great interest since the availability of genomic databases. Although only two bits are sufficient to encode four bases of... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 569 |
SubjectTerms | Bioinformatics Compression algorithms Data engineering DNA Dynamic programming Educational institutions Genomics Protein engineering Sequences |
Title | Efficient Compression of non-repetitive DNA sequences using Dynamic Programming |
URI | https://ieeexplore.ieee.org/document/4289956 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELbaTkwFWsRbHhhx68RxHmPVhyqktgxF6lb5bAchRFpV6cKv55wXAjGwJXbknGwld5_9fXeEPFiLsEJqxWLQCFBCI5hKvZSJUCrf1zqIwGmHF8tw_hI8beSmRR4bLYy1tiCf2YG7LM7yzU4f3VbZMHDoQIZt0kbgVmq1mv2UQmMqZK3dQs8pN3VKp-o-rEUzPBmOJuPVojyLqEb9UV6l8C6zLlnUdpWkkvfBMYeB_vyVsvG_hp-S_reOjz43HuqMtGx2Trp1IQdafdc9spoWiSRwDOo6S2psRncpzXYZO9i9k6Lhb5FOliPakK-p48y_0klZ0969yDG9PrCtT9az6Xo8Z1WlBfaW8JzZBHyFnjwBL9KI12wMfpJy4YMykqfAjQkhxUc4BAKUNioG8DWCZ5NirxAXpIP22EtCMVbnJtKx8HCdhbKxC5CMVDbSKgItrkjPTc92X-bS2FYzc_138w058atCQdy7JZ38cLR3GATkcF-s_hcjJbAX |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2VcoBTgRax4wNH0iZxnOVYdVGBpuVQpN4qb0EIkVZVeuHrGWdDIA7cEjtyRs7yZuz3ZgDutMawgkluhUJigOIravHESSzqM-66UnqBMNrheOZPXrzHJVs24L7Wwmitc_KZ7prDfC9freXOLJX1PBMdMH8P9hH3mVOoteoVlVxlSlml3kLsZMsqqVN57leyGTvq9YeDeVzsRpTj_iiwkuPLuAVxZVlBK3nv7jLRlZ-_kjb-1_Qj6Hwr-chzjVHH0NDpCbSqUg6k_LLbMB_lqSRwDGI6C3JsStYJSdeptdUbI0bDHyMZzvqkpl8Tw5p_JcOiqr25keF6fWBbBxbj0WIwscpaC9ZbZGeWjoTLEcsj4QQSIzYdCjdKbOoKrpidCFspXyR4iS08KrhUPBTClRg-qwR7KT2FJtqjz4Cgt26rQIbUwSdNuQ6Ni6QY14HkgZD0HNpmelabIpvGqpyZi7-bb-Fgsoinq-nD7OkSDt2ybJDtXEEz2-70NboEmbjJ34QvhIqzYA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2006+International+Conference+on+Advanced+Computing+and+Communications+%3A+Mangalore%2C+India%2C+20-23+December+2006&rft.atitle=Efficient+Compression+of+non-repetitive+DNA+sequences+using+Dynamic+Programming&rft.au=Srinivasa%2C+K.G.&rft.au=Jagadish%2C+M.&rft.au=Venugopal%2C+K.R.&rft.au=Patnaik%2C+L.M.&rft.date=2006-12-01&rft.pub=IEEE&rft.isbn=9781424407156&rft.spage=569&rft.epage=574&rft_id=info:doi/10.1109%2FADCOM.2006.4289956&rft.externalDocID=4289956 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424407156/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424407156/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424407156/sc.gif&client=summon&freeimage=true |