Parsing strategies for BWT compression
Block-sorting is an innovative compression mechanism introduced by Burrows and Wheeler (1994), and has been the subject of considerable scrutiny in the years since it first became public. Block-sorting compression is usually described as involving three steps: permuting the input one block at a time...
Saved in:
Published in | Proceedings DCC 2001. Data Compression Conference pp. 429 - 438 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
2001
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Block-sorting is an innovative compression mechanism introduced by Burrows and Wheeler (1994), and has been the subject of considerable scrutiny in the years since it first became public. Block-sorting compression is usually described as involving three steps: permuting the input one block at a time through the use of the Burrows-Wheeler transform (BWT); applying a move-to-front (MTF) transform to each of the permuted blocks; and then entropy coding the output with a Huffman or arithmetic coder. In this paper we prepend a fourth transformation to this sequence: parsing. In the BWT implementations that have been considered to date the unit of transmission has been taken to be the ASCII character. But there is no particular reason why this should be so, and a range of other strategies can be used to construct the sequence of symbols that is fed into the BWT process. We consider some of the issues associated with making this change, and show that in some situations the introduction of a simple parsing stage allows improved compression to be obtained compared to an otherwise equivalent character-based BWT implementation. We also describe an MTF-like ranking transformation that caters better to large-alphabet situations than does the strict MTF rule used in conventional BWT implementations. |
---|---|
AbstractList | Block-sorting is an innovative compression mechanism introduced by Burrows and Wheeler (1994), and has been the subject of considerable scrutiny in the years since it first became public. Block-sorting compression is usually described as involving three steps: permuting the input one block at a time through the use of the Burrows-Wheeler transform (BWT); applying a move-to-front (MTF) transform to each of the permuted blocks; and then entropy coding the output with a Huffman or arithmetic coder. In this paper we prepend a fourth transformation to this sequence: parsing. In the BWT implementations that have been considered to date the unit of transmission has been taken to be the ASCII character. But there is no particular reason why this should be so, and a range of other strategies can be used to construct the sequence of symbols that is fed into the BWT process. We consider some of the issues associated with making this change, and show that in some situations the introduction of a simple parsing stage allows improved compression to be obtained compared to an otherwise equivalent character-based BWT implementation. We also describe an MTF-like ranking transformation that caters better to large-alphabet situations than does the strict MTF rule used in conventional BWT implementations. |
Author | Isal, R.Y.K. Moffat, A. |
Author_xml | – sequence: 1 givenname: R.Y.K. surname: Isal fullname: Isal, R.Y.K. organization: Dept. of Comput. Sci. & Software Eng., Melbourne Univ., Vic., Australia – sequence: 2 givenname: A. surname: Moffat fullname: Moffat, A. |
BookMark | eNotjztPwzAUhS1RJJrCjMSUiS3pvY6fI4RXpUowFDFWTnxdGdGksrPw74lUpnOGT0ffKdhiGAdi7BahRgS7fmrbmgNgbVGjFhesAK2sRGgQFmyJoEw1d3HFipy_AWZW4ZLdf7iU43Ao85TcRIdIuQxjKh-_dmU_Hk-Jco7jcM0ug_vJdPOfK_b58rxr36rt--umfdhWETWfKm-V5s46D1wE3kknhJS2UR5U47veC26CkdSh6TkEraiX1sxmCkQAGXyzYnfn3UhE-1OKR5d-9-dLzR-bZz-q |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/DCC.2001.917174 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EndPage | 438 |
ExternalDocumentID | 917174 |
GroupedDBID | -~X 29F 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS |
ID | FETCH-LOGICAL-i172t-d9672a9ad024f2b5a4455936d063dbcd428f85eb18c20f76ec598068604f05fd3 |
IEDL.DBID | RIE |
ISBN | 0769510310 9780769510316 |
ISSN | 1068-0314 |
IngestDate | Tue Aug 26 18:52:41 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i172t-d9672a9ad024f2b5a4455936d063dbcd428f85eb18c20f76ec598068604f05fd3 |
PageCount | 10 |
ParticipantIDs | ieee_primary_917174 |
PublicationCentury | 2000 |
PublicationDate | 20010000 |
PublicationDateYYYYMMDD | 2001-01-01 |
PublicationDate_xml | – year: 2001 text: 20010000 |
PublicationDecade | 2000 |
PublicationTitle | Proceedings DCC 2001. Data Compression Conference |
PublicationTitleAbbrev | DCC |
PublicationYear | 2001 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0020061 ssj0000558171 |
Score | 1.3212003 |
Snippet | Block-sorting is an innovative compression mechanism introduced by Burrows and Wheeler (1994), and has been the subject of considerable scrutiny in the years... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 429 |
SubjectTerms | Arithmetic Australia Council Computer science Data compression Decoding Dictionaries Entropy coding Software engineering Sorting World Wide Web |
Title | Parsing strategies for BWT compression |
URI | https://ieeexplore.ieee.org/document/917174 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVoJ6ZCKeJbHhCbU-fDTrxSqCokUIdWdKv8KSGkFNF04dfXZ6dFIAakDEmW6GQ599753juEbqlLnbBWEDhzI4U0kkielUQrJZQVVOVhHNDzC5_Mi6cFW7Q-20ELY60NzWc2gdtwlm9WegOlsqGnFh5Ad1DH87Yo1dqXUyhjVVruuzuAKAeuRTkoyMJ0J8_YAU54PNMa7-yeeWv5k1IxfBiNgDSmSfzWj5krIeWMe1HLvQ5OhdBp8p5sGpXor18-jv-M5ggNvrV9eLrPWsfowNZ91NsNd8DtXj9Bd1MZ6gh43ezcJLAHuPj-dYahDz32z9YDNB8_zkYT0g5VIG8eqzTECF5mUkjjk7PLFJNFwWCqn_FYxShtPB1xFfN_8Epn1JXcaiYq0JHQwlHmTH6KuvWqtmcIq0zmXHkEaHNXUJ3CykrpERZVmbFCnaM-RL38iL4ZyxjwxZ9vL9Fh7O6C6wp1m8-NvfbpvlE3YaG3AEChow |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELWgDDAVShHfeEBsSZ0PO_FKoSrQVh1a0a3yp4QqpYimC7-es5MWgRiQMiRZoouV3Lvze-8QuiU2stwYHrg9tyAVWgSCxVmgpOTScCITPw5oOGL9afo8o7PaZ9trYYwxnnxmQnfq9_L1Uq1dq6wDpQUA6F20B2mfRpVYa9tQIZTmUbbld7hS2VdbhDkNmZ_vBDW7AxSAaGrrnc01q01_IsI7D92uKxujsHraj6krPun0mpWae-W9Ch3XZBGuSxmqz19Ojv-M5xC1v9V9eLzNW0doxxQt1NyMd8D1136M7sbCdxLwqtz4SWCAuPj-dYIdE71i0BZtNO09Trr9oB6rELwBWikDzVkWCy40pGcbSypSeJs8YRrQipZKQ0Ficwr_8FzFxGbMKMpzpyQhqSXU6uQENYplYU4RlrFImAQMaBKbEhW5tRUCMBaRsTZcnqGWi3r-XjlnzKuAz_-8e4P2-5PhYD54Gr1coIOK6-WOS9QoP9bmCpJ_Ka_9on8Bq_ek7A |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+DCC+2001.+Data+Compression+Conference&rft.atitle=Parsing+strategies+for+BWT+compression&rft.au=Isal%2C+R.Y.K.&rft.au=Moffat%2C+A.&rft.date=2001-01-01&rft.pub=IEEE&rft.isbn=9780769510316&rft.issn=1068-0314&rft.spage=429&rft.epage=438&rft_id=info:doi/10.1109%2FDCC.2001.917174&rft.externalDocID=917174 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1068-0314&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1068-0314&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1068-0314&client=summon |