Parsing strategies for BWT compression

Block-sorting is an innovative compression mechanism introduced by Burrows and Wheeler (1994), and has been the subject of considerable scrutiny in the years since it first became public. Block-sorting compression is usually described as involving three steps: permuting the input one block at a time...

Full description

Saved in:
Bibliographic Details
Published inProceedings DCC 2001. Data Compression Conference pp. 429 - 438
Main Authors Isal, R.Y.K., Moffat, A.
Format Conference Proceeding
LanguageEnglish
Published IEEE 2001
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Block-sorting is an innovative compression mechanism introduced by Burrows and Wheeler (1994), and has been the subject of considerable scrutiny in the years since it first became public. Block-sorting compression is usually described as involving three steps: permuting the input one block at a time through the use of the Burrows-Wheeler transform (BWT); applying a move-to-front (MTF) transform to each of the permuted blocks; and then entropy coding the output with a Huffman or arithmetic coder. In this paper we prepend a fourth transformation to this sequence: parsing. In the BWT implementations that have been considered to date the unit of transmission has been taken to be the ASCII character. But there is no particular reason why this should be so, and a range of other strategies can be used to construct the sequence of symbols that is fed into the BWT process. We consider some of the issues associated with making this change, and show that in some situations the introduction of a simple parsing stage allows improved compression to be obtained compared to an otherwise equivalent character-based BWT implementation. We also describe an MTF-like ranking transformation that caters better to large-alphabet situations than does the strict MTF rule used in conventional BWT implementations.
AbstractList Block-sorting is an innovative compression mechanism introduced by Burrows and Wheeler (1994), and has been the subject of considerable scrutiny in the years since it first became public. Block-sorting compression is usually described as involving three steps: permuting the input one block at a time through the use of the Burrows-Wheeler transform (BWT); applying a move-to-front (MTF) transform to each of the permuted blocks; and then entropy coding the output with a Huffman or arithmetic coder. In this paper we prepend a fourth transformation to this sequence: parsing. In the BWT implementations that have been considered to date the unit of transmission has been taken to be the ASCII character. But there is no particular reason why this should be so, and a range of other strategies can be used to construct the sequence of symbols that is fed into the BWT process. We consider some of the issues associated with making this change, and show that in some situations the introduction of a simple parsing stage allows improved compression to be obtained compared to an otherwise equivalent character-based BWT implementation. We also describe an MTF-like ranking transformation that caters better to large-alphabet situations than does the strict MTF rule used in conventional BWT implementations.
Author Isal, R.Y.K.
Moffat, A.
Author_xml – sequence: 1
  givenname: R.Y.K.
  surname: Isal
  fullname: Isal, R.Y.K.
  organization: Dept. of Comput. Sci. & Software Eng., Melbourne Univ., Vic., Australia
– sequence: 2
  givenname: A.
  surname: Moffat
  fullname: Moffat, A.
BookMark eNotjztPwzAUhS1RJJrCjMSUiS3pvY6fI4RXpUowFDFWTnxdGdGksrPw74lUpnOGT0ffKdhiGAdi7BahRgS7fmrbmgNgbVGjFhesAK2sRGgQFmyJoEw1d3HFipy_AWZW4ZLdf7iU43Ao85TcRIdIuQxjKh-_dmU_Hk-Jco7jcM0ug_vJdPOfK_b58rxr36rt--umfdhWETWfKm-V5s46D1wE3kknhJS2UR5U47veC26CkdSh6TkEraiX1sxmCkQAGXyzYnfn3UhE-1OKR5d-9-dLzR-bZz-q
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/DCC.2001.917174
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 438
ExternalDocumentID 917174
GroupedDBID -~X
29F
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i172t-d9672a9ad024f2b5a4455936d063dbcd428f85eb18c20f76ec598068604f05fd3
IEDL.DBID RIE
ISBN 0769510310
9780769510316
ISSN 1068-0314
IngestDate Tue Aug 26 18:52:41 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i172t-d9672a9ad024f2b5a4455936d063dbcd428f85eb18c20f76ec598068604f05fd3
PageCount 10
ParticipantIDs ieee_primary_917174
PublicationCentury 2000
PublicationDate 20010000
PublicationDateYYYYMMDD 2001-01-01
PublicationDate_xml – year: 2001
  text: 20010000
PublicationDecade 2000
PublicationTitle Proceedings DCC 2001. Data Compression Conference
PublicationTitleAbbrev DCC
PublicationYear 2001
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020061
ssj0000558171
Score 1.3212003
Snippet Block-sorting is an innovative compression mechanism introduced by Burrows and Wheeler (1994), and has been the subject of considerable scrutiny in the years...
SourceID ieee
SourceType Publisher
StartPage 429
SubjectTerms Arithmetic
Australia Council
Computer science
Data compression
Decoding
Dictionaries
Entropy coding
Software engineering
Sorting
World Wide Web
Title Parsing strategies for BWT compression
URI https://ieeexplore.ieee.org/document/917174
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVoJ6ZCKeJbHhCbU-fDTrxSqCokUIdWdKv8KSGkFNF04dfXZ6dFIAakDEmW6GQ599753juEbqlLnbBWEDhzI4U0kkielUQrJZQVVOVhHNDzC5_Mi6cFW7Q-20ELY60NzWc2gdtwlm9WegOlsqGnFh5Ad1DH87Yo1dqXUyhjVVruuzuAKAeuRTkoyMJ0J8_YAU54PNMa7-yeeWv5k1IxfBiNgDSmSfzWj5krIeWMe1HLvQ5OhdBp8p5sGpXor18-jv-M5ggNvrV9eLrPWsfowNZ91NsNd8DtXj9Bd1MZ6gh43ezcJLAHuPj-dYahDz32z9YDNB8_zkYT0g5VIG8eqzTECF5mUkjjk7PLFJNFwWCqn_FYxShtPB1xFfN_8Epn1JXcaiYq0JHQwlHmTH6KuvWqtmcIq0zmXHkEaHNXUJ3CykrpERZVmbFCnaM-RL38iL4ZyxjwxZ9vL9Fh7O6C6wp1m8-NvfbpvlE3YaG3AEChow
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELWgDDAVShHfeEBsSZ0PO_FKoSrQVh1a0a3yp4QqpYimC7-es5MWgRiQMiRZoouV3Lvze-8QuiU2stwYHrg9tyAVWgSCxVmgpOTScCITPw5oOGL9afo8o7PaZ9trYYwxnnxmQnfq9_L1Uq1dq6wDpQUA6F20B2mfRpVYa9tQIZTmUbbld7hS2VdbhDkNmZ_vBDW7AxSAaGrrnc01q01_IsI7D92uKxujsHraj6krPun0mpWae-W9Ch3XZBGuSxmqz19Ojv-M5xC1v9V9eLzNW0doxxQt1NyMd8D1136M7sbCdxLwqtz4SWCAuPj-dYIdE71i0BZtNO09Trr9oB6rELwBWikDzVkWCy40pGcbSypSeJs8YRrQipZKQ0Ficwr_8FzFxGbMKMpzpyQhqSXU6uQENYplYU4RlrFImAQMaBKbEhW5tRUCMBaRsTZcnqGWi3r-XjlnzKuAz_-8e4P2-5PhYD54Gr1coIOK6-WOS9QoP9bmCpJ_Ka_9on8Bq_ek7A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+DCC+2001.+Data+Compression+Conference&rft.atitle=Parsing+strategies+for+BWT+compression&rft.au=Isal%2C+R.Y.K.&rft.au=Moffat%2C+A.&rft.date=2001-01-01&rft.pub=IEEE&rft.isbn=9780769510316&rft.issn=1068-0314&rft.spage=429&rft.epage=438&rft_id=info:doi/10.1109%2FDCC.2001.917174&rft.externalDocID=917174
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1068-0314&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1068-0314&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1068-0314&client=summon