Sub-optimal data compression and the subset sum problem

We propose an efficient, sub-optimal prefix code construction method for discrete sources with a finite alphabet and known probability mass function (pmf). It is well known that for a source that puts out symbols x i with probability p i , the optimal codeword lengths are l i = log ( 1 / p i ) . How...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of electronics and communications Vol. 65; no. 1; pp. 53 - 61
Main Authors Katti, Raj, Srinivasan, Sudarshan
Format Journal Article
LanguageEnglish
Published Elsevier GmbH 2011
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We propose an efficient, sub-optimal prefix code construction method for discrete sources with a finite alphabet and known probability mass function (pmf). It is well known that for a source that puts out symbols x i with probability p i , the optimal codeword lengths are l i = log ( 1 / p i ) . However, codeword lengths are integers and log ( 1 / p i ) is, in general, not an integer. We propose a method to find binary codewords for x i whose lengths are initially assumed to be ⌈ log ( 1 / p i ) ⌉ − 1 . Every prefix code must satisfy the Kraft's inequality but our initial codeword lengths may not satisfy the Kraft's inequality. Using a simplified version of the subset sum problem we find a minimal set of codeword lengths that must be increased from ⌈ log ( 1 / p i ) ⌉ − 1 to ⌈ log ( 1 / p i ) ⌉ , so that Kraft's inequality is satisfied. Even though this solution is not optimal it leads to average codeword lengths that are close to optimal and in some cases codeword lengths that are optimal. Unlike the Huffman code, our solution does not require the ordering of probabilities in the pmf. The efficiency of our method can be further improved by reducing the size of the subset sum problem. The example of English text shows that our method leads to a solution that is very close to the optimal solution. The proposed method can also be used for encryption, thereby accomplishing both compression and encryption simultaneously.
ISSN:1434-8411
1618-0399
DOI:10.1016/j.aeue.2010.01.011