Sub-optimal data compression and the subset sum problem
We propose an efficient, sub-optimal prefix code construction method for discrete sources with a finite alphabet and known probability mass function (pmf). It is well known that for a source that puts out symbols x i with probability p i , the optimal codeword lengths are l i = log ( 1 / p i ) . How...
Saved in:
Published in | International journal of electronics and communications Vol. 65; no. 1; pp. 53 - 61 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Elsevier GmbH
2011
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We propose an efficient, sub-optimal prefix code construction method for discrete sources with a finite alphabet and known probability mass function (pmf). It is well known that for a source that puts out symbols
x
i
with probability
p
i
, the optimal codeword lengths are
l
i
=
log
(
1
/
p
i
)
. However, codeword lengths are integers and
log
(
1
/
p
i
)
is, in general, not an integer. We propose a method to find binary codewords for
x
i
whose lengths are initially assumed to be
⌈
log
(
1
/
p
i
)
⌉
−
1
. Every prefix code must satisfy the Kraft's inequality but our initial codeword lengths may not satisfy the Kraft's inequality. Using a simplified version of the subset sum problem we find a minimal set of codeword lengths that must be increased from
⌈
log
(
1
/
p
i
)
⌉
−
1
to
⌈
log
(
1
/
p
i
)
⌉
, so that Kraft's inequality is satisfied. Even though this solution is not optimal it leads to average codeword lengths that are close to optimal and in some cases codeword lengths that are optimal. Unlike the Huffman code, our solution does not require the ordering of probabilities in the pmf. The efficiency of our method can be further improved by reducing the size of the subset sum problem. The example of English text shows that our method leads to a solution that is very close to the optimal solution. The proposed method can also be used for encryption, thereby accomplishing both compression and encryption simultaneously. |
---|---|
ISSN: | 1434-8411 1618-0399 |
DOI: | 10.1016/j.aeue.2010.01.011 |