Natural-Language Text Compression Using Reverse Multi-Delimiter Codes

This paper studies binary reverse multi-delimiter (RMD) data compression codes. RMD codes have a range of useful properties, such as unique decodability, completeness, universality, synchronizability, recognition using a finite automaton, and the ability for rapid data retrieval within an encoded fi...

Full description

Saved in:

Bibliographic Details
Published in	Cybernetics and systems analysis Vol. 60; no. 1; pp. 1 - 12
Main Authors	Anisimov, A. V., Zavadskyi, I. O., Chudakov, T. S.
Format	Journal Article
Language	English
Published	New York Springer US 01.01.2024 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Codes Compression ratio Control Cybernetics Data compression Data retrieval Decoding Mapping Mathematics Mathematics and Statistics Natural language processing Processor Architectures Software Engineering/Programming and Operating Systems Systems Theory archiver code compression multi-delimiter
Online Access	Get full text
ISSN	1060-0396 1573-8337
DOI	10.1007/s10559-024-00641-2

Cover

More Information
Summary:	This paper studies binary reverse multi-delimiter (RMD) data compression codes. RMD codes have a range of useful properties, such as unique decodability, completeness, universality, synchronizability, recognition using a finite automaton, and the ability for rapid data retrieval within an encoded file. The authors have constructed a simple monotonic mapping from the set of non-negative integers to the codeword set. Based on this mapping, they have developed a fast byte-aligned decoding algorithm. Computer experiments demonstrate that we can decode RMD codes nearly as quickly as the SCDC code and several times faster than the Fibonacci code. Compared to known codes of a similar type, RMD codes exhibit a better compression ratio for natural language texts (more than four times closer to the entropy bound than SCDC). Additionally, the paper describes a technology for preprocessing natural language texts, which, combined with encoding using RMD codes, enhances the efficiency of powerful modern archivers.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1060-0396 1573-8337
DOI:	10.1007/s10559-024-00641-2