Tag based models for Arabic text compression

Text compression is needed to reduce the space required to store information contained in the text and the amount of time needed to transmit that information. Compression-based models such as the Prediction-by-Partial Matching (PPM) compression scheme have been found very effective for many further...

Full description

Saved in:
Bibliographic Details
Published in2017 Intelligent Systems Conference (IntelliSys) pp. 697 - 705
Main Authors Alkhazi, Ibrahim S., Alghamdi, Mansoor A., Teahan, William J.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Text compression is needed to reduce the space required to store information contained in the text and the amount of time needed to transmit that information. Compression-based models such as the Prediction-by-Partial Matching (PPM) compression scheme have been found very effective for many further natural language processing tasks such as authorship ascription, text categorization, and word segmentation for various languages, including English, Chinese and Arabic. Therefore, this paper explores an approach of compressing Arabic text using parts-of-speech (tags) along with the text based on the PPM compression scheme. This new approach produces significantly better compression results when compared to state-of-the-art compression algorithms for Arabic text.
DOI:10.1109/IntelliSys.2017.8324370