Applications of N-Grams in Textual Information Systems
The use of n-grams in textual information systems is reviewed to familiarize nonexperts with the basic elements of this approach to textual processing. An n-gram is defined as length of n characters derived from text string which contains fewer than n characters. Characters are usually divided into...
Saved in:
Published in | Journal of documentation Vol. 54; no. Jan; pp. 48 - 69 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
01.01.1998
|
Online Access | Get full text |
Cover
Loading…
Summary: | The use of n-grams in textual information systems is reviewed to familiarize nonexperts with the basic elements of this approach to textual processing. An n-gram is defined as length of n characters derived from text string which contains fewer than n characters. Characters are usually divided into di- or trigrams according to the logic of adjacency, binarism, or some nonlocational approach. These approaches may be used for word conflation & information retrieval. However, they may also be used for text compression. Recent efforts have been made to extend the use of n-grams to index, search, & retrieve spoken documents, eg, as those in multimedia information systems. It is stressed that although the use of n-grams to date has been textually based, they may be potentially used for any sort of string of symbols, eg, ASCII characters. Adapted from the source document |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 content type line 23 ObjectType-Feature-2 |
ISSN: | 0022-0418 |