Applications of N-Grams in Textual Information Systems

The use of n-grams in textual information systems is reviewed to familiarize nonexperts with the basic elements of this approach to textual processing. An n-gram is defined as length of n characters derived from text string which contains fewer than n characters. Characters are usually divided into...

Full description

Saved in:

Bibliographic Details
Published in	Journal of documentation Vol. 54; no. Jan; pp. 48 - 69
Main Authors	Robertson, Alexander M, Willett, Peter
Format	Journal Article
Language	English
Published	01.01.1998
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The use of n-grams in textual information systems is reviewed to familiarize nonexperts with the basic elements of this approach to textual processing. An n-gram is defined as length of n characters derived from text string which contains fewer than n characters. Characters are usually divided into di- or trigrams according to the logic of adjacency, binarism, or some nonlocational approach. These approaches may be used for word conflation & information retrieval. However, they may also be used for text compression. Recent efforts have been made to extend the use of n-grams to index, search, & retrieve spoken documents, eg, as those in multimedia information systems. It is stressed that although the use of n-grams to date has been textually based, they may be potentially used for any sort of string of symbols, eg, ASCII characters. Adapted from the source document
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 content type line 23 ObjectType-Feature-2
ISSN:	0022-0418