Applications of N-Grams in Textual Information Systems

The use of n-grams in textual information systems is reviewed to familiarize nonexperts with the basic elements of this approach to textual processing. An n-gram is defined as length of n characters derived from text string which contains fewer than n characters. Characters are usually divided into...

Full description

Saved in:
Bibliographic Details
Published inJournal of documentation Vol. 54; no. Jan; pp. 48 - 69
Main Authors Robertson, Alexander M, Willett, Peter
Format Journal Article
LanguageEnglish
Published 01.01.1998
Online AccessGet full text

Cover

Loading…
More Information
Summary:The use of n-grams in textual information systems is reviewed to familiarize nonexperts with the basic elements of this approach to textual processing. An n-gram is defined as length of n characters derived from text string which contains fewer than n characters. Characters are usually divided into di- or trigrams according to the logic of adjacency, binarism, or some nonlocational approach. These approaches may be used for word conflation & information retrieval. However, they may also be used for text compression. Recent efforts have been made to extend the use of n-grams to index, search, & retrieve spoken documents, eg, as those in multimedia information systems. It is stressed that although the use of n-grams to date has been textually based, they may be potentially used for any sort of string of symbols, eg, ASCII characters. Adapted from the source document
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
content type line 23
ObjectType-Feature-2
ISSN:0022-0418