CoCITe-Coordinating Changes in Text

Text streams are ubiquitous and contain a wealth of information, but are typically orders of magnitude too large in scale for comprehensive human inspection. There is a need for tools that can detect and group changes occurring within text streams and substreams, in order to find, structure, and sum...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on knowledge and data engineering Vol. 24; no. 1; pp. 15 - 29
Main Authors Wright, J. H., Grothendieck, J.
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2012
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Text streams are ubiquitous and contain a wealth of information, but are typically orders of magnitude too large in scale for comprehensive human inspection. There is a need for tools that can detect and group changes occurring within text streams and substreams, in order to find, structure, and summarize these changes for presentation to human analysts. This paper describes a procedure for efficiently finding step changes, trends, bursts, and cyclic changes affecting frequencies of words, or more general lexical items, within streams of documents which may be optionally labeled with metadata. The common phenomenon of over-dispersion is accommodated using mixture distributions. A streaming implementation is described which can process data from a continuous feed. Anomalies can be detected, grouped, and rendered visually for human comprehension.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2010.250