TOPIC DETECTION OF UNRESTRICTED TEXTS: APPROACHES AND EVALUATIONS
Topic detection and tracking refers to automatic techniques for locating topically related cohesive paragraphs in a stream of text. Most documents are about more than one subject, but many Natural Language Processing (NLP) and Information Retrieval (IR) techniques implicitly assume documents have ju...
Saved in:
Published in | Applied artificial intelligence Vol. 19; no. 2; pp. 119 - 135 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Taylor & Francis Group
26.01.2005
|
Online Access | Get full text |
Cover
Loading…
Summary: | Topic detection and tracking refers to automatic techniques for locating topically related cohesive paragraphs in a stream of text. Most documents are about more than one subject, but many Natural Language Processing (NLP) and Information Retrieval (IR) techniques implicitly assume documents have just one topic. Even in the presence of a single topic within a document, the document may address multiple subtopics and various aspects of the primary topic. Hence, dividing documents into topically coherent units and discovering their topic might have many uses. We describe new clues that account for the topic of grouping of contiguous portions of the text. Those clues are based on general lexical resources, which make them applicable to unrestricted texts, and can have many uses such as helping users find answers to general questions in an information search task, or in question/answering systems, or in text summarization. We devise an algorithm for identifying these clues, and we report on the performance of these clues, as well as the improvements suggested by our experiments. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
ISSN: | 0883-9514 1087-6545 |
DOI: | 10.1080/08839510590887441 |