Generic method for detecting focus time of documents

•Statistical approach for estimating the focus time of text documents.•Classification framework for categorizing documents into temporal and atemporal.•Bi-Temporal Document Representation using document focus time and creation time. Time is an important aspect of text documents. While some documents...

Full description

Saved in:
Bibliographic Details
Published inInformation processing & management Vol. 51; no. 6; pp. 851 - 868
Main Authors Jatowt, Adam, Au Yeung, Ching Man, Tanaka, Katsumi
Format Journal Article
LanguageEnglish
Published Oxford Elsevier Ltd 01.11.2015
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Statistical approach for estimating the focus time of text documents.•Classification framework for categorizing documents into temporal and atemporal.•Bi-Temporal Document Representation using document focus time and creation time. Time is an important aspect of text documents. While some documents are atemporal, many have strong temporal characteristics and contain contents related to time. Such documents can be mapped to their corresponding time periods. In this paper, we propose estimating the focus time of documents which is defined as the time period to which document’s content refers and which is considered complementary dimension to the document’s creation time. We propose several estimators of focus time by utilizing statistical knowledge from external resources such as news article collections. The advantage of our approach is that document focus time can be estimated even for documents that do not contain any temporal expressions or contain only few of them. We evaluate the effectiveness of our methods on the diverse datasets of documents about historical events related to 5 countries. Our approach achieves average error of less than 21years on collections of Wikipedia pages, extracts from history-related books and web pages, while using the total time frame of 113years. We also demonstrate an example classification method to distinguish temporal from atemporal documents.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2015.05.001