Automatic new topic identification using multiple linear regression

The purpose of this study is to provide automatic new topic identification of search engine query logs, and estimate the effect of statistical characteristics of search engine queries on new topic identification. By applying multiple linear regression and multi-factor ANOVA on a sample data log from...

Full description

Saved in:
Bibliographic Details
Published inInformation processing & management Vol. 42; no. 4; pp. 934 - 950
Main Author Ozmutlu, Seda
Format Journal Article
LanguageEnglish
Published Oxford Elsevier Ltd 01.07.2006
Elsevier Science
Elsevier Science Ltd
Subjects
Online AccessGet full text
ISSN0306-4573
1873-5371
DOI10.1016/j.ipm.2005.10.002

Cover

Loading…
More Information
Summary:The purpose of this study is to provide automatic new topic identification of search engine query logs, and estimate the effect of statistical characteristics of search engine queries on new topic identification. By applying multiple linear regression and multi-factor ANOVA on a sample data log from the Excite search engine, we demonstrated that the statistical characteristics of Web search queries, such as time interval, search pattern and position of a query in a user session, are effective on shifting to a new topic. Multiple linear regression is also a successful tool for estimating topic shifts and continuations. The findings of this study provide statistical proof for the relationship between the non-semantic characteristics of Web search queries and the occurrence of topic shifts and continuations.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2005.10.002