An information-theoretic perspective of tf–idf measures

This paper presents a mathematical definition of the “probability-weighted amount of information” (PWI), a measure of specificity of terms in documents that is based on an information-theoretic view of retrieval events. The proposed PWI is expressed as a product of the occurrence probabilities of te...

Full description

Saved in:
Bibliographic Details
Published inInformation processing & management Vol. 39; no. 1; pp. 45 - 65
Main Author Aizawa, Akiko
Format Journal Article
LanguageEnglish
Published Oxford Elsevier Ltd 2003
Elsevier Science
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents a mathematical definition of the “probability-weighted amount of information” (PWI), a measure of specificity of terms in documents that is based on an information-theoretic view of retrieval events. The proposed PWI is expressed as a product of the occurrence probabilities of terms and their amounts of information, and corresponds well with the conventional term frequency–inverse document frequency measures that are commonly used in today’s information retrieval systems. The mathematical definition of the PWI is shown, together with some illustrative examples of the calculation.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:0306-4573
1873-5371
DOI:10.1016/S0306-4573(02)00021-3