An information-theoretic perspective of tf–idf measures
This paper presents a mathematical definition of the “probability-weighted amount of information” (PWI), a measure of specificity of terms in documents that is based on an information-theoretic view of retrieval events. The proposed PWI is expressed as a product of the occurrence probabilities of te...
Saved in:
Published in | Information processing & management Vol. 39; no. 1; pp. 45 - 65 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Oxford
Elsevier Ltd
2003
Elsevier Science Elsevier Science Ltd |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper presents a mathematical definition of the “probability-weighted amount of information” (PWI), a measure of specificity of terms in documents that is based on an information-theoretic view of retrieval events. The proposed PWI is expressed as a product of the occurrence probabilities of terms and their amounts of information, and corresponds well with the conventional term frequency–inverse document frequency measures that are commonly used in today’s information retrieval systems. The mathematical definition of the PWI is shown, together with some illustrative examples of the calculation. |
---|---|
Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0306-4573 1873-5371 |
DOI: | 10.1016/S0306-4573(02)00021-3 |