ON A PROBABILISTIC APPROACH TO DETERMINING THE SIMILARITY BETWEEN BOOLEAN SEARCH REQUEST FORMULATIONS

A new and promising approach to document clustering consists of utilizing previously formed clusters of queries to cluster documents. To employ this approach in practice a similarity measure for queries must be available. This requirement does not cause any problem in the case of information retriev...

Full description

Saved in:
Bibliographic Details
Published inJournal of documentation Vol. 38; no. 1; pp. 14 - 28
Main Author RADECKI, TADEUSZ
Format Journal Article
LanguageEnglish
Published London MCB UP Ltd 01.01.1982
Aslib, etc
Subjects
Online AccessGet full text
ISSN0022-0418
1758-7379
DOI10.1108/eb026719

Cover

More Information
Summary:A new and promising approach to document clustering consists of utilizing previously formed clusters of queries to cluster documents. To employ this approach in practice a similarity measure for queries must be available. This requirement does not cause any problem in the case of information retrieval systems in which both the search request formulations and document representations are sets of weighted or unweighted index terms. However, in most operational retrieval systems search request formulations are Boolean combinations of index terms. Research into similarity measures for search request formulations of this type has already been undertaken by the author and reported elsewhere. The present paper provides further results of investigations in this area. The novelty of the approach discussed is the incorporation within the methodology described earlier of a weighting mechanism to indicate the relative importance of particular attributes of a given Boolean search request formulation. A modification suggested is based on the standard probabilistic approach to information retrieval.
Bibliography:original-pdf:2780380102.pdf
href:eb026719.pdf
istex:F7DBFB37054AE724EB718DA77C5B58A0E41DBD85
ark:/67375/4W2-S0QXD393-J
filenameID:2780380102
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:0022-0418
1758-7379
DOI:10.1108/eb026719