Comparative Analysis of Relevance for SVM-Based Interactive Document Retrieval
Support Vector Machines (SVMs) were applied to interactive document retrieval that uses active learning. In such a retrieval system, the degree of relevance is evaluated by using a signed distance from the optimal hyperplane. It is not clear, however, how the signed distance in SVMs has characterist...
Saved in:
Published in | Journal of advanced computational intelligence and intelligent informatics Vol. 17; no. 2; pp. 149 - 156 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
20.03.2013
|
Online Access | Get full text |
Cover
Loading…
Summary: | Support Vector Machines (SVMs) were applied to interactive document retrieval that uses active learning. In such a retrieval system, the degree of relevance is evaluated by using a signed distance from the optimal hyperplane. It is not clear, however, how the signed distance in SVMs has characteristics of vector space model. We therefore formulated the degree of relevance by using the signed distance in SVMs and comparatively analyzed it with a conventional Rocchio-based method. Although vector normalization has been utilized as preprocessing for document retrieval, few studies explained why vector normalization was effective. Based on our comparative analysis, we theoretically show the effectiveness of normalizing document vectors in SVM-based interactive document retrieval. We then propose a cosine kernel that is suitable for SVM-based interactive document retrieval. The effectiveness of the method was compared experimentally with conventional relevance feedback for Boolean, Term Frequency and Term Frequency-Inverse Document Frequency representations of document vectors. Experimental results for a Text REtrieval Conference data set showed that the cosine kernel is effective for all document representations, especially Term Frequency representation. |
---|---|
ISSN: | 1343-0130 1883-8014 |
DOI: | 10.20965/jaciii.2013.p0149 |