Reducing Redundant Information in Search Results Employing Approximation Algorithms

It is widely accepted that there are many Web documents that contain identical or near-identical information. Modern search engines have developed duplicate detection algorithms to eliminate this problem in the search results, but difficulties still remain, mainly because the structure and the conte...

Full description

Saved in:

Bibliographic Details
Published in	Database and Expert Systems Applications pp. 240 - 247
Main Authors	Makris, Christos, Plegas, Yannis, Stamatiou, Yannis C., Stavropoulos, Elias C., Tsakalidis, Athanasios K.
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	approximation algorithms ranking redundant information semantics shingling SuperTexts Web search
Online Access	Get full text

Cover

Loading…

More Information
Summary:	It is widely accepted that there are many Web documents that contain identical or near-identical information. Modern search engines have developed duplicate detection algorithms to eliminate this problem in the search results, but difficulties still remain, mainly because the structure and the content of the results could not be changed. In this work we propose an effective methodology for removing redundant information from search results. Using previous methodologies, we extract from the search results a set of composite documents called SuperTexts and then, by applying novel approximation algorithms, we select the SuperTexts that better reduce the redundant information. The final results are next ranked according to their relevance to the initial query. We give some complexity results and experimentally evaluate the proposed algorithms.
Bibliography:	This research has been co-financed by the European Union (European Social Fund – ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: Thales; Investing in knowledge society through the European Social Fund.
ISBN:	9783319100845 331910084X
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-10085-2_22