An Approach to Improving Quality of Crawlers Using Naïve Bayes for Classifier and Hyperlink Filter
Nowadays, most of search engines rely on keywords provided by users. However, keywords may not be sufficiently representative for the main topic of a web page. When searching for a topic, users input their desirable topic in terms of keywords. Keyword-based search engines will return pages that cont...
Saved in:
Published in | Computational Collective Intelligence. Technologies and Applications pp. 525 - 535 |
---|---|
Main Authors | , |
Format | Book Chapter |
Language | English |
Published |
Berlin, Heidelberg
Springer Berlin Heidelberg
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Nowadays, most of search engines rely on keywords provided by users. However, keywords may not be sufficiently representative for the main topic of a web page. When searching for a topic, users input their desirable topic in terms of keywords. Keyword-based search engines will return pages that contain the keywords even though these pages are not about the topic. This limits the efficiency of these engines as they may return undesirable result.
In this paper, we present an approach to improve the quality of search engines by focusing on web pages related to specific topics. Our system includes three main components: a crawler for gathering web pages, a classifier for classifying web pages by topics, and a hyperlink filter (or distiller) for filtering hyperlinks. We propose Naïve Bayes algorithms for classifier and distiller to enhance the accuracy of the system. We also implement and examine the efficiency of our system by gathering web pages in two topics: Artificial Intelligence and Motorcycle. The result shows that our crawler achieves performance improvements in efficiency over the ones that search by keywords. |
---|---|
ISBN: | 9783642346293 3642346294 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-642-34630-9_54 |