Stratification-Based Outlier Detection over the Deep Web

For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the...

Full description

Saved in:

Bibliographic Details
Published in	Computational Intelligence and Neuroscience Vol. 2016; no. 2016; pp. 514 - 526
Main Authors	Cui, Zhiming, Gu, Caidong, Fang, Ligang, Sheng, Victor S., Zhao, Pengpeng, Xian, Xuefeng, Yang, Yuanfeng
Format	Journal Article
Language	English
Published	Cairo, Egypt Hindawi Limiteds 01.01.2016 Hindawi Publishing Corporation Hindawi Limited
Subjects	Algorithms Commerce Data analysis Data mining Database Management Systems Discriminant analysis Distributed processing Humans Information Storage and Retrieval Intelligence Internet Outliers (statistics) Pilots Queries Recall Sampling Variables Websites
Online Access	Get full text

Cover

Loading…

More Information
Summary:	For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Academic Editor: Leonardo Franco
ISSN:	1687-5265 1687-5273
DOI:	10.1155/2016/7386517