Finding interesting outliers - a Belief Network based approach

Outliers are deviations from the usual trends of data; to discover interestingness among outliers i.e. finding anomalies which are of real-interest for subject matter experts is an active area of research in data mining and machine learning community. Due to its subjective nature, the definition of...

Full description

Saved in:

Bibliographic Details
Published in	SoutheastCon 2015 pp. 1 - 7
Main Authors	Masood, Adnan, Wei Li
Format	Conference Proceeding
Language	English
Published	IEEE 01.04.2015
Subjects	Bayes methods Bayesian Belief Network Data mining Graphical models Interestingness Measures Joints Outlier Analysis Probabilistic Graphical Models Probabilistic logic Sensitivity Uncertainty
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Outliers are deviations from the usual trends of data; to discover interestingness among outliers i.e. finding anomalies which are of real-interest for subject matter experts is an active area of research in data mining and machine learning community. Due to its subjective nature, the definition of what amounts to 'interesting' varies between domains and subject matter experts. This paper provides an overview of the current state of quantification for measures of interestingness, using Bayesian Belief Networks as background knowledge. Building up on this foundation, we also provide a process flow for ranking outliers based on subject matter expert's apriori interestingness. Mining outliers may help discover potential anomalies and fraudulent activities. Meaningful outliers can be retrieved and analyzed by using domain knowledge. Domain knowledge (or background knowledge) is represented using probabilistic graphical models such as Bayesian belief networks. Bayesian networks are graph-based representation used to model and encode mutual relationships between entities. Due to their probabilistic graphical nature, Belief Networks are an ideal way to capture the sensitivity, causal inference, uncertainty and background knowledge in real world data sets. Bayesian Networks effectively present the causal relationships between different entities (nodes) using conditional probability. This probabilistic relationship shows the degree of belief between entities. A quantitative measure which computes changes in this degree of belief acts as a sensitivity measure. In this research paper we provide an overview of interestingness measures, their use to measure sensitivity in belief networks and review the earlier work on so-called Interestingness Filtering Engine. Building upon these foundation, we propose an iterative model to use multiple interestingness measures resulting in better performance and improved sensitivity analysis. The approach quantitatively validates probabilistic interestingness measures as an effective sensitivity analysis technique in rare class mining.
ISSN:	1091-0050 1558-058X
DOI:	10.1109/SECON.2015.7132918