Finding interesting outliers - a Belief Network based approach

Outliers are deviations from the usual trends of data; to discover interestingness among outliers i.e. finding anomalies which are of real-interest for subject matter experts is an active area of research in data mining and machine learning community. Due to its subjective nature, the definition of...

Full description

Saved in:
Bibliographic Details
Published inSoutheastCon 2015 pp. 1 - 7
Main Authors Masood, Adnan, Wei Li
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Outliers are deviations from the usual trends of data; to discover interestingness among outliers i.e. finding anomalies which are of real-interest for subject matter experts is an active area of research in data mining and machine learning community. Due to its subjective nature, the definition of what amounts to 'interesting' varies between domains and subject matter experts. This paper provides an overview of the current state of quantification for measures of interestingness, using Bayesian Belief Networks as background knowledge. Building up on this foundation, we also provide a process flow for ranking outliers based on subject matter expert's apriori interestingness. Mining outliers may help discover potential anomalies and fraudulent activities. Meaningful outliers can be retrieved and analyzed by using domain knowledge. Domain knowledge (or background knowledge) is represented using probabilistic graphical models such as Bayesian belief networks. Bayesian networks are graph-based representation used to model and encode mutual relationships between entities. Due to their probabilistic graphical nature, Belief Networks are an ideal way to capture the sensitivity, causal inference, uncertainty and background knowledge in real world data sets. Bayesian Networks effectively present the causal relationships between different entities (nodes) using conditional probability. This probabilistic relationship shows the degree of belief between entities. A quantitative measure which computes changes in this degree of belief acts as a sensitivity measure. In this research paper we provide an overview of interestingness measures, their use to measure sensitivity in belief networks and review the earlier work on so-called Interestingness Filtering Engine. Building upon these foundation, we propose an iterative model to use multiple interestingness measures resulting in better performance and improved sensitivity analysis. The approach quantitatively validates probabilistic interestingness measures as an effective sensitivity analysis technique in rare class mining.
ISSN:1091-0050
1558-058X
DOI:10.1109/SECON.2015.7132918