Detection of Sequential Outliers Using a Variable Length Markov Model

The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain gree...

Full description

Saved in:

Bibliographic Details
Published in	2008 Seventh International Conference on Machine Learning and Applications pp. 571 - 576
Main Authors	Low-Kam, C., Laurent, A., Teisseire, M.
Format	Conference Proceeding
Language	English
Published	IEEE 01.12.2008
Subjects	Concentration Inequality Data analysis DNA Genetic mutations Information Criterion Machine learning Outliers Proteins Robots Sequences Sequential Databases Size measurement Testing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain greedy in terms of memory usage. In this paper we propose an extension of one such approach, based on a probabilistic suffix tree and on a measure of similarity. We add a pruning criterion which reduces the size of the tree while improving the model, and a sharp inequality for the concentration of the measure of similarity, to better sort the outliers. We prove the feasibility of our approach through a set of experiments over a protein database.
ISBN:	0769534953 9780769534954
DOI:	10.1109/ICMLA.2008.137