On normalization and algorithm selection for unsupervised outlier detection

This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and de...

Full description

Saved in:

Bibliographic Details
Published in	Data mining and knowledge discovery Vol. 34; no. 2; pp. 309 - 354
Main Authors	Kandanaarachchi, Sevvandi, Muñoz, Mario A., Hyndman, Rob J., Smith-Miles, Kate
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2020 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Chemistry and Earth Sciences Computer Science Data analysis Data Mining and Knowledge Discovery Datasets Information Storage and Retrieval Outliers (statistics) Physics Statistics for Engineering Effect of normalization on outlier detection Algorithm selection problem for outlier detection Unsupervised outlier detection Instance space analysis Instance space analysis for outlier detection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper demonstrates that the performance of various outlier detection methods is sensitive to both the characteristics of the dataset, and the data normalization scheme employed. To understand these dependencies, we formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers. Then, we perform an instance space analysis of combinations of normalization and detection methods. Such analysis enables the visualization of the strengths and weaknesses of these combinations. Moreover, we gain insights into which method combination might obtain the best performance for a given dataset.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1384-5810 1573-756X
DOI:	10.1007/s10618-019-00661-z