Outlier detection for mixed-type data: A novel approach

Outlier detection can serve as an extremely important tool for researchers from a wide range of fields. From the sectors of banking and marketing to the social sciences and healthcare sectors, outlier detection techniques are very useful for identifying subjects that exhibit different and sometimes...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Costa, Efthymios, Papatsouma, Ioanna
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 09.12.2023
Subjects	Algorithms Continuity (mathematics) Data analysis Data mining Datasets Machine learning Outliers (statistics)
Online Access	Get full text
ISSN	2331-8422

Cover

Loading…

More Information
Summary:	Outlier detection can serve as an extremely important tool for researchers from a wide range of fields. From the sectors of banking and marketing to the social sciences and healthcare sectors, outlier detection techniques are very useful for identifying subjects that exhibit different and sometimes peculiar behaviours. When the data set available to the researcher consists of both discrete and continuous variables, outlier detection presents unprecedented challenges. In this paper we propose a novel method that detects outlying observations in settings of mixed-type data, while reducing the required user interaction and providing general guidelines for selecting suitable hyperparameter values. The methodology developed is being assessed through a series of simulations on data sets with varying characteristics and achieves very good performance levels. Our method demonstrates a high capacity for detecting the majority of outliers while minimising the number of falsely detected non-outlying observations. The ideas and techniques outlined in the paper can be used either as a pre-processing step or in tandem with other data mining and machine learning algorithms for developing novel approaches to challenging research problems.
Bibliography:	content type line 50 SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1
ISSN:	2331-8422