Analysis of the Effect of Unexpected Outliers in the Classification of Spectroscopy Data

Multi-class classification algorithms are very widely used, but we argue that they are not always ideal from a theoretical perspective, because they assume all classes are characterized by the data, whereas in many applications, training data for some classes may be entirely absent, rare, or statist...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Glavin, Frank G, Madden, Michael G
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 14.06.2018
Subjects	Algorithms Classification Classifiers Data analysis Organic chemistry Outliers (statistics) Solvents
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Multi-class classification algorithms are very widely used, but we argue that they are not always ideal from a theoretical perspective, because they assume all classes are characterized by the data, whereas in many applications, training data for some classes may be entirely absent, rare, or statistically unrepresentative. We evaluate one-sided classifiers as an alternative, since they assume that only one class (the target) is well characterized. We consider a task of identifying whether a substance contains a chlorinated solvent, based on its chemical spectrum. For this application, it is not really feasible to collect a statistically representative set of outliers, since that group may contain \emph{anything} apart from the target chlorinated solvents. Using a new one-sided classification toolkit, we compare a One-Sided k-NN algorithm with two well-known binary classification algorithms, and conclude that the one-sided classifier is more robust to unexpected outliers.
ISSN:	2331-8422