Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study

The nearest neighbor classifier is one of the most used and well-known techniques for performing recognition tasks. It has also demonstrated itself to be one of the most useful algorithms in data mining in spite of its simplicity. However, the nearest neighbor classifier suffers from several drawbac...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 34; no. 3; pp. 417 - 435
Main Authors	Garcia, Salvador, Derrac, Joaquin, Cano, Jose Ramon, Herrera, Francisco
Format	Journal Article
Language	English
Published	Los Alamitos, CA IEEE 01.03.2012 IEEE Computer Society
Subjects	Accuracy Applied sciences classification Classification algorithms Computer science; control theory; systems condensation Data processing. List processing. Character string processing edition Exact sciences and technology Memory organisation. Data processing nearest neighbor Noise Noise measurement Prototype selection Prototypes Software Taxonomy Training Performance evaluation Capability index Nearest neighbour Data analysis Taxonomy Empirical method edition Data mining classification Condensation Supervised classification Recommendation Prototype selection Statistical test nearest neighbor Efficiency Categorization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The nearest neighbor classifier is one of the most used and well-known techniques for performing recognition tasks. It has also demonstrated itself to be one of the most useful algorithms in data mining in spite of its simplicity. However, the nearest neighbor classifier suffers from several drawbacks such as high storage requirements, low efficiency in classification response, and low noise tolerance. These weaknesses have been the subject of study for many researchers and many solutions have been proposed. Among them, one of the most promising solutions consists of reducing the data used for establishing a classification rule (training data) by means of selecting relevant prototypes. Many prototype selection methods exist in the literature and the research in this area is still advancing. Different properties could be observed in the definition of them, but no formal categorization has been established yet. This paper provides a survey of the prototype selection methods proposed in the literature from a theoretical and empirical point of view. Considering a theoretical point of view, we propose a taxonomy based on the main characteristics presented in prototype selection and we analyze their advantages and drawbacks. Empirically, we conduct an experimental study involving different sizes of data sets for measuring their performance in terms of accuracy, reduction capabilities, and runtime. The results obtained by all the methods studied have been verified by nonparametric statistical tests. Several remarks, guidelines, and recommendations are made for the use of prototype selection for nearest neighbor classification.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292 1939-3539
DOI:	10.1109/TPAMI.2011.142