Classification with K-Nearest Neighbors Algorithm: Comparative Analysis between the Manual and Automatic Methods for K-Selection

Machine learning and the algorithms it uses have been the subject of many and varied studies with the development of artificial intelligence in recent years. One of the popular and widely used classification algorithms is the nearest neighbors’ algorithm and in particular k nearest neighbors. This a...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of advanced computer science & applications Vol. 14; no. 4
Main Authors Mladenova, Tsvetelina, Valova, Irena
Format Journal Article
LanguageEnglish
Published West Yorkshire Science and Information (SAI) Organization Limited 2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Machine learning and the algorithms it uses have been the subject of many and varied studies with the development of artificial intelligence in recent years. One of the popular and widely used classification algorithms is the nearest neighbors’ algorithm and in particular k nearest neighbors. This algorithm has three important steps: calculation of distances; selection of the number of neighbors; and the classification itself. The choice of the value for the k parameter determines the number of neighbors and is important and has a significant impact on the degree of efficiency of the created model. This article describes a study of the influence of the way the k parameter is chosen - manually or automatically. Data sets, used for the study, are selected to be as close as possible in their features to the data generated and used by small businesses - heterogeneous, unbalanced, with relatively small volumes and small training sets. From the obtained results, it can be concluded that the automatic determination of the value of k can give results close to the optimal ones. Deviations are observed in the accuracy rate and the behavior of well-known KNN modifications with increasing neighborhood size for some of the training data sets tested, but one cannot expect that the same model's parameter values (e.g. for k) will be optimally applicable on all data sets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2158-107X
2156-5570
DOI:10.14569/IJACSA.2023.0140444