A STUDY ON THE USE OF NORMALIZED L2-METRIC IN CLASSIFICATION TASKS
Context. In machine learning, similarity measures, and distance metrics are pivotal in tasks like classification, clustering, and dimensionality reduction. The effectiveness of traditional metrics, such as Euclidean distance, can be limited when applied to complex datasets. The object of the study i...
Saved in:
Published in | Radìoelektronika, informatika, upravlìnnâ no. 2; pp. 110 - 115 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
29.06.2025
|
Online Access | Get full text |
Cover
Loading…
Summary: | Context. In machine learning, similarity measures, and distance metrics are pivotal in tasks like classification, clustering, and dimensionality reduction. The effectiveness of traditional metrics, such as Euclidean distance, can be limited when applied to complex datasets. The object of the study is the processes of data classification and dimensionality reduction in machine learning tasks, in particular, the use of metric methods to assess the similarity between objects.Objective. The study aims to evaluate the feasibility and performance of a normalized L2-metric (Normalized Euclidean Distance, NED) for improving the accuracy of machine learning algorithms, specifically in classification and dimensionality reduction.Method. We prove mathematically that the normalized L2-metric satisfies the properties of boundedness, scale invariance, and monotonicity. It is shown that NED can be interpreted as a measure of dissimilarity of feature vectors. Its integration into k-nearest neighbors and t-SNE algorithms is investigated using a high-dimensional Alzheimer’s disease dataset. The study implemented four models combining different approaches to classification and dimensionality reduction. Model M1 utilized the k-nearest neighbors method with Euclidean distance without dimensionality reduction, serving as a baseline; Model M2 employed the normalized L2-metric in kNN; Model M3 integrated t-SNE for dimensionality reduction followed by kNN based on Euclidean distance; and Model M4 combined t-SNE and the normalized L2-metric for both reduction and classification stages. A hyperparameter optimization prоcedure was implemented for all models, including the number of neighbors, voting type, and the perplexity parameter for t-SNE. Cross-validation was conducted on five folds to evaluate classification quality objectively. Additionally, the impact of data normalization on model accuracy was examined.Results. Models using NED consistently outperformed models based on Euclidean distance, with the highest classification accuracy of 91.4% achieved when it was used in t-SNE and the nearest neighbor method (Model M4). This emphasizes the adaptability of NED to complex data structures and its advantage in preserving key features in high and low-dimensional spaces.Conclusions. The normalized L2-metric shows potential as an effective measure of dissimilarity for machine learning tasks. It improves the performance of algorithms while maintaining scalability and robustness, which indicates its suitability for various applications in high-dimensional data contexts. |
---|---|
ISSN: | 1607-3274 2313-688X |
DOI: | 10.15588/1607-3274-2025-2-9 |