Application of the K-means clustering method in the organisation of passenger transport in a smart city

Every year, big data clustering methods are gaining popularity for decision-making on organizing passenger transportation in a smart city, ensuring the efficiency, adaptability and environmental friendliness of the transport system. Their relevance is due to the growth of data volumes, changing dema...

Full description

Saved in:

Bibliographic Details
Published in	Sučasnij stan naukovih doslìdženʹ ta tehnologìj v promislovostì (Online) no. 1(31); pp. 83 - 101
Main Authors	Matseliukh, Yurii, Lytvyn, Vasyl
Format	Journal Article
Language	English
Published	31.03.2025
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Every year, big data clustering methods are gaining popularity for decision-making on organizing passenger transportation in a smart city, ensuring the efficiency, adaptability and environmental friendliness of the transport system. Their relevance is due to the growth of data volumes, changing demand and the negative impact of transport on the environment. The object of the study is the process of clustering data sets on organizing passenger transportation. The subject of the study is the principles of studying clustering metrics when calculating the number of clusters for executing transportation schedules. The purpose of the study is to apply the K-means algorithm based on quality metrics for clustering data on organizing passenger transportation in a smart city. The following tasks are solved in the article: study of the features of clustering methods and their metrics; analysis of a large-scale heterogeneous data set on the duration of electric transport trips in an average-sized city; development of an effective algorithm for choosing a method for calculating the number of clusters based on metrics for assessing the quality of data clustering. The methods of analysis, synthesis, generalization, comparison, grouping, cluster analysis, system analysis, K-means method were used. The following results were obtained: It was established that the choice of the clustering method depends on the specifics of the task, data characteristics and the objectives of the analysis of transport flows. A complex, heterogeneous and raw data structure was revealed regarding the duration of electric transport journeys. Cluster analysis using the K-means method was due to the need for accurate data distribution between clusters. An algorithm for choosing a method for calculating the number of clusters based on metrics for assessing the quality of data clustering, including the elbow method, the silhouette method and the Kalinsky–Kharabash index, was proposed. It is recommended to use clustering to create routes with reduced waiting time, fewer transfers and compliance with passenger needs. Conclusions: the K-means method was used to analyze the duration of electric transport journeys. Data analysis revealed sections of routes with different traffic flow intensity, which depends on their location in urban areas, seasonality, etc. An algorithm for choosing a method for calculating the number of clusters based on internal metrics is proposed.
ISSN:	2522-9818 2524-2296
DOI:	10.30837/2522-9818.2025.1.083