An efficient data processing framework for mining the massive trajectory of moving objects

•A novel framework for efficient processing of trajectory data of moving objects.•Propose a big data distribution module based on a two-step consistent hashing algorithm.•Propose a data transformation module based on a parallel linear referencing strategy.•Propose a compression-aware I/O performance...

Full description

Saved in:

Bibliographic Details
Published in	Computers, environment and urban systems Vol. 61; pp. 129 - 140
Main Authors	Zhou, Yuanchun, Zhang, Yang, Ge, Yong, Xue, Zhenghua, Fu, Yanjie, Guo, Danhuai, Shao, Jing, Zhu, Tiangang, Wang, Xuezhi, Li, Jianhui
Format	Journal Article
Language	English
Published	Oxford Elsevier Ltd 01.01.2017 Elsevier Science Ltd
Subjects	Algorithms Big data Communication Compression contribution model Data mining Data processing Efficiency Experiments Hash based algorithms Parallel linear referencing Parallel processing Performance evaluation Trajectory analysis Trajectory of moving object Transformations Two step consistent hashing Compression contribution model Parallel linear referencing Big data Two step consistent hashing Trajectory of moving object
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•A novel framework for efficient processing of trajectory data of moving objects.•Propose a big data distribution module based on a two-step consistent hashing algorithm.•Propose a data transformation module based on a parallel linear referencing strategy.•Propose a compression-aware I/O performance improvement module.•Conduct extensive empirical studies on large scale 1.114TB synthetic data and real 578GB GPS data. Recently, there has been increasing development of positioning technology, which enables us to collect large scale trajectory data for moving objects. Efficient processing and analysis of massive trajectory data has thus become an emerging and challenging task for both researchers and practitioners. Therefore, in this paper, we propose an efficient data processing framework for mining massive trajectory data. This framework includes three modules: (1) a data distribution module, (2) a data transformation module, and (3) a high performance I/O module. Specifically, we first design a two-step consistent hashing algorithm, which takes into account load balancing, data locality, and scalability, for a data distribution module. In the data transformation module, we present a parallel strategy of a linear referencing algorithm with reduced subtask coupling, easy-implemented parallelization, and low communication cost. Moreover, we propose a compression-aware I/O module to improve the processing efficiency. Finally, we conduct a comprehensive performance evaluation on a synthetic dataset (1.114TB) and a real world taxi GPS dataset (578GB). The experimental results demonstrate the advantages of our proposed framework.
ISSN:	0198-9715 1873-7587
DOI:	10.1016/j.compenvurbsys.2015.03.004