Scaling Deep Learning Models for Large Spatial Time-Series Forecasting

Neural networks are used for different machine learning tasks, such as spatial time-series forecasting. Accurate modelling of a large and complex system requires large datasets to train a deep neural network that causes a challenge of scale as training the network and serving the model are computati...

Full description

Saved in:
Bibliographic Details
Published in2019 IEEE International Conference on Big Data (Big Data) pp. 1587 - 1594
Main Authors Abbas, Zainab, Ivarsson, Jon Reginbald, Al-Shishtawy, Ahmad, Vlassov, Vladimir
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Neural networks are used for different machine learning tasks, such as spatial time-series forecasting. Accurate modelling of a large and complex system requires large datasets to train a deep neural network that causes a challenge of scale as training the network and serving the model are computationally and memory intensive. One example of a complex system that produces a large number of spatial time-series is a large road sensor infrastructure deployed for traffic monitoring. The goal of this work is twofold: 1) To model large amount of spatial time-series from road sensors; 2) To address the scalability problem in a real-life task of large-scale road traffic prediction which is an important part of an Intelligent Transportation System.We propose a partitioning technique to tackle the scalability problem that enables parallelism in both training and prediction: 1) We represent the sensor system as a directed weighted graph based on the road structure, which reflects dependencies between sensor readings, and weighted by sensor readings and inter-sensor distances; 2) We propose an algorithm to automatically partition the graph taking into account dependencies between spatial time-series from sensors; 3) We use the generated sensor graph partitions to train a prediction model per partition. Our experimental results on traffic density prediction using Long Short-Term Memory (LSTM) Neural Networks show that the partitioning-based models take 2x, if run sequentially, and 12x, if run in parallel, less training time, and 20x less prediction time compared to the unpartitioned model of the entire road infrastructure. The partitioning-based models take 100x less total sequential training time compared to single sensor models, i.e., one model per sensor. Furthermore, the partitioning-based models have 2x less prediction error (RMSE) compared to both the single sensor models and the entire road model.
DOI:10.1109/BigData47090.2019.9005475