A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU

Calculating Euclidean distance matrix is a data intensive operation and becomes computationally prohibitive for large datasets. Recent development of Graphics Processing Units (GPUs) has produced superb performance on scientific computing problems using massive parallel processing cores. However, du...

Full description

Saved in:
Bibliographic Details
Published in2010 International Conference on Machine Learning and Applications pp. 208 - 213
Main Authors Qi Li, Kecman, V, Salman, R
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Calculating Euclidean distance matrix is a data intensive operation and becomes computationally prohibitive for large datasets. Recent development of Graphics Processing Units (GPUs) has produced superb performance on scientific computing problems using massive parallel processing cores. However, due to the limited size of device memory, many GPU based algorithms have low capability in solving problems with large datasets. In this paper, a chunking method is proposed to calculate Euclidean distance matrix on large datasets. This is not only designed for scalability in multi-GPU environment but also to maximize the computational capability of each individual GPU device. We first implement a fast GPU algorithm that is suitable for calculating sub matrices of Euclidean distance matrix. Then we utilize a Map-Reduce like framework to split the final distance matrix calculation into many small independent jobs of calculating partial distance matrices, which can be efficiently solved by our GPU algorithm. The framework also dynamically allocates GPU resources to those independent jobs for maximum performance. The experimental results have shown a speed up of 15x on datasets which contain more than half million data points.
ISBN:1424492114
9781424492114
DOI:10.1109/ICMLA.2010.38