A GPU-based harmony k-means algorithm for document clustering

Document clustering is one of the most important tasks in text mining. In clustering algorithms, high-dimensional vector is usually used to represent a document which causes that the algorithms are often computationally expensive. On the other hand, Graphic Processing Unit (GPU) is increasingly impo...

Full description

Saved in:
Bibliographic Details
Published inIET Conference Proceedings p. 3.29
Main Authors Gao, Zhanchun, Li, Enxing, Jiang, Yanjun
Format Conference Proceeding
LanguageEnglish
Published Stevenage, UK IET 2012
The Institution of Engineering & Technology
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Document clustering is one of the most important tasks in text mining. In clustering algorithms, high-dimensional vector is usually used to represent a document which causes that the algorithms are often computationally expensive. On the other hand, Graphic Processing Unit (GPU) is increasingly important in parallel computing due to its powerful parallel capacity and high bandwidth. This paper implements a GPU-based Harmony K-means Algorithm (HKA) with NVIDIA's Compute Unified Device Architecture (CUDA), and uses it for document clustering. In our experiment, our GPU-based program can acquire a maximum 20 times speedup in contrast with CPU-based program.
ISBN:9781849196413
1849196419
DOI:10.1049/cp.2012.2426