A parallel metaheuristic data clustering framework for cloud
A high performance data analytics for internet of things (IoT) has been a promising research subject in recent years because traditional data mining algorithms may not be applicable to big data of IoT. One of the main reasons is that the data that need to be analyzed may exceed the storage size of a...
Saved in:
Published in | Journal of parallel and distributed computing Vol. 116; pp. 39 - 49 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.06.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A high performance data analytics for internet of things (IoT) has been a promising research subject in recent years because traditional data mining algorithms may not be applicable to big data of IoT. One of the main reasons is that the data that need to be analyzed may exceed the storage size of a single machine. The computation cost of data analysis tasks that is too high for a single computer system is another critical problem we have to confront when analyzing data from an IoT system. That is why an efficient data clustering framework for metaheuristic algorithm on a cloud computing environment is presented in this paper for data analytics, which explains how to divide mining tasks of a mining algorithm into different nodes (i.e., the Map process) and then aggregate the mining results from these nodes (i.e., Reduce process). We further attempted to use the proposed framework to implement data clustering algorithms (e.g., k-means, genetic k-means, and particle swarm optimization) on a standalone system and Spark. The experimental results show that the performance of the proposed framework makes it useful to develop data clustering algorithms on a cloud computing environment. |
---|---|
ISSN: | 0743-7315 1096-0848 |
DOI: | 10.1016/j.jpdc.2017.10.020 |