A parallel metaheuristic data clustering framework for cloud

A high performance data analytics for internet of things (IoT) has been a promising research subject in recent years because traditional data mining algorithms may not be applicable to big data of IoT. One of the main reasons is that the data that need to be analyzed may exceed the storage size of a...

Full description

Saved in:
Bibliographic Details
Published inJournal of parallel and distributed computing Vol. 116; pp. 39 - 49
Main Authors Tsai, Chun-Wei, Liu, Shi-Jui, Wang, Yi-Chung
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.06.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A high performance data analytics for internet of things (IoT) has been a promising research subject in recent years because traditional data mining algorithms may not be applicable to big data of IoT. One of the main reasons is that the data that need to be analyzed may exceed the storage size of a single machine. The computation cost of data analysis tasks that is too high for a single computer system is another critical problem we have to confront when analyzing data from an IoT system. That is why an efficient data clustering framework for metaheuristic algorithm on a cloud computing environment is presented in this paper for data analytics, which explains how to divide mining tasks of a mining algorithm into different nodes (i.e., the Map process) and then aggregate the mining results from these nodes (i.e., Reduce process). We further attempted to use the proposed framework to implement data clustering algorithms (e.g., k-means, genetic k-means, and particle swarm optimization) on a standalone system and Spark. The experimental results show that the performance of the proposed framework makes it useful to develop data clustering algorithms on a cloud computing environment.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2017.10.020