Self-Adaptive Framework for Efficient Stream Data Classification on Storm

In this era of big data, stream data classification which is one of typical data stream applications has become more and more significant and challengeable. In these applications, it is obvious that data classification is much more frequent than model training. The ratio of stream data to be classif...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on systems, man, and cybernetics. Systems Vol. 50; no. 1; pp. 123 - 136
Main Authors Deng, Shizhuo, Wang, Botao, Huang, Shan, Yue, Chuncheng, Zhou, Jianpeng, Wang, Guoren
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In this era of big data, stream data classification which is one of typical data stream applications has become more and more significant and challengeable. In these applications, it is obvious that data classification is much more frequent than model training. The ratio of stream data to be classified is rapid and time-varying, so it is an important problem to classify the stream data efficiently with high throughput. In this paper, we first analyze and categorize the current data stream machine learning algorithms according to their data structures. Then, we propose stream data classification topology (SDC-Topology) on Storm. For the classification algorithms based on the matrix, we propose self-adaptive stream data classification framework (SASDC-Framework) for efficient stream data classification on Storm. In SASDC-Framework, all the data sets arriving at the same unit time are partitioned into subsets with the nearly best partition size and processed in parallel. To select the nearly best partition size for the stream data sets efficiently, we adopt bisection method strategy and inverse distance weighted strategy. Extreme learning machine, which is a fast and accurate machine learning method based on matrix calculating, is used to test the efficiency of our proposals. According to evaluation results, the throughputs based on SASDC-Framework are 8-35 times higher than those based on SDC-Topology and the best throughput is more than 40000 prediction requests per second in our environment.
AbstractList In this era of big data, stream data classification which is one of typical data stream applications has become more and more significant and challengeable. In these applications, it is obvious that data classification is much more frequent than model training. The ratio of stream data to be classified is rapid and time-varying, so it is an important problem to classify the stream data efficiently with high throughput. In this paper, we first analyze and categorize the current data stream machine learning algorithms according to their data structures. Then, we propose stream data classification topology (SDC-Topology) on Storm. For the classification algorithms based on the matrix, we propose self-adaptive stream data classification framework (SASDC-Framework) for efficient stream data classification on Storm. In SASDC-Framework, all the data sets arriving at the same unit time are partitioned into subsets with the nearly best partition size and processed in parallel. To select the nearly best partition size for the stream data sets efficiently, we adopt bisection method strategy and inverse distance weighted strategy. Extreme learning machine, which is a fast and accurate machine learning method based on matrix calculating, is used to test the efficiency of our proposals. According to evaluation results, the throughputs based on SASDC-Framework are 8-35 times higher than those based on SDC-Topology and the best throughput is more than 40000 prediction requests per second in our environment.
Author Huang, Shan
Zhou, Jianpeng
Yue, Chuncheng
Deng, Shizhuo
Wang, Botao
Wang, Guoren
Author_xml – sequence: 1
  givenname: Shizhuo
  orcidid: 0000-0002-6863-8516
  surname: Deng
  fullname: Deng, Shizhuo
  organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China
– sequence: 2
  givenname: Botao
  orcidid: 0000-0002-0186-0219
  surname: Wang
  fullname: Wang, Botao
  email: wangbotao@cse.neu.edu.cn
  organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China
– sequence: 3
  givenname: Shan
  surname: Huang
  fullname: Huang, Shan
  organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China
– sequence: 4
  givenname: Chuncheng
  surname: Yue
  fullname: Yue, Chuncheng
  organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China
– sequence: 5
  givenname: Jianpeng
  surname: Zhou
  fullname: Zhou, Jianpeng
  organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China
– sequence: 6
  givenname: Guoren
  surname: Wang
  fullname: Wang, Guoren
  organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China
BookMark eNo9kFFLwzAQx4NMcM59APGl4HNncmnT5HHUTQcTHzqfQ5om0Lk2M8kUv70tG8LBHdzvfwe_WzTpXW8Quid4QQgWT7vqrVwAJsUCirzAIK7QFAjjKQCFyf9M2A2ah7DHGBPgjGI2RZvKHGy6bNQxtt8mWXvVmR_nPxPrfLKyttWt6WNSRW9UlzyrqJLyoEJoh42KreuToarofHeHrq06BDO_9Bn6WK925Wu6fX_ZlMttqkHQmDJbQ61tZrTKQWgOjNTUWqqVrpucE54BFgrbTFDS8JEwhDIoVFNQpiCjM_R4vnv07utkQpR7d_L98FICpTTPQAgyUORMae9C8MbKo2875X8lwXKUJkdpcpQmL9KGzMM50xpj_nmOOQy66B_ue2j-
CODEN ITSMFE
CitedBy_id crossref_primary_10_1049_rpg2_12766
crossref_primary_10_1007_s12559_018_9557_x
crossref_primary_10_1109_TSMC_2020_3019531
crossref_primary_10_1016_j_knosys_2021_106749
crossref_primary_10_1007_s13042_020_01158_8
crossref_primary_10_3390_sym12081292
crossref_primary_10_1007_s10586_021_03462_6
crossref_primary_10_1016_j_bdr_2022_100356
crossref_primary_10_1109_TSMC_2021_3102978
crossref_primary_10_1007_s11280_022_01042_1
Cites_doi 10.1007/BF00116251
10.1016/j.neucom.2007.10.008
10.1109/TSMC.2016.2597240
10.14778/2733004.2733010
10.1109/TCSII.2012.2204112
10.1016/j.neucom.2005.12.126
10.1109/TNN.2006.875977
10.1016/j.neucom.2010.11.034
10.1016/j.jss.2016.06.009
10.1145/347090.347107
10.1109/TKDE.2017.2662236
10.1109/5254.708428
10.1007/s10115-014-0808-1
10.1109/TNNLS.2014.2333557
10.1007/978-3-642-23496-5_13
10.1016/j.jinteco.2007.02.004
10.1016/j.eswa.2016.08.052
10.1145/800186.810616
10.1016/j.neucom.2007.02.009
10.1016/j.neucom.2015.04.105
10.1109/IJCNN.2010.5596303
10.1016/j.eswa.2013.11.035
10.1016/j.neucom.2012.01.042
10.1016/j.neunet.2014.10.001
10.1023/A:1018628609742
10.14778/3402755.3402795
10.1109/TSMC.2013.2297401
10.1109/COMPSAC.2009.67
10.1109/ICDM.2011.63
10.1109/TSMC.2016.2585566
10.1007/s00500-014-1233-9
10.1007/s13042-013-0180-6
10.1038/323533a0
10.1007/s11280-013-0236-2
10.1162/NECO_a_00893
10.3389/fncir.2016.00023
10.1109/BigDataService.2015.17
10.1109/TKDE.2006.69
10.1016/j.neucom.2012.01.040
10.14778/2733004.2733027
10.1016/j.neucom.2014.03.076
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
7TB
8FD
FR3
H8D
JQ2
L7M
L~C
L~D
DOI 10.1109/TSMC.2017.2757029
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
Engineering Research Database
Aerospace Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Aerospace Database
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Aerospace Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2168-2232
EndPage 136
ExternalDocumentID 10_1109_TSMC_2017_2757029
8082128
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China; NSFC
  grantid: U1401256; 61332014; 61332006
  funderid: 10.13039/501100001809
– fundername: National Natural Science Foundation of China
  grantid: 61173030
  funderid: 10.13039/501100001809
GroupedDBID 0R~
6IK
97E
AAJGR
AASAJ
ABQJQ
ABVLG
ACGFS
ACIWK
AKJIK
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
HZ~
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
PQQKQ
RIA
RIE
RIG
RNS
AAYXX
CITATION
7SC
7SP
7TB
8FD
FR3
H8D
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c293t-6fb2bcf4eca529c8261b3ff3cacbd58184209a0f4931d829c8e13627ad736a243
IEDL.DBID RIE
ISSN 2168-2216
IngestDate Fri Sep 13 02:39:19 EDT 2024
Fri Aug 23 03:29:58 EDT 2024
Wed Jun 26 19:27:36 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c293t-6fb2bcf4eca529c8261b3ff3cacbd58184209a0f4931d829c8e13627ad736a243
ORCID 0000-0002-0186-0219
0000-0002-6863-8516
PQID 2333542991
PQPubID 75739
PageCount 14
ParticipantIDs ieee_primary_8082128
proquest_journals_2333542991
crossref_primary_10_1109_TSMC_2017_2757029
PublicationCentury 2000
PublicationDate 2020-Jan.
2020-1-00
20200101
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – month: 01
  year: 2020
  text: 2020-Jan.
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on systems, man, and cybernetics. Systems
PublicationTitleAbbrev TSMC
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
(ref49) 2015
ref15
ref14
ref11
ref10
(ref29) 2017
(ref39) 2016
ref17
ref19
ref18
(ref46) 2016
schlimmer (ref36) 1986
ref45
ref48
ref47
jin (ref8) 2011; 4
ref42
ref41
ref44
ref43
broomhead (ref26) 1988; 2
(ref50) 2017
ref9
ref4
ref3
ref6
ref5
ref40
ref35
ref34
ref37
ref31
lin (ref16) 2013; 28
ref30
mccord (ref7) 2011
ref33
ref32
ref2
ref1
ref38
ref24
ref23
ref25
ref20
ref22
ref21
ref28
ref27
(ref12) 2016
References_xml – ident: ref35
  doi: 10.1007/BF00116251
– ident: ref44
  doi: 10.1016/j.neucom.2007.10.008
– year: 2017
  ident: ref50
  publication-title: Apache Samza
– ident: ref10
  doi: 10.1109/TSMC.2016.2597240
– ident: ref4
  doi: 10.14778/2733004.2733010
– ident: ref41
  doi: 10.1109/TCSII.2012.2204112
– start-page: 496
  year: 1986
  ident: ref36
  article-title: A case study of incremental concept induction
  publication-title: Proc AAAI
  contributor:
    fullname: schlimmer
– ident: ref27
  doi: 10.1016/j.neucom.2005.12.126
– ident: ref43
  doi: 10.1109/TNN.2006.875977
– ident: ref40
  doi: 10.1016/j.neucom.2010.11.034
– ident: ref33
  doi: 10.1016/j.jss.2016.06.009
– ident: ref34
  doi: 10.1145/347090.347107
– ident: ref3
  doi: 10.1109/TKDE.2017.2662236
– ident: ref30
  doi: 10.1109/5254.708428
– ident: ref1
  doi: 10.1007/s10115-014-0808-1
– ident: ref32
  doi: 10.1109/TNNLS.2014.2333557
– start-page: 175
  year: 2011
  ident: ref7
  article-title: Spam detection on Twitter using traditional classifiers
  publication-title: Proc 8th Int Conf Auton Trusted Comput
  doi: 10.1007/978-3-642-23496-5_13
  contributor:
    fullname: mccord
– ident: ref9
  doi: 10.1016/j.jinteco.2007.02.004
– ident: ref24
  doi: 10.1016/j.eswa.2016.08.052
– ident: ref47
  doi: 10.1145/800186.810616
– ident: ref45
  doi: 10.1016/j.neucom.2007.02.009
– ident: ref18
  doi: 10.1016/j.neucom.2015.04.105
– ident: ref19
  doi: 10.1109/IJCNN.2010.5596303
– ident: ref5
  doi: 10.1016/j.eswa.2013.11.035
– year: 2015
  ident: ref49
  publication-title: Apache SPARK
– ident: ref42
  doi: 10.1016/j.neucom.2012.01.042
– ident: ref13
  doi: 10.1016/j.neunet.2014.10.001
– ident: ref31
  doi: 10.1023/A:1018628609742
– volume: 4
  start-page: 1458
  year: 2011
  ident: ref8
  article-title: A data mining-based spam detection system for social media networks
  publication-title: Proc VLDB Endowment
  doi: 10.14778/3402755.3402795
  contributor:
    fullname: jin
– volume: 2
  start-page: 321
  year: 1988
  ident: ref26
  article-title: Multivariable functional interpolation and adaptive networks
  publication-title: Complex Syst
  contributor:
    fullname: broomhead
– ident: ref11
  doi: 10.1109/TSMC.2013.2297401
– ident: ref48
  doi: 10.1109/COMPSAC.2009.67
– ident: ref38
  doi: 10.1109/ICDM.2011.63
– ident: ref2
  doi: 10.1109/TSMC.2016.2585566
– year: 2016
  ident: ref12
  publication-title: Apache Storm
– year: 2016
  ident: ref46
  publication-title: Storm (Event Processor)
– ident: ref20
  doi: 10.1007/s00500-014-1233-9
– ident: ref21
  doi: 10.1007/s13042-013-0180-6
– ident: ref25
  doi: 10.1038/323533a0
– ident: ref15
  doi: 10.1007/s11280-013-0236-2
– ident: ref22
  doi: 10.1162/NECO_a_00893
– ident: ref23
  doi: 10.3389/fncir.2016.00023
– ident: ref28
  doi: 10.1109/BigDataService.2015.17
– ident: ref37
  doi: 10.1109/TKDE.2006.69
– ident: ref14
  doi: 10.1016/j.neucom.2012.01.040
– year: 2017
  ident: ref29
  publication-title: Multi One-Class Incremental SVM for Document Stream Digitization
– year: 2016
  ident: ref39
  publication-title: Apache Hadoop
– ident: ref6
  doi: 10.14778/2733004.2733027
– ident: ref17
  doi: 10.1016/j.neucom.2014.03.076
– volume: 28
  start-page: 35
  year: 2013
  ident: ref16
  article-title: A secure and practical mechanism of outsourcing extreme learning machine in cloud computing
  publication-title: IEEE Intell Syst
  contributor:
    fullname: lin
SSID ssj0001286306
Score 2.247611
Snippet In this era of big data, stream data classification which is one of typical data stream applications has become more and more significant and challengeable. In...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 123
SubjectTerms Algorithms
Artificial intelligence
Artificial neural networks
Classification
Data structures
Data transmission
Datasets
extreme learning machine (ELM)
Learning systems
Machine learning
partition strategy
Partitions
Proposals
Real-time systems
Storm
Storms
stream data
Throughput
Topology
Training
Title Self-Adaptive Framework for Efficient Stream Data Classification on Storm
URI https://ieeexplore.ieee.org/document/8082128
https://www.proquest.com/docview/2333542991/abstract/
Volume 50
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JT8JAFH4BTnpwQyOKZg6ejIVZuh4JQtAEL0DCrZn1oizRcvHXOzMtSNSDSdP0MG0m75u-Zd6b9wHc8YQITDkOtMI4CKmKAyEUCSKDU5OGxnkRrtriJR7Nwud5NK_Bw-4sjNbaF5_pjnv0uXy1khu3VdZNrb2y-rQO9RTT8qzW3n5KGjNPpUlJbMG39yqJSXDWnU7GfVfHlXRoEiXYO5TfZsjzqvxSxt7CDI9hvJ1bWVjy2tkUoiM_f7Rt_O_kT-CocjVRr1wbp1DTyzM43GtA2ISniX4zQU_xtdN6aLit1ELWlUUD313CfhS51DVfoEdecORZNF19kYcU2Wtiw_bFOcyGg2l_FFTkCoG0Fr4IYiOokCbUkkc0kzbKIIIZwySXQkXWjIcUZxybMGNEpW6EJtbYJVwlLOY0ZBfQWK6W-hJQarDJbGiUKht7kUhwlnFu3RQptWNMiFpwv5V1vi57aOQ-9sBZ7oDJHTB5BUwLmk52u4GV2FrQ3qKTV3_ZR04ZY45vKyNXf791DQfUxcd-y6QNjeJ9o2-sE1GIW796vgAonMIL
link.rule.ids 315,786,790,802,27957,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED7xGICBN6JQwAMTIsWx8xxRoWqBdmkrsUV-LtCHIF349ZzdFCpgQIqiDJfIunPuu5fvAC5FGkrKBA2MpjSImE4CKXUYxJZmNoussyJctUUvaQ-jh-f4eQWuv87CGGN88ZlpuEefy9cTNXOhspsM8Qr16SqsI87TfH5aaymikiXcD9NkYYLix3uVxkTam0G_23SVXGmDpXFKvUn5DUR-ssovdewxprUD3cXq5qUlL41ZKRvq40fjxv8ufxe2K2OT3M53xx6smPE-bC21IDyATt-82uBWi6nTe6S1qNUiaMySe99fAj9KXPJajMidKAXxczRdhZEXKsGrj4776BCGrftBsx1U4xUChRhfBomVTCobGSViliv0M0LJreVKKKljBPKI0VxQG-U81JmjMCHCXSp0yhPBIn4Ea-PJ2BwDySy1OTpHmUbvK4yl4LkQaKgoZdzMhLgGVwteF9N5F43Cex80L5xgCieYohJMDQ4c774IK7bVoL6QTlH9Z-8F45y7iVt5ePL3Wxew0R50n4qnTu_xFDaZ85Z9AKUOa-XbzJyhSVHKc7-TPgGWV8Vh
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Self-Adaptive+Framework+for+Efficient+Stream+Data+Classification+on+Storm&rft.jtitle=IEEE+transactions+on+systems%2C+man%2C+and+cybernetics.+Systems&rft.au=Deng%2C+Shizhuo&rft.au=Wang%2C+Botao&rft.au=Huang%2C+Shan&rft.au=Yue%2C+Chuncheng&rft.date=2020-01-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=2168-2216&rft.eissn=2168-2232&rft.volume=50&rft.issue=1&rft.spage=123&rft_id=info:doi/10.1109%2FTSMC.2017.2757029&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2216&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2216&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2216&client=summon