Self-Adaptive Framework for Efficient Stream Data Classification on Storm
In this era of big data, stream data classification which is one of typical data stream applications has become more and more significant and challengeable. In these applications, it is obvious that data classification is much more frequent than model training. The ratio of stream data to be classif...
Saved in:
Published in | IEEE transactions on systems, man, and cybernetics. Systems Vol. 50; no. 1; pp. 123 - 136 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In this era of big data, stream data classification which is one of typical data stream applications has become more and more significant and challengeable. In these applications, it is obvious that data classification is much more frequent than model training. The ratio of stream data to be classified is rapid and time-varying, so it is an important problem to classify the stream data efficiently with high throughput. In this paper, we first analyze and categorize the current data stream machine learning algorithms according to their data structures. Then, we propose stream data classification topology (SDC-Topology) on Storm. For the classification algorithms based on the matrix, we propose self-adaptive stream data classification framework (SASDC-Framework) for efficient stream data classification on Storm. In SASDC-Framework, all the data sets arriving at the same unit time are partitioned into subsets with the nearly best partition size and processed in parallel. To select the nearly best partition size for the stream data sets efficiently, we adopt bisection method strategy and inverse distance weighted strategy. Extreme learning machine, which is a fast and accurate machine learning method based on matrix calculating, is used to test the efficiency of our proposals. According to evaluation results, the throughputs based on SASDC-Framework are 8-35 times higher than those based on SDC-Topology and the best throughput is more than 40000 prediction requests per second in our environment. |
---|---|
AbstractList | In this era of big data, stream data classification which is one of typical data stream applications has become more and more significant and challengeable. In these applications, it is obvious that data classification is much more frequent than model training. The ratio of stream data to be classified is rapid and time-varying, so it is an important problem to classify the stream data efficiently with high throughput. In this paper, we first analyze and categorize the current data stream machine learning algorithms according to their data structures. Then, we propose stream data classification topology (SDC-Topology) on Storm. For the classification algorithms based on the matrix, we propose self-adaptive stream data classification framework (SASDC-Framework) for efficient stream data classification on Storm. In SASDC-Framework, all the data sets arriving at the same unit time are partitioned into subsets with the nearly best partition size and processed in parallel. To select the nearly best partition size for the stream data sets efficiently, we adopt bisection method strategy and inverse distance weighted strategy. Extreme learning machine, which is a fast and accurate machine learning method based on matrix calculating, is used to test the efficiency of our proposals. According to evaluation results, the throughputs based on SASDC-Framework are 8-35 times higher than those based on SDC-Topology and the best throughput is more than 40000 prediction requests per second in our environment. |
Author | Huang, Shan Zhou, Jianpeng Yue, Chuncheng Deng, Shizhuo Wang, Botao Wang, Guoren |
Author_xml | – sequence: 1 givenname: Shizhuo orcidid: 0000-0002-6863-8516 surname: Deng fullname: Deng, Shizhuo organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China – sequence: 2 givenname: Botao orcidid: 0000-0002-0186-0219 surname: Wang fullname: Wang, Botao email: wangbotao@cse.neu.edu.cn organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China – sequence: 3 givenname: Shan surname: Huang fullname: Huang, Shan organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China – sequence: 4 givenname: Chuncheng surname: Yue fullname: Yue, Chuncheng organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China – sequence: 5 givenname: Jianpeng surname: Zhou fullname: Zhou, Jianpeng organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China – sequence: 6 givenname: Guoren surname: Wang fullname: Wang, Guoren organization: School of Computer Science and Engineering, Northeastern University, Shenyang, China |
BookMark | eNo9kFFLwzAQx4NMcM59APGl4HNncmnT5HHUTQcTHzqfQ5om0Lk2M8kUv70tG8LBHdzvfwe_WzTpXW8Quid4QQgWT7vqrVwAJsUCirzAIK7QFAjjKQCFyf9M2A2ah7DHGBPgjGI2RZvKHGy6bNQxtt8mWXvVmR_nPxPrfLKyttWt6WNSRW9UlzyrqJLyoEJoh42KreuToarofHeHrq06BDO_9Bn6WK925Wu6fX_ZlMttqkHQmDJbQ61tZrTKQWgOjNTUWqqVrpucE54BFgrbTFDS8JEwhDIoVFNQpiCjM_R4vnv07utkQpR7d_L98FICpTTPQAgyUORMae9C8MbKo2875X8lwXKUJkdpcpQmL9KGzMM50xpj_nmOOQy66B_ue2j- |
CODEN | ITSMFE |
CitedBy_id | crossref_primary_10_1049_rpg2_12766 crossref_primary_10_1007_s12559_018_9557_x crossref_primary_10_1109_TSMC_2020_3019531 crossref_primary_10_1016_j_knosys_2021_106749 crossref_primary_10_1007_s13042_020_01158_8 crossref_primary_10_3390_sym12081292 crossref_primary_10_1007_s10586_021_03462_6 crossref_primary_10_1016_j_bdr_2022_100356 crossref_primary_10_1109_TSMC_2021_3102978 crossref_primary_10_1007_s11280_022_01042_1 |
Cites_doi | 10.1007/BF00116251 10.1016/j.neucom.2007.10.008 10.1109/TSMC.2016.2597240 10.14778/2733004.2733010 10.1109/TCSII.2012.2204112 10.1016/j.neucom.2005.12.126 10.1109/TNN.2006.875977 10.1016/j.neucom.2010.11.034 10.1016/j.jss.2016.06.009 10.1145/347090.347107 10.1109/TKDE.2017.2662236 10.1109/5254.708428 10.1007/s10115-014-0808-1 10.1109/TNNLS.2014.2333557 10.1007/978-3-642-23496-5_13 10.1016/j.jinteco.2007.02.004 10.1016/j.eswa.2016.08.052 10.1145/800186.810616 10.1016/j.neucom.2007.02.009 10.1016/j.neucom.2015.04.105 10.1109/IJCNN.2010.5596303 10.1016/j.eswa.2013.11.035 10.1016/j.neucom.2012.01.042 10.1016/j.neunet.2014.10.001 10.1023/A:1018628609742 10.14778/3402755.3402795 10.1109/TSMC.2013.2297401 10.1109/COMPSAC.2009.67 10.1109/ICDM.2011.63 10.1109/TSMC.2016.2585566 10.1007/s00500-014-1233-9 10.1007/s13042-013-0180-6 10.1038/323533a0 10.1007/s11280-013-0236-2 10.1162/NECO_a_00893 10.3389/fncir.2016.00023 10.1109/BigDataService.2015.17 10.1109/TKDE.2006.69 10.1016/j.neucom.2012.01.040 10.14778/2733004.2733027 10.1016/j.neucom.2014.03.076 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 7TB 8FD FR3 H8D JQ2 L7M L~C L~D |
DOI | 10.1109/TSMC.2017.2757029 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Technology Research Database Engineering Research Database Aerospace Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Aerospace Database Technology Research Database Computer and Information Systems Abstracts – Academic Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Aerospace Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2168-2232 |
EndPage | 136 |
ExternalDocumentID | 10_1109_TSMC_2017_2757029 8082128 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Natural Science Foundation of China; NSFC grantid: U1401256; 61332014; 61332006 funderid: 10.13039/501100001809 – fundername: National Natural Science Foundation of China grantid: 61173030 funderid: 10.13039/501100001809 |
GroupedDBID | 0R~ 6IK 97E AAJGR AASAJ ABQJQ ABVLG ACGFS ACIWK AKJIK ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD HZ~ IFIPE IPLJI JAVBF M43 O9- OCL PQQKQ RIA RIE RIG RNS AAYXX CITATION 7SC 7SP 7TB 8FD FR3 H8D JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c293t-6fb2bcf4eca529c8261b3ff3cacbd58184209a0f4931d829c8e13627ad736a243 |
IEDL.DBID | RIE |
ISSN | 2168-2216 |
IngestDate | Fri Sep 13 02:39:19 EDT 2024 Fri Aug 23 03:29:58 EDT 2024 Wed Jun 26 19:27:36 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c293t-6fb2bcf4eca529c8261b3ff3cacbd58184209a0f4931d829c8e13627ad736a243 |
ORCID | 0000-0002-0186-0219 0000-0002-6863-8516 |
PQID | 2333542991 |
PQPubID | 75739 |
PageCount | 14 |
ParticipantIDs | ieee_primary_8082128 proquest_journals_2333542991 crossref_primary_10_1109_TSMC_2017_2757029 |
PublicationCentury | 2000 |
PublicationDate | 2020-Jan. 2020-1-00 20200101 |
PublicationDateYYYYMMDD | 2020-01-01 |
PublicationDate_xml | – month: 01 year: 2020 text: 2020-Jan. |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on systems, man, and cybernetics. Systems |
PublicationTitleAbbrev | TSMC |
PublicationYear | 2020 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 (ref49) 2015 ref15 ref14 ref11 ref10 (ref29) 2017 (ref39) 2016 ref17 ref19 ref18 (ref46) 2016 schlimmer (ref36) 1986 ref45 ref48 ref47 jin (ref8) 2011; 4 ref42 ref41 ref44 ref43 broomhead (ref26) 1988; 2 (ref50) 2017 ref9 ref4 ref3 ref6 ref5 ref40 ref35 ref34 ref37 ref31 lin (ref16) 2013; 28 ref30 mccord (ref7) 2011 ref33 ref32 ref2 ref1 ref38 ref24 ref23 ref25 ref20 ref22 ref21 ref28 ref27 (ref12) 2016 |
References_xml | – ident: ref35 doi: 10.1007/BF00116251 – ident: ref44 doi: 10.1016/j.neucom.2007.10.008 – year: 2017 ident: ref50 publication-title: Apache Samza – ident: ref10 doi: 10.1109/TSMC.2016.2597240 – ident: ref4 doi: 10.14778/2733004.2733010 – ident: ref41 doi: 10.1109/TCSII.2012.2204112 – start-page: 496 year: 1986 ident: ref36 article-title: A case study of incremental concept induction publication-title: Proc AAAI contributor: fullname: schlimmer – ident: ref27 doi: 10.1016/j.neucom.2005.12.126 – ident: ref43 doi: 10.1109/TNN.2006.875977 – ident: ref40 doi: 10.1016/j.neucom.2010.11.034 – ident: ref33 doi: 10.1016/j.jss.2016.06.009 – ident: ref34 doi: 10.1145/347090.347107 – ident: ref3 doi: 10.1109/TKDE.2017.2662236 – ident: ref30 doi: 10.1109/5254.708428 – ident: ref1 doi: 10.1007/s10115-014-0808-1 – ident: ref32 doi: 10.1109/TNNLS.2014.2333557 – start-page: 175 year: 2011 ident: ref7 article-title: Spam detection on Twitter using traditional classifiers publication-title: Proc 8th Int Conf Auton Trusted Comput doi: 10.1007/978-3-642-23496-5_13 contributor: fullname: mccord – ident: ref9 doi: 10.1016/j.jinteco.2007.02.004 – ident: ref24 doi: 10.1016/j.eswa.2016.08.052 – ident: ref47 doi: 10.1145/800186.810616 – ident: ref45 doi: 10.1016/j.neucom.2007.02.009 – ident: ref18 doi: 10.1016/j.neucom.2015.04.105 – ident: ref19 doi: 10.1109/IJCNN.2010.5596303 – ident: ref5 doi: 10.1016/j.eswa.2013.11.035 – year: 2015 ident: ref49 publication-title: Apache SPARK – ident: ref42 doi: 10.1016/j.neucom.2012.01.042 – ident: ref13 doi: 10.1016/j.neunet.2014.10.001 – ident: ref31 doi: 10.1023/A:1018628609742 – volume: 4 start-page: 1458 year: 2011 ident: ref8 article-title: A data mining-based spam detection system for social media networks publication-title: Proc VLDB Endowment doi: 10.14778/3402755.3402795 contributor: fullname: jin – volume: 2 start-page: 321 year: 1988 ident: ref26 article-title: Multivariable functional interpolation and adaptive networks publication-title: Complex Syst contributor: fullname: broomhead – ident: ref11 doi: 10.1109/TSMC.2013.2297401 – ident: ref48 doi: 10.1109/COMPSAC.2009.67 – ident: ref38 doi: 10.1109/ICDM.2011.63 – ident: ref2 doi: 10.1109/TSMC.2016.2585566 – year: 2016 ident: ref12 publication-title: Apache Storm – year: 2016 ident: ref46 publication-title: Storm (Event Processor) – ident: ref20 doi: 10.1007/s00500-014-1233-9 – ident: ref21 doi: 10.1007/s13042-013-0180-6 – ident: ref25 doi: 10.1038/323533a0 – ident: ref15 doi: 10.1007/s11280-013-0236-2 – ident: ref22 doi: 10.1162/NECO_a_00893 – ident: ref23 doi: 10.3389/fncir.2016.00023 – ident: ref28 doi: 10.1109/BigDataService.2015.17 – ident: ref37 doi: 10.1109/TKDE.2006.69 – ident: ref14 doi: 10.1016/j.neucom.2012.01.040 – year: 2017 ident: ref29 publication-title: Multi One-Class Incremental SVM for Document Stream Digitization – year: 2016 ident: ref39 publication-title: Apache Hadoop – ident: ref6 doi: 10.14778/2733004.2733027 – ident: ref17 doi: 10.1016/j.neucom.2014.03.076 – volume: 28 start-page: 35 year: 2013 ident: ref16 article-title: A secure and practical mechanism of outsourcing extreme learning machine in cloud computing publication-title: IEEE Intell Syst contributor: fullname: lin |
SSID | ssj0001286306 |
Score | 2.247611 |
Snippet | In this era of big data, stream data classification which is one of typical data stream applications has become more and more significant and challengeable. In... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 123 |
SubjectTerms | Algorithms Artificial intelligence Artificial neural networks Classification Data structures Data transmission Datasets extreme learning machine (ELM) Learning systems Machine learning partition strategy Partitions Proposals Real-time systems Storm Storms stream data Throughput Topology Training |
Title | Self-Adaptive Framework for Efficient Stream Data Classification on Storm |
URI | https://ieeexplore.ieee.org/document/8082128 https://www.proquest.com/docview/2333542991/abstract/ |
Volume | 50 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JT8JAFH4BTnpwQyOKZg6ejIVZuh4JQtAEL0DCrZn1oizRcvHXOzMtSNSDSdP0MG0m75u-Zd6b9wHc8YQITDkOtMI4CKmKAyEUCSKDU5OGxnkRrtriJR7Nwud5NK_Bw-4sjNbaF5_pjnv0uXy1khu3VdZNrb2y-rQO9RTT8qzW3n5KGjNPpUlJbMG39yqJSXDWnU7GfVfHlXRoEiXYO5TfZsjzqvxSxt7CDI9hvJ1bWVjy2tkUoiM_f7Rt_O_kT-CocjVRr1wbp1DTyzM43GtA2ISniX4zQU_xtdN6aLit1ELWlUUD313CfhS51DVfoEdecORZNF19kYcU2Wtiw_bFOcyGg2l_FFTkCoG0Fr4IYiOokCbUkkc0kzbKIIIZwySXQkXWjIcUZxybMGNEpW6EJtbYJVwlLOY0ZBfQWK6W-hJQarDJbGiUKht7kUhwlnFu3RQptWNMiFpwv5V1vi57aOQ-9sBZ7oDJHTB5BUwLmk52u4GV2FrQ3qKTV3_ZR04ZY45vKyNXf791DQfUxcd-y6QNjeJ9o2-sE1GIW796vgAonMIL |
link.rule.ids | 315,786,790,802,27957,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED7xGICBN6JQwAMTIsWx8xxRoWqBdmkrsUV-LtCHIF349ZzdFCpgQIqiDJfIunPuu5fvAC5FGkrKBA2MpjSImE4CKXUYxJZmNoussyJctUUvaQ-jh-f4eQWuv87CGGN88ZlpuEefy9cTNXOhspsM8Qr16SqsI87TfH5aaymikiXcD9NkYYLix3uVxkTam0G_23SVXGmDpXFKvUn5DUR-ssovdewxprUD3cXq5qUlL41ZKRvq40fjxv8ufxe2K2OT3M53xx6smPE-bC21IDyATt-82uBWi6nTe6S1qNUiaMySe99fAj9KXPJajMidKAXxczRdhZEXKsGrj4776BCGrftBsx1U4xUChRhfBomVTCobGSViliv0M0LJreVKKKljBPKI0VxQG-U81JmjMCHCXSp0yhPBIn4Ea-PJ2BwDySy1OTpHmUbvK4yl4LkQaKgoZdzMhLgGVwteF9N5F43Cex80L5xgCieYohJMDQ4c774IK7bVoL6QTlH9Z-8F45y7iVt5ePL3Wxew0R50n4qnTu_xFDaZ85Z9AKUOa-XbzJyhSVHKc7-TPgGWV8Vh |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Self-Adaptive+Framework+for+Efficient+Stream+Data+Classification+on+Storm&rft.jtitle=IEEE+transactions+on+systems%2C+man%2C+and+cybernetics.+Systems&rft.au=Deng%2C+Shizhuo&rft.au=Wang%2C+Botao&rft.au=Huang%2C+Shan&rft.au=Yue%2C+Chuncheng&rft.date=2020-01-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=2168-2216&rft.eissn=2168-2232&rft.volume=50&rft.issue=1&rft.spage=123&rft_id=info:doi/10.1109%2FTSMC.2017.2757029&rft.externalDBID=NO_FULL_TEXT |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2168-2216&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2168-2216&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2168-2216&client=summon |