Parallel Classification of Spatial Points Into Geographical Regions

The amount of data generated by social media, social networks and distributed platforms such as blockchain, have reached quite high levels. Various data analysis methods could be applied this big data. One of these methods is to classify geo-tagged social network data in order to report geographical...

Full description

Saved in:
Bibliographic Details
Published in2019 18th International Symposium on Parallel and Distributed Computing (ISPDC) pp. 9 - 15
Main Authors Tarmur, Sanver, Ozturan, Can
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2019
Subjects
Online AccessGet full text
DOI10.1109/ISPDC.2019.000-3

Cover

Abstract The amount of data generated by social media, social networks and distributed platforms such as blockchain, have reached quite high levels. Various data analysis methods could be applied this big data. One of these methods is to classify geo-tagged social network data in order to report geographical area associated with the data. We propose an efficient parallel classification approach and implement a classifier tool which is capable of processing huge amount of data. To test our approach, we collect Twitter data over five densest areas of Turkey. There are important factors affecting the classification performance such as the spatial indexing and the parallelization strategies. Hierarchical Triangular Mesh (HTM) and R-Tree spatial indexes are used for indexing regions. For parallel processing data streams classifier tool is implemented based on Apache Spark and Kafka platforms in order to obtain high scalability. To show effectiveness of our method, we perform tests on Amazon Web Services (AWS) Cloud environment and compare our method against a method which implements HTM on a Microsoft SQL Server. Results show that 1.6 - 4.5 fold speed-up is obtained and Twitter data that is collected over a month can be processed effectively in three hours.
AbstractList The amount of data generated by social media, social networks and distributed platforms such as blockchain, have reached quite high levels. Various data analysis methods could be applied this big data. One of these methods is to classify geo-tagged social network data in order to report geographical area associated with the data. We propose an efficient parallel classification approach and implement a classifier tool which is capable of processing huge amount of data. To test our approach, we collect Twitter data over five densest areas of Turkey. There are important factors affecting the classification performance such as the spatial indexing and the parallelization strategies. Hierarchical Triangular Mesh (HTM) and R-Tree spatial indexes are used for indexing regions. For parallel processing data streams classifier tool is implemented based on Apache Spark and Kafka platforms in order to obtain high scalability. To show effectiveness of our method, we perform tests on Amazon Web Services (AWS) Cloud environment and compare our method against a method which implements HTM on a Microsoft SQL Server. Results show that 1.6 - 4.5 fold speed-up is obtained and Twitter data that is collected over a month can be processed effectively in three hours.
Author Tarmur, Sanver
Ozturan, Can
Author_xml – sequence: 1
  givenname: Sanver
  surname: Tarmur
  fullname: Tarmur, Sanver
  organization: Bogazici University
– sequence: 2
  givenname: Can
  surname: Ozturan
  fullname: Ozturan, Can
  organization: Bogazici University
BookMark eNotzLFOwzAUhWEj0QFKdyQWv0DCvXYc2yMKUCJVIqJlrm4Su1gycZRk4e2JBNP5h0_nll0PaXCM3SPkiGAf62PzXOUC0OYAkMkrtrPaoBYGpQFUN6xqaKIYXeRVpHkOPnS0hDTw5PlxXJMib1IYlpnXw5L43qXLROPXyiL_cJeVznds4ynObve_W_b5-nKq3rLD-76ung5ZQK2WTMhWC08FlE5b4bzqjcCCDIiiEwhdq4zqe6nLojVdB1Ir5YqybD2tWCstt-zh7zc4587jFL5p-jkbbcEqlL-PnkcZ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISPDC.2019.000-3
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781728138015
1728138019
EndPage 15
ExternalDocumentID 8790951
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i175t-23b72fa406e792ef5d8214a8024c210cb585dd3764b8cc03755e466bfa2ef7573
IEDL.DBID RIE
IngestDate Thu Jun 29 18:39:04 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-23b72fa406e792ef5d8214a8024c210cb585dd3764b8cc03755e466bfa2ef7573
PageCount 7
ParticipantIDs ieee_primary_8790951
PublicationCentury 2000
PublicationDate 2019-Jun
PublicationDateYYYYMMDD 2019-06-01
PublicationDate_xml – month: 06
  year: 2019
  text: 2019-Jun
PublicationDecade 2010
PublicationTitle 2019 18th International Symposium on Parallel and Distributed Computing (ISPDC)
PublicationTitleAbbrev ISPDC
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.6980249
Snippet The amount of data generated by social media, social networks and distributed platforms such as blockchain, have reached quite high levels. Various data...
SourceID ieee
SourceType Publisher
StartPage 9
SubjectTerms apache spark
hierarchical triangular mesh
HTM
Indexing
parallel processing
point classification
Servers
Sparks
Spatial databases
Spatial indexes
spatial indexing
streaming data classification
Twitter
Title Parallel Classification of Spatial Points Into Geographical Regions
URI https://ieeexplore.ieee.org/document/8790951
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB7anjyptOKbHDy6fexms7vnammFyqIWeit5zEKx7Eq7vfjrndltK4oHbyEEkkxCvknyzXwAd4FPXgM69HTiEk862fcSo7SnfW3tII7sIODg5OmzGs_k0zycN-D-EAuDiBX5DLtcrP7yXWG3_FTWi6OEPYImNGmb1bFa-5_HftKbvKYPQyZrcQZKOlx-6KVUcDE6hum-o5ol8t7dlqZrP3_lYPzvSE6g8x2YJ9ID5JxCA_M2DFO9ZkWUlagULpn7U5lbFJlgxWHaYSItlnm5EZO8LMRO95w5GCvxgkxI3nRgNnp8G469nTaCtyTALz0_MJGfaYJjjBIfs9DF_kDqmCDX0i3OGroGOEenhzSxtSx0G6JUymSaGkdhFJxBKy9yPAdhQnIBZUCOXaCkdpyQzlrsK7Kk1U7ZC2izARYfdfqLxW7ul39XX8ERL0HNprqGVrne4g3hdmluqwX7AmSymis
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4gHvSkBoxve_Do8tjtdnfPKAEFslFIuJG-NiGSXQLLxV_vzC5gNB68NU2TttOm37T9Zj6AB89Fr8Ea68jIRA43vOVESkhHulLrdhjotkfBycOR6E34y9SfVuBxHwtjrS3IZ7ZBxeIv32R6Q09lzTCIyCM4gEPEfe6X0Vq7v8dW1Oy_x08domtRDko8Xn4ophSA0T2B4a6rkify0djkqqE_f2Vh_O9YTqH-HZrH4j3onEHFpjXoxHJFmigLVmhcEvunMDjLEkaaw7jHWJzN03zN-mmesa3yObEwFuzNEiV5XYdJ93nc6TlbdQRnjpCfO66nAjeRCMg2iFyb-CZ021yGCLoa73Fa4UXAGDw_uAq1Jqlb33IhVCKxceAH3jlU0yy1F8CUj04g99C18wSXhlLSaW1bAi2ppRH6EmpkgNmyTIAx28796u_qezjqjYeD2aA_er2GY1qOklt1A9V8tbG3iOK5uisW7wuoyp14
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2019+18th+International+Symposium+on+Parallel+and+Distributed+Computing+%28ISPDC%29&rft.atitle=Parallel+Classification+of+Spatial+Points+Into+Geographical+Regions&rft.au=Tarmur%2C+Sanver&rft.au=Ozturan%2C+Can&rft.date=2019-06-01&rft.pub=IEEE&rft.spage=9&rft.epage=15&rft_id=info:doi/10.1109%2FISPDC.2019.000-3&rft.externalDocID=8790951