Parallel Classification of Spatial Points Into Geographical Regions
The amount of data generated by social media, social networks and distributed platforms such as blockchain, have reached quite high levels. Various data analysis methods could be applied this big data. One of these methods is to classify geo-tagged social network data in order to report geographical...
Saved in:
Published in | 2019 18th International Symposium on Parallel and Distributed Computing (ISPDC) pp. 9 - 15 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2019
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ISPDC.2019.000-3 |
Cover
Abstract | The amount of data generated by social media, social networks and distributed platforms such as blockchain, have reached quite high levels. Various data analysis methods could be applied this big data. One of these methods is to classify geo-tagged social network data in order to report geographical area associated with the data. We propose an efficient parallel classification approach and implement a classifier tool which is capable of processing huge amount of data. To test our approach, we collect Twitter data over five densest areas of Turkey. There are important factors affecting the classification performance such as the spatial indexing and the parallelization strategies. Hierarchical Triangular Mesh (HTM) and R-Tree spatial indexes are used for indexing regions. For parallel processing data streams classifier tool is implemented based on Apache Spark and Kafka platforms in order to obtain high scalability. To show effectiveness of our method, we perform tests on Amazon Web Services (AWS) Cloud environment and compare our method against a method which implements HTM on a Microsoft SQL Server. Results show that 1.6 - 4.5 fold speed-up is obtained and Twitter data that is collected over a month can be processed effectively in three hours. |
---|---|
AbstractList | The amount of data generated by social media, social networks and distributed platforms such as blockchain, have reached quite high levels. Various data analysis methods could be applied this big data. One of these methods is to classify geo-tagged social network data in order to report geographical area associated with the data. We propose an efficient parallel classification approach and implement a classifier tool which is capable of processing huge amount of data. To test our approach, we collect Twitter data over five densest areas of Turkey. There are important factors affecting the classification performance such as the spatial indexing and the parallelization strategies. Hierarchical Triangular Mesh (HTM) and R-Tree spatial indexes are used for indexing regions. For parallel processing data streams classifier tool is implemented based on Apache Spark and Kafka platforms in order to obtain high scalability. To show effectiveness of our method, we perform tests on Amazon Web Services (AWS) Cloud environment and compare our method against a method which implements HTM on a Microsoft SQL Server. Results show that 1.6 - 4.5 fold speed-up is obtained and Twitter data that is collected over a month can be processed effectively in three hours. |
Author | Tarmur, Sanver Ozturan, Can |
Author_xml | – sequence: 1 givenname: Sanver surname: Tarmur fullname: Tarmur, Sanver organization: Bogazici University – sequence: 2 givenname: Can surname: Ozturan fullname: Ozturan, Can organization: Bogazici University |
BookMark | eNotzLFOwzAUhWEj0QFKdyQWv0DCvXYc2yMKUCJVIqJlrm4Su1gycZRk4e2JBNP5h0_nll0PaXCM3SPkiGAf62PzXOUC0OYAkMkrtrPaoBYGpQFUN6xqaKIYXeRVpHkOPnS0hDTw5PlxXJMib1IYlpnXw5L43qXLROPXyiL_cJeVznds4ynObve_W_b5-nKq3rLD-76ung5ZQK2WTMhWC08FlE5b4bzqjcCCDIiiEwhdq4zqe6nLojVdB1Ir5YqybD2tWCstt-zh7zc4587jFL5p-jkbbcEqlL-PnkcZ |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ISPDC.2019.000-3 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9781728138015 1728138019 |
EndPage | 15 |
ExternalDocumentID | 8790951 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i175t-23b72fa406e792ef5d8214a8024c210cb585dd3764b8cc03755e466bfa2ef7573 |
IEDL.DBID | RIE |
IngestDate | Thu Jun 29 18:39:04 EDT 2023 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i175t-23b72fa406e792ef5d8214a8024c210cb585dd3764b8cc03755e466bfa2ef7573 |
PageCount | 7 |
ParticipantIDs | ieee_primary_8790951 |
PublicationCentury | 2000 |
PublicationDate | 2019-Jun |
PublicationDateYYYYMMDD | 2019-06-01 |
PublicationDate_xml | – month: 06 year: 2019 text: 2019-Jun |
PublicationDecade | 2010 |
PublicationTitle | 2019 18th International Symposium on Parallel and Distributed Computing (ISPDC) |
PublicationTitleAbbrev | ISPDC |
PublicationYear | 2019 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.6980249 |
Snippet | The amount of data generated by social media, social networks and distributed platforms such as blockchain, have reached quite high levels. Various data... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 9 |
SubjectTerms | apache spark hierarchical triangular mesh HTM Indexing parallel processing point classification Servers Sparks Spatial databases Spatial indexes spatial indexing streaming data classification |
Title | Parallel Classification of Spatial Points Into Geographical Regions |
URI | https://ieeexplore.ieee.org/document/8790951 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB7anjyptOKbHDy6fexms7vnammFyqIWeit5zEKx7Eq7vfjrndltK4oHbyEEkkxCvknyzXwAd4FPXgM69HTiEk862fcSo7SnfW3tII7sIODg5OmzGs_k0zycN-D-EAuDiBX5DLtcrP7yXWG3_FTWi6OEPYImNGmb1bFa-5_HftKbvKYPQyZrcQZKOlx-6KVUcDE6hum-o5ol8t7dlqZrP3_lYPzvSE6g8x2YJ9ID5JxCA_M2DFO9ZkWUlagULpn7U5lbFJlgxWHaYSItlnm5EZO8LMRO95w5GCvxgkxI3nRgNnp8G469nTaCtyTALz0_MJGfaYJjjBIfs9DF_kDqmCDX0i3OGroGOEenhzSxtSx0G6JUymSaGkdhFJxBKy9yPAdhQnIBZUCOXaCkdpyQzlrsK7Kk1U7ZC2izARYfdfqLxW7ul39XX8ERL0HNprqGVrne4g3hdmluqwX7AmSymis |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4gHvSkBoxve_Do8tjtdnfPKAEFslFIuJG-NiGSXQLLxV_vzC5gNB68NU2TttOm37T9Zj6AB89Fr8Ea68jIRA43vOVESkhHulLrdhjotkfBycOR6E34y9SfVuBxHwtjrS3IZ7ZBxeIv32R6Q09lzTCIyCM4gEPEfe6X0Vq7v8dW1Oy_x08domtRDko8Xn4ophSA0T2B4a6rkify0djkqqE_f2Vh_O9YTqH-HZrH4j3onEHFpjXoxHJFmigLVmhcEvunMDjLEkaaw7jHWJzN03zN-mmesa3yObEwFuzNEiV5XYdJ93nc6TlbdQRnjpCfO66nAjeRCMg2iFyb-CZ021yGCLoa73Fa4UXAGDw_uAq1Jqlb33IhVCKxceAH3jlU0yy1F8CUj04g99C18wSXhlLSaW1bAi2ppRH6EmpkgNmyTIAx28796u_qezjqjYeD2aA_er2GY1qOklt1A9V8tbG3iOK5uisW7wuoyp14 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2019+18th+International+Symposium+on+Parallel+and+Distributed+Computing+%28ISPDC%29&rft.atitle=Parallel+Classification+of+Spatial+Points+Into+Geographical+Regions&rft.au=Tarmur%2C+Sanver&rft.au=Ozturan%2C+Can&rft.date=2019-06-01&rft.pub=IEEE&rft.spage=9&rft.epage=15&rft_id=info:doi/10.1109%2FISPDC.2019.000-3&rft.externalDocID=8790951 |