A Machine Learning based Approach to Identify User Interests from Social Data
Social media platforms like Twitter, Facebook, Instagram, etc., are considered a common source of extracting information about individuals, such as their needs, interests, and opinions. Our major contribution in this paper is to identify user interests and desires related to the fashion industry in...
Saved in:
Published in | Pattern Recognition and Image Analysis (IPRIA), International Conference on pp. 1 - 6 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
21.10.2022
|
Subjects | |
Online Access | Get full text |
ISSN | 2049-3630 |
DOI | 10.1109/INMIC56986.2022.9972956 |
Cover
Abstract | Social media platforms like Twitter, Facebook, Instagram, etc., are considered a common source of extracting information about individuals, such as their needs, interests, and opinions. Our major contribution in this paper is to identify user interests and desires related to the fashion industry in Pakistan. Since people in Pakistan mostly write tweets and reviews in Roman Urdu, the dataset we focused on in this research was comprised of Roman Urdu Tweets and Google Map reviews. From the literature, we observed that not much effort has been done on Roman Urdu tweets and reviews because of its being a low resource language. In terms of methodology, we applied LDA, LSA, and BERT for topic modeling; Vadar combined with TextBlob and DistilBert for sentiment analysis; and K-Means for identifying user clusters with similar interests. In our experiments, we used 15000 tweets and 6000 Google reviews. We were able to create five distinct clusters for each brand. These clusters were further used to track the users based on their interests. We evaluated the performance of our approach and validated it empirically based on Cohen's Kappa score, and achieved a score of 0.45 that shows moderate agreement between human and machine. |
---|---|
AbstractList | Social media platforms like Twitter, Facebook, Instagram, etc., are considered a common source of extracting information about individuals, such as their needs, interests, and opinions. Our major contribution in this paper is to identify user interests and desires related to the fashion industry in Pakistan. Since people in Pakistan mostly write tweets and reviews in Roman Urdu, the dataset we focused on in this research was comprised of Roman Urdu Tweets and Google Map reviews. From the literature, we observed that not much effort has been done on Roman Urdu tweets and reviews because of its being a low resource language. In terms of methodology, we applied LDA, LSA, and BERT for topic modeling; Vadar combined with TextBlob and DistilBert for sentiment analysis; and K-Means for identifying user clusters with similar interests. In our experiments, we used 15000 tweets and 6000 Google reviews. We were able to create five distinct clusters for each brand. These clusters were further used to track the users based on their interests. We evaluated the performance of our approach and validated it empirically based on Cohen's Kappa score, and achieved a score of 0.45 that shows moderate agreement between human and machine. |
Author | Tahir, Rida Naeem, M. Asif |
Author_xml | – sequence: 1 givenname: Rida surname: Tahir fullname: Tahir, Rida email: ridatahir38@gmail.com organization: School of Computing, National University of Computer and Emerging Sciences,Islamabad,Pakistan – sequence: 2 givenname: M. Asif surname: Naeem fullname: Naeem, M. Asif email: asif.naeem@nu.edu.pk organization: School of Computing, National University of Computer and Emerging Sciences,Islamabad,Pakistan |
BookMark | eNotj81OwzAQhA0CiVLyBBzwC6Ss7cTxHqvyF6mFA-VcbZINGLVOZefStycSPY30zaeR5lZchSGwEA8KFkoBPtbvm3pVWnR2oUHrBWKlsbQXIsMKnSnBYDV5l2KmocDcWAM3IkvpFwDMhCZ5JjZLuaH2xweWa6YYfPiWDSXu5PJ4jMNUyXGQdcdh9P1JfiWOsg4jR05jkn0cDvJzaD3t5RONdCeue9onzs45F9uX5-3qLV9_vNar5Tr3hcNcNQDIShtAVM71xjZAqmU0SFWhGk2OXQ9NVzhrVWmBOqPJmk5XRQUNm7m4_5_1zLw7Rn-geNqd_5s_SipP0g |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/INMIC56986.2022.9972956 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798350397109 |
EISSN | 2049-3630 |
EndPage | 6 |
ExternalDocumentID | 9972956 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
ID | FETCH-LOGICAL-i489-1b009e123099188f36b0a1ce939a741b2a8e8f0bd48661560ad32a63d27470be3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:26:04 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i489-1b009e123099188f36b0a1ce939a741b2a8e8f0bd48661560ad32a63d27470be3 |
PageCount | 6 |
ParticipantIDs | ieee_primary_9972956 |
PublicationCentury | 2000 |
PublicationDate | 2022-Oct.-21 |
PublicationDateYYYYMMDD | 2022-10-21 |
PublicationDate_xml | – month: 10 year: 2022 text: 2022-Oct.-21 day: 21 |
PublicationDecade | 2020 |
PublicationTitle | Pattern Recognition and Image Analysis (IPRIA), International Conference on |
PublicationTitleAbbrev | INMIC |
PublicationYear | 2022 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0003204295 |
Score | 1.8128402 |
Snippet | Social media platforms like Twitter, Facebook, Instagram, etc., are considered a common source of extracting information about individuals, such as their... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | BERT Blogs Coherence Industries K-Means LDA Machine learning Marketing campaigns Multimedia Web sites NLP Sentiment analysis Social networking (online) Targeted Advertising Training |
Title | A Machine Learning based Approach to Identify User Interests from Social Data |
URI | https://ieeexplore.ieee.org/document/9972956 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED21nZgAtYhveWAkqRMnxh4roCpIqRhaqVvlryCE1CJIB_j13CWhCMTAFlmK4vgcv5fze2eAC10Kl5daR6nPyJKTm8g4YyMpMdoiR07qyJxcTOVknt0v8kUHLrdemBBCLT4LMV3We_l-7TaUKhuSyRP5fBe6OM0ar9Y2nyJSWlrzVsKVcD28m-KL5FIrUiKkadze_eMYlRpFxrtQfD2_EY88x5vKxu7jV2nG_3ZwDwbffj32sEWifeiEVR-KEStqpWRgbRHVR0aY5dmorSPOqjVrjLrlO5vjXGR1fhBx4o2R7YQ13l12YyozgNn4dnY9idrDE6KnTOkowc9JB4QlZICJUqWQlpvEBS20QRJhU6OCKrn1mUKERtpjvEiNFJ7-UrkN4gB6q_UqHAJDYkslbK6M1z6TmdWeG15K4aRE7ib4EfRpJJYvTXmMZTsIx383n8AORYOW_zQ5hV71uglniOuVPa8D-gnovaBn |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwFHwqZYAJUIv4xgMjSZPYMfZYAVULTcXQSt0qfwUhpAZBOsCv5zkJRSAGtihSJMfPyV1e7s4AFzKnJs2lDBLLvCUnVYEySgecY7VpipzUeHNyNuHDGbubp_MWXK69MM65SnzmQn9Y_cu3hVn5VlnPmzyRz2_AJuI-S2u31rqjQhP_ck0bEVccyd5ogreScim8FiFJwub6HxupVDgy2IHsawS1fOQ5XJU6NB-_whn_O8Rd6H479sjDGov2oOWWHcj6JKu0ko40MaqPxKOWJf0mSZyUBamtuvk7meFqJFWHEJHijXjjCandu-RGlaoL08Ht9HoYNNsnBE9MyCDGB0o6BCbkgLEQOeU6UrFxkkqFNEInSjiRR9oygRiNxEdZmihOrf9OjbSj-9BeFkt3AASprQ-xuVJWWsaZljZSUc6p4RzZG40OoeNnYvFSB2Qsmkk4-vv0OWwNp9l4MR5N7o9h21fGg0ESn0C7fF25U0T5Up9Vxf0Eg5KjtA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Pattern+Recognition+and+Image+Analysis+%28IPRIA%29%2C+International+Conference+on&rft.atitle=A+Machine+Learning+based+Approach+to+Identify+User+Interests+from+Social+Data&rft.au=Tahir%2C+Rida&rft.au=Naeem%2C+M.+Asif&rft.date=2022-10-21&rft.pub=IEEE&rft.eissn=2049-3630&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FINMIC56986.2022.9972956&rft.externalDocID=9972956 |