A Machine Learning based Approach to Identify User Interests from Social Data

Social media platforms like Twitter, Facebook, Instagram, etc., are considered a common source of extracting information about individuals, such as their needs, interests, and opinions. Our major contribution in this paper is to identify user interests and desires related to the fashion industry in...

Full description

Saved in:
Bibliographic Details
Published inPattern Recognition and Image Analysis (IPRIA), International Conference on pp. 1 - 6
Main Authors Tahir, Rida, Naeem, M. Asif
Format Conference Proceeding
LanguageEnglish
Published IEEE 21.10.2022
Subjects
Online AccessGet full text
ISSN2049-3630
DOI10.1109/INMIC56986.2022.9972956

Cover

Abstract Social media platforms like Twitter, Facebook, Instagram, etc., are considered a common source of extracting information about individuals, such as their needs, interests, and opinions. Our major contribution in this paper is to identify user interests and desires related to the fashion industry in Pakistan. Since people in Pakistan mostly write tweets and reviews in Roman Urdu, the dataset we focused on in this research was comprised of Roman Urdu Tweets and Google Map reviews. From the literature, we observed that not much effort has been done on Roman Urdu tweets and reviews because of its being a low resource language. In terms of methodology, we applied LDA, LSA, and BERT for topic modeling; Vadar combined with TextBlob and DistilBert for sentiment analysis; and K-Means for identifying user clusters with similar interests. In our experiments, we used 15000 tweets and 6000 Google reviews. We were able to create five distinct clusters for each brand. These clusters were further used to track the users based on their interests. We evaluated the performance of our approach and validated it empirically based on Cohen's Kappa score, and achieved a score of 0.45 that shows moderate agreement between human and machine.
AbstractList Social media platforms like Twitter, Facebook, Instagram, etc., are considered a common source of extracting information about individuals, such as their needs, interests, and opinions. Our major contribution in this paper is to identify user interests and desires related to the fashion industry in Pakistan. Since people in Pakistan mostly write tweets and reviews in Roman Urdu, the dataset we focused on in this research was comprised of Roman Urdu Tweets and Google Map reviews. From the literature, we observed that not much effort has been done on Roman Urdu tweets and reviews because of its being a low resource language. In terms of methodology, we applied LDA, LSA, and BERT for topic modeling; Vadar combined with TextBlob and DistilBert for sentiment analysis; and K-Means for identifying user clusters with similar interests. In our experiments, we used 15000 tweets and 6000 Google reviews. We were able to create five distinct clusters for each brand. These clusters were further used to track the users based on their interests. We evaluated the performance of our approach and validated it empirically based on Cohen's Kappa score, and achieved a score of 0.45 that shows moderate agreement between human and machine.
Author Tahir, Rida
Naeem, M. Asif
Author_xml – sequence: 1
  givenname: Rida
  surname: Tahir
  fullname: Tahir, Rida
  email: ridatahir38@gmail.com
  organization: School of Computing, National University of Computer and Emerging Sciences,Islamabad,Pakistan
– sequence: 2
  givenname: M. Asif
  surname: Naeem
  fullname: Naeem, M. Asif
  email: asif.naeem@nu.edu.pk
  organization: School of Computing, National University of Computer and Emerging Sciences,Islamabad,Pakistan
BookMark eNotj81OwzAQhA0CiVLyBBzwC6Ss7cTxHqvyF6mFA-VcbZINGLVOZefStycSPY30zaeR5lZchSGwEA8KFkoBPtbvm3pVWnR2oUHrBWKlsbQXIsMKnSnBYDV5l2KmocDcWAM3IkvpFwDMhCZ5JjZLuaH2xweWa6YYfPiWDSXu5PJ4jMNUyXGQdcdh9P1JfiWOsg4jR05jkn0cDvJzaD3t5RONdCeue9onzs45F9uX5-3qLV9_vNar5Tr3hcNcNQDIShtAVM71xjZAqmU0SFWhGk2OXQ9NVzhrVWmBOqPJmk5XRQUNm7m4_5_1zLw7Rn-geNqd_5s_SipP0g
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/INMIC56986.2022.9972956
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350397109
EISSN 2049-3630
EndPage 6
ExternalDocumentID 9972956
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i489-1b009e123099188f36b0a1ce939a741b2a8e8f0bd48661560ad32a63d27470be3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:26:04 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i489-1b009e123099188f36b0a1ce939a741b2a8e8f0bd48661560ad32a63d27470be3
PageCount 6
ParticipantIDs ieee_primary_9972956
PublicationCentury 2000
PublicationDate 2022-Oct.-21
PublicationDateYYYYMMDD 2022-10-21
PublicationDate_xml – month: 10
  year: 2022
  text: 2022-Oct.-21
  day: 21
PublicationDecade 2020
PublicationTitle Pattern Recognition and Image Analysis (IPRIA), International Conference on
PublicationTitleAbbrev INMIC
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003204295
Score 1.8128402
Snippet Social media platforms like Twitter, Facebook, Instagram, etc., are considered a common source of extracting information about individuals, such as their...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms BERT
Blogs
Coherence
Industries
K-Means
LDA
Machine learning
Marketing campaigns
Multimedia Web sites
NLP
Sentiment analysis
Social networking (online)
Targeted Advertising
Training
Title A Machine Learning based Approach to Identify User Interests from Social Data
URI https://ieeexplore.ieee.org/document/9972956
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED21nZgAtYhveWAkqRMnxh4roCpIqRhaqVvlryCE1CJIB_j13CWhCMTAFlmK4vgcv5fze2eAC10Kl5daR6nPyJKTm8g4YyMpMdoiR07qyJxcTOVknt0v8kUHLrdemBBCLT4LMV3We_l-7TaUKhuSyRP5fBe6OM0ar9Y2nyJSWlrzVsKVcD28m-KL5FIrUiKkadze_eMYlRpFxrtQfD2_EY88x5vKxu7jV2nG_3ZwDwbffj32sEWifeiEVR-KEStqpWRgbRHVR0aY5dmorSPOqjVrjLrlO5vjXGR1fhBx4o2R7YQ13l12YyozgNn4dnY9idrDE6KnTOkowc9JB4QlZICJUqWQlpvEBS20QRJhU6OCKrn1mUKERtpjvEiNFJ7-UrkN4gB6q_UqHAJDYkslbK6M1z6TmdWeG15K4aRE7ib4EfRpJJYvTXmMZTsIx383n8AORYOW_zQ5hV71uglniOuVPa8D-gnovaBn
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwFHwqZYAJUIv4xgMjSZPYMfZYAVULTcXQSt0qfwUhpAZBOsCv5zkJRSAGtihSJMfPyV1e7s4AFzKnJs2lDBLLvCUnVYEySgecY7VpipzUeHNyNuHDGbubp_MWXK69MM65SnzmQn9Y_cu3hVn5VlnPmzyRz2_AJuI-S2u31rqjQhP_ck0bEVccyd5ogreScim8FiFJwub6HxupVDgy2IHsawS1fOQ5XJU6NB-_whn_O8Rd6H479sjDGov2oOWWHcj6JKu0ko40MaqPxKOWJf0mSZyUBamtuvk7meFqJFWHEJHijXjjCandu-RGlaoL08Ht9HoYNNsnBE9MyCDGB0o6BCbkgLEQOeU6UrFxkkqFNEInSjiRR9oygRiNxEdZmihOrf9OjbSj-9BeFkt3AASprQ-xuVJWWsaZljZSUc6p4RzZG40OoeNnYvFSB2Qsmkk4-vv0OWwNp9l4MR5N7o9h21fGg0ESn0C7fF25U0T5Up9Vxf0Eg5KjtA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Pattern+Recognition+and+Image+Analysis+%28IPRIA%29%2C+International+Conference+on&rft.atitle=A+Machine+Learning+based+Approach+to+Identify+User+Interests+from+Social+Data&rft.au=Tahir%2C+Rida&rft.au=Naeem%2C+M.+Asif&rft.date=2022-10-21&rft.pub=IEEE&rft.eissn=2049-3630&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FINMIC56986.2022.9972956&rft.externalDocID=9972956