Multidimensional Private Information Portrait in Social Network Users

In order to tackle the challenges of users' weak privacy awareness and frequent disclosure of private information in social network, this paper proposes a multidimensional privacy information portrait model of users in Chinese social networks. Because the TF-IDF (Term Frequency-Inverse Document...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of advanced computer science & applications Vol. 14; no. 12
Main Authors Shan, Fangfang, Wang, Mengyi, Sun, Huifang
Format Journal Article
LanguageEnglish
Published West Yorkshire Science and Information (SAI) Organization Limited 2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In order to tackle the challenges of users' weak privacy awareness and frequent disclosure of private information in social network, this paper proposes a multidimensional privacy information portrait model of users in Chinese social networks. Because the TF-IDF (Term Frequency-Inverse Document Frequency, TF-IDF) algorithm does not consider the distribution of feature terms among and within classes, uses the TF-IDF algorithm based on the bag-of-words model to calculate the sensitivity of user privacy information. Considering the diversity of user privacy information, this paper proposes the PROLM (Positive reverse order lookaround matching ) algorithm, which is combined with the Flashtext+ (improved Flashtext) algorithm and SMA (string matching algorithm, SMA), the PROLM_FlashText+_SMA to extract user personal privacy information and location where the privacy information is located, and return the sensitivity. Using the BERT (Bidirectional Encoder Representation from Transformers, BERT)-Softmax privacy information classification model, the privacy information is classified into high, moderate and mild privacy information, and a multidimensional privacy information portrait of the user is constructed based on the privacy information and sensitivity. The experiments show that the accuracy of PROLM_FlashText+_SMA algorithm for privacy information extraction reaches 93.63%, and the overall F1 index of privacy information classification using the BERT-Softmax model reaches 0.9798 on the test set, better than baseline comparison model, has better privacy information classification effect.
ISSN:2158-107X
2156-5570
DOI:10.14569/IJACSA.2023.0141228