Crowdsourced data leaking user's privacy while using anonymization technique
Due to the tremendous value embedded in big educational data, numerous research institutes have collected large volumes of student behavioral data. To fully utilize the underlying values, the collected data may be shared with third parties, such as worldwide intelligent data experts. However, this m...
Saved in:
Published in | Mehran University research journal of engineering and technology Vol. 44; no. 2; pp. 93 - 116 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Mehran University of Engineering and Technology
01.04.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 0254-7821 2413-7219 |
DOI | 10.22581/muet1982.2954 |
Cover
Loading…
Summary: | Due to the tremendous value embedded in big educational data, numerous research institutes have collected large volumes of student behavioral data. To fully utilize the underlying values, the collected data may be shared with third parties, such as worldwide intelligent data experts. However, this may pose privacy risks to data owners, even though the data collectors usually anonymize the data before crowdsourcing. To demonstrate that anonymization alone is insufficient to protect user privacy, we conducted an experimental study using offline and online behavioral traces collected through campus cards and smartphones. Our study demonstrates that a student’s identity can be identified with high probability based on anonymized behavior payment traces. The analysis of results demonstrates that only ten features, i.e., Transmission Control Protocol (TCP), synchronization attempts, content length, downlink traffic, last acknowledgement packet delay, uplink traffic, cell ID, base station ID, day, hour (offline payment, time) day, hour, minute (online payment time), and point of sale ID (POS_ID) are sufficient to uniquely identify an individual. Five supervised standard learning algorithm classifiers have been utilized to predict the user identity i.e., Extra Tree, Bagging, Decision Tree, Nearest Neighbor (KNN), and Random Forest Tree classifiers. The evaluation results showed that the achieved accuracy reached 99.99%, 99.95%, 99.02%, 98.84%, and 99.56%, respectively. |
---|---|
ISSN: | 0254-7821 2413-7219 |
DOI: | 10.22581/muet1982.2954 |