Grey maximum distance to average vector based on quasi identifier attribute

We are in big data era, and have surrounded by many kinds of data, the interactions between data become more and more frequently, and this also brought about personal privacy issues. In reality, attackers can infer the user identity of the sensitive information by aggregating data from other sources...

Full description

Saved in:

Bibliographic Details
Published in	2017 International Conference on Grey Systems and Intelligent Services (GSIS) p. 119
Main Authors	Zhang, Qishan, Liu, Hong, Wu, Yingyi, Lin, Chuyue, Lin, Gongcheng
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2017
Subjects	Data privacy Euclidean distance Loss measurement Privacy Safety Sensitivity Weight measurement
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We are in big data era, and have surrounded by many kinds of data, the interactions between data become more and more frequently, and this also brought about personal privacy issues. In reality, attackers can infer the user identity of the sensitive information by aggregating data from other sources. The leakage of privacy will bring inconvenience to the personal life, and even lead to loss of property and personal safety. Therefore, how to ensure that private information in the process of interacting and sharing can be protected effectively becomes a hot research issue. k-anonymity is an effective method of privacy preserving, k-anonymous model can effectively avoid the personal identity from being directly identified and thus makes it difficult to determine the owner of sensitive information. There is no constraint on the sensitivity distribution of equivalence classes in k-anonymity, which makes the algorithm to be attacked by homogeneous attacks and background knowledge attacks, and leads to some sensitive property values leaked. MDAV (Maximum Distance to Average Vector) is one of the algorithms for k-anonymous models. MDAV uses Euclidean distance to measure the homogeneity between different records, which treats all attributes equally and covers the different importance of each attribute. However the importance of each attribute is different in actual situation and it needs to be treated differently. The importance of attribute in MDAV would affect the risk of privacy disclosure, the existing literatures mainly focus on: (I) subjective measurement of attribute importance, (II) Euclidean distance for the homogeneity measure, and it treats each attribute equally. Subjective approach could not be realized easily, and the importance of each attribute in the actual situation is often different, which will have an impact on the effect of MDAV. In this paper, from the perspective of qusi identifier attribute, grey relation analysis is introduced into improve the measure method, a novel GMDAV (Grey Maximum Distance to Average Vector) is proposed for k-anonymous. For the approach distance between tuples, considering the importance of the quasi identity attribute and the similarity of the important attributes, a comprehensive measure method with the weighted Euclidean distance based on grey relation analysis is proposed to determine the importance of attributes. As for the information loss evaluation of GMDAV, it needs to be evaluated according to the importance of attribute. MDAV often uses IL evaluation model to treat the loss of all attributes equally, but it can not test the validity of GMDAV. Based on IL evaluation model, considering the importance of the attributes, an attribute information loss based on grey weight model (AIL) is put forward. Finally, The experiments were conducted by using Tarragona, Census and EIA three sets of classical datasets, and AIL and DLD (Distance Linked Disclosure) have been adopted for algorithm evaluation. In three datasets, for AIL evaluation, information losses of GMDAV are all better than MDAV, with the increase of data amount in the datasets (Tarragona<ltCensus<ltEIA), the magnitude of information loss is increased. On the other hand, for DLD evalution, with the increase of k, the risk of privacy disclosure of the two algorithms are gradually reduced, and GMDAV can achieve a lower risk of privacy disclosure than MDAV in privacy leaks, which is due to GMDAV focuses on reducing the information loss of the important attributes, while information distortion of other less important attributes may be bigger. The experimental results show that GMDAV could effectively reduce the loss of information and reduce the overall information disclosure risk of the important attribute values.
ISBN:	1509066675 9781509066674
ISSN:	2166-9449
DOI:	10.1109/GSIS.2017.8077683