Protecting Privacy When Sharing and Releasing Data with Multiple Records per Person

This study concerns the risks of privacy disclosure when sharing and releasing a dataset in which each individual may be associated with multiple records. Existing data privacy approaches and policies typically assume that each individual in a shared dataset corresponds to a single record, leading t...

Full description

Saved in:

Bibliographic Details
Published in	Journal of the Association for Information Systems Vol. 21; no. 6; pp. 1461 - 1485
Main Authors	Kartal, Hasan, Li, Xiao-Bai
Format	Journal Article
Language	English
Published	Atlanta Association for Information Systems 01.01.2020
Subjects	Datasets Frequency distribution Information systems Privacy Releasing Risk assessment
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This study concerns the risks of privacy disclosure when sharing and releasing a dataset in which each individual may be associated with multiple records. Existing data privacy approaches and policies typically assume that each individual in a shared dataset corresponds to a single record, leading to an underestimation of the disclosure risks in multiple records per person scenarios. We propose two novel measures of privacy disclosure to arrive at a more appropriate assessment of disclosure risks. The first measure assesses individual-record disclosure risk based upon the frequency distribution of individuals’ occurrences. The second measure assesses sensitive-attribute disclosure risk based upon the number of individuals affiliated with a sensitive value. We show that the two proposed disclosure measures generalize the well-known k-anonymity and l-diversity measures, respectively, and work for scenarios with either a single record or multiple records per person. We have developed an efficient computational procedure that integrates the two proposed measures and a data quality measure to anonymize the data with multiple records per person when sharing and releasing the data for research and analytics. The results of the experimental evaluation using real-world data demonstrate the advantage of the proposed approach over existing techniques for protecting privacy while preserving data quality.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1536-9323 1536-9323
DOI:	10.17705/1jais.00643