Login | Register

Privacy-preserving data publishing for cluster analysis


Privacy-preserving data publishing for cluster analysis

Fung, Benjamin C.M., Wang, Ke, Wang, Lingyu and Hung, Patrick C.K. (2009) Privacy-preserving data publishing for cluster analysis. Data & Knowledge Engineering, 68 (6). pp. 552-575. ISSN 0169023X

[thumbnail of 2009_Privacy-Preserving_Data_Publishing_for_Cluster_Analysis.pdf]
Text (application/pdf)

Official URL: http://dx.doi.org/10.1016/j.datak.2008.12.001


Releasing person-specific data could potentially reveal sensitive information about individuals. k-anonymization is a promising privacy protection mechanism in data publishing. Although substantial research has been conducted on k-anonymization and its extensions in recent years, only a few prior works have considered releasing data for some specific purpose of data analysis. This paper presents a practical data publishing framework for generating a masked version of data that preserves both individual privacy and information usefulness for cluster analysis. Experiments on real-life data suggest that by focusing on preserving cluster structure in the masking process, the cluster quality is significantly better than the cluster quality of the masked data without such focus. The major challenge of masking data for cluster analysis is the lack of class labels that could be used to guide the masking process. Our approach converts the problem into the counterpart problem for classification analysis, wherein class labels encode the cluster structure in the data, and presents a framework to evaluate the cluster quality on the masked data.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Concordia Institute for Information Systems Engineering
Item Type:Article
Authors:Fung, Benjamin C.M. and Wang, Ke and Wang, Lingyu and Hung, Patrick C.K.
Journal or Publication:Data & Knowledge Engineering
Digital Object Identifier (DOI):10.1016/j.datak.2008.12.001
Keywords:privacy, knowledge discovery, anonymity, cluster analysis
ID Code:36251
Deposited On:22 Dec 2011 18:55
Last Modified:18 Jan 2018 17:36
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top