中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
差分隐私下的一种频繁序列模式挖掘方法

文献类型:期刊论文

作者卢国庆 ; 张啸剑 ; 丁丽萍 ; 李彦峰 ; 廖鑫
刊名计算机研究与发展
出版日期2015
卷号52期号:12页码:2789-2801
关键词频繁序列模式 数据挖掘 差分隐私 隐私保护 前缀树
ISSN号1000-1239
其他题名Frequent Sequential Pattern Mining under Differential Privacy
中文摘要频繁序列模式挖掘是数据挖掘领域的1个基本问题,然而模式本身及其支持度计数都有可能泄露用户隐私信息.差分隐私(differential privacy, DP)作为一种新出现的隐私保护技术,定义了一个相当严格的攻击模型,通过添加噪音使数据失真达到隐私保护的目的.由于序列数据内在序列性和高维度的特点 ,给差分隐私应用于频繁序列模式挖掘带来了挑战.对此提出了一种基于交互式差分隐私保护框架的频繁序列模式挖掘算法Diff-FSPM(differen tial-privacy frequent sequential pattern mining).该算法利用指数机制获取最优序列长度,并采用一种维规约策略获得原始序列数据集的规约表示,有效降低序列维度的影响;应用前缀树压缩频繁 序列模式,利用拉普拉斯机制产生的噪音扰动频繁模式的真实支持度计数,同时采用闭频繁序列模式和Markov假设,有效分配隐私预算,并利用一致性约束后 置处理,增强输出模式的可用性.理论角度证明算法满足epsilon-差分隐私,实验结果验证算法具有较好的可用性.
英文摘要Frequent sequential pattern mining is an exploratory problem in the field of data mining. However, directly releasing the discovered frequent patterns and the corresponding true supports may reveal the individuals' privacy. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection by adding noise. Due to the inherent sequentiality and high-dimensionality in sequences, it is challenging to apply differential privacy to frequent sequential pattern mining. To address those problems, this paper presents a differentially private method called Diff-FSPM that uses an interactive way to find frequent patterns. To reduce the impact of dimensionality, Diff-FSPM first employs the exponential mechanism to obtain the optimal sequential length for truncating each sequence in the original dataset. After that, Diff-FSPM relies on a prefix-tree to compress all the frequent patterns, and then utilizes the Laplace mechanism to perturb the true support of the frequent patterns. To efficiently allocate the privacy budget, Diff-FSPM uses the closet frequent pattern and Markov assumption to guide the mining process. Finally, Diff-FSPM adopts a post-processing technique with consistency constraint to boost the accuracy of the returned noisy support counts. We theoretically prove that Diff-FSPM satisfies epsilon-differential privacy, and the experimental results show that it outperforms the existing methods in terms of utility.
收录类别CSCD
语种中文
CSCD记录号CSCD:5591487
公开日期2016-12-09
源URL[http://ir.iscas.ac.cn/handle/311060/17390]  
专题软件研究所_软件所图书馆_期刊论文
推荐引用方式
GB/T 7714
卢国庆,张啸剑,丁丽萍,等. 差分隐私下的一种频繁序列模式挖掘方法[J]. 计算机研究与发展,2015,52(12):2789-2801.
APA 卢国庆,张啸剑,丁丽萍,李彦峰,&廖鑫.(2015).差分隐私下的一种频繁序列模式挖掘方法.计算机研究与发展,52(12),2789-2801.
MLA 卢国庆,et al."差分隐私下的一种频繁序列模式挖掘方法".计算机研究与发展 52.12(2015):2789-2801.

入库方式: OAI收割

来源:软件研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。