Slides - Personal Web Pages

Download Report

Transcript Slides - Personal Web Pages

On the Lower Bound of Reconstruction Error for Spectral Filtering Based PPDM
Songtao Guo, Xintao Wu, Yingjiu Li
Motivation
Additive noise perturbation model
~
U  U V
Perturbed data
Original data
Attacker’s question:
How close the estimated data using SF is to the original one?
(Upper Bound?)
Noise
Data owner’s question:
How
much
should
be added
to preserve
privacy
at a givendata
tolerated
level?
(Lower
Additive Randomization has been a primary tool to hide sensitive private information during privacy preserving data mining. The previous work
based
onnoise
Spectral
Filtering
empirically
showed
that individual
can be
separated
fromBound?)
the perturbed one and as a res
However, the explicit relation between the effects of perturbation and the accuracy of the reconstructed data still remains as a challenging problem.
 1 || Uˆ  U ||  2
Spectral Filtering
SVD based reconstruction algorithm
~
Estimate individual values of U from the perturbed data U
1.
2.
~
Input: U , a given perturbed data set
V , a noise data set
Output: Û , a reconstructed data
--- H.Kargupta et al. ICDM 2003
~ ~ ~T
~
Apply EVD on the covariance matrix of U : A  QQ
Using random matrix theory, the pair of V min and V max , which provide the theoretical bounds of the eigenvalues
corresponding to the matrix VTV, are obtained.
~
k  max{ i | i  V max }
3. Extract
the
first
k
components
of
A
as
the
principal
components
by
~
~
~
~,e
~ , e
~
e






1
2
k are the corresponding eigenvectors.
2
k are the first k largest eigenvalues of A and
• 1
~
~
~
~
~

• Qk  [e1e2 ek ] forms an orthonormal basis of a subspace .
4.
5.
Find the orthogonal projection on to
~ ~T
Get estimate data set: P~  Qk Qk
~
~
ˆ
: U  UP~
BEGIN
~
~ ~ ~ ~T
U
1 Apply SVD on
to get U  L DR
2 Apply SVD on V and assume  V
max
is the largest singular value
~
3 Determine the first k components of U by k  min{ i | ~i  2 V } 1
k
~~ T
~
~
ˆ
4 Reconstructing the data as U  U k    i li ri (k   )
i 1
END
Lower bound
New strategy to determine k
~ ~ ~ ~T
ˆ
The lower bound of SVD reconstruction U  U k  Lk Dk Rk is
Strategy 1(old):
~
k  max{ i | i  V }
|| U k  U || F || Uˆ  U || F
Strategy 2(new):
~
k  min{ i | i  2V }  1
~  2 }  1
k

min{
i
|

where
i
V
The lower bound of SVD is the lower bound of SF since SVD reconstruction is proved to be equivalent to PCA.
f (k )  Uˆ  U
The lower bound represents the best estimate the attacker can achieve by the spectral filtering technique.
f (k  1)  f (k )  Ve~k 1e~kT1  Ue~k 1e~kT1
Compare with the upper bound (Guo and Wu, SAC06)
|| Ve~ e~
T
k 1 k 1
||  V
2
F
Noise affection
~
Due to     V
~
ˆ
|| U  U || F || U || F
|| Ue~k 1e~kT1 ||2F  k 1
Information gain
, the strategy 2 is approximate optimal.
2 || E || F
 || VP || F
~
(k  || E ||2 )  2 || E || F
T
T
T
E

V
U

U
V

V
V is the derived perturbation on the original covariance matrix A = UTU.
• where
• The upper bound determines how close the estimated data achieved by attackers is from the original one. It imposes
a serious threat of privacy breaches
ECML/PKDD
September 18th-22nd, 2006 Berlin, Germany