Transcript client
SpaceTwist: A Flexible Approach for
Hiding Query User Location
Speaker: Man Lung Yiu
Aalborg University
Joint work with
Christian S. Jensen, Xuegang Huang, Hua Lu
1
Outline
Motivation
Related work
Our solution and its privacy analysis
Experimental results
Conclusions
2
Why location privacy?
Queries in location-based services (LBS)
POI Points-of-interest (e.g., cinema locations)
Nearest neighbor (NN) query
server
Find the closest POI to user location q
Client-server architecture
client
Client (user) sends the point q to server (LBS server)
Server reports the result (i.e., p1) back to client
Server may not be trusted
p1
q
p2
p3 p4
p6
p5
Want to find my result
What should I
do?
Don’t want to leak
my location
3
Related work: spatial cloaking
Extend the point q into a cloaked region Q’
K-anonymous region, trusted anonymizer [Mokbel et al., 2006]
Other types: dummy, obfuscation
Other architectures: peer communication, client itself
Q'
u3
Computes a candidate result set that contains the result of
any possible query location in Q’
Example: candidate set: {p1, p2, p3, p4, p5, p6}
Returns the candidate result set
Disadvantages
Server incurs high processing and communication cost
Requires specialized query processing algorithms, not readily
implemented in existing LBS servers
u2
Anonymizer
Server receives Q’ (instead of q)
q
u1
p1
p2
p6
Q'
p5
p3 p4
LBS server
4
Related work:
transformation-based matching
5
6
0
4
Evaluates the query in a transformed space
No guarantee for the exact result
Theoretical study [Indyk et al., 2006]
7
3
0
2
q
13
7
p3
p1
5
6
12
8
14
12
p2
11
13
1
15
4
8
2
14
10
9
15
11
10
Hilbert transformation [Khoshgozaran et al., 2007]
3
1
A protocol with asymptotic communication cost N
9
Key (H) for specifying the Hilbert ordering, known by client
and a trusted entity but not server
Preprocessing: a trusted entity converts each point p
(e.g., restaurant) to the value H(p), uploads it to server
Query time: client sends H(q) to server, which reports the
closest Hilbert value to H(q), client decodes the reported
value into the result location
Double Hilbert curve, improve result accuracy
drawbacks
5
Features of our solution
Our solution (SpaceTwist)
Fundamental differences from previous approaches
No cloaked region (unlike spatial cloaking)
Query evaluated in the original space (unlike transformation-based matching)
Readily applicable on existing systems
retrieves POI’s from the server incrementally
until the client is guaranteed to have accurate results
Simple client-server architecture (i.e., no trusted components, peers)
Simple server-side query processing: incremental nearest neighbor search
Granular search (optional server-side functionality)
Reduces communication cost but guarantees accuracy bound of results
Spatial cloaking incurs high cost at the server
Transformation-based matching does not offer result accuracy guarantees
6
SpaceTwist: overview
supply space
Anchor location (fake client location)
Define an ordering of points in the space
Client fetches points from server incrementally
Supply space
Space of objects retrieved from the server
Target space guaranteed to cover the actual result
demand space
the beginning
Grows as more objects retrieved
Demand space
Supply space known by both server and client
anchor
Demand space known only by client
supply space
anchor
Shrinks when a “better” result is found
Termination: supply space contains the demand space
demand space
the end
7
Transmission of points
Communication cost
Points are sent from server to client through (TCP/IP) packets
Inefficient to use one packet for one point
Multiple points are packed into a packet before transmission
number of packets received by the client
Packet capacity : number of points in a packet
Actual value of ?
Depends on Maximum Transmission Unit (MTU)
Our experiments: MTU=576 bytes, and =67
8
SpaceTwist: example
p
1
q'
q
Input: user location q, anchor location q’
Client asks server to report points in ascending distance
from anchor q’ iteratively [Hjaltason et al. 1999]
Supply space radius , initially 0
2
q'
q
2nd point
Nearest neighbor distance to user (found so far)
Update to dist(q,p) when a point p closer to q is found
Supply space covers demand space
Guarantee that exact nearest neighbor of q found
1
Distance of the current reported point from anchor q’
Stop when dist(q,q’) + ≤
p
p
Demand space radius , initially
Note: server only knows q’ and reported points
1st point
p
1
p
p
2
q
q'
3rd point
3
9
Privacy analysis
1
m-1
2
m
……
What does the server (malicious attacker) know?
Anchor location q’
Reported points (in reported order): p1, p2, …, pm
Termination condition: dist(q,q’) + ≤
Possible query location qc
Client did not stop at the point p(m-1)
dist(qc, q’) + min{ dist(qc, pi) : i[1,(m-1)] } > dist(q’, p(m-1))
Client stops at the point pm
dist(qc, q’) + min{ dist(qc, pi) : i[1,m] } ≤ dist(q’, pm)
Inferred privacy region : the set of all possible qc
Quantification of privacy
Privacy value: (q, ) = average dist. of location in from q
10
Visualization of
User q
A ring with center at q’
Radius approx. dist(q,q’)
What if the server considers searching on
a small sample instead of the whole dataset
Seen points
Visualization with different types of points
Characteristics of (i.e., possible locations qc)
Anchor q'
=4
Low communication cost
becomes large at low data density
But less accurate result
How this can be done?
coarser granularity
11
Granular search requirement
Granular search: search POI’s at coarser granularity
Advantages
Reduce communication cost
Enhance location privacy protection
Accuracy requirement
User specifies an error bound
A point pP is a relaxed NN of q if
dist(q, p) + min { dist(q, p’) : p’P }
Actual NN distance
12
Granular search
Given an error bound , impose a grid in the
space with cell length = / 2
As in incremental search, the server still reports
points in ascending distance from anchor q’
Server discards a data point p if it falls in the
same cell of any reported point
With granular searching (anchor q’)
Server reports p1, client updates its NN to p1
Server discards p2, p3
Server reports p4, client updates its NN to p4
p
p
4
q
p
2
1
p
3
q'
regular grid
Client receives fewer points and has a larger
inferred privacy region
13
Granular search implementation
Materialization of results not feasible
Error bound only known at query time
Different users specify different values of
Data points are indexed by a (disk-based) R-tree on server
We extend the incremental NN search [Hjaltason et al. 1999]
Use a cell list V to keep track of the cells of reported points
Discard entries or points that are fully covered by cells in V
Remove cells in V when they are not useful anymore
p2
c2
c1
p3
p
q' e1 e 5
2
p
1
c3
p4
p6
c5
c4
c6
e3
p7
14
Parameter selection guide
Appropriate parameter values for the user (client)?
Error bound
Set = vmax tmax based on
tmax: maximum time delay acceptable by user
vmax: maximum travel speed (walking, cycling, driving)
Anchor point q’
Decide the anchor distance dist(q, q’)
Based on privacy value, i.e., privacy value at least dist(q, q’)
Or, based on acceptable value of m (communication cost)
Set the anchor q’ to a random location with distance
dist(q, q’) from q
15
Experimental study
Our solution GST (Granular SpaceTwist)
Spatial datasets (domain: [0,10000]2)
Two real datasets: SC (172188 pts), TG (556696 pts)
Synthetic uniform random UI datasets
Performance metrics (workload size=100)
Client-side: SpaceTwist ; Server-side: granular search
packet capacity=67,
derived from MTU
Communication cost (in number of packets)
Result error (result NN distance – actual NN distance)
Privacy value of inferred privacy region
Default parameter values
Anchor distance dist(q,q’): 200
Error bound : 200, Data size N (million): 0.5
16
Transformation-based matching
vs. GST
Hilbert transformation [Khoshgozaran et al., 2007]
SHB: single Hilbert curve
DHB: two orthogonal Hilbert curves
GST computes result with low error
Very low error on real data (skewed) distribution
Stable error for different data distribution
result error
17
Spatial cloaking vs. GST
Our problem setting: no trusted middleware
Competitor: client-side spatial cloaking (CLK)
Trusted third party cloaking not applicable to our problem!
CLK: enlarge q into a square with side length 2*dist(q,q’)
Extent comparable to inferred privacy region of GST
GST produces result at low communication cost
Low cost even at high privacy
Cost independent of N
varying dist(q,q’)
varying data size N
communication cost (# of packets)
18
Effect of error bound
communication cost
result error
privacy value
19
Effect of anchor distance
communication cost
result error
privacy value
20
Effect of data size (UI data only)
communication cost
result error
privacy value
21
Conclusion
Develop a novel solution for protecting location
privacy of query users
Advantages
SpaceTwist at client
Granular search at server
Low communication cost (due to granular searching)
Low result error
Sufficient privacy protection
Future work
Extension for other location based queries
Road network application
22
References
M. F. Mokbel, C.-Y. Chow, and W. G. Aref. The New Casper:
Query Processing for Location Services without
Compromising Privacy. In VLDB, 2006.
P. Indyk and D.Woodruff. Polylogarithmic Private Approximations
and Efficient Matching. In Theory of Cryptography
Conference, 2006.
A. Khoshgozaran and C. Shahabi. Blind Evaluation of Nearest
Neighbor Queries Using Space Transformation to Preserve
Location Privacy. In SSTD, 2007.
G. R. Hjaltason and H. Samet. Distance Browsing in Spatial
Databases. TODS, 24(2):265–318, 1999.
23