Transcript client

SpaceTwist: A Flexible Approach for
Hiding Query User Location
Speaker: Man Lung Yiu
Aalborg University
Joint work with
Christian S. Jensen, Xuegang Huang, Hua Lu
1
Outline





Motivation
Related work
Our solution and its privacy analysis
Experimental results
Conclusions
2
Why location privacy?

Queries in location-based services (LBS)


POI Points-of-interest (e.g., cinema locations)
Nearest neighbor (NN) query



server
Find the closest POI to user location q
Client-server architecture


client
Client (user) sends the point q to server (LBS server)
Server reports the result (i.e., p1) back to client
Server may not be trusted
p1
q
p2
p3 p4
p6
p5
Want to find my result
What should I
do?
Don’t want to leak
my location
3
Related work: spatial cloaking

Extend the point q into a cloaked region Q’




K-anonymous region, trusted anonymizer [Mokbel et al., 2006]
Other types: dummy, obfuscation
Other architectures: peer communication, client itself
Q'
u3
Computes a candidate result set that contains the result of
any possible query location in Q’


Example: candidate set: {p1, p2, p3, p4, p5, p6}
Returns the candidate result set
Disadvantages


Server incurs high processing and communication cost
Requires specialized query processing algorithms, not readily
implemented in existing LBS servers
u2
Anonymizer
Server receives Q’ (instead of q)


q
u1
p1
p2
p6
Q'
p5
p3 p4
LBS server
4
Related work:
transformation-based matching
5
6
0
4



Evaluates the query in a transformed space
No guarantee for the exact result
Theoretical study [Indyk et al., 2006]


7
3
0
2
q
13
7
p3
p1
5
6
12
8
14
12
p2
11
13
1
15
4
8
2
14
10
9
15
11
10
Hilbert transformation [Khoshgozaran et al., 2007]




3
1
A protocol with asymptotic communication cost N
9
Key (H) for specifying the Hilbert ordering, known by client
and a trusted entity but not server
Preprocessing: a trusted entity converts each point p
(e.g., restaurant) to the value H(p), uploads it to server
Query time: client sends H(q) to server, which reports the
closest Hilbert value to H(q), client decodes the reported
value into the result location
Double Hilbert curve, improve result accuracy
drawbacks
5
Features of our solution

Our solution (SpaceTwist)



Fundamental differences from previous approaches



No cloaked region (unlike spatial cloaking)
Query evaluated in the original space (unlike transformation-based matching)
Readily applicable on existing systems



retrieves POI’s from the server incrementally
until the client is guaranteed to have accurate results
Simple client-server architecture (i.e., no trusted components, peers)
Simple server-side query processing: incremental nearest neighbor search
Granular search (optional server-side functionality)

Reduces communication cost but guarantees accuracy bound of results


Spatial cloaking incurs high cost at the server
Transformation-based matching does not offer result accuracy guarantees
6
SpaceTwist: overview
supply space

Anchor location (fake client location)



Define an ordering of points in the space
Client fetches points from server incrementally
Supply space

Space of objects retrieved from the server



Target space guaranteed to cover the actual result


demand space
the beginning
Grows as more objects retrieved
Demand space


Supply space known by both server and client
anchor
Demand space known only by client
supply space
anchor
Shrinks when a “better” result is found
Termination: supply space contains the demand space
demand space
the end
7
Transmission of points

Communication cost




Points are sent from server to client through (TCP/IP) packets
Inefficient to use one packet for one point
Multiple points are packed into a packet before transmission


number of packets received by the client
Packet capacity : number of points in a packet
Actual value of ?


Depends on Maximum Transmission Unit (MTU)
Our experiments: MTU=576 bytes, and =67
8
SpaceTwist: example
p

1

q'
q


Input: user location q, anchor location q’
Client asks server to report points in ascending distance
from anchor q’ iteratively [Hjaltason et al. 1999]


Supply space radius , initially 0



2


q'
q
2nd point
Nearest neighbor distance to user (found so far)
Update  to dist(q,p) when a point p closer to q is found
Supply space covers demand space
Guarantee that exact nearest neighbor of q found
1

Distance of the current reported point from anchor q’
Stop when dist(q,q’) +  ≤ 
p

p
Demand space radius , initially 


Note: server only knows q’ and reported points
1st point
p
1
p
p
2

q
q' 
3rd point
3
9
Privacy analysis
1
m-1
2
m
……

What does the server (malicious attacker) know?




Anchor location q’
Reported points (in reported order): p1, p2, …, pm
Termination condition: dist(q,q’) +  ≤ 
Possible query location qc

Client did not stop at the point p(m-1)


dist(qc, q’) + min{ dist(qc, pi) : i[1,(m-1)] } > dist(q’, p(m-1))
Client stops at the point pm

dist(qc, q’) + min{ dist(qc, pi) : i[1,m] } ≤ dist(q’, pm)

Inferred privacy region : the set of all possible qc

Quantification of privacy

Privacy value: (q, ) = average dist. of location in  from q
10
Visualization of 
User q




A ring with center at q’
Radius approx. dist(q,q’)
What if the server considers searching on
a small sample instead of the whole dataset




Seen points
Visualization with different types of points
Characteristics of  (i.e., possible locations qc)


Anchor q'
=4
Low communication cost
 becomes large at low data density
But less accurate result
How this can be done?
coarser granularity
11
Granular search requirement


Granular search: search POI’s at coarser granularity
Advantages



Reduce communication cost
Enhance location privacy protection
Accuracy requirement


User specifies an error bound 
A point pP is a relaxed NN of q if
dist(q, p)   + min { dist(q, p’) : p’P }
Actual NN distance
12
Granular search




Given an error bound , impose a grid in the
space with cell length  =  / 2
As in incremental search, the server still reports
points in ascending distance from anchor q’
Server discards a data point p if it falls in the
same cell of any reported point
With granular searching (anchor q’)




Server reports p1, client updates its NN to p1
Server discards p2, p3
Server reports p4, client updates its NN to p4
p
p
4
q
p
2
1
p
3
q'

regular grid
Client receives fewer points and has a larger
inferred privacy region 
13
Granular search implementation

Materialization of results not feasible




Error bound  only known at query time
Different users specify different values of 
Data points are indexed by a (disk-based) R-tree on server
We extend the incremental NN search [Hjaltason et al. 1999]



Use a cell list V to keep track of the cells of reported points
Discard entries or points that are fully covered by cells in V
Remove cells in V when they are not useful anymore
p2
c2
c1
p3
p
q' e1 e 5
2
p
1
c3
p4
p6
c5
c4
c6
e3
p7
14
Parameter selection guide


Appropriate parameter values for the user (client)?
Error bound 




Set = vmax  tmax based on
tmax: maximum time delay acceptable by user
vmax: maximum travel speed (walking, cycling, driving)
Anchor point q’

Decide the anchor distance dist(q, q’)



Based on privacy value, i.e., privacy value at least dist(q, q’)
Or, based on acceptable value of m (communication cost)
Set the anchor q’ to a random location with distance
dist(q, q’) from q
15
Experimental study

Our solution GST (Granular SpaceTwist)


Spatial datasets (domain: [0,10000]2)



Two real datasets: SC (172188 pts), TG (556696 pts)
Synthetic uniform random UI datasets
Performance metrics (workload size=100)




Client-side: SpaceTwist ; Server-side: granular search
packet capacity=67,
derived from MTU
Communication cost (in number of packets)
Result error (result NN distance – actual NN distance)
Privacy value of inferred privacy region 
Default parameter values


Anchor distance dist(q,q’): 200
Error bound : 200, Data size N (million): 0.5
16
Transformation-based matching
vs. GST

Hilbert transformation [Khoshgozaran et al., 2007]



SHB: single Hilbert curve
DHB: two orthogonal Hilbert curves
GST computes result with low error


Very low error on real data (skewed) distribution
Stable error for different data distribution
result error
17
Spatial cloaking vs. GST


Our problem setting: no trusted middleware
Competitor: client-side spatial cloaking (CLK)


Trusted third party cloaking not applicable to our problem!
CLK: enlarge q into a square with side length 2*dist(q,q’)


Extent comparable to inferred privacy region  of GST
GST produces result at low communication cost


Low cost even at high privacy
Cost independent of N
varying dist(q,q’)
varying data size N
communication cost (# of packets)
18
Effect of error bound
communication cost
result error
privacy value
19
Effect of anchor distance
communication cost
result error
privacy value
20
Effect of data size (UI data only)
communication cost
result error
privacy value
21
Conclusion

Develop a novel solution for protecting location
privacy of query users



Advantages




SpaceTwist at client
Granular search at server
Low communication cost (due to granular searching)
Low result error
Sufficient privacy protection
Future work


Extension for other location based queries
Road network application
22
References
M. F. Mokbel, C.-Y. Chow, and W. G. Aref. The New Casper:
Query Processing for Location Services without
Compromising Privacy. In VLDB, 2006.
P. Indyk and D.Woodruff. Polylogarithmic Private Approximations
and Efficient Matching. In Theory of Cryptography
Conference, 2006.
A. Khoshgozaran and C. Shahabi. Blind Evaluation of Nearest
Neighbor Queries Using Space Transformation to Preserve
Location Privacy. In SSTD, 2007.
G. R. Hjaltason and H. Samet. Distance Browsing in Spatial
Databases. TODS, 24(2):265–318, 1999.
23