Nearest Neighbor Alg for Peer-to-Peer

Download Report

Transcript Nearest Neighbor Alg for Peer-to-Peer

An Efficient Nearest Neighbor (NN) Algorithm
for Peer-to-Peer (P2P) Settings
Ahmed Sabbir Arif
Graduate Student, York University
30 November 2006
1
Introduction

Though Peer-to-peer (P2P) networks are becoming increasingly
popular as a powerful means for data exchange, methods for
accessing complex data such as ‘spatial’ data on P2P networks are
still at their infancy. In my presentation I will demonstrate a new and
efficient algorithm for acquiring complex data off P2P Network.

Organization of this presentation:
Nearest Neighbor (NN) Algorithm
P2P Network
How NN has been used in P2P data acquisition
Problem with traditional strategy
New algorithm
 Comparison





30 November 2006
2
An Heuristic Approach to Travelling Salesman
Problem (TSP)

Nearest Neighbor (NN) was one of the first algorithms used to
determine a solution to the Travelling Salesman Problem (TSP).

It is NP-hard

4 of the algorithm: 2
These are the steps
1.
2.
3.
4.
5.
Pick an arbitrary starting vertex.
Terminate if the current vertex5has no unmarked edges.
Find the unmarked edge from the current vertex with the least
weight and mark it.
1
Select the vertex
at
the
other
end of the edge.
3
6
Repeat from step 2.
30 November 2006
3
An Heuristic Approach to Travelling Salesman
Problem (TSP) cont.

It doesn’t in general compute the optimal result but takes little time
to execute.

Reliable only for special cases of the TSP where the triangle
inequality is satisfied.

In the worst case, the algorithm can compute tours that are by an
arbitrary factor larger than the optimal tour:

For every constant r there is an instance of the TSP such that the length
of the tour computed by the NN algorithm >= (r * length of the optimal
tour)
30 November 2006
4
Peer-2-Peer Network
Client
Server
Client
Server
Client
Server
30 November 2006
5
Quadtree Data Structure



A quadtree is a tree data structure in which each internal node has
up to 4 children.
Used to partition a 2D space by recursively subdividing it into 4
quadrants/regions.
The regions may be square, rectangular or of arbitrary shapes.
30 November 2006
6
Distributed Hash Table (DHT) Chord

A class of decentralized distributed systems that partition ownership
of a set of keys among participating nodes, and can efficiently route
messages to the unique owner of any given key.

Can map peer address too.

Chord adapts efficiently as nodes join, leave.

There is a high probability that it will find a file in O(log n) time for
n peers.
30 November 2006
7
Usual Query Strategy
Chord
routing
algorithm
Control Point
Quad Tree
Chord
Keys
Data storage
Bucket 1
30 November 2006
Data storage
Bucket 2
Data storage
Bucket 3
IP
Data storage
Bucket 4
8
Problems with Usual Strategies





Retrieving data is not obvious because in P2P there is no central
server/administration.
Bad with special data.
Can’t work in parallel (works sequentially).
Approaches with a single root, that increases the possibility of
getting an non-optimal result.
Single point of failure makes all tree operation start from the peer
who stores the root control point.
30 November 2006
9
The Minimum Level ƒmin

Single point of failure makes all tree operation start from the peer
who stores the root control point.

To avoid this ƒmin is introduced


Forces objects to be stored (and query process to start) at a level
l  ƒmin
ƒmax is also introduced to avoid objects being stored at l  ƒmax
30 November 2006
10
A Practical Approach to Parallelism




Have to avoid all-to-all communication.
There remains a possibility where the spatial data structure will
return a closer object rather than the object that is more relevant,
this has to be fixed.
1st iteration:
Can be done:
By maintaining multiple entry point.
 Has a single peer’s point of view.

NN
query
priority queue
a
b
..
..
..
initiates
Q-tree
30 November 2006
11
Worst Case (WC) Criterion

After the 1st iteration we have:

Many blocks (and the peers who maintain them) instead of single peer.
 Have chance to have more NN to be contacted.

2nd iteration decides which ones to be contacted using WC criterion

Ensures that the relevant peers that can still help finding a closer
neighbor for the next nearest neighbor are contacted
 Works even for ƒmin >0
30 November 2006
12
Worst Case (WC) Criterion cont.

Let's assume:




Quire point: q
The current nearest neighbor is at distance: m
Position is the priority queue: t
Element: e

The algorithm computes the maximum distance MaxDist(q, t)
Processes all elements e in the priority queue whose distance
Dist(q,e) < Min(MaxDist(q, t), m)

Goal is to look at two pieces of information:

1.
2.
The first spatial object in the priority queue that, in the worst case,
can be the next NN.
The maximum possible distance from the query point to an object in
the top element of the priority queue.
30 November 2006
13
NNQuery
AA
AB
BA
BB
AC
AD
BC
BD
Control point
X
Y
WC-1
CA
CB
DA
CC
CD
DC
DB
WC-2
Z
DD
Query point
30 November 2006
14
Analysis of this New Algorithm

This new NN algorithm assumes:

Every peer has perfect quadtree with height h.
 There is no way to determine if the peer added before still holds the
data.
 P2P network matches their standard.

Traditional P2P NN Algorithm


Sequential, that’s why Complexity is: O(4h)
This new algorithm is parallel




NN will be found in: O(h) time
Increment rate would be: O(4h/2)
Overall complexity: O(2h)
It’s a big improvement considering P2P network
30 November 2006
15
References
[1] Tanin, E., Nayar, D., and Samet, H. 2005. An efficient nearest neighbor
algorithm for P2P settings, Proceedings of the 2005 National Conference on
Digital Government Research, Atlanta, Georgia, May 15-18, 2005, pages 2128.
[2] G. Kedem. The Quad-CIF Tree: A data structure for hierarchical online
algorithms, Proceedings of the 19th Design Automation Conference, Las
Vegas, NV, June 1982, pages 352-357.
[3] Wikipedia, Available: http://en.wikipedia.org/
30 November 2006
16
Thank you!
Questions
?
30 November 2006
17