Transcript slides

Lecture 3
Nearest Neighbor Algorithms
Shang-Hua Teng
What is Algorithm?
• A computable set of steps to achieve a
desired result from a given input
• Example:
– Input: An array A of n numbers
n
– Desired result
 ak
a1 a2  an
k 1
• Pseudo-code of Algorithm SUM
Pseudo-code of Algorithm SUM
Algorithm SUM  A  a1
a2  an 
s  a1
for k  2 to n
s  s  ak
return s
Complexity:
• Input Size n
• Number of steps: n-1 additions
Example 2:
Integer Multiplication
c=ab
• When do we need to multiply two very
large numbers?
– In Cryptography and Network Security
• message as numbers
• encryption and decryption need to multiply numbers
How to multiply 2 n-bit numbers

************
************
************
************
************
************
************
************
************
************
************
************
************
************
************************
Input Size : 2n bits
Complexity
2
n bit operations
Asymptotic Notation of
Complexity
• as input size grows, how fast does the running
time grow.
– T1(n) = 100 n
– T2(n) = n2
•
•
•
•
Which algorithm is better?
When n < 100 is small then T2 is smaller
As n becomes larger, T2 grows much faster
To solve large-scale problem, algorithm1 is
preferred.
Asymptotic Notation
(Removing the constant factor)
• The Q Notation
Q(g(n)) = { f(n): there exist positive c1 and c2 and
n0 such that 0  c1g (n)  f (n)  c2 g (n)
for all n > n0}
• For example T(n) = 4nlog n + n = Q(nlog n)
• For example n – 1 = Q(n)
Asymptotic Notation
(Removing the constant factor)
• The Big-O Notation
O(g(n)) = { f(n): there exist positive c and
n0 such that
0  f (n)  cg (n)
for all n > n0}
• For example T(n) = 4nlog n + n = O(nlog n)
• But also T(n) = 4nlog n + n = O(n2)
Nearest Neighbor Problem:
General Formulation
Input : a set of n points in d dimensions
P  p1 p2  pn 
Output :
For each point p  P the nearest point to p
Nearest Neighbor Problem
Applications
• Points could be web-page, closest neighbor
is the most similar web-page
• Points could be people, closest neighbor
could be the best friend
• Points could be biological spices, the closest
neighbor could be the closest spice
• …
An
2
O(dn )
time Algorithm
Algorithm NN P   p1
p2  pn 
for all i  [1, n], j  [1, n]
compute d i,j   ||p i - p j ||
for i  1 to n
dist i   
for j  1 to n
if i  j and d i,j   dist [i ]
then
dist [i ]  d i,j ; NN [i ]  j
return NN , dist
Why O(dn2) time?
Can We do better?
• Yes, Handout #4, by Jon Louis Bentley
When d  1, On log n  time

For d  1, O n log
d -1

n time
One-Dimensional Geometry
If we can order points from small to large,
then we just need to look at the left
neighbor and right neighbor of each point to
find its nearest neighbor
Reduce to Sorting
• Input: Array A[1...n], of elements in an
arbitrary order; array size n
Output: Array A[1...n] of the same elements,
but in the non-decreasing order
Divide and Conquer
• Divide the problem into a number of
sub-problems (similar to the original
problem but smaller);
• Conquer the sub-problems by solving them
recursively (if a sub-problem is small
enough, just solve it in a straightforward
manner.
• Combine the solutions to the sub-problems
into the solution for the original problem
Algorithm Design Paradigm I
• Solve smaller problems, and use solutions to the
smaller problems to solve larger ones
– Divide and Conquer
• Correctness: mathematical induction
Merge Sort
• Divide the n-element sequence to be sorted into
two subsequences of n/2 element each
• Conquer: Sort the two subsequences
recursively using merge sort
• Combine: merge the two sorted subsequences
to produce the sorted answer
• Note: during the recursion, if the subsequence
has only one element, then do nothing.
Merge-Sort(A,p,r)
A procedure sorts the elements in the sub-array
A[p..r] using divide and conquer
• Merge-Sort(A,p,r)
– if p >= r, do nothing
– if p< r then q  ( p  r ) / 2
• Merge-Sort(A,p,q)
• Merge-Sort(A,q+1,r)
• Merge(A,p,q,r)
• Starting by calling Merge-Sort(A,1,n)
A = MergeArray(L,R)
Assume L[1:s] and R[1:t] are two sorted arrays
of elements: Merge-Array(L,R) forms a single
sorted array A[1:s+t] of all elements in L and R.
• A = MergeArray(L,R)
– L[ s  1]  ; R[t  1]  
– i  1; j  1
– for k 1 to s + t
• do if L[i]  R[ j ]
– then A[k ]  L[i ]; i  i  1
– else A[k ]  R[ j ]; j  j  1
Complexity of MergeArray
• At each iteration, we perform 1 comparison, 1
assignment (copy one element to A) and 2
increments (to k and i or j )
• So number of operations per iteration is 4.
• Thus, Merge-Array takes at most 4(s+t) time.
• Linear in the size of the input.
Merge (A,p,q,r)
Assume A[p..q] and A[q+1..r] are two sorted
Merge(A,p,q,r) forms a single sorted array A[p..r].
• Merge (A,p,q,r)
–
–
–
–
s  q - p  1; t  r - q;
L  A[ p..q]; R  A[q  1, r ]
L[ s  1]  ; R[t  1]  
A[ p..r ]  MergeArray ( L, R)
Merge-Sort(A,p,r)
A procedure sorts the elements in the sub-array
A[p..r] using divide and conquer
• Merge-Sort(A,p,r)
– if p >= r, do nothing
– if p< r then q  ( p  r ) / 2
• Merge-Sort(A,p,q)
• Merge-Sort(A,q+1,r)
• Merge(A,p,q,r)
Running Time of Merge-Sort
• Running time as a function of the input size,
that is the number of elements in the array A.
• The Divide-and-Conquer scheme yields a
clean recurrences.
• Assume T(n) be the running time of mergesort for sorting an array of n elements.
• For simplicity assume n is a power of 2, that
is, there exists k such that n = 2k .
Recurrence of T(n)
• T(1) = 1
• for n > 1, we have
T (n)  2T (n / 2)  4n
1

T ( n)  
2T (n / 2)  4n
if n = 1
if n > 1
Solution of Recurrence of T(n)
T(n) = 4 nlog n + n = O(nlog n)
• Picture Proof by Recursion Tree