Transcript Analysis

Analysis
CS 367 – Introduction to Data Structures
Overview
• Why study data structures?
– biggest reason for a computer is to store,
analyze, and retrieve data
– data structures are used to store data
– type of data structure chosen effects
• amount of data that can be stored
• retrieval speed
• speed of analysis
Data Structures
• Data structure selection involves tradeoffs
– speed vs. capacity vs. flexibility vs. complexity
• Consider an array versus a vector
– which is faster?
– which has a larger storage capacity?
– which has more flexibility in data placement?
– which is more complex?
– which one is better?
• Must always consider which
characteristics are most important for the
data being handled
Algorithms
• Always do some kind of work on data
– at least sort or search through it
• Many different routes to same destination
– consider 10 different people writing the same
program
• Picking the best route requires analysis
– most concerned with how long an algorithm
will take
Analysis
• Computation Complexity
– how much effort and cost is associated with a
specific algorithm
• How do we measure computational
complexity?
– run algorithm on a set of data and time it?
• results depend on speed of computer used
– a PC vs. a Cray supercomputer
• results depend on language algorithm written in
– C vs. Java vs. Basic
Analysis
• Need a mathematical technique to analyze
algorithms
– needs to be free of any real-time units (ms)
– should relate logical time unit to data size
• t = cn
t->logical time, c->constant, n->data size
• notice that time does not need to given in seconds
• Several different techniques have been
suggested
– we will only consider a few of them
Big-O Notation
• Formal, mathematical definition
f(n) is O(g(n)) if there exist positive numbers c and N
such that f(n) ≤ cg(n) for all n ≥ N.
– Wow! That probably makes no sense at all
• Now let’s put it into English
– we want to find some function (g(n)) that is
always greater than or equal to the original
function (f(n))
• for small values of f(n), g(n) may not be equal or
greater
• However, once n reaches a certain size, then g(n)
will always be greater than or equal to f(n)
Big-O Notation
• Still confused? Don’t feel bad, let’s see an
example
– consider the following function:
f(n) = n2 + 2n + 35
– now consider this next function:
• g(n) = n2
– by the previous definition, we need to find a c
and an N that satisfy the following:
• cg(n) ≥ f(n) for all n ≥ N
Big-O Notation
• So let’s find a c and an N
– if c=2 then for what values of n is cg(n) ≥ f(n)?
• solve the following equation for n:
– cn2 ≥ n2 + 2n + 35
• n ≥ 7 (this means N = 7 from previous definition)
• it is clear that there are an infinite number of c
and N possibilities
– what’s important, is g(n) is the same for all of them
– but there’s an infinite number of possible g(n), too
• g(n) = n2, n3, n4, …
– pick the smallest g(n) that works
Big-O Notation
• The previous slide showed that g(n) = n2
– Hence, our Big-O notation is O(n2) for the
function f(n) = n2 + 2n + 35
• Notice that all but the first term of f(n) were
discarded to find g(n)
– this works because for large n, the largest
exponent dominates
– consider n=1, n=5, n=100
Analysis
• So how does all this math help algorithm
analysis?
– consider the following piece of code
for(i=sum=0; i<n; i++)
sum += a[i];
– before loop starts, 2 assignments (i and sum)
– 2 assignments within the loop (sum and i)
– equation relating time to n is:
f(n) = 2 + 2n
– computational complexity is:
O(n)
Analysis
• Consider another example
int rowsum[n];
int firstcolsum;
for(i=firstcolsum=0; i<n; i++) {
firstcolsum += a2d[i][0];
for(j=rowsum[i]=0; j<n; j++)
rowsum[i] += a2d[i][j];
}
– before loop starts: 2 assignments
– inside outer loop: 4 assignments
– inside inner loop: 2 assignments
– equation: f(n) = 2 + 4n + 2n*n = 2 + 4n + 2n2
– computational complexity: O(n2)
Analysis
• Analysis can get much more sophisticated
• Let’s compare 2 array search algorithms
– brute force
• check the first element, then second, third, etc.
– binary search
•
•
•
•
•
requires a sorted array
start at the center and divide the array into two
if less than center, search lower half
if greater than center, search upper half
repeat until the number is found
Brute Force Analysis
• Code
int bruteforce(int [] array, int key) {
for(int i=0; i<array.length; i++) {
if(array[i] == key)
return i;
}
return -1;
• Analysis
– best case: iterations = 1
– worst case: iterations = n
– average case: iterations = n/2
– average computational complexity = O(n)
Binary Search Analysis
• Code
int binarySearch(int [] array, int key) {
int lo = 0, hi = array.length – 1, mid;
while(lo <= hi) {
mid = (lo + hi) / 2;
if(key < array[mid])
hi = mid – 1;
if(key > array[mid])
lo = mid + 1;
else
return mid;
}
return -1;
Binary Search Analysis
• Analysis
– best case: iterations = 1
– worst case: iterations = log2 n
– average case: give equation
• don’t worry about how this was achieved
• see the book if you want details
• it should be clear that calculating complexity is not
always easy or straightforward
– average computational complexity: O(log2 n)
Big-Ω Notation
• Formal, mathematical definition
f(n) is Ω(g(n)) if there exist positive numbers c and N
such that f(n) ≥ cg(n) for all n ≥ N.
– only real difference between Big-Ω and Big-O
is that Big-O is an upper bound on a function
and Big-Ω is a lower bound
– all the descriptions of Big-O apply – just
reverse the “≤” sign in Big-O to a “≥” sign for
Big- Ω
Big-Θ Notation
• Formal, mathematical definition
f(n) is Θ(g(n)) if there exist positive numbers c1, c2, and N such
that c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ N.
• Again, we consider the following equation
– f(n) = n2 + 2n + 35
• We have shown that this is O(n2)
• It is easy to show that it is also Ω(n2)
– if c=1 and g(n)=n2 then cg(n) ≤ f(n)
• If c1=1, c2=2, and N=7, we satisfy the above
definition and the computational complexity
(using Big-Θ) becomes Θ(n2)
Words of Warning
• There are many other, more sophisticated
methods of computational complexity
• Those shown here do not always paint a
complete picture
• However, these techniques can provide
some good insight into an algorithm