Transcript Lecture 5

CS503: Fifth Lecture, Fall 2008
Recursion and Linked Lists.
Michael Barnathan
Here’s what we’ll be learning:
• Theory:
–
–
–
–
Recursion.
Recursive data structures.
The “Divide and Conquer” paradigm.
Memoization (as in “memo” + “ization”).
• Data Structures:
– Linked Lists.
• We are going to keep coming back to recursion
throughout the semester.
– But it should be easier for you each time we cover it.
– We’ll stop covering it when you’re sufficiently familiar with it.
Recursion: Definition
• A function is recursive if it calls itself as a step in solving a
problem.
– Why would we want to do this?
• A data structure is recursive if it can be defined in terms of
a smaller version of itself.
• Recursion is used when the problem can be broken down
into smaller instances of the same problem.
– Or “subproblems”.
– These subproblems are more easily solved.
• Often by breaking them into even smaller subproblems.
• Of course, at some point, we have to stop.
• The solutions to these subproblems are then merged into a
solution for the whole problem.
Recursive Structures: Example.
• Before we discussed sorting, I asked you how you would
sort a 3 element array.
• We couldn’t figure that out immediately, so I asked how to
do it on a 2-element array.
– Compare the elements and swap.
• Then I asked you how to extend that to a 3-element array.
– Do two comparisons.
• A size-n array can be recursively defined as a single element
and an array of size n-1.
– The sorting problem was easier to solve for small arrays.
– By extending the (easier) problem from small to large, we came
up with a general sorting algorithm (bubble sort).
Recursion: Why?
• Oftentimes, a problem can be solved by reducing it to a smaller
version of itself.
• If your solution is a function, you can call that function inside of
itself to solve the smaller problems.
• There are two components of a recursive solution:
– The reduction: this generates a solution to the larger problem through
solution of the smaller problems.
– The base case: this solves the problem outright when it’s “small
enough”.
• The key:
– You won’t be able to follow what is going on in every recursive call at
once; don’t think of it this way.
– Instead, think of a recursive function as a reduction procedure for
reducing a problem to a smaller instance of itself. Only when the
problem is very small do you attempt a direct solution.
Example:
•
Using a loop, print all of the numbers from 1 to n.
void printTo(int n) {
for (int i =1; i <= n; i++)
System.out.println(i);
}
•
Now do it using recursion.
– Stop and think before coding. Never rush into a recursive algorithm.
– Problem: Print every number from 1 to n.
– How can we break this problem up?
•
•
Print every number from 1 to n-1, then print n.
“Print n” is easy: System.out.println(n);
– How can we print every number from 1 to n-1?
– How about calling our solution with n-1?
•
This is going to call it with n-2…
–
–
•
This is going to call it with n-3...
» …
And print n-3.
And print n-2.
– And print n-1.
•
•
And print n.
And then we’re done! Right?
Where does it end!?
When does it end!?
• This is a fundamental question when writing
any recursive algorithm: where do we stop?
• The easiest place to stop is when n < 1.
– What do we do when n < 1?
– Well, we already outputted all of the numbers
from 1 to n. That was the goal.
– So we do nothing. We “return;”.
So here it is.
void printTo(int n) {
if (n < 1)
//Base case.
return;
else {
//Recursive step: 1 to n is 1 to n-1 and n.
printTo(n-1);
//Print 1 to n-1.
System.out.println(n);
//Print n.
}
}
• Question: What if we printed n before calling printTo?
• Don’t try to trace each call. Think about what this is doing.
The Reduction:
• We reduced the problem of printing 1 .. n to the problem of
printing 1 .. n-1 and printing n.
– A smaller instance of the same problem.
– So we solved it using the same function.
• Because we took one element off the end at a time, this is
called tail recursion.
• When it became “small enough” (n < 1), we solved it
directly.
– By stopping, since the output was already correct.
– If we kept going, we’d print 0, -1, …, which would be incorrect.
• That’s how recursion works.
– But this is a simple example.
• What’s the complexity of this algorithm?
Splitting into pieces.
• What we just saw was a “one piece” problem.
– We reduced the problem to one smaller problem.
• n -> (n-1).
– These are the easiest to solve.
– These solutions are usually linear.
• What if we split the problem into two smaller
problems at each step?
– Say we wanted to find the nth number in the
Fibonacci series.
Recursive Fibonacci:
• The Fibonacci series is a series in which each
term is the sum of the two previous terms.
– Recursive definition.
• The first two terms are 1.
– Base case (without this, we’d have a problem).
• It looks like this:
– 1, 1, 2, 3, 5, 8, 13, 21, 34, …
• And here’s its function:
– F(n) = F(n-1) + F(n-2)
– F(1) = F(2) = 1
Fibonacci: two-piece recursion.
• In order to find the nth Fibonacci number, we
need to simply add the n-1th and n-2th
Fibonacci numbers.
• Ok, so here’s a Java function fib(int n):
int fib(int n) {
}
• What would we write for the base case?
Fibonacci base case
int fib(int n) {
if (n <= 2)
return 1;
}
• That was simple. The recursive part?
Fibonacci base case
int fib(int n) {
if (n <= 2)
return 1;
return fib(n-1) + fib(n-2);
}
• Ok, that wasn’t too bad.
Solving multi-piece recursion.
• Often you get a direct solution from the recursive
call in one-piece recursion.
• But when you split into more than one piece, you
must often merge the solutions.
– In the case of Fibonacci, we did this by adding.
– Sometimes it will be more complex than this.
• Recursion usually looks like this:
– Call with smaller problems.
– Stop at the base case.
– Merge the subproblems as we go back up.
Divide and Conquer
• The practice of splitting algorithms up into
manageable pieces, solving the pieces, and
merging the solutions is called the “divide and
conquer” algorithm paradigm.
• It is a “top down” approach, in that you start
with something big and split it into smaller
pieces.
• Recursion isn’t necessary for these algorithms,
but it is often useful.
Memoization
•
•
•
•
What if we wanted to analyze fib?
Well, there’s one call to fib(n)…
Which makes two calls: fib(n-1) and fib(n-2)…
Which makes four calls: fib(n-2), fib(n-3),
fib(n-3), and fib(n-4)…
• Which makes eight…
• …
• Uh oh.
Why is it exponential?
• So this is O(2^n). Not good.
• And yet if you were to do it in a loop, you
could do it in linear time.
• But there’s something wrong with the way
we’re calling that causes an exponential result.
– There’s a lot of work being repeated.
• We repeat n-2 twice, n-3 four times, n-4 eight…
• In fact, this is the reason why it’s exponential!
• Fortunately, we can reduce this.
Memoization
• “Memoization” (no “r”) is the practice of caching
subproblem solutions in a table.
• (Come to think of it, they could have left the “r” in and
it would still be an accurate term).
• So when we find fib(5), we store the result in a table.
The next time it gets called, we just return the table
value instead of recomputing.
– So we save one call. What’s the big deal?
– fib(5) calls fib(4) and fib(3), fib(4) calls fib(3) and fib(2),
fib(3) calls…
– You actually just saved an exponential number of calls by
preventing fib(5) from running again.
Implementing Memoization
•
•
Use a member array for the table and wrap the actual work in a private function.
The fib() function looks up the answer in the table and calls that function if it’s not found.
class Fibonacci {
private static int[] fibresults = new int[n+1];
//Or use a Vector for dynamic sizing.
public int fib(int n) {
if (fibresults[n] <= 0)
//Not in table.
fibresults[n] = fib_r(n); //Fill it in.
return fibresults[n];
}
private int fib_r(int n) {
if (n <= 2)
return 1;
return fib(n-1) + fib(n-2);
}
}
//This does the real work.
//Base case.
//Note that we call “fib”, not “fib_r”.
The Gain
• When we store the results of that extra work,
this algorithm becomes linear.
• Finding F(50) without memoization takes 1
minute and 15 seconds on rockhopper.
• Finding F(50) with memoization takes 0.137s.
• The space cost of storing the table was linear.
– Because we’re storing for one variable, n.
Optimal Solution to Fibonacci
• The Fibonacci series has a closed form.
• That means we can find F(n) in constant time:
• F(n) = (phi^n - (-1/phi)^n) / sqrt(5).
• Phi is the Golden Ratio, approx. 1.618.
• It pays to research the problem you’re solving.
Linked Lists
• We said that arrays were:
– Contiguous.
– Homogenous.
– Random access.
• What if we drop the contiguousness?
• That is, adjacent elements in the list are no
longer adjacent in memory.
• It turns out that you lose random access, but
gain some other properties in return.
Linked Lists
• A linked list is simply a collection of elements in which
each points to the next element.
• For example:
1
2
3
• This is accomplished by storing a reference to the next
node in each node:
class Node<DataType> {
public DataType data;
public Node<DataType> next;
}
Variations
• Doubly linked lists contain pointers to the next and
previous nodes. The Java “LinkedList” class is doubly-linked.
– This class has a similar interface to Vector.
1
2
3
• Circularly linked lists are linked lists in which the last
element points back to the first:
– Seldom used, usually for “one every x” problems.
– To traverse one of these, stop when the next element is equal to
where you started.
1
0
2
3
CRUD: Linked Lists.
•
•
•
•
Insertion:
Access:
Updating an element:
Deleting an element:
• Search:
• Merge:
• Let’s start with access and insertion.
?
?
?
?
?
?
Node Access
• Elements are no longer contiguous in memory.
• We can no longer jump to the ith element.
• Now we have to start at the beginning and
follow the reference to the next node i times.
• Therefore, access is linear.
• This is called sequential access.
– Because every node must be visited in sequence.
Access element 3:
1
2
3
Node Insertion
• To insert into a list, all we need to do is change
the “next” pointer of the node before it and
point the new node to the one after.
• This is a constant-time operation (provided
we’re already in position to insert).
• Example:
Insert “5” after “1”:
1
5
2
3
Node1.next = new Node(5, Node1.next);
Merging Two Lists
• Linked lists have the unique property of
permitting merges to be carried out in
constant time (if you store the last node).
• In an array, you’d need to allocate a large array
and copy each of the two arrays to it.
• In a list, you simply change the pointer before
the target and the last pointer in the 2nd list.
• Example: 1
2
3
1
2
Merge at “2”:
6
5
4
6
5
3
4
CRUD: Linked Lists.
•
•
•
•
Insertion:
Access:
Updating an element:
Deleting an element:
• Search:
• Merge:
O(1)
O(N)
O(1)
?
O(N).
O(1).
• Binary search will not work on sequential access data structures.
– Moving around the array to find the middle is O(n), so we may as well
use linear search.
• Updating a node is as simple as changing its value.
• That leaves deletion.
Deletion
•
•
•
•
•
Insertion in reverse.
Easier to do in a doubly-linked list.
Store the next node from the target.
Remove the target node.
Set previous node’s next to the stored next.
Delete “5”:
1
5
2
3
CRUD: Linked Lists.
•
•
•
•
Insertion:
Access:
Updating an element:
Deleting an element:
• Search:
• Merge:
O(1)
O(N)
O(1)
O(1)
O(N).
O(1).
• Dynamically sized by nature.
– Just stick a new node at the end.
• Modifications are fast, but node access is the killer.
– And you need to access the nodes before performing other operations on
them.
• Three main uses:
– When search/access is not very important (e.g. logs, backups).
– When you’re merging and deleting a lot.
– When you need to iterate through the list sequentially anyway.
Les Adieux, L’Absence, Le Retour
• That was our first lecture on recursion.
• There will be others - it’s an important topic.
• The lesson:
– Self-similarity is found everywhere in nature:
trees, landscapes, rivers, and even organs exhibit
it. Recursion is not a primary construct for arriving
at solutions, but a method for analyzing these
natural patterns.
• Next class: Linked Lists 2, Stacks, and Queues.
• Begin thinking about project topics.