Transcript Powerpoint
Algorithms
Session 7
LBSC 790 / INFM 718B
Building the Human-Computer Interface
Agenda: Weeks 7 and 8
• Questions
• Some useful algorithms
• Project
• Some useful data structures
– Including Java implementations
Why Study Algorithms?
• Some generic problems come up repeatedly
– Sorting
– Searching
– Graph traversal
• Need a way to compare alternative solutions
• Reusing algorithms is easy and productive
– Focusing on the algorithm reveals the key ideas
– Language and interface make reusing code hard
Sorting
• Given an array, put the elements in order
– Numerical or lexicographic
• Desirable characteristics
–
–
–
–
Fast
In place (don’t need a second array)
Able to handle any values for the elements
Easy to understand
Insertion Sort
• Simple, able to handle any data
• Grow a sorted array from the beginning
– Create an empty array of the proper size
– Pick the elements one at a time in any order
– Put them in the new array in sorted order
• If the element is not last, make room for it
– Repeat until done
• Can be done in place if well designed
Insertion Sort
90 11 27 31
4
16 11 37
Insertion Sort
90 11 27 31
90
4
16 11 37
Insertion Sort
90 11 27 31
11 90
4
16 11 37
Insertion Sort
90 11 27 31
11 27 90
4
16 11 37
Insertion Sort
90 11 27 31
11 27 31 90
4
16 11 37
Insertion Sort
90 11 27 31
4
4
11 27 31 90
16 11 37
Insertion Sort
90 11 27 31
4
4
16 11 37
11 16 27 31 90
Insertion Sort
90 11 27 31
4
4
16 11 37
11 11 16 27 31 90
Insertion Sort
90 11 27 31
4
4
16 11 37
11 11 16 27 31 37 90
Insertion Sort
• Sorting can actually be done in place
– Never need the same element in both arrays
• Every insertion can cause lots of copying
– If there are N elements, need to do N insertions
– Worst case is about N/2 copys per insertion
– N elements can take nearly N2 operations to sort
• But each operation is very fast
– So this is fine if N is small (20 or so)
Merge Sort
• Fast, able to handle any data
– But can’t be done in place
• View the array as a set of small sorted arrays
– Initially only the 1-element “arrays” are sorted
• Merge pairs of sorted arrays
– Repeatedly choose the smallest element in each
– This produces sorted arrays that are twice as long
• Repeat until only one array remains
Merge Sort
90 11 27 31
11 90
4
16 11 37
Merge Sort
90 11 27 31
11 90 27 31
4
16 11 37
Merge Sort
90 11 27 31
4
16 11 37
11 90 27 31
4
16
Merge Sort
90 11 27 31
4
16 11 37
11 90 27 31
4
16 11 37
Merge Sort
11 27 31 90
11 90 27 31
4
16 11 37
Merge Sort
11 27 31 90
4
11 16 37
11 90 27 31
4
16 11 37
Merge Sort
11 27 31 90
4
4
11 16 37
11 11 16 27 31 37 90
Merge Sort
• Each array size requires N steps
– But 8 elements requires only 3 array sizes
k
• In general, 2 elements requires k array sizes
– So the complexity is N*log(N)
• No faster sort (based on comparisons) exists
– Faster sorts require assumptions about the data
– There are other N*log(N) sorts, though
• Merge sort is most often used for large disk files
Computational Complexity
• Run time typically depends on:
– How long things take to set up
– How many operations there are in each step
– How many steps there are
• Insertion sort can be faster than merge sort
– One array, one operation per step
– But N*log(N) eventually beats N2 for large N
• And once it does, the advantage increases rapidly
Divide and Conquer
• Split a problem into simpler subproblems
– Keep doing that until trivial subproblems result
• Solve the trivial subproblems
• Combine the results to solve a larger problem
– Keep doing that until the full problem is solved
• Merge sort illustrates divide and conquer
– But it is a general strategy that is often helpful
Recursion
• Divide and conquer problems are recursive
– Solve the same problem at increasing granularity
• Construct a Java method to solve the problem
– Divide the problem into subproblems
– Call the same method to solve each subproblem
• Unless the subproblems are trivial
– Use the parameters to control the granularity
• See this week’s notes page for merge sort example
A Recipe for Recursion
• First, craft a divide and conquer strategy
• Create a non-recursive top-level method
– Calls recursive method with initial parameters
• In the recursive method:
– First solve the problem if it is trivial and return
• Be sure you eventually get here!
– Otherwise, split the problem and call itself
– Combine the results and return them
Search
• Find something by following links
– Web pages
– Connections in the flight finder
– Winning moves in chess
• This may seem like an easy problem
– But computational complexity can get really bad
– Simple tricks can help in some cases
Web Crawlers
• Goal is to find everything on the web
– Build a balanced tree, sorted by search terms
• Start anywhere, follow every link
• If every page has 1 kB of text and 10 links
– Then 10 levels would find a terabyte of data!
• Avoid links that are likely to be uninteresting
• Detect duplicates quickly with hashing
Outside-In Search
• Explore multiple paths between two points
– Usually trying to find the best by some measure
• Flight finder searches like a web crawler
– Every possible continuation of every route
• Also search backward from the destination
• Assuming 10 departures per airfield:
– 3 connections takes Flight finder 10,000 steps
– 1 connection twice would take 200 steps
How to Win at Chess
• The paths are the legal moves
– And the “places” are possible board positions
• You are seeking to make things better
– Your opponent seeks to make things worse
• Such “zero sum games” are common
– Although many lack chess’ shared information
• Any problem structure makes search easier
– The trick is to exploit constraints effectively
Minimax Searching
• Decide how many half-moves to look ahead
• Develop a scoring strategy for final positions
– Based on piece count and positional factors
• Follow a promising path
– Helpful to guess the best moves for each side
• With several moves available, pick the best
– But stop any search that can’t improve things
Minimax Search Example
Min
0
Max
0
3
0
0
-3
5
-3
3
3
5
Traveling Salesperson Problem
• Find the cheapest way of visiting 10 cities
– Given the airfare between every city pair
– Only visit each city once, finish where you start
• There are only 90 city pairs
– But there are a LOT of possible tours
• The best known algorithm is VERY slow
– Because the problem is “NP complete”
NP Complete Problems
• No “polynomial time” algorithm is known
2
3
4
– Not N , N , N …
• Haven’t proved that none exists
– But if it does, many hard problems would be easy
• Approximate solutions with heuristic methods
– Greedy methods
– Genetic algorithms
– Simulated annealing
What’s Wrong With Arrays?
• Must specify maximum size when declared
– And the maximum possible size is always used
• Can only index with integers
– For efficiency they must be densely packed
• Adding new elements is costly
– If the elements are stored in order
• Every element must be the same type
What’s Good About Arrays?
• Can get any element quickly
– If you know what position it is in
• Natural data structure to use with a loop
– Do the same thing to different data
• Efficiently uses memory
– If the array is densely packed
• Naturally encodes an order among elements
Linked Lists
• A way of making variable length arrays
– In which insertions and deletions are easy
• Very easy to do in Java
• But nothing comes for free
– Finding an element can be slow
– Extra space is needed for the links
– There is more complexity to keep track of
Making a Linked List
• In Java, all objects are accessed by reference
– Object variables store the location of the object
• New instances must be explicitly constructed
• Add reference to next element in each object
– Handy to also have a reference to the prior one
• Keep a reference to the first object
– And then walk down the list using a loop
Linked List Example
first
Jill
Joe
Tom
Public static main (String[] argv) {
Student first;
…
}
Public class Student {
int String name;
public Student next;
}
Linked List Operations
• Add an element
– Easy to put it in sorted order
• Examine every element
– Just as fast as using an array
• Find just one element
– May be as slow as examining every element
• Delete an element after you find it
– Fast if you keep both next and prior links
first
Jill
Linked List Insertion
Joe
Tom
public void insert(String newName) {
Student temp = first;
boolean done = false;
while (!done) {
if ((temp.next==null) ||
(temp.next.name.follows(newName))){
Student new =
new Student(name, temp.next);
temp.next=new;
done = true;
}
temp = temp.next;
}}
Trees
• Linked list with multiple next elements
– Just as easy to create as linked lists
– Binary trees are useful for relationships like “<“
• Insertions and deletions are easy
• Useful for fast searching of large collections
– But only if the tree is balanced
• Efficiently balancing trees is complex, but possible
Binary Tree Example
Public class Student {
int String name;
public Student left;
public Student right;
}
root
Joe
Jill
Tom
Data Structures in Java
• Resizable array [O(n) insertion, O(1) access]:
– ArrayList
• Linked list [O(1) insertion, O(n) access, sorted]:
– LinkedList
• Hash table [object index, unsorted, O(1)]:
– HashSet (key only)
– HashMap (key+value)
• Balanced Tree [object index, sorted, O(log n)]:
– TreeSet (key only)
– TreeMap (key+value)
Hashtables
• Find an element nearly as fast as in an array
– With easy insertion and deletion
– But without the ability to keep things in order
• Fairly complex to implement
– But Java defines a class to make it simple
• Helpful to understand how it works
– “One size fits all” approaches are inefficient
How Hashtables Work
• Create an array with enough room
– It helps a lot to guess the right size first
• Choose a variable in each object as the key
– But it doesn’t need to be an integer
• Choose a spot in the array for each key value
– Using a fast mathematical function
– Best if things are scattered around well
• Choose how to handle multiple keys at one spot
Java HashMap Class
• Hashtables are objects like any other
– import java.util.*
– Must be declared and instantiated
• The constructor optionally takes a size parameter
• put(key, value) adds an element
• containsKey(key) checks for an element
• get(key) returns the “value” object for that key
Stacks
• Maintain an implicit order
– Last In-First Out (LIFO)
• Easy additions and deletions
– push and pop operations
• Maps naturally to certain problems
– Interrupt a task to compute an intermediate value
• Implemented as a Java class
– import java.util.*
Choosing a Data Structure
• What operations do you need to perform?
– Reading every element is typically easy
– Other things depend on the representation
• Hashing finds single elements quickly
– But cannot preserve order
• Stacks and linked lists preserve order easily
– But they can only read one element at any time
• Balanced trees are best when you need both
• Which operations dominate the processing?