here - Cornell Computer Science
Download
Report
Transcript here - Cornell Computer Science
Topics
Will be covered
Important Programming Concepts
Types, Recursion/Induction, Asymptotic Complexity
The Toolbox (Data Structures)
Arrays, Linked Lists, Trees (BSTs, Heaps), Hashtables
Practical Things
Searching, Sorting, Abstract Data Types
Graphs
Threads and Concurrency
Topics
Sort of Important
Don’t have a panic attack over these topics!
GUIS
Just do the problems on old prelims/finals
Software engineering
Don’t write horrible code on coding questions
But they are both important in the real world!
Topics
Will Not Be Covered
Lectures 24 and above
Java Virtual Machine
Distributed Systems and Cloud Computing
Balancing Trees (e.g.: AVL trees)
But do know what a balanced and unbalanced tree is
Recurrences
Network Flow
Typing
Primitive Types
boolean, int, double, etc…
Test equality with == and !=
Compare with <, <=, >, and >=
Pass by Value
void f(int x) {
x--;
}
int x = 10;
f(x);
// x == 10
f
10
9
main
10
Typing
Reference types
Actual object is stored elsewhere
Variable contains a reference to the object
== tests equality for the reference
equals() tests equality for the object
x == y implies that x.equals(y)
x.equals(y) does not imply x == y
How do we compare objects of type T?
Implement the Comparable<T> interface
Pass by Reference
void f(ArrayList<Integer> l) {
l.add(2);
l = new ArrayList<Integer>();
}
ArrayList<Integer> l =
new ArrayList<Integer >();
l.add(1);
f(l);
// l contains 1, 2
{}
{1,2}
{1}
{}
f
l
main
l
Typing
We know that type B can implement/extend A
B is a subtype of A; A is a supertype of B
The real type of the object is its dynamic type
This type is known only at run-time
Object can act like a supertype of its dynamic type
Cannot act like a subtype of its dynamic type
Variables/arguments of type A accept any subtype of A
A is a supertype of the static type
Static type is a supertype of the dynamic type
Typing
Static type is compile-time, determined by code
Dynamic type might be a subtype of the static type
Casting temporarily changes the static type
Upcasts are always safe
Always cast to a supertype of the dynamic type
Downcasts may not be safe
Can downcast to a supertype of the dynamic type
Cannot downcast to a subtype of the dynamic type
Typing
If B extends A, and B and A both have function foo()
Which foo() gets called?
Answer depends on the dynamic type
If the dynamic type is B, B’s foo will() be called
Even if foo() is invoked inside a function of A
Exception: static functions
Static functions are not associated with any object
Thus, they do not have any type
Induction and Recursion
Recursion
Basic examples
Factorial : n! = n(n-1)!
Combinations Pascal’s triangle
Recursive structure
Tree (tree t = root with right/left subtree)
Depth first search
Don’t forget base case (in proof, in your code)
Induction and Recursion
Induction
Can do induction on previous recursive problems
Algorithm correctness proof (DFS)
Math equation proof
Induction and Recursion
Step 1
Base case
Step 2
suppose n is the variable you’re going to do induction on.
Suppose the equation holds when n=k
Strong induction: suppose it holds for all n<=k
Step 3
prove that when n = k+1, equation still holds, by making use of
the assumptions in Step 2.
Asymptotic Complexity
f(n) is O(g(n)) if:
∃ (c, n0) such that ∀ n ≥ n0, f(n) ≤ c⋅g(n)
∃ - there exists; ∀ - for all
(c, n0) is called the witness pair
f(n) is O(g(n)) roughly means f(n) ≤ g(n)
Big-O notation is a model for running time
Don’t need to know the computer’s specs
Models usually but do not always work in real life
Asymptotic Complexity
Meaning of n0
We can compare two integers
How can we compare two functions?
Answer is which function grows faster
Fast vs. slow car
60-mph car with no headstart
40-mph car with a headstart
n0 is when the fast car overtakes the slow one
Functions have a dominant term
2n3 + 4n2 + n + 2: 2n3 is the dominant term
Asymptotic Complexity
Meaning of c
Cannot get a precise measurement
Famous election recounts (Bush/Gore, Coleman/Franken)
Algorithm’s speed on a 2 GHz versus a 1 GHz processor
Hard to measure constant for the dominant term
c is the fudge factor
Change the speed by a constant factor
2GHz is at most twice as fast as a 1 GHz (c = 2)
2n3 is O(n3)
Fast and slow car have asymptotically equal speed
Asymptotic Complexity
(Assume c, d are constants)
Logarithmic vs. Logarithmic: log(nc), log(nd) “equal”
Difference is a constant factor ( log(nc) = c log(n) )
Logarithm’s base also does not matter
Logarithmic vs. Polynomial: log n is O(nc)
Corollary: O(n log n) better than O(n2)
Polynomial vs. Polynomial: nc is O(nd), c ≤ d
Polynomial vs. Exponential: nc is O(dn)
Exponential running time almost always too slow!
Arrays
Arrays have a fixed capacity
If more space needed…
Allocate larger array
Copy smaller array into larger one
Entire operations takes O(n) time
Arrays have random access
Can read any element in O(1) times
(Doubly) Linked Lists
Each node has three parts
Value stored in node
Next node
Previous node
Also have access to head, tail of linked list
Very easy to grow linked list in both directions
Downside: sequential access
O(n) time to access something in the middle
Trees
Recursive data structure
A tree is…
A single node
Zero or more subtrees below it
Every node (except the root) has one parent
Properties of trees must hold for all nodes in the tree
Each node is the root of some tree
Makes sense for recursive algorithms
Tree
Binary Trees
Each node can have at most two children
We usually distinguish between left, right child
Trees we study in CS 2110 are binary trees
Binary Trees
Binary Search Trees
Used to sort data inside a tree
For every node with value x:
Every node in the left subtree has a value < x
Every node in the right subtree has a value > x
Binary search tree is not guaranteed to be balanced
If it is balanced, we can find a node in O(log n) time
Binary Search Trees
Binary Search Trees
Completely
unbalanced,
but still a BST
Not a Binary Search Tree
8 is in the left
subtree of 5
Not a Binary Search Tree
3 is the left
child of 2
Adding to a Binary Search Tree
Adding 4 to the BST
Start at root (5)
4<5
Go left to 2
4>2
Go right to 3
4>3
Add 4 as right child of 3
Tree Traversals
Converts the tree into a list
Works on any binary tree
Do not need a binary search tree
Traverse node and its left and right subtrees
Subtrees are traversed recursively
Preorder: node, left subtree, right subtree
Inorder: left subtree, node, right subtree
Produces a sorted list for binary search trees
Postorder: left subtree, rightsubtree, node
Tree Traversals
Inorder Traversal
In(5)
In(2), 5, In(7)
1, 2, In(3), 5, In(7)
1, 2, 3, 4, 5, In(7)
1, 2, 3, 4, 5, 6, 7, 9
Binary Heaps
Weaker condition on each node
A node is smaller than its immediate children
Don’t care about entire subtree
Don’t care if right child is smaller than left child
Smallest node guaranteed to be at the root
No guarantees beyond that
Guarantee also holds for each subtree
Heaps grow top-to-bottom, left-to-right
Shrink in opposite direction
Binary Heaps
Binary Heaps
Adding to a heap
Find lowest unfilled level of tree
Find leftmost empty spot, add node there
New node could be smaller than its parent
Swap it up if its smaller
If swapped, could still be smaller than its new parent
Keep swapping up until the parent is smaller
Binary Heaps
Removing from a heap
Take out element from the top of the heap
Find rightmost element on lowest level
Make this the new root
New root could be larger than one/both children
Swap it with the smallest child
Keep swapping down…
Binary Heaps
Can represent a heap as an array
Root element is at index 0
For an element at index i
Parent is at index (i – 1)/2
Children are at indices 2i + 1, 2i + 2
n-element heap takes up first n spots in the array
New elements grow heap by 1
Removing an element shrinks heap by 1
Hashtables
Motivation
Sort n numbers between 0 and 2n – 1
Sorting integers within a certain range
More specific than comparable objects with unlimited range
General lower bound of O(n log n) may not apply
Can be done in O(n) time with counting sort
Create an array of size 2n
The ith entry counts all the numbers equal to i
For each number, increment the correct entry
Can also find a given number in O(1) time
Hashtables
Cannot do this with arbitrary data types
Integer alone can have over 4 billion possible values
For a hashtable, create an array of size m
Hash function maps each object to an array index
between 0 and m – 1 (in O(1) time)
Hash function makes sorting impossible
Quality of hash function is based on how many
elements map to same index in the hashtable
Need to expect O(1) collisions
Hashtables
Dealing with collisions
In counting sort, one array entry contains only element
of the same value
The hash function can map different objects to the same
index of the hashtable
Chaining
Each entry of the hashtable is a linked list
Linear Probing
If h(x) is taken, try h(x) + 1, h(x) + 2, h(x) + 3, …
Quadratic probing: h(x) + 1, h(x) + 4, h(x) + 9, …
Hashtables
Table Size
If too large, we waste space
If too small, everything collides with each other
Probing falls apart if number of items (n) is almost the size of
the hashtable (m)
Typically have a load factor 0 < λ ≤ 1
Resize table when n/m exceeds λ
Resizing changes m
Have to reinsert everything into new hashtable
Hashtables
Table doubling
Double the size every time we exceed our load factor
Worst case is when we just doubled the hashtable
Consider all prior times we doubled the table
n + n/2 + n/4 + n/8 + … < 2n
Insert n items in O(n) time
Average O(1) time to insert one item
Some operations take O(n) time
This also works for growing an ArrayList
Hashtables
Java, hashCode(), and equals()
hashCode() assigns an object an integer value
Java maps this integer to a number between 0 and m – 1
If x.equals(y), x and y should have the same hashCode()
Insert object with one hashCode()
Won’t find it if you look it up with a different hashCode()
If you override equals(), you must also override hashCode()
Different objects can have the same hashCode()
If this happens too often, we have too many collisions
Only equals() can determine if they are equal
Searching
Unsorted lists
Element could be anywhere, O(n) search time
Sorted lists – binary search
Try middle element
Search left half if too large, right half if too small
Each step cuts search space in half, O(log n) steps
Binary search requires random access
Searching a sorted linked list is still O(n)
Sorting
Many ways to sort
Same high-level goal, same result
What’s the difference?
Algorithm, data structures used dictates running time
Also dictate space usages
Each algorithm has its own flavor
Once again, assume random access to the list
Sorting
Swap operation: swap(x, i, j)
temp = x[i]
x[i] = x[j]
x[j] = temp
Many sorts are a fancy set of swap instructions
Modifies the array in place, very space-efficient
Not space efficient to copy a large array
Insertion Sort
for (i = 0…n-1)
Take element at index i, swap it back one spot until…
you hit the beginning of the list
previous element is smaller than this one
Ex.: 4, 7, 8, 6, 2, 9, 1, 5 – swap 6 back twice
After k iterations, first k elements relatively sorted
4 iterations: 4, 6, 7, 8, 2, 9, 1, 5
O(n2) time, in-place sort
O(n) for sorted lists, good for nearly sorted lists
Selection Sort
for (i = 0…n-1)
Scan array from index i to the end
Swap smallest element into index i
Ex.: 1, 3, 5, 8, 9, 6, 7 – swap 6, 8
After k iterations, first k elements absolutely sorted
4 iterations: 1, 3, 5, 6, 9, 8, 7
O(n2) time, in-place sort
Also O(n2) for sorted lists
Merge Sort
Copy left half, right half into two smaller arrays
Recursively run merge sort each half
Base case: 1 or 2 element array
Merge two sorted halves back into original array
Ex. (1, 3, 6, 9), (2, 5, 7, 8) – (1, 2, 3, 5, 6, 7, 8, 9)
Running Time: O(n log n)
Merge takes O(n) time
Split the list in half about log(n) times
Also uses O(n) extra space!
Quick Sort
Randomly pick a pivot
Partition into two (unequal) halves
Left partition smaller than pivot, right partition larger
Recursively run quick sort on both partitions
Expected Running Time: O(n log n)
Partition takes O(n) time
Pivot could be smallest, largest element (n partitions)
Expected to split list O(log n) times
In-place sort: partition can be done in-place
Heap Sort
Use a max-heap (represented as an array)
Can move items up/down the heap with swaps
for (i = 0…n-1)
Add element at index i to the heap
First pass builds the heaps
for (i = n-1…0)
Remove largest element, put it in spot i
O(n log n), in-place sort
Abstract Data Types
Lists
Stacks
LIFO
Queues
FIFO
Sets
Dictionaries (Maps)
Priority Queues
Java API
E.g.: ArrayList is an ADT list backed by an array
Abstract Data Types
Priority Queue
Implemented as heap
peek() - look at heap root : O(1)
poll() - heap “delete” op : O(log n)
add() - heap “add” op : O(log n)
What is a graph?
A graph has vertices
A graph has edges between two vertices
n – number of vertices; m – number of edges
Directed vs. undirected graph
Directed edges can only be traversed one way
Undirected edges can be traversed both way
Weighted vs. unweighted graph
Edges could have weights/costs assigned to them
What is a graph?
What makes a graph special?
Cycles!!!
What is a graph without a cycle?
Undirected graphs
Trees
Directed graphs
Directed acyclic graph (DAG)
Topological Sort
Topological sort is for directed graphs
Indegree: number of edges entering a vertex
Outdegree: number of edges leaving a vertex
Topological sort algorithm
Delete a vertex with an indegree of 0
Delete its outgoing edges, too
Repeat until no vertices have an indegree of 0
Topological Sort
B
A
E
D
A
B
C
C
E
D
Topological Sort
What can’t topological sort cannot delete?
Cycles!!!
Every node in a cycle has an indegree of 1
Need to delete another node in the cycle first
A graph is DAG iff a topological sort deletes it
iff - if and only if
Graph Searching
Works on directed and undirected graphs
Components – set of vertices connected to edges
Component starts as a single vertex
No edges are selected yet
Travel edges to add vertices to this component
From a vertex in the component to a new vertex
Graph Searching
Implementation details
Why is choosing a random path risky?
Cycles!!!
Could traverse a cycle forever
Need to keep track of vertices in your component
No cycles if you do not visit a vertex twice
Graph search algorithms need:
A set of vertices in the component (visited)
A collection of vertices to visit (toVisit)
Graph Searching
Add the start vertex to toVisit
Pick a vertex from toVisit
If the vertex is in visited, do nothing
Could add same vertex to toVisit twice before visiting it
If the vertex is not in visited:
Add vertex (and the edge to it) to visited
Follow its edges to neighboring vertices
Add each neighboring vertices to toVisit if it is not in visited
Repeat until there are no more vertices to visit
Graph Searching
Running time analysis
Can check if vertex is in visited in O(1) time
Use an array or HashSet
Each edge added to toVisit once in a directed graph
Algorithm has at most m iterations
Running time determined by ADT used for toVisit
Stack, queue: O(1) to add, delete; O(m) total
Priority queue: O(log m) to add, delete; O(m log m) total
Graph Searching
Depth-first search
Uses a stack
Goes as deep as it can before taking a new path
Breadth-first search
Uses a queue
Graph Searching: DFS
B
A
E
A
B
E
D
C
C
E-D
B-E
D
A-B
B-C
∅-A
Graph Searching: BFS
B
A
E
D
A
B
C
C
E
D
E-D
C-D
B-E
A-B
B-C
∅-A
Minimum Spanning Tree
MSTs apply to undirected graphs
Take only some of the edges in the graph
Spanning – all vertices connected together
Tree – no cycles connected
For all spanning trees, m = n – 1
All unweighted spanning trees are MSTs
Need to find MST for a weighted graph
Minimum Spanning Tree
Initially, we have no edges selected
We have n 1-vertex components
Find edge between two components
Don’t add edge between vertices in same component
Creates a cycle
Pick smallest edge between two components
This is a greedy strategy, but it somehow works
If edge weights are distinct, only one possible MST
Doesn’t matter which algorithm you choose
Minimum Spanning Trees
Kruskal’s algorithm
Process edges from least to greatest
Each edge either
Connects two vertices in two different components (take it)
Connects two vertices in the same component (throw it out)
O(m log m) running time
O(m log m) to sort the edges
Need union-find to track components and merge them
Does not alter running time
Minimum Spanning Trees
5
A
7
B
8
4
3
2
C
E
9
12
D
10
F
G
1
Minimum Spanning Trees
Prim’s algorithm
Graph search algorithm
Like BFS, but it uses a priority queue
Priority is the edge weight
Size of heap is O(m); running time is O(m log m)
Minimum Spanning Trees
7
B
5
A
8
3
C-D
A-C
∅-A
0
23
10
C
C-B
4
A-B
5
B-E
7
9
12
D
4
2
E
D-E
8
G
1
F
E-G
G-F
91
D-F
10
D-G
12
Shortest Path Algorithm
For directed and undirected graphs
Find shortest path between two vertices
Start vertex is the source
End vertex is the sink
Use Dijkstra’s algorithm
Assumes positive edge weights
Graph search algorithm using a priority queue
Weight is now entire path length
Cost of path to node + cost of edge to next vertex
Shortest Path Algorithm
5
A
B
8
4
3
2
A-C
∅-A
0
2
7
A-B
5
C-D
5
10
C-B
6
B-E
12
9
15
D
C
E
D-E
13
F
D-F
F-G
16
15
G
1
D-G
20
E-G
21
Motivation for Threads
Splitting the Work
Multi-core processors can multi-task
Can’t multi-task with a single thread!
Operating system simulates multi-tasking
Obvious: Multi-tasking is faster
Less Obvious: Not all threads are active
Waiting for a movie to download from DC++
Multi-task: Study for CS 2110 while you wait!
Operating system runs another thread while one waits
Thread Basics
Code – Set of instructions, stored as data
A paper with instructions does not do anything
Code by itself does not run or execute
Thread – Runs/executes code
Multiple people can follow the same instructions
Multiple threads can run the same code
Thread Basics
Making the Code
Write a class implementing the Runnable interface
Or use an anonymous inner class on the fly…
Code to execute goes in the run() method
Running a Thread
Thread thread = new Thread(Runnable target);
Multiple threads could be constructed with the same target
Start a thread: thread.start();
Wait until a thread finishes: thread.join();
Concurrency
Two or more active threads
Hard to predict each thread’s “speed”
Nondeterministic behavior
Two active threads, one data structure
Both perform writes, or one reads while another writes
No problems if both read data
Updates are not atomic!
E.g.: One heap operation (such as add) takes multiple steps
Wait until update completes to read/write again
Data in a bad state until this happens
Concurrency
Avoiding race conditions (write/write, read/write)
Imagine a room with a key
Must acquire the key before entering the room
Enter room, do many things, leave room, drop off key
Mutual exclusion, or mutexes
Only one thread can acquire a mutex
Others threads wait until mutex is released
Synchronization
Could use Java’s Semaphore class with a count of 1
Primitive, many corner cases like exceptions
Better: synchronized (x) { /* body*/ }
Thread 1 acquires a lock on object
x is a reference type; lock acquired on its memory location
Thread 2 must wait if it wants a lock on x
Also waits if it wants a lock on y if x == y
Synchronized functions: synchronized (this)
Only one synchronized function running per object
Synchronization
Only synchronize when you have to
Get key, enter room, chat with BFF on phone for 1 hour…
People waiting for key are very angry!
Deadlock via bugs
Thread 1 never releases lock on A, Thread 2 waiting for A
Deadlock via circular waiting
Thread 1 has lock on A, waiting for lock on B
Thread 2 has lock on B, waiting for lock on A
Solution: always acquire locks in same order