here - Cornell Computer Science

Download Report

Transcript here - Cornell Computer Science

Topics
 Will be covered
 Important Programming Concepts

Types, Recursion/Induction, Asymptotic Complexity
 The Toolbox (Data Structures)

Arrays, Linked Lists, Trees (BSTs, Heaps), Hashtables
 Practical Things

Searching, Sorting, Abstract Data Types
 Graphs
 Threads and Concurrency
Topics
 Sort of Important
 Don’t have a panic attack over these topics!
 GUIS

Just do the problems on old prelims/finals
 Software engineering

Don’t write horrible code on coding questions
 But they are both important in the real world!
Topics
 Will Not Be Covered
 Lectures 24 and above


Java Virtual Machine
Distributed Systems and Cloud Computing
 Balancing Trees (e.g.: AVL trees)

But do know what a balanced and unbalanced tree is
 Recurrences
 Network Flow
Typing
 Primitive Types
 boolean, int, double, etc…
 Test equality with == and !=
 Compare with <, <=, >, and >=
Pass by Value
void f(int x) {
x--;
}
int x = 10;
f(x);
// x == 10
f
10
9
main
10
Typing
 Reference types
 Actual object is stored elsewhere
 Variable contains a reference to the object
 == tests equality for the reference
 equals() tests equality for the object


x == y implies that x.equals(y)
x.equals(y) does not imply x == y
 How do we compare objects of type T?

Implement the Comparable<T> interface
Pass by Reference
void f(ArrayList<Integer> l) {
l.add(2);
l = new ArrayList<Integer>();
}
ArrayList<Integer> l =
new ArrayList<Integer >();
l.add(1);
f(l);
// l contains 1, 2
{}
{1,2}
{1}
{}
f
l
main
l
Typing
 We know that type B can implement/extend A
 B is a subtype of A; A is a supertype of B
 The real type of the object is its dynamic type
 This type is known only at run-time
 Object can act like a supertype of its dynamic type
 Cannot act like a subtype of its dynamic type
 Variables/arguments of type A accept any subtype of A
 A is a supertype of the static type
 Static type is a supertype of the dynamic type
Typing
 Static type is compile-time, determined by code
 Dynamic type might be a subtype of the static type
 Casting temporarily changes the static type
 Upcasts are always safe
 Always cast to a supertype of the dynamic type
 Downcasts may not be safe
 Can downcast to a supertype of the dynamic type
 Cannot downcast to a subtype of the dynamic type
Typing
 If B extends A, and B and A both have function foo()
 Which foo() gets called?
 Answer depends on the dynamic type
 If the dynamic type is B, B’s foo will() be called
 Even if foo() is invoked inside a function of A
 Exception: static functions
 Static functions are not associated with any object
 Thus, they do not have any type
Induction and Recursion
 Recursion
 Basic examples


Factorial : n! = n(n-1)!
Combinations Pascal’s triangle
 Recursive structure

Tree (tree t = root with right/left subtree)
 Depth first search
 Don’t forget base case (in proof, in your code)
Induction and Recursion
 Induction
 Can do induction on previous recursive problems


Algorithm correctness proof (DFS)
Math equation proof
Induction and Recursion
 Step 1
 Base case
 Step 2
 suppose n is the variable you’re going to do induction on.
Suppose the equation holds when n=k
 Strong induction: suppose it holds for all n<=k
 Step 3
 prove that when n = k+1, equation still holds, by making use of
the assumptions in Step 2.
Asymptotic Complexity
 f(n) is O(g(n)) if:
 ∃ (c, n0) such that ∀ n ≥ n0, f(n) ≤ c⋅g(n)
 ∃ - there exists; ∀ - for all
 (c, n0) is called the witness pair
 f(n) is O(g(n)) roughly means f(n) ≤ g(n)
 Big-O notation is a model for running time
 Don’t need to know the computer’s specs
 Models usually but do not always work in real life
Asymptotic Complexity
 Meaning of n0
 We can compare two integers
 How can we compare two functions?
 Answer is which function grows faster
 Fast vs. slow car



60-mph car with no headstart
40-mph car with a headstart
n0 is when the fast car overtakes the slow one
 Functions have a dominant term

2n3 + 4n2 + n + 2: 2n3 is the dominant term
Asymptotic Complexity
 Meaning of c
 Cannot get a precise measurement



Famous election recounts (Bush/Gore, Coleman/Franken)
Algorithm’s speed on a 2 GHz versus a 1 GHz processor
Hard to measure constant for the dominant term
 c is the fudge factor




Change the speed by a constant factor
2GHz is at most twice as fast as a 1 GHz (c = 2)
2n3 is O(n3)
Fast and slow car have asymptotically equal speed
Asymptotic Complexity
 (Assume c, d are constants)
 Logarithmic vs. Logarithmic: log(nc), log(nd) “equal”
 Difference is a constant factor ( log(nc) = c log(n) )
 Logarithm’s base also does not matter
 Logarithmic vs. Polynomial: log n is O(nc)
 Corollary: O(n log n) better than O(n2)
 Polynomial vs. Polynomial: nc is O(nd), c ≤ d
 Polynomial vs. Exponential: nc is O(dn)
 Exponential running time almost always too slow!
Arrays
 Arrays have a fixed capacity
 If more space needed…
 Allocate larger array
 Copy smaller array into larger one
 Entire operations takes O(n) time
 Arrays have random access
 Can read any element in O(1) times
(Doubly) Linked Lists
 Each node has three parts
 Value stored in node
 Next node
 Previous node
 Also have access to head, tail of linked list
 Very easy to grow linked list in both directions
 Downside: sequential access
 O(n) time to access something in the middle
Trees
 Recursive data structure
 A tree is…
 A single node
 Zero or more subtrees below it
 Every node (except the root) has one parent
 Properties of trees must hold for all nodes in the tree
 Each node is the root of some tree
 Makes sense for recursive algorithms
Tree
Binary Trees
 Each node can have at most two children
 We usually distinguish between left, right child
 Trees we study in CS 2110 are binary trees
Binary Trees
Binary Search Trees
 Used to sort data inside a tree
 For every node with value x:
 Every node in the left subtree has a value < x
 Every node in the right subtree has a value > x
 Binary search tree is not guaranteed to be balanced
 If it is balanced, we can find a node in O(log n) time
Binary Search Trees
Binary Search Trees
Completely
unbalanced,
but still a BST
Not a Binary Search Tree
8 is in the left
subtree of 5
Not a Binary Search Tree
3 is the left
child of 2
Adding to a Binary Search Tree
 Adding 4 to the BST
 Start at root (5)
4<5
 Go left to 2
4>2
 Go right to 3
4>3
 Add 4 as right child of 3
Tree Traversals
 Converts the tree into a list
 Works on any binary tree
 Do not need a binary search tree
 Traverse node and its left and right subtrees
 Subtrees are traversed recursively
 Preorder: node, left subtree, right subtree
 Inorder: left subtree, node, right subtree
 Produces a sorted list for binary search trees
 Postorder: left subtree, rightsubtree, node
Tree Traversals
 Inorder Traversal
 In(5)
 In(2), 5, In(7)
 1, 2, In(3), 5, In(7)
 1, 2, 3, 4, 5, In(7)
 1, 2, 3, 4, 5, 6, 7, 9
Binary Heaps
 Weaker condition on each node
 A node is smaller than its immediate children
 Don’t care about entire subtree
 Don’t care if right child is smaller than left child
 Smallest node guaranteed to be at the root
 No guarantees beyond that
 Guarantee also holds for each subtree
 Heaps grow top-to-bottom, left-to-right
 Shrink in opposite direction
Binary Heaps
Binary Heaps
 Adding to a heap
 Find lowest unfilled level of tree
 Find leftmost empty spot, add node there
 New node could be smaller than its parent
 Swap it up if its smaller
 If swapped, could still be smaller than its new parent

Keep swapping up until the parent is smaller
Binary Heaps
 Removing from a heap
 Take out element from the top of the heap
 Find rightmost element on lowest level
 Make this the new root
 New root could be larger than one/both children
 Swap it with the smallest child
 Keep swapping down…
Binary Heaps
 Can represent a heap as an array
 Root element is at index 0
 For an element at index i
 Parent is at index (i – 1)/2
 Children are at indices 2i + 1, 2i + 2
 n-element heap takes up first n spots in the array
 New elements grow heap by 1
 Removing an element shrinks heap by 1
Hashtables
 Motivation
 Sort n numbers between 0 and 2n – 1
 Sorting integers within a certain range


More specific than comparable objects with unlimited range
General lower bound of O(n log n) may not apply
 Can be done in O(n) time with counting sort



Create an array of size 2n
The ith entry counts all the numbers equal to i
For each number, increment the correct entry
 Can also find a given number in O(1) time
Hashtables
 Cannot do this with arbitrary data types
 Integer alone can have over 4 billion possible values
 For a hashtable, create an array of size m
 Hash function maps each object to an array index
between 0 and m – 1 (in O(1) time)

Hash function makes sorting impossible
 Quality of hash function is based on how many
elements map to same index in the hashtable

Need to expect O(1) collisions
Hashtables
 Dealing with collisions
 In counting sort, one array entry contains only element
of the same value
 The hash function can map different objects to the same
index of the hashtable
 Chaining
 Each entry of the hashtable is a linked list
 Linear Probing
 If h(x) is taken, try h(x) + 1, h(x) + 2, h(x) + 3, …
 Quadratic probing: h(x) + 1, h(x) + 4, h(x) + 9, …
Hashtables
 Table Size
 If too large, we waste space
 If too small, everything collides with each other

Probing falls apart if number of items (n) is almost the size of
the hashtable (m)
 Typically have a load factor 0 < λ ≤ 1

Resize table when n/m exceeds λ
 Resizing changes m

Have to reinsert everything into new hashtable
Hashtables
 Table doubling
 Double the size every time we exceed our load factor


Worst case is when we just doubled the hashtable
Consider all prior times we doubled the table
 n + n/2 + n/4 + n/8 + … < 2n
 Insert n items in O(n) time


Average O(1) time to insert one item
Some operations take O(n) time
 This also works for growing an ArrayList
Hashtables
 Java, hashCode(), and equals()
 hashCode() assigns an object an integer value

Java maps this integer to a number between 0 and m – 1
 If x.equals(y), x and y should have the same hashCode()



Insert object with one hashCode()
Won’t find it if you look it up with a different hashCode()
If you override equals(), you must also override hashCode()
 Different objects can have the same hashCode()


If this happens too often, we have too many collisions
Only equals() can determine if they are equal
Searching
 Unsorted lists
 Element could be anywhere, O(n) search time
 Sorted lists – binary search
 Try middle element
 Search left half if too large, right half if too small
 Each step cuts search space in half, O(log n) steps
 Binary search requires random access
 Searching a sorted linked list is still O(n)
Sorting
 Many ways to sort
 Same high-level goal, same result
 What’s the difference?
 Algorithm, data structures used dictates running time
 Also dictate space usages
 Each algorithm has its own flavor
 Once again, assume random access to the list
Sorting
 Swap operation: swap(x, i, j)
 temp = x[i]
 x[i] = x[j]
 x[j] = temp
 Many sorts are a fancy set of swap instructions
 Modifies the array in place, very space-efficient
 Not space efficient to copy a large array
Insertion Sort
 for (i = 0…n-1)
 Take element at index i, swap it back one spot until…


you hit the beginning of the list
previous element is smaller than this one
 Ex.: 4, 7, 8, 6, 2, 9, 1, 5 – swap 6 back twice
 After k iterations, first k elements relatively sorted
 4 iterations: 4, 6, 7, 8, 2, 9, 1, 5
 O(n2) time, in-place sort
 O(n) for sorted lists, good for nearly sorted lists
Selection Sort
 for (i = 0…n-1)
 Scan array from index i to the end
 Swap smallest element into index i
 Ex.: 1, 3, 5, 8, 9, 6, 7 – swap 6, 8
 After k iterations, first k elements absolutely sorted
 4 iterations: 1, 3, 5, 6, 9, 8, 7
 O(n2) time, in-place sort
 Also O(n2) for sorted lists
Merge Sort
 Copy left half, right half into two smaller arrays
 Recursively run merge sort each half

Base case: 1 or 2 element array
 Merge two sorted halves back into original array
 Ex. (1, 3, 6, 9), (2, 5, 7, 8) – (1, 2, 3, 5, 6, 7, 8, 9)
 Running Time: O(n log n)
 Merge takes O(n) time
 Split the list in half about log(n) times
 Also uses O(n) extra space!
Quick Sort
 Randomly pick a pivot
 Partition into two (unequal) halves

Left partition smaller than pivot, right partition larger
 Recursively run quick sort on both partitions
 Expected Running Time: O(n log n)
 Partition takes O(n) time
 Pivot could be smallest, largest element (n partitions)
 Expected to split list O(log n) times
 In-place sort: partition can be done in-place
Heap Sort
 Use a max-heap (represented as an array)
 Can move items up/down the heap with swaps
 for (i = 0…n-1)
 Add element at index i to the heap
 First pass builds the heaps
 for (i = n-1…0)
 Remove largest element, put it in spot i
 O(n log n), in-place sort
Abstract Data Types
 Lists
 Stacks
 LIFO
 Queues
 FIFO
 Sets
 Dictionaries (Maps)
 Priority Queues
 Java API
 E.g.: ArrayList is an ADT list backed by an array
Abstract Data Types
 Priority Queue
 Implemented as heap



peek() - look at heap root : O(1)
poll() - heap “delete” op : O(log n)
add() - heap “add” op : O(log n)
What is a graph?
 A graph has vertices
 A graph has edges between two vertices
 n – number of vertices; m – number of edges
 Directed vs. undirected graph
 Directed edges can only be traversed one way
 Undirected edges can be traversed both way
 Weighted vs. unweighted graph
 Edges could have weights/costs assigned to them
What is a graph?
 What makes a graph special?
 Cycles!!!
 What is a graph without a cycle?
 Undirected graphs

Trees
 Directed graphs

Directed acyclic graph (DAG)
Topological Sort
 Topological sort is for directed graphs
 Indegree: number of edges entering a vertex
 Outdegree: number of edges leaving a vertex
 Topological sort algorithm
 Delete a vertex with an indegree of 0

Delete its outgoing edges, too
 Repeat until no vertices have an indegree of 0
Topological Sort
B
A
E
D
A
B
C
C
E
D
Topological Sort
 What can’t topological sort cannot delete?
 Cycles!!!
 Every node in a cycle has an indegree of 1
 Need to delete another node in the cycle first
 A graph is DAG iff a topological sort deletes it
 iff - if and only if
Graph Searching
 Works on directed and undirected graphs
 Components – set of vertices connected to edges
 Component starts as a single vertex
 No edges are selected yet
 Travel edges to add vertices to this component
 From a vertex in the component to a new vertex
Graph Searching
 Implementation details
 Why is choosing a random path risky?
 Cycles!!!
 Could traverse a cycle forever
 Need to keep track of vertices in your component
 No cycles if you do not visit a vertex twice
 Graph search algorithms need:
 A set of vertices in the component (visited)
 A collection of vertices to visit (toVisit)
Graph Searching
 Add the start vertex to toVisit
 Pick a vertex from toVisit
 If the vertex is in visited, do nothing

Could add same vertex to toVisit twice before visiting it
 If the vertex is not in visited:



Add vertex (and the edge to it) to visited
Follow its edges to neighboring vertices
Add each neighboring vertices to toVisit if it is not in visited
 Repeat until there are no more vertices to visit
Graph Searching
 Running time analysis
 Can check if vertex is in visited in O(1) time

Use an array or HashSet
 Each edge added to toVisit once in a directed graph

Algorithm has at most m iterations
 Running time determined by ADT used for toVisit


Stack, queue: O(1) to add, delete; O(m) total
Priority queue: O(log m) to add, delete; O(m log m) total
Graph Searching
 Depth-first search
 Uses a stack
 Goes as deep as it can before taking a new path
 Breadth-first search
 Uses a queue
Graph Searching: DFS
B
A
E
A
B
E
D
C
C
E-D
B-E
D
A-B
B-C
∅-A
Graph Searching: BFS
B
A
E
D
A
B
C
C
E
D
E-D
C-D
B-E
A-B
B-C
∅-A
Minimum Spanning Tree
 MSTs apply to undirected graphs
 Take only some of the edges in the graph
 Spanning – all vertices connected together
 Tree – no cycles connected
 For all spanning trees, m = n – 1
 All unweighted spanning trees are MSTs
 Need to find MST for a weighted graph
Minimum Spanning Tree
 Initially, we have no edges selected
 We have n 1-vertex components
 Find edge between two components
 Don’t add edge between vertices in same component
 Creates a cycle
 Pick smallest edge between two components
 This is a greedy strategy, but it somehow works
 If edge weights are distinct, only one possible MST
 Doesn’t matter which algorithm you choose
Minimum Spanning Trees
 Kruskal’s algorithm
 Process edges from least to greatest
 Each edge either


Connects two vertices in two different components (take it)
Connects two vertices in the same component (throw it out)
 O(m log m) running time


O(m log m) to sort the edges
Need union-find to track components and merge them
 Does not alter running time
Minimum Spanning Trees
5
A
7
B
8
4
3
2
C
E
9
12
D
10
F
G
1
Minimum Spanning Trees
 Prim’s algorithm
 Graph search algorithm
 Like BFS, but it uses a priority queue

Priority is the edge weight
 Size of heap is O(m); running time is O(m log m)
Minimum Spanning Trees
7
B
5
A
8
3
C-D
A-C
∅-A
0
23
10
C
C-B
4
A-B
5
B-E
7
9
12
D
4
2
E
D-E
8
G
1
F
E-G
G-F
91
D-F
10
D-G
12
Shortest Path Algorithm
 For directed and undirected graphs
 Find shortest path between two vertices
 Start vertex is the source
 End vertex is the sink
 Use Dijkstra’s algorithm
 Assumes positive edge weights
 Graph search algorithm using a priority queue
 Weight is now entire path length

Cost of path to node + cost of edge to next vertex
Shortest Path Algorithm
5
A
B
8
4
3
2
A-C
∅-A
0
2
7
A-B
5
C-D
5
10
C-B
6
B-E
12
9
15
D
C
E
D-E
13
F
D-F
F-G
16
15
G
1
D-G
20
E-G
21
Motivation for Threads
 Splitting the Work
 Multi-core processors can multi-task

Can’t multi-task with a single thread!
 Operating system simulates multi-tasking
 Obvious: Multi-tasking is faster
 Less Obvious: Not all threads are active
 Waiting for a movie to download from DC++

Multi-task: Study for CS 2110 while you wait!
 Operating system runs another thread while one waits
Thread Basics
 Code – Set of instructions, stored as data
 A paper with instructions does not do anything
 Code by itself does not run or execute
 Thread – Runs/executes code
 Multiple people can follow the same instructions
 Multiple threads can run the same code
Thread Basics
 Making the Code
 Write a class implementing the Runnable interface

Or use an anonymous inner class on the fly…
 Code to execute goes in the run() method
 Running a Thread
 Thread thread = new Thread(Runnable target);

Multiple threads could be constructed with the same target
 Start a thread: thread.start();
 Wait until a thread finishes: thread.join();
Concurrency
 Two or more active threads
 Hard to predict each thread’s “speed”
 Nondeterministic behavior
 Two active threads, one data structure
 Both perform writes, or one reads while another writes

No problems if both read data
 Updates are not atomic!

E.g.: One heap operation (such as add) takes multiple steps
 Wait until update completes to read/write again

Data in a bad state until this happens
Concurrency
 Avoiding race conditions (write/write, read/write)
 Imagine a room with a key
 Must acquire the key before entering the room
 Enter room, do many things, leave room, drop off key
 Mutual exclusion, or mutexes
 Only one thread can acquire a mutex
 Others threads wait until mutex is released
Synchronization
 Could use Java’s Semaphore class with a count of 1
 Primitive, many corner cases like exceptions
 Better: synchronized (x) { /* body*/ }
 Thread 1 acquires a lock on object

x is a reference type; lock acquired on its memory location
 Thread 2 must wait if it wants a lock on x

Also waits if it wants a lock on y if x == y
 Synchronized functions: synchronized (this)
 Only one synchronized function running per object
Synchronization
 Only synchronize when you have to
 Get key, enter room, chat with BFF on phone for 1 hour…

People waiting for key are very angry!
 Deadlock via bugs
 Thread 1 never releases lock on A, Thread 2 waiting for A
 Deadlock via circular waiting
 Thread 1 has lock on A, waiting for lock on B
 Thread 2 has lock on B, waiting for lock on A
 Solution: always acquire locks in same order