Data Structures and Algorithms

Download Report

Transcript Data Structures and Algorithms

Data Structures and Algorithms
PLSD210
Key Points
Lectures 1 & 2
• Data Structures and Algorithms
• The key to your professional reputation
• A much more dramatic effect can be made on
the performance of a program by changing to a
better algorithm than by
• hacking
• converting to assembler
• Buy a text for long-term reference!
• Professional software engineers have
algorithms text(s) on their shelves
• Hackers have user manuals for the latest
software package
• Which they’ll have to throw away next year anyway!
Lecture 2
• Most algorithms operate on data
collections, so define
• Collection Abstract Data Type (ADT)
• Methods
• Constructor / Destructor
• Add / Delete
• Find
• Sort (later)
• ….
Lecture 2
• Array implementation
• Specify maximum size in Constructor
• Allocate array with sufficient space
• Add method
• Takes constant time
• Independent of current size of collection
• Find method
• Linear search
• Worst case time: n x constant
• Average time: n/2 x constant
(if the key is always found!)

n x constant
(depending on probability of
finding the key)
• Worst case analysis preferred - easier and more relevant!
Searching
• Find method
• Binary search
• Sort the data into ascending (or descending) order
• Check the middle item
• If desired key is less, go left
• If match, finished!
• If desired key is greater, go right
n
n
8
n
4
n
2
<
• You can divide n in half: log2n times
• Therefore binary search takes time = c log2n
• In “big Oh”, this is O( log n )
Binary Search vs Sequential Search
• Find method
• Sequential
• Worst case time: c1 n
• Binary search
• Worst case time: c2 log2n
Compare n with logn
60
50
n
40
Time
Small
problems we’re not
interested!
Large4 log n
n
problems
we’re
interested
in this gap!
30
20
10
Binary search
More complex
Higher constant factor 0
4log2n
0
10
20
30
n
40
50
60
Binary Search vs Sequential SearchLogs
• Find method
Base 2 is by far
the most common
in this course.
Assume base 2
unless otherwise
noted!
• Sequential
• Worst case time: c1 n
• Binary search
• Worst case time: c2 log2n
Compare n with logn
60
50
n
40
Time
Small
problems we’re not
interested!
Large 4 log n
n
problems
we’re
interested
in this gap!
30
20
4log2n
10
Binary search
More complex
Higher constant factor
0
0
10
20
30
n
40
50
60
Lecture 3 - Key Points
• Linked Lists
•
•
•
•
•
Dynamic allocation gives flexibility
Use as much or as little space as needed
Addition - constant time
Searching - proportional to n
Deletion - proportional to n
• Constant in doubly linked if given node
• Variants
•
•
•
•
LIFO (simplest)
FIFO (add tail pointer)
Doubly linked
Scan in either direction
Circularly linked
Scan entire list from any point
Lecture 4 - Key Points
• Stacks
• Just a collection with LIFO semantics
• Add, Delete usually named push, pop
• Recursion
• Stack frame to store function context
• Permits recursive calls
• Basis for some simple, elegant
and efficient solutions!
• but, sometimes too simple
• Fibonacci - elegant but not efficient
(explanation to come!)
Lecture 4 - Key Points
• Searching
• O(logn) search time
• Binary search • but addition is expensive
• Trees
• Recursive Data Structure
• Sub-trees are themselves trees
• Searched recursively
• Search time proportional to height, O(logn)
Lecture 10 - Key Points
• The Sorting Repertoire
Pathological
• Insertion O(n2)
Guaranteed
case
• Bubble
O(n2)
Guaranteed
• Heap
O(n log n) Guaranteed
• Quick
O(n log n) Most of the time!
O(n2)
• Bin
O(n)
Keys in small range
O(n+m)
• Radix
O(n)
Bounded keys/duplicates
O(nlog n)
• Bin and Radix sort
• Distinguished by ability to
• map the key to a small integer or
Lecture 10 - Key Points
• Radix Sort
• Any suitable radix can be used
• Different bases can be used in each phase
• Memory management must be efficient to
achieve O(n)
Lecture 13 - Key Points
• m-way trees
• Major use: databases
• m adjusted so that a node fits in a physical disc block
• Searching - O( log n )
• B-trees
• m-way tree in which each node is at least half-full
• B+-trees
• No data in nodes
• Leaf nodes
• Contain pointers to data
• Linked together to enable fast in-order traversal of
tree
• Insertion
• Split full nodes, promote a key if necessary
Lecture 13 - Key Points
• Hash Tables
• If we can derive an address from a key directly
O(1) searching
• Constraints
• Limited range - table has to fit in memory
• Keys dense in range - otherwise too much
space wasted
• Duplicates
O(n) in worst case
Key Points - Lecture 14
• Hash Tables
• O(1) searching complexity
• Address calculated directly from key with hash
function
• Collisions
• Table Organisation
• Linked lists
• Overflow areas
• Re-hashing
• Use a second hash function
• Linear
• Susceptible to clustering
• Quadratic
Key Points - Lecture 14
• Hash Functions
• Assume we can construct a function to map
keys to a range of integers
• Methods for reducing this range to (0,m]
• Division
• mod m
• Don’t use m = 2p
• Multiplication
• Multiply by A and extract p middle bits
• m = 2p OK
• Universal Hashing
• Animations!!