Transcript Document

COSC 3100 – Data
Structures and Algorithms
(Intro, Part I)
Instructor: Mohammad Tanviruzzaman
(Tanvir)
7/20/2015
1
Who am I ?
China
USA
Bangladesh
• Name is, in short, Tanvir
India
• Undergrad from Department
of Computer
Science and Engineering (CSE) of
Bangladesh University of Engineering and
Technology (BUET)
• Worked in software company for 2 yrs
• Presently doing PhD in Computational
Sciences Program in Department of
Mathematics, Statistics, and Computer
Science (MSCS) in MU
7/20/2015
2
Syllabus
•
•
•
•
3 tests (10% each)
Final exam (25%)
5 home works (6% each)
3 programming assignments (5%
each)
• Bonuses !
7/20/2015
3
Outline (Ch 1, Sec 1.1-1.3)
• Why study algorithms?
• What is an algorithm? What are its
properties?
• Three examples of Greatest Common
Divisor, gcd(m, n)
• How to specify an algorithm?
• What is algorithmic problem solving?
• Algorithm Design Strategies
• Important Problem Types
7/20/2015
4
Why study algorithms?
• From “Algorithmi”, latin form of AlKhwārizmī, Persian Mathematician
• What is it
– Briefly speaking, algorithms are
procedural solutions to problems
– Algorithms are not answers, but rather
precisely defined procedures for
getting answers. (e.g., sorting 3
numbers)
7/20/2015
5
Why study? (contd.)
• Cornerstone of computer science.
Programs will not exist without
algorithms.
• Algorithm design techniques, or
problem solving strategies are useful
in fields beyond computer science.
7/20/2015
6
Why study? (contd.)
• Prominent computer scientist Donald
Knuth:
“… It has often been said that a person
does not really understand something until
after teaching it to someone else.
Actually a person does not really
understand something until after teaching
it to a computer, i.e., expressing it as an
algorithm…”
7/20/2015
7
Algorithms
• An algorithm is a sequence of unambiguous
instructions for solving a computational
problem, i.e., for obtaining a required
output for any legitimate input in a finite
amount of time. problem
algorithm
input
7/20/2015
“computer”
output
8
Example of computational
problem: Sorting
• Statement of problem:
– Input: a sequence of N numbers <a1, a2, …, aN>
– Output: a reordering of the input sequence <a’1,
a’2, …, a’N> so that a’i ≤ a’j whenever i < j
– Instance: the sequence <5, 3, 2, 8, 3> becomes
<2, 3, 3, 5, 8> after sorting
– Algorithms:
• Selection sort
• Insertion sort
• Merge sort and many more
7/20/2015
9
Properties of Algorithms
• What makes an algorithm different
from a recipe?
– Finiteness:
• terminates after a finite number of steps
– Definiteness:
• each step must be rigorously and unambiguously
specified, e.g., “stir until lumpy”
– Input:
• valid inputs must be clearly specified
7/20/2015
10
Properties of Algorithms
(contd.)
• Output:
– data that result upon completion of the
algorithm must be specified
• Effectiveness:
– steps must be simple and basic, e.g., check
if m is prime
7/20/2015
11
Examples
Is the following a legitimate algorithm?
i <- 1
while i <= 10 do
a <- i + 1
print value of a
Not finite, goes on and on…
7/20/2015
12
Examples of Algorithms – Computing
Greatest Common Divisor of Two nonnegeative, not-both zero Integers
• gcd(m, n): the largest integer that
divides both m and n
• First try - Euclid’s Algorithm:
– Idea: gcd(m, n) = gcd(n, m mod n)
– 60 = 2×24 + 12 and 24 = 2×12 + 0
– gcd(60, 24) = gcd(24, 12) = gcd(12, 0) = 12
7/20/2015
13
Greatest Common Divisor
(Euclid’s Algorithm), gcd(m, n)
• Step 1: If n = 0, return value of m as
the answer and stop; otherwise,
proceed to Step 2.
• Step 2: Divide m by n and assign the
value of the remainder to r.
• Step 3: Assign the value of n to m
and the value of r to n. Go to Step 1.
7/20/2015
14
Methods of Specifying Algorithms
• Natural language
– Ambiguous, e.g., “Mike ate his sandwitch on a
bed of lettuce.”
• Pseudocode
– A mixture of natural language and programming
language-like structures
– Precise and succinct
– Pseudocode in this course
• Omits declaration of variables
• Use indentation to show scope of for, if, and while
• Use <- for assignment
7/20/2015
15
Pseudocode for (Euclid’s
Algorithm), gcd(m, n)
ALGORITHM Euclid(m, n)
// Computes gcd(m, n) by Euclid’s algorithm
// Input: Two nonnegative, not-both-zero integers m and n
//Output: Greatest common divisor of m and n
while n ≠ 0 do
r <- m mod n
m <- n
gcd(m, n) = gcd(n, m mod n)
gcd(24, 60) = gcd(60, 24 mod
n <- r
return m
60)
What is the minimum number of divisions
What happens if m < n ?
among all inputs 1 ≤ m, n ≤ 10 ?
7/20/2015
16
Second try: Consecutive
Integer Checking, gcd(m, n)
• Step 1: Assign the value of min{m, n} to q.
• Step 2: Divide m by q. If the remainder is 0,
go to Step 3; otherwise, go to Step 4.
• Step 3: Divide n by q. If the remainder is 0,
return the value of q as the answer and stop;
otherwise, proceed to Step 4.
• Step 4: Decrease the value of q by 1. Go to
Step 2.
Does it work when m or n is 0?
Which one is faster, Euclid’s or this one?
7/20/2015
17
Third try: Middle-school,
gcd(m, n)
Finite,
Definite,
Input,
Output,
Basic
• Step 1: Find prime factors of m.
• Step 2: Find prime factors of n.
• Step 3: Identify all common prime factors
of m and n (if p is a prime factor occurring
pm and pn times in m and n, it should be
repeated min{pm,pn} times)
• Step 4: Compute product of all common
factors and return product as the answer.
Is it legitimate algorithm?
7/20/2015
18
What can we learn from the
three examples of gcd(m, n) ?
• Each step must be basic and unambiguous
• Same algorithm, but different
representations (different pseudocodes)
• Same problem, but different algorithms,
based on different ideas and having
dramatically different speeds.
– gcd(31415, 14142) = 1; Euclid takes ~0.08 ms
whereas Consecutive Integer Checking takes
~0.55 ms, about 7 times speedier
7/20/2015
19
Fundamentals of Algorithmic
Problem Solving
• Understanding the problem
– Ask questions, do a few small examples
by hand, think about special cases, etc.
– An input is an instance of the problem
the algorithm solves
– Specify exactly the set of instances the
algorithm needs to handle
– Example: gcd(m, n)
7/20/2015
20
Fundamentals of Algorithmic
Problem Solving (contd.)
• Decide on
– Exact vs. approximate solution
• Cannot solve exactly, e.g., extracting square
roots, solving definite integrals, etc.
• Algorithms for exact solution can be
unacceptably slow
– Appropriate Data Structure
7/20/2015
21
Fundamentals of Algorithmic
Problem Solving (contd.)
• Design algorithm
• Prove correctness of the algorithm
– Yields required output for ever
legitimate input in finite time
– E.g., Euclid’s: gcd(m, n) = gcd(n, m mod n)
• Second integer gets smaller on every
iteration, because (m mod n) can be 0, 1, …,
n-1 thus less than n
• The algorithm terminates when the second
integer is 0
7/20/2015
22
Fundamentals of Algorithmic
Problem Solving (contd.)
• Analyze algorithm
– Time efficiency: how fast it runs
– Space efficiency: How much extra
memory it uses
– Simplicity: easier to understand, usually
contains fewer bugs, sometimes simpler
is more efficient, but not always!
– Generality: e.g., whether two integers
are relatively prime, use gcd(m, n)
7/20/2015
23
Fundamentals of Algorithmic
Problem Solving (contd.)
• Coding algorithm
– Write in a programming language for a
real machine
– Standard tricks: (Programming Pearls by Bentley)
• Compute loop invariant (which does not
change value in the loop) outside loop
• Replace expensive operation by cheap ones
– E.g., check if n is even
» n%2 == 0 is more expensive than (n & 1) == 0
7/20/2015
24
Understand
the problem
Decide on: exact vs.
approximate solving,
data structures
Design
algorithm
Prove
correctness
Analyze the
algorithm
infinite
Code the
algorithm
7/20/2015
finite
25
Algorithm Design Strategies
• Powerful set of design techniques applicable
to wide range of problems
• Brute force (Chapter 3)
• Decrease and Conquer (Chapter 4)
• Divide and Conquer (Chapter 5)
• Transform and Conquer (Chapter 6)
• Space and time tradeoffs (Chapter 7)
• Dynamic Programming (Chapter 8)
• Greedy Approach (Chapter 9)
• Backtracking and Branch and Bound (Chapter 12)
• Parallel and Probabilistic Algorithms
7/20/2015
26
What we have seen?
• What is an algorithm? What are the
properties?
• How to specify an algorithm?
• Steps from a problem towards coding
• One problem different algorithms
having different speeds
7/20/2015
27
Important Problem Types
• Sorting
• Searching
• String Processing
• Graph Problems
7/20/2015
28
Sorting
• Rearrange items of a given list in
non-decreasing order
– Input: a sequence of N numbers <a1, a2,
…, aN>
– Output: a reordering <a’1, a’2, …, a’N> of
input sequence such that a’1 ≤ a’2 ≤ … ≤ a’N.
• Why sorting?
7/20/2015
29
Why Sorting?
• Uniqueness testing
– How to test if elements of collection S
are all distinct?
– Sort and repeated items are immediate
neighbors
– One pass through elements of S, testing
if S[i] = S[i+1] for any 1 ≤ i ≤ N-1 then
finishes the job.
7/20/2015
30
Why Sorting? (contd.)
• Deleting duplicates
– How can we remove all but one copy of
any repeated elements in S ?
– Sort and sweep does the job.
• Prioritize events
– Sort by deadlines
• Median finding
– After sorting, K-th largest elements is
S[k]
7/20/2015
31
Why Sorting? (contd.)
• Frequency counting
– Sort and sweep
• Set intersection/union
– If sets are sorted merge them
repeatedly taking the smaller of the two
head elements, placing them into the
new set if desired, and then deleting
the head from the appropriate list.
7/20/2015
32
Why Sorting? (contd.)
• Finding a target pair
– How can we test whether there are two
integers x, y ∈ S such that x + y = z for some
target z ?
– Instead of testing all possible pairs, sort the
numbers in increasing order and sweep. As S[i]
increases with i, its possible partner j such
that S[j] = z − S[i] decreases. Thus decreasing
j appropriately as i increases gives a nice
solution.
• Efficient searching
– Sort and perform efficient binary search
7/20/2015
33
Sorting (contd.)
• Sorting key
– A specially chosen piece of information
used to guide sorting, e.g., sort student
records by names
• Examples of sorting algorithms
– Selection sort, bubble sort, quick sort,
merge sort, heap sort, …
• Evaluate sorting complexity
– Number of key comparisons
7/20/2015
34
Sorting (contd.)
• Two properties
– Stability: if preserves relative order of
any two equal elements in input
• If student records are already sorted
alphabetically, performing stable sorting by
grades preserves alphabetic order for
students with same grades
– In place: if it does not require extra
memory except for a few memory units
7/20/2015
35
Selection Sort
ALGORITHM SelectionSort(A[0…n-1])
//Input: An array A[0…n-1]
//Output: An array A[0…n-1] sorted in nondecreasing order
60
42
75
for i <- 0 to n-2 do
i=0 27
42
75
min <- i
i=1 27
42
75
for j <- i+1 to n-1 do i=2 27 42 60
if A[j] < A[min] i=3 27 42 60
min <- j
swap A[i] and A[min]
83
27
83
60
83
60
83
75
75
83
Is it stable? In place?
7/20/2015
36
Searching
• Find a given value called a search key
in a given set
• Examples of Searching Algorithms
– Sequential searching
– Binary searching
7/20/2015
37
String Processing
• A string is a sequence of characters from an
alphabet.
• Text strings: letters, numbers, and special
characters
• Bit strings: sequence of 0’s and 1’s
• Gene sequence: string of characters from fourcharacter alphabet {A, C, G, T} for Adenine,
Cytosine, Guanine, and Thymine, constituents of
DNA
• String matching: search for a given word/pattern
in a text or gene sequence
7/20/2015
38
Graph Problems
• Informally
– A graph is a collection of points called vertices, some of
which are connected by line segments called edges.
• Modeling
– World Wide Web (WWW)
– Communication Networks
– Project Scheduling
• Examples of graph algorithms
a
c
b
d
e
f
– Graph traversal (how to reach all points in a network ?)
– Shortest-path (what is the best route between two
cities ?)
– Topological sorting (is a set of courses with
prerequisites consistent?)
7/20/2015
39
Graph Problems (contd.)
1
4
2
Bridges of Königsberg
1
4
3
a
b
2
3
7/20/2015
Minimum
how
newstroll,
bridges
needed
Could you,
in many
a single
cross
all 7
40
to
make exactly
such a stroll
possible?
bridges
once and
return at beginning?
Graph Problems (contd.)
b
a
d
c
e
f
a
7/20/2015
a
b
c
e
Minimum how many colors
needed to color 6 regions
so that no two neighboring
regions receive same color ?
c
d
f
b
e
d
f
41
Fundamental Data Structures
• Linear Data Structures
– Stacks and Queues
• Graphs
• Trees
7/20/2015
42
Linear Data Structures
• Arrays
– A sequence of n items of the same data
type, stored contiguously in memory and
made accessible through array indices.
indices
A:
0
10
1
4
2
3
4
5
6
7
17
93
8
70
1
47
e.g.,
A[4] = 8,
A[2] = 17,
A[7] = 47
7/20/2015
43
Linear Data Structures (contd.)
• Linked List
– A sequence of nodes with links (called
pointers) to neighbors
– Singly or Doubly linked lists
header
Data
Pointer
header
7/20/2015
44
Linear Data Structure (contd.)
• Array vs. Linked List
– Access time: fixed vs. variable
– Update cost: higher vs. lower
– Storage Anticipation: needed vs. not
needed
7/20/2015
45
Linear Data Structures (contd.)
• Stack:
–
–
–
–
–
Insertion/deletion only at the top
Last In First Out (LIFO)
Two operations: push and pop
E.g., recursion needs stack
java.util.Stack
7/20/2015
46
Linear Data Structures (contd.)
• Queues:
– Insertion from rear,
deletion from front
– First In First Out
(FIFO)
– Two operations:
enqueue, dequeue
7/20/2015
47
Linear Data Structures (contd.)
• Priority queues:
– Select item of highest priority among
dynamically changing candidates. E.g., schedule
job in a shared computer
– Principal operations:
• Find largest, delete largest, add new element
• Add/delete must result in priority queue
– Array or Sorted array based implementation is
inefficient
– “Heap” is better (We shall see in Transform
and Conquer)
7/20/2015
48
Graphs
• A graph, G = < V, E > is defined by a
pair of sets:
– V is called vertices and E is called edges
a
c
b
d
e
f
V = {a, b, c, d, e, f}
E = { (a, c), (c, b), (b, f), (f, e), (e, d), (d, a), (c, e) }
7/20/2015
49
Graphs (contd.)
• Undirected graphs and Directed
Graphs (Digraphs)
• We usually will not allow graphs to
have loops
• What is the maximum number of edges in
an undirected graph with |V| vertices?
a
c
b
d
e
f
Directed graph
7/20/2015
50
Representing Graphs
• Adjacency matrix
– N × N boolean matrix
– A[i, j] = 1 if there is an edge from i-th
vertex to j-th vetex
– Adjacency matrix of an undirected
graph is symmetric. (why?)
• Adjacency list
– A collection of linked lists, one for each
vertex, that contain all neighbors of a
vertex
7/20/2015
51
Representing Graph (contd.)
a
b
c
d
e
f
a
b c
d e f
0
0
1
1
0
0
0
0
1
0
0
1
1
0
0
0
1
0
1
1
0
0
1
0
a
b
c
d
e
f
7/20/2015
0
0
1
1
0
1
0
1
0
0
1
0
c
c
a
a
c
b
d
f
b
e
d
e
a
c
b
d
e
f
e
f
52
Weighted Graphs
• Edges have numbers associated with
them, e.g., distance between two
cities
a b c d
a
1
c
5
7
2
7/20/2015
a
b
c
d
b
∞ 5
5 ∞
1 7
∞ 4
1
7
∞
2
∞
4
2
∞
a
b, 5
c, 1
b
a, 5
c, 7
d, 4
c
a, 1
b, 7
d, 2
d
b, 4
c, 2
4
d
53
Graph properties
• Path(u, v)
– Sequence of adjacent vertices starting at u
and ending at v
– Simple path: all edges distinct
– Length of path is # of vertices - 1
• Connected Graphs
– Between every pair of vertices there is a path
• Connected Component
– Maximal connected subgraph
7/20/2015
54
Tree
• A tree is a connected acyclic graph.
– |E| = |V|-1
• Forest: has no cycle but may be
disconnected
• Between every two vertices there is
exactly one simple path. Why ?
• Any vertex can be regarded as the root,
and then we have a rooted tree.
7/20/2015
55
Tree (contd.)
root
a
b
c
d
g
Level 0
e
Level 1
f
Level 2
• Ancestors: For vertex v, all the
vertices on the simple path from root
to v are called v’s ancestors.
7/20/2015
56
Tree (contd.)
• Descendants: All vertices for which
v is an ancestor are descendants of v.
• Parent, child, sibling:
– If (u, v) is the last edge of the simple
path from root to v, then u is parent of
v and v is a child of u.
– Vertices having same parents are
siblings.
• Leaves:
– A vertex without children is a leaf.
7/20/2015
57
Tree (contd.)
• Subtree: v with all its descendants is
the subtree rooted at v.
• Depth of vertex: Length of the
simple path from root to v is v’s
depth.
• Height of tree: The length of the
longest simple path from root to leaf.
7/20/2015
58
Ordered Tree
• An ordered tree is a rooted tree in
which all children of each vertex are
ordered.
• Binary tree:
– An ordered tree, each vertex has at
most two children and each children is
designated as either left child or right
child.
– log 2 𝑛 ≤ ℎ ≤ 𝑛 − 1, here n is the
number of nodes in the tree and h is
tree’s height
7/20/2015
59
Ordered Tree
• Binary Search Tree:
– Each vertex is assigned a number
– A number assigned to a parental vertex
is larger than all numbers in the left
subtree and smaller than all numbers in
the right subtree.
9
5
1
12
7
10
4
7/20/2015
60
Summary
• An algorithm is a sequence of unambiguous
instructions for solving a problem in finite time.
• Pseudocode is a succinct representation of
algorithm.
• The same problem can often be solved by several
algoithms.
• Algorithms operate on data, thus data structuring
becomes critical for efficient algorithmic problem
solving.
• A good algorithm is usually the result of repeated
efforts.
7/20/2015
61
Homework 1
• Due on 26th (next Thursday) in class,
hard copies
• Please explain your answers, do not
just write yes/no, so that while
grading I may consider your thought
process as an input, you may receive
good points even if the answer is
wrong!
7/20/2015
62