Lecture 2C - Basic Data Structures

Download Report

Transcript Lecture 2C - Basic Data Structures

Basic data structures
› Linked lists
› Queues
› Stacks
› Balanced binary trees
1
Why data structures?
› Programs manipulate data
› Data should be organized so manipulations will be efficient
- Search (e.g. Finding a word/file/web page)
› Better programs are powered by good data structures
› Naïve choices are often much less efficient than clever choices
› Data structures are existing tools that can help you
- Guide your design
- and save you time (avoid re-inventing the wheel)
Linked list
› A linked list is
- a collection of items (stored in “positions” in the list)
- that supports the following operations
- addFirst( newItem )
- Add newItem at the beginning of the list
- addLast( newItem )
- Add newItem at the end of the list
- addAfter( existingPosition, newItem )
- Add NewItem after existingPosition
- getFirst( )
- getLast( )
- …
Singly Linked Lists
› A singly linked list is a data
structure consisting of a
sequence of nodes
next
› Each node stores
node
elem
- element
- link to the next node

A
B
C
D
Singly Linked Lists
› "Head" points to the first element
in the list.
› “Tail” points to the last element in
the list.
Tail
Head

A
B
C
D
Inserting at the Head
1.
2.
3.
4.
5.
Allocate a new node
Insert new element
Have new node point to
first element
Have Head point to new
node
Extra checks…
Tail
A
Head

B
C
D
Remove first element
1.
Update head to point to
next node in the list
2.
Delete the former first node
Head
Tail

A
B
C
D
Inserting at the Tail
1.
2.
3.
4.
5.
Allocate a new node
Insert new element
Have new node point to
null
Have old last node point to
new node
Update tail to point to new
node
Removing at the Tail
› Removing at the tail of a singly
linked list cannot be efficient!
› There is no constant-time way
to update the tail to point to the
previous node
Doubly Linked List
› A doubly linked list is often more
convenient!
› Nodes store:
prev
next
- element
- link to the previous node
- link to the next node
elem
node
› Special trailer and header nodes
nodes/positions
header
Linked Lists
trailer
elements
10
Insertion
› We visualize operation insertAfter(p, X), which returns position q
p
A
B
C
p
A
q
B
C
X
p
A
q
B
X
C
Deletion
› We visualize remove(p), where p == last()
A
B
C
A
B
C
p
D
p
D
A
B
C
Worst-cast running time
› In a doubly linked list
+ insertion at head or tail is O(1)
+ deletion at head or tail is O(1)
- Find element requires O(n)
The Queue data structure
› The Queue data structure stores
arbitrary objects
› Insertions and deletions follow the
first-in first-out (FIFO) scheme
› Insertions are at the rear of the
queue and removals are at the
front of the queue
› Main queue operations:
- enqueue(object): inserts an element
at the end of the queue
- object dequeue(): removes and
returns the element at the front of the
queue
› Auxiliary queue operations:
- object front(): returns the
element at the front without
removing it
- integer size(): returns the
number of elements stored
- boolean isEmpty(): indicates
whether no elements are stored
Example
Operation
Output
Q
enqueue(5)
–
(5)
enqueue(3)
–
(5, 3)
dequeue()
5
–
enqueue(7)
dequeue()
(3)
3
front()
(3, 7)
(7)
7
dequeue()
7
()
dequeue()
“error”
()
(7)
isEmpty()
true
()
enqueue(9)
–
(9)
enqueue(7)
–
(9, 7)
size()
2
(9, 7)
enqueue(3)
–
(9, 7, 3)
enqueue(5)
–
(9, 7, 3, 5)
dequeue()
9
(7, 3, 5)
Applications of Queues
› Direct applications
- Waiting lists
- Access to shared resources (e.g., printer)
- Simulation
› Indirect applications
- Auxiliary data structure for algorithms
- Component of other data structures
Queue Interface in Java
public interface Queue<E> {
public int size();
public boolean isEmpty();
public E front()
throws EmptyQueueException;
public void enqueue(E element);
public E dequeue()
throws EmptyQueueException;
}
Queue implementation using singly linked lists
› Note that we need to keep pointers to both the first and
the last nodes in the list
Queue implementation using singly linked lists
›
Queues
› public interface Queue<E> {
›
public int size();
›
public boolean isEmpty();
›
public E front()
›
public void enqueue(E element);
›
public E dequeue()
}
20
Queues
› public interface Queue<E> {
›
public int size();
›
public boolean isEmpty();
›
public E front()
›
public void enqueue(E element);
›
public E dequeue()
}
Size is a counter starting at 0
incremented when “enqueue”
decremented when “dequeue”
21
Queues
› public interface Queue<E> {
›
public int size();
›
public boolean isEmpty();
›
public E front()
›
public void enqueue(E element);
›
public E dequeue()
}
Size is a counter starting at 0
incremented when “enqueue”
decremented when “dequeue”
isEmpty = (Is size == 0)?
22
Queues
› public interface Queue<E> {
›
public int size();
›
public boolean isEmpty();
›
public E front()
›
public void enqueue(E element);
›
public E dequeue()
}
Head
Tail

A
B
C
D
23
Queues
› public interface Queue<E> {
›
public int size();
›
public boolean isEmpty();
›
public E front()
›
public void enqueue(E element);
›
public E dequeue()
}
Tail
A
Head

B
C
D
24
Queues
› public interface Queue<E> {
›
public int size();
›
public boolean isEmpty();
›
public E front()
›
public void enqueue(E element);
›
public E dequeue()
}
Head
Tail

A
B
C
D
25
Queues
› public interface Queue<E> {
›
public int size();
›
public boolean isEmpty();
›
public E front()
›
public void enqueue(E element);
›
public E dequeue()
}
Head
Tail

A
B
C
D
26
Queues
› public interface Queue<E> {
›
public int size();
›
public boolean isEmpty();
›
public E front()
›
public void enqueue(E element);
›
public E dequeue()
}
Head
Tail

A
B
C
27
Queues
› public interface Queue<E> {
›
public int size();
›
public boolean isEmpty();
›
public E front()
›
public void enqueue(E element);
›
public E dequeue()
}
These operations can be performed in
O(1) time per operation.
The Stack
› The Stack data structure stores
arbitrary objects
› Auxiliary stack operations:
› Insertions and deletions follow the
last-in first-out (LIFO) scheme
- object top(): returns the last
inserted element without
removing it
› Think of a spring-loaded plate
dispenser
- integer size(): returns the
number of elements stored
› Main stack operations:
- boolean isEmpty(): indicates
whether no elements are stored
- push(object): inserts an element
- object pop(): removes and returns
the last inserted element
Stack
› public interface Stack {
›
public int size();
›
public boolean isEmpty();
›
public Object top()
›
public void push(Object o);
›
public Object pop()
}
Size is a counter starting at 0
incremented when “push”
decremented when “pop”
30
Stack
› public interface Stack {
›
public int size();
›
public boolean isEmpty();
›
public Object top()
›
public void push(Object o);
›
public Object pop()
}
Size is a counter starting at 0
incremented when “push”
decremented when “pop”
isEmpty = (Is size == 0)?
31
Stack
› public interface Stack {
›
public int size();
›
public boolean isEmpty();
›
public Object top()
›
public void push(Object o);
›
public Object pop()
}
Head

A
B
C
D
32
Stack
› public interface Stack {
›
public int size();
›
public boolean isEmpty();
›
public Object top()
›
public void push(Object o);
›
public Object pop()
}
Tail
A
Head

B
C
D
33
Stack
› public interface Stack {
›
public int size();
›
public boolean isEmpty();
›
public Object top()
›
public void push(Object o);
›
public Object pop()
}
Head

A
B
C
D
34
Stack
› public interface Stack {
›
public int size();
›
public boolean isEmpty();
›
public Object top()
›
public void push(Object o);
›
public Object pop()
}
Head

A
B
C
D
35
Stack
› public interface Stack {
›
public int size();
›
public boolean isEmpty();
›
public Object top()
›
public void push(Object o);
›
public Object pop()
}
Head

B
C
D
36
Stack
› public interface Stack {
›
public int size();
›
public boolean isEmpty();
›
public Object top()
›
public void push(Object o);
›
public Object pop()
These operations can be performed in
O(1) time per operation.
Parentheses Matching
› Each “(”, “{”, or “[” must be paired with a
matching “)”, “}”, or “[”
- correct: ( )(( )){([( )])}
- correct: ((( )(( )){([( )])}
- incorrect: )(( )){([( )])}
- incorrect: ({[ ])}
- incorrect: (
Parentheses Matching Algorithm
Algorithm ParenMatch(X,n):
Input: An array X of n tokens, each of which is either a grouping symbol, a
variable, an arithmetic operator, or a number
Output: true if and only if all the grouping symbols in X match
Let S be an empty stack
for i=0 to n-1 do
if X[i] is an opening grouping symbol then
S.push(X[i])
else if X[i] is a closing grouping symbol then
if S.isEmpty() then
return false {nothing to match with}
if S.pop() does not match the type of X[i] then
return false {wrong type}
if S.isEmpty() then
return true {every symbol matched}
else
return false {some symbols were never matched}
HTML Tag Matching
For fully-correct HTML, each <name> should pair with a matching </name>
<body>
<center>
<h1> The Little Boat </h1>
</center>
<p> The storm tossed the little
boat like a cheap sneaker in an
old washing machine. The three
drunken fishermen were used to
such treatment, of course, but
not the tree salesman, who even as
a stowaway now felt that he
had overpaid for the voyage. </p>
<ol>
<li> Will the salesman die? </li>
<li> What color is the boat? </li>
<li> And what about Naomi? </li>
</ol>
</body>
The Little Boat
The storm tossed the little boat
like a cheap sneaker in an old
washing machine. The three
drunken fishermen were used to
such treatment, of course, but not
the tree salesman, who even as
a stowaway now felt that he had
overpaid for the voyage.
1. Will the salesman die?
2. What color is the boat?
3. And what about Naomi?
Trees
Make Money Fast!
Stock
Fraud
Ponzi
Scheme
Bank
Robbery
What is a Tree
› In computer science, a
tree is an abstract model
of a hierarchical structure
Computers”R”Us
› A tree consists of nodes
with a parent-child relation
Sales
Manufacturing
› Applications:
- Organization charts
- File systems
- Programming environments
US
Europe
International
Asia
Laptops
Canada
Desktops
R&D
Tree Terminology
Subtree: tree consisting of
a node and its
descendants
› Root: node without parent (A)
› Internal node: node with at least one
child (A, B, C, F)
› External node (a.k.a. leaf ): node
without children (E, I, J, K, G, H, D)
A
› Ancestors of a node: parent,
grandparent, grand-grandparent, etc.
B
C
D
› Depth of a node: number of
ancestors
› Height of a tree: maximum depth of
any node (3)
› Descendant of a node: child,
grandchild, grand-grandchild, etc.
E
F
I
J
G
K
H
subtree
Binary Trees
Applications:
› A binary tree is a tree with the
following properties:

- Each internal node has at most two
children (exactly two for proper
binary trees)


arithmetic expressions
decision processes
searching
A
- The children of a node are an
ordered pair
› We call the children of an internal
node left child and right child
› Alternative recursive definition: a
binary tree is either
- a tree consisting of a single node, or
- a tree whose root has an ordered
pair of children, each of which is a
binary tree
B
C
D
E
H
F
I
G
Binary Trees
› Notation
n number of nodes
e number of external nodes
i number of internal nodes
h height
Properties:
 e = i + 1
 n = 2e - 1
 h  i
 h  (n - 1)/2
 e  2h
 h  log2 e
 h  log2 (n + 1) - 1
Binary Trees
› A node is represented by
an object storing
-

Element
Parent node
Left child node
Right child node
B
› Node objects implement
the Position ADT

B
A
A
D
C

D

E

C


E
Binary Search Trees - Ordered Dictionaries
› Keys are assumed to come from a total order.
› Operations
- insertion(key): insert key into dictionary
- delete(key): delete key from dictionary
- boolean find(key): does the key exists in the dictionary
Binary Search
› Binary search can perform operation find(k) on a dictionary
implemented by means of an array-based sequence, sorted by key
- at each step, the number of candidate items is halved
- terminates after O(log n) steps
› Example: find(7)
0
1
3
4
5
7
8
9
11
14
16
18
m
l
0
1
3
4
5
m
l
0
1
3
1
3
h
8
9
11
14
16
18
19
8
9
11
14
16
18
19
8
9
11
14
16
18
19
h
4
l
0
7
19
4
5
m
5
7
h
7
l=m =h
Binary Search Trees
› A binary search tree is a binary
tree storing keys (or key-value
entries) at its internal nodes and
satisfying the following property:
- Let u, v, and w be three nodes such
that u is in the left subtree of v and
w is in the right subtree of v. We
have
key(u)  key(v)  key(w)
6
2
› External nodes do not store items
1
9
4
8
Search
› To search for a key k, we trace
a downward path starting at the
root
› The next node visited depends
on the outcome of the
comparison of k with the key of
the current node
Algorithm TreeSearch(k, v)
if T.isExternal (v)
return v
if k < key(v)
return TreeSearch(k, T.left(v))
else if k = key(v)
return v
else { k > key(v) }
return TreeSearch(k, T.right(v))
<
› If we reach a leaf, the key is
not found and we return null
› Example: find(4):
- Call TreeSearch(4,root)
2
1
6
9
>
4 =
8
Insertion
› To perform operation
insert(k), we search for key
k (using TreeSearch)
2
9
>
1
› Assume k is not already in
the tree, and let w be the
leaf reached by the search
4
8
>
w
› We insert k at node w and
expand w into an internal
node
› Example: insert 5
6
<
6
2
1
9
4
8
w
5
Deletion
› Assume key k is in the tree,
and let let v be the node
storing k
2
9
>
4 v
1
8
w
5
› If node v has a leaf child w, we
remove v and w from the tree
with operation
removeExternal(w), which
removes w and its parent
› Example: remove 4
6
<
› To perform operation
remove(k), we search for key k
6
2
1
9
5
8
Deletion (cont.)
› We consider the case where
the key k to be removed is
stored at a node v whose
children are both internal
1
v
3
2
8
6
w
- we find the internal node w that
follows v in an inorder traversal
› Example: remove 3
5
z
- we copy key(w) into node v
- we remove node w and its left
child z (which must be a leaf) by
means of operation
removeExternal(z)
9
1
v
5
2
8
6
9
Performance
› Consider a dictionary
with n items implemented
by means of a binary
search tree of height h
- the space used is O(n)
- methods find, insert and
remove take O(h) time
› The height h is O(n) in
the worst case and O(log
n) in the best case
AVL Trees
› AVL trees are balanced.
44
› An AVL Tree is a binary search
tree such that for every internal
node v of T, the heights of the
children of v can differ by at
most 1.
4
2
17
78
1
3
2
32
88
50
1
48
62
1
1
› Local property that guarantees a
global property.
An example of an AVL tree where the
heights are shown next to the nodes:
55
Height of an AVL Tree
Theorem: The height of an AVL tree storing n keys is O(log n).
Proof: Let us bound n(h): the minimum number of internal nodes of an AVL tree of
height h.
› We easily see that n(1) = 1 and n(2) = 2
› For n > 2, an AVL tree of height h contains the root node, one AVL subtree of
height h-1 and another of height h-2.
› That is, n(h) = 1 + n(h-1) + n(h-2)
› Knowing n(h-1) > n(h-2), we get n(h) > 2n(h-2). So
n(h) > 2n(h-2), n(h) > 4n(h-4), n(h) > 8n(n-6), … (by induction),
n(h) > 2in(h-2i)
› Solving the base case we get: n(h) > 2
3
h/2-1
› Taking logarithms: h < 2log n(h) +2
› Thus the height of an AVL tree is O(log n)
4
n(2)
n(1)
Inserting with balanced height
› Insert node into binary search tree as usual
- Insert occurs at leaves
- Increases height of some nodes along path to root
› Walk up towards root
- If unbalanced height is found, restructure unbalanced
region with rotation operation
57
Insertion in an AVL Tree
› Insertion is as in a binary search tree
› Always done by expanding an external node.
› Example:
44
44
17
78
c=z
a=y
17
78
32
32
50
50
88
88
48
48
62
62
54
w
before insertion
after insertion
b=x
Restructuring
› let (a,b,c) be an inorder listing of x, y, z
› perform the rotations needed to make b the topmost node of the
three
(other two cases
are symmetrical)
a=z
a=z
case 2: double rotation
(a right rotation about c,
then a left rotation about a)
c=y
b=y
T0
T0
b=x
c=x
T1
T3
b=y
T2
case 1: single rotation
(a left rotation about a)
T1
T3
a=z
T0
b=x
T2
c=x
T1
T2
a=z
T3
T0
c=y
T1
T2
T3
Insertion in an AVL Tree
17
17
78
32
50
48
44
44
44
88
62
17
78
32
50
48
88
62
62
32
50
48
78
54
54
before insertion
after insertion
(unbalanced)
after double rotation
88
Removal in an AVL Tree
› Removal begins as in a binary search tree, which means
the node removed will become an empty external node.
Its parent, w, may cause an imbalance.
› Example:
44
44
17
62
32
50
48
17
62
78
54
before deletion of 32
50
88
48
78
54
after deletion
88
Rebalancing after a removal
› Let z be the first unbalanced node encountered while travelling up
the tree from w. Also, let y be the child of z with the larger height,
and let x be the child of y with the larger height.
› We perform restructure(x) to restore balance at z.
› As this restructuring may upset the balance of another node
higher in the tree, we must continue checking for balance until the
root of T is reached
a=z
w
44
62
17
b=y
62
50
48
c=x
78
54
44
88
17
78
50
48
88
54
Running Times for AVL Trees
› a single restructure is O(1)
- using a linked-structure binary tree
› find is O(log n)
- height of tree is O(log n), no restructures needed
› insert is O(log n)
- initial find is O(log n)
- Restructuring up the tree, maintaining heights is O(log n)
› remove is O(log n)
- initial find is O(log n)
- Restructuring up the tree, maintaining heights is O(log n)
Summary data structures
› Queues
- Enqueue, dequeue, first and size operations in O(1) time.
› Stacks
- Push, pop, top and size operations in O(1) time
› Balanced binary trees (e.g. AVL trees)
- Insert, delete and find operations in O(log n) time
Sorting
› There are many many algorithms for sorting:
- Insertion sort
- Selection sort
- Bubble sort
- QuickSort
- MergeSort
- Shell sort
- …
See examples on: http://www.sorting-algorithms.com/
Merging
› The key to Merge Sort is merging two sorted lists into one,
such that if you have two lists X (x1x2…xm) and
Y(y1y2…yn) the resulting list is Z(z1z2…zm+n)
› Example:
L1 = { 3 8 9 } L 2 = { 1 5 7 }
merge(L1, L2) = { 1 3 5 7 8 9 }
Merging (cont.)
X:
Result:
3
10
23
54 Y:
1
5
25
75
Merging (cont.)
X:
Result:
3
10
1
23
54 Y:
5
25
75
Merging (cont.)
X:
Result:
10
1
23
3
54 Y:
5
25
75
Merging (cont.)
X:
Result:
10
1
54 Y:
23
3
5
25
75
Merging (cont.)
X:
Result:
54 Y:
23
1
3
5
10
25
75
Merging (cont.)
X:
Result:
54 Y:
1
3
5
10
25
23
75
Merging (cont.)
X:
Result:
54 Y:
1
3
5
10
75
23
25
Merging (cont.)
X:
Result:
Y:
1
3
5
10
75
23
25
54
Merging (cont.)
X:
Result:
Y:
1
3
5
10
23
25
54
75
Time: O(n)
Divide And Conquer
› Merging a two lists of one element each is the same
as sorting them.
› Merge sort divides up an unsorted list until the above
condition is met and then sorts the divided parts back
together in pairs.
› Specifically this can be done by recursively dividing
the unsorted list in half, merge sorting the right side
then the left side and then merging the right and left
back together.
Merge Sort Algorithm
Given a list L with a length k:
› If k == 1  the list is sorted
› Else:
- Merge Sort the left side (1 thru k/2)
- Merge Sort the right side (k/2+1 thru k)
- Merge the right side with the left side
Merge Sort Example
99
6
86 15 58 35 86
4
0
Merge Sort Example
99
99
6
6
86 15 58 35 86
86 15
58 35 86
4
0
4
0
Merge Sort Example
99
99
99
6
6
6
86 15 58 35 86
86 15
58 35 86
86 15
58 35
4
0
4
0
86
4
0
Merge Sort Example
99
99
99
99
6
6
6
6
86 15 58 35 86
86 15
58 35 86
86 15
86
15
4
0
4
0
58 35
86
4
58
86
35
0
4
0
Merge Sort Example
99
99
99
99
6
6
6
6
86 15 58 35 86
86 15
58 35 86
86 15
86
15
4
0
4
0
58 35
86
4
58
86
35
0
4
4
0
0
Merge Sort Example
99
Merge
6
86
15
58
35
86
0
4
4
0
Merge Sort Example
6
99
Merge
99
6
15 86
86
15
35 58
0
58
86
35
4
86
0
4
Merge Sort Example
6
6
Merge
99
15 86 99
15 86
0
4
58 35
35 58 86
0
4
86
Merge Sort Example
0
6
Merge
4
6
15 35 58 86 86 99
15 86 99
0
4
35 58 86
Merge Sort Example
0
4
6
15 35 58 86 86 99
Merge Sort Analysis
For the original problem, we
have a cost of cn, plus two
subproblems each of size (n/2)
and running time T(n/2).
Each of the size n/2 problems has a
cost of cn/2 plus two subproblems,
each costing T(n/4).
cn
cn
Cost of divide
and merge.
cn/2
T(n/2)
cn/2
T(n/2)
T(n/4) T(n/4) T(n/4) T(n/4)
Cost of sorting
subproblems.
Recursion Tree for Merge Sort
Continue expanding until the problem size reduces to 1.
cn
cn/2
cn
cn/2
cn
lg n
cn/4
cn/4 cn/4
cn
cn/4
Total
c
c
c
c c
c
: cn log n+cn
cn
Recursion Tree for Merge Sort
Continue expanding until the problem size reduces to 1.
cn
• Each level has total cost cn.
cn/2
cn/4
• Each time we go down one level, the
number of subproblems doubles, but the
cost per subproblem halves  cost per
level remains the same.
cn/2
cn/4 cn/4
cn/4
•There are lg n + 1 levels, height is log n.
(Assuming n is a power of 2.)
•Can be proved by induction.
•Total cost = sum of costs at each level =
(log n + 1)cn = cn log n + cn = (n log n).
c
c
c
c c
c
Com
p
Summary: Sorting
• Sorting n values can be done in O(n log n) time.
• No algorithm can do it faster, thus sorting has a lower
bound (n log n) [in the general case].
91