No Slide Title

Download Report

Transcript No Slide Title

CSE331 – Lecture 21
– Heaps, Binary Files, and Bit Sets
Chapter 14
File Structure
Complete Binary Tree Example
Direct File Access
Maximum and Minimum Heaps
bitVector Class
Heap Insertion
Lossless Compression
pushHeap()
popHeap()
Lossy Compression
Adjusting popHeap()
Example of Building Huffman
Tree
Heap Sort
Summary Slides
Heapifying
1
Main Index
Contents
Complete Binary Tree
5
v[0]
1
3
v[2]
v[1]
6
9
v[4]
v[3]
7
v[7]
2
0
v[8]
2
v[5]
8
v[9]
Main Index
Contents
4
v[6]
Indexing & Other Heap Properties

Node[ j ] has ...

Node[ 2*j+1 ] as left child
Node[ 2*j+2 ] as right child
Node[ (j-1)/2 ] as parent




3
MAX Heap : parent > both children
MIN Heap: parent < both children
Main Index
Contents
Maximum and Minimum Heaps
55
40
52
50
20
11
10
25
10
5
22
(A) M aximum Heap (9 nodes)
(B) M aximum Heap (4 nodes)
10
5
11
15
50
10
25
20
52
30
40
55
22
(C) M inimum Heap (9 nodes)
4
30
15
Main Index
(D) M inimum Heap (4 nodes)
Contents
Heap Before and After Insertion
of 50
63
63
v[0]
v[0]
30
40
30
v[2]
v[1]
25
10
v[4]
v[3]
5
3
18
v[7]
v[8]
v[9]
8
v[5]
v[2]
v[1]
38
25
10
v[6]
8
v[4]
v[3]
v[5]
5
3
18
50
v[7]
v[8]
v[9]
v[10]
(a)
5
40
(b)
Main Index
Contents
38
v[6]
Example of tree reordering in
pushHeap()
63
63
63
v[0]
...
30
...
v[0]
v[1]
...
50
v[9]
30
25
v[10]
18
v[9]
...
v[1]
30
v[4]
25
v[10]
Step 2 Compare 50 and 30
(Exchange v[4] and v[1])
Main Index
...
50
v[4]
Step 1 Compare 50 and 25
(Exchange v[10] and v[4])
6
v[0]
v[1]
v[4]
18
...
50
Contents
18
25
v[9]
v[10]
Step 3 Compare 50 and 63
(50 in correct location)
pushHeap() – new value added at end
& bubbles upward
template <typename T, typename Compare>
void pushHeap(vector<T>& v, int last, Compare comp)
{
// new item v[last-1] & v[0] to v[last-2] are in heap order
int currentPos = last-1;
// current node
int parentPos = (currentPos-1)/2; // it’s parent
T target = v[last-1];
// new value being pushed
while (currentPos != 0) // not at root yet
{
if (comp(target,v[parentPos])) // out of order
{
// copy parent down & move up a level in tree
v[currentPos] = v[parentPos];
currentPos = parentPos;
parentPos = (currentPos-1)/2;
}
else // heap condition is ok. break
break;
}
// copy target to correct location
v[currentPos] = target;
}
7
Main Index
Contents
Exchanging first & last elements
in popHeap()
63
18
v[0]
v[0]
30
40
30
40
v[1]
v[2]
v[1]
v[2]
25
10
v[4]
v[3]
5
3
18
v[7]
v[8]
v[9]
8
38
10
v[5]
v[6]
v[3]
v[4]
5
3
63
v[7]
v[8]
v[9]
8
38
v[5]
v[6]
After exchanging the root
and last element in the heap
Before a deletion
8
25
Main Index
Contents
Adjusting the semi-heap in
popHeap()
...
40
40
v[0]
v[0]
...
18
38
v[2]
v[2]
8
38
8
v[5]
v[6]
v[5]
Main Index
v[6]
Step 2: Exchange 18 and 38
Step 1: Exchange 18 and 40
9
18
Contents
PopHeap()
template <typename T, typename Compare>
void popHeap(vector<T>& v, int last, Compare comp)
{
T temp;
// exchange first and last elements in heap
temp = v[0];
v[0] = v[last-1];
v[last-1] = temp;
// trickle down root over the range [0, last-1)
adjustHeap(v, 0, last-1, comp);
}
10
Main Index
Contents
adjustHeap() – converts semi-heap to
heap : range [first,last)
template <typename T, typename Compare>
void adjustHeap(vector<T>& v, int first, int last,
Compare comp)
{
int currentPos = first;
// start at v[first]
T target = v[first];
// save value temporarily
int childPos = 2 * currentPos + 1; // select left child
while (childPos <= last-1)
// more nodes to check
{
if ((childPos+1 <= last-1) &&
comp(v[childPos+1], v[childPos]))
childPos = childPos + 1;
// select right child
if (comp(v[childPos],target))
{
v[currentPos] = v[childPos];
// copy selected child up
currentPos = childPos;
// move down
childPos = 2 * currentPos + 1; // left child
}
else // target belongs at currentPos
break;
}
v[currentPos] = target; // copy target to currentPos
}
11
Main Index
Contents
Heap Sort

Two steps

(1) Build a heap (Heapify the vector)
(2) For each Index from v.sixe()-1 to 1 do

–
–


12
Swap v[0] and v[Index] : This places a single item in
the sorted portion of the vector (following the heap)
Adjust semi-heap (vector within range [0,Index) )
To sort In Order : Use MAX Heap
To sort In Reverse Oreder : Use MIN Heap
Main Index
Contents
Building a Heap (Heapifying a Vector)
9
17
12
30
65
20
50
4
60
19
Initial Vector
9
17
12
30
65
50
4
20
60
19
adjustHeap () at 4 causes no changes
(A)
13
Main Index
Contents
Example of Heapifying a Vector
(Cont…)
9
9
65
30
20
50
4
60
12
17
12
65
60
30
19
4
19
65
9
12
4
20
30
17
12
19
adjustHeap() at 1 moves 12 down two levels
(D)
14
60
50
60
65
50
17
adjustHeap () at 2 moves 17 down
(C)
adjustHeap () at 3 moves 30 down
(B)
30
20
50
Main Index
19
4
20
17
9
adjustHeap() at 0 moves 9 down three levels
(E)
Contents
Making a Heap: Heapifying using
makeHeap()
template <typename T, typename Compare>
void makeHeap(vector<T>& v, Compare comp)
{
int heapPos, lastPos;
// compute the size of the heap and the index
// of the last parent
lastPos = v.size();
heapPos = (lastPos - 2)/2;
// filter down every parent in order from last parent
// down to root
while (heapPos >= 0)
{
adjustHeap(v,heapPos, lastPos, comp);
heapPos--;
}
}
15
Main Index
Contents
Example of heap sort
int arr[] = {50, 20, 75, 35, 25};
vector<int> v(arr, 5);
75
35
50
25
20
Heapified Tree
16
Main Index
Contents
Example of Implementing heap
sort (Cont….)
Calling popHeap() with last = 4
deletes 50 and stores it in h[3]
Calling popHeap() with last = 5
deletes 75 and stores it in h[4]
50
35
1
2
1
20
25
75
20
17
20
35
25
75
50
Main Index
25
Calling popHeap() with last = 2
deletes 25 and stores it in h[1]
1
25
50
2
75
50
Calling popHeap() with last = 3
deletes 35 and stores it in h[2]
20
35
Contents
35
75
heapSort() - in “d_sort.h”
template <typename T, typename Compare>
void heapSort (vector<T>& v, Compare comp)
{
// "heapify" the vector v
makeHeap(v, comp);
int i, n = v.size();
// iteration that determines elements v[n-1] ... v[1]
for(i = n; i > 1;i--)
{
// call popHeap() to move next largest to v[n-1]
popHeap(v, i, comp);
}
}
18
Main Index
Contents
Program using heapSort()
#include <iostream>
#include <vector>
#include <functional>
#include "d_sort.h“
#include "d_util.h“
#include "d_random.h“
using namespace std;
//
//
//
//
for
for
for
for
greater<T>() operator
heapSort()
writeVector()
randomNumber
int main() {
vector<int> v;
randomNumber rnd;
int i;
// create a vector with 15 random integers
for (i = 0; i < 15; i++)
v.push_back(rnd.random(100));
cout << "Sort in ascending order" << endl << "
";
heapSort(v,greater<int>()); // also callable with less<int>()
writeVector(v);
cout << endl;
return 0;
}
19
Main Index
Contents



File Structure
A text file contains ASCII characters with a newline
sequence separating lines.
A binary file consists of data objects that vary from a
single character (byte) to more complex structures that
include integers, floating point values, programmergenerated class objects, and arrays.
each data object in a file a record
File as a direct access structure
20
R0
R1
R2
R3
R4
Ri
Rn-2 Rn-1
0
1
2
3
4
currentPos
n-2 n-1
Main Index
Contents



offset
beg
21
Direct File Access
The functions seekg() and seekp() allow the
application to reposition the current file pointers.
The seek functions take an offset argument that
measures the number of bytes from the
beginning (beg), ending (end), or current position
(cur) in the file.
If a file is used for both input and output, use the
seek functions tellg() and seekg().
expands
offset
file
offset offset
cur
Main Index
end
Contents

x
Implementing the bitVector
Class
bitMask() returns an unsigned character value
containing a 1 in the bit position representing i.
x
x
x
0
x
x
x
member[vectorIndex(i)]
Set bit i
0
0
0
0
1
0
0
0
x
x
x
bitMask(i) | member[vectorIndex(i)] x
x
x
x
1
x
x
x
~bitMask(i ) & member[vectorIndex(i)] x
x
x
x
0
x
x
x
bitMask(i)
x
x
x
x
1
member[vectorIndex(i)]
Clear bit i
1
1
1
1
0
1
1
1
~bitMask(i)
22
Main Index
Contents



Lossless Compression
data compression loses no information
original data can be recovered exactly from the
compressed data.
normally apply to "discrete" data, such as text, word
processing files, computer applications, and so forth
This paper ...
...
...
...
...
Submitted by
J. Q. Student
23
Compress
1001011101010111001010
...
...
...
1010010110111010110011
Main Index
Contents
Reconstruct
This paper ...
...
...
...
...
Submitted by
J. Q. Student



Lossy Compression
loses some information during compression and
the data cannot be recovered exactly
shrink the data further than lossless compression
techniques.
Sound files often use this type of compression,
10110101011101
...
10111011101110
10110111
11011010110111
...
11010110110101
11011011011011
24
Compress
1101011010101110100010
...
...
...
0100100100000101101001
Main Index
Contents
Reconstruct
10110101011101
...
10111011101110
11011
...
11010110110101
11011011011011
Huffman Tree – data codes are from
edges (root to leaf)
25
Main Index
Contents






26
Huffman Codes
Variable length binary codes
Shortest codes for most frequently occurring
data value
Longest codes for least frequently occurring
data values
Codes have unique prefixes
Codes are determined from binary tree built
from priority queue of data value/frequency
pairs
Tree is built bottom-up (always balanced)
Main Index
Contents
Building Huffman Tree
d:6
a:16
b:4
e:20
f:3
c:8
Priority Queue
27
Main Index
Contents
Building Huffman Tree (Cont…)
d:6
a:16
7
e:20
f:3
c:8
Priority Queue
28
Main Index
Contents
b:4
Building Huffman Tree (Cont…)
29
Main Index
Contents
Building Huffman Tree (Cont…)
30
Main Index
Contents
Summary Slide 1
§- Heap
- an array-based tree that has heap order
- maximum heap: if v[i] is a parent, then v[i]  v[2i+1]
and v[i]  v[2i+2] (a parent is  its children)
- root, v[0], is the maximum value in the vector
- minimum heap: the parent is  its children.
- v[0] is the minimum value
- Insertion: place the new value at the back of the
heap and filtering it up the tree.
31
Main Index
Contents
Summary Slide 2
§- Heap (Cont…)
- Deletion: exchanging its value with the back of the
heap and then filtering the new root down the tree,
which now has one less element.
- Insert and delete running time: O(log2 n)
- heapifying: apply the filter-down operation to the
interior nodes, from the last interior node in the tree
down to the root
- running time: O(n)
- The O(n log2 n) heapsort algorithm heapifies a
vector and erases repeatedly from the heap,
locating each deleted value in its final position.
32
Main Index
Contents
Summary Slide 3
§- Binary File
- a sequence of 8-bit characters without the
requirement that a character be printable and with
no concern for a newline sequence that terminates
lines
- often organized as a sequence of records: record 0,
record 1, record 2, ..., record n-1.
- uses for both input and output, and the C++ file
<fstream> contains the operations to support these
types of files.
- the open() function must use the attribute ios::binary
33
Main Index
Contents
Summary Slide 4
§- Binary File (Cont…)
- For direct access to a file record, use the function
seekg(), which moves the file pointer to a file record.
- accepts an argument that specifies motion from
the beginning of the file (ios::beg), from the
current position of the file pointer (ios::cur), and
from the end of the file (ios::end)
- use read() function to inputs a sequence of bytes
from the file into block of memory and write()
function to output from a block of memory to a
binary file
34
Main Index
Contents
Summary Slide 5
§- Bit Manipulation Operators
- | (OR), & (AND), ^ (XOR), ~ (NOT), << (shift left),
and >> (shift right)
- use to perform operations on specific bits within a
character or integer value.
- The class, bitVector, use operator overloading
- treat a sequence of bits as an array, with bit 0 the
left-most bit of the sequence
- bit(), set(), and clear() allow access to specific bits
- the class has I/O operations for binary files and the
stream operator << that outputs a bit vector as an
ASCII sequence of 0 and 1 values.
35
Main Index
Contents
Summary Slide 6
§- File Compression Algorithm
- encodes a file as sequence of characters that
consume less disk space than the original file.
- Two types of compression algorithms:
1) lossless compression
– restores the original file.
– approach: count the frequency of occurrence
of each character in the file and assign a
prefix bit code to each character
- file size: the sum of the products of each
bit-code length and the frequency of
occurrence of the corresponding character.
36
Main Index
Contents
Summary Slide 7
§- File Compression Algorithm (Cont…)
2) lossy compression
– loses some information during compression
and the data cannot be recovered exactly
– normally used with sound and video files
- The Huffman compression algorithm builds optimal
prefix codes by constructing a full tree with the most
frequently occurring characters and shorter bit
codes near the top of the tree. The less frequently
occurring characters occur near the bottom of the
tree and have longer bit codes.
37
Main Index
Contents
Summary Slide 8
§- File Compression Algorithm (Cont…)
- If the file contains n distinct characters, the loop
concludes after n-1 iterations, having built the
Huffman Tree.
- implementation requires the use of a heap, bit
operations, and binary files
- The use of the bitVector class simplifies the
construction of the classes hCompress and
hDecompress, which perform Huffman
compression and decompression.
- works better with textfiles; they tend to have fewer
unique characters than binary files.
38
Main Index
Contents