Transcript Exercise 2

Algorithm Programming 1
89-210
Exercise 2
Bar-Ilan University
2007-2008 ‫תשס"ח‬
by Moshe Fresko
Exercise 2 – Compression Algorithms
1.
Define an interface “Compression” in “Compression.java” as:
interface Compression {
void compress(InputStream is, OutputStream os) throws IOException ;
void decompress(InputStream is, OutputStream os) throws IOException ;
void stop() ;
}
// Note that these definitions are as general as possible to let flexibility in the
input and output media chosen.
2.
Define the LZW compression algorithm in file LZW.java
3.
This class must implement the above Compress interface.
Exercise 2 – Compression Algorithms
1.
2.
Write an application program “main()” in “LZW.java” that will run in the
following form
java LZW compress/decompress InputFile OutputFile
It will run the given Algorithm for compressing/decompressing the given
input file into the given output file.
Example:
java LZW compress MyTextFile.txt MySmallFile.lzw
will compress MyTextFile.txt into MySmallFile.lzw
java LZW decompress MySmallFile.lzw MyNewTextFile.txt
will uncompress the given file into a new file
Exercise 2 – Compression Algorithms

You can use the following two classes to keep a dynamic list and an associative
array



ArrayList (with interface List)
HashMap (with interface Map)
Example:
// For keeping a dynamic list
List myList = new ArrayList() ;
// For keeping an associative array
Map myMap = new HashMap() ;

List




Map




boolean add(Object)
int size()
Object get(int)
Object put(Object key,Object value)
boolean containsKey(Object key)
Object get(Object key)
For more information, look at the Java API specification page.
Exercise 2 – Compression Algorithms

Algorithm Parameters:




LZW: The maximal initial Dictionary size is 512 entries which requires 9 bits for
writing. First 256 entries are the ascii characters.
After reaching the end of the dictionary it will double the dictionary size. Which means
from that point on the representation of an entry will be 10 bits and will be a dictionary
of max size 1024.
This will continue until 16 bits. After reaching 216 entries, it will stop inserting entries
into the dictionary.
Please take the number of max bits (16 in our case) as a parameter into the constructor.
The default constructor will start it with 16.



In each dictionary sizes (9 bits=512, 10 bits=1024 etc.) the last element (all 1-s) will be
kept for the End-Of-Stream mark.




LZW() { this(16); }
LZW(int numOfMaxBits) { /*Initialization*/ }
This Mark (all 1-s) will be written after the whole InputStream is compressed (in .compress()
function).
Parallel to that it will be the mark for the end-of-stream when reading from input-stream of
decompress. (in .decompress() function)
Note that the number of bits must be increased ONE STEP BEFORE the dictionary is full.
You have to use CompactInputStream/CompactOutputStream from the 1st
Exercise to write the bits into an output or to read from input.
Exercise # 2
Compression Algorithms

Important:

You submit via submitex.










Course number: 89-210
Exercise: ex2
Two files will be delivered. Compression.java and LZW.java
If you use your version of Compact-Streams, submit also them. If you use the
version from the Course Site, there is no need for submission.
It must compile/work under Java 1.4 / 1.5 / 1.6. You can try it on the Unix
environment (on Sunshine).
Do not write debugging information to the console or any other place.
Write enough comments. Write wherever you think there is a need for the
understanding of the code.
Write your code according to the OOP principles.
Everybody must do it alone.
Deadline:

1 Jan 2008
Exercise # 2
How to Re-Size the Number-of-Bits?





In the Beginning of the compression/decompression process, the
dictionary size is 256 and the number of bits to be written/read is 9.
In each step of the LZW algorithm (compress/decompress) the dictionary
size will increase by one.
When dictionary size is 400 we must read/write 9-bit numbers in the range
0-399. If we read/write 511 (9 bit of all 1-s), this means we
reached/marked the end of stream (end of compressed data).
When dictionary size is 512, it means the number 511 has an entry in the
dictionary. That means that we cannot use it as the end-of-stream mark. In
that case the number of bits will be increased to 10 and the end-of-stream
mark will be 1023 (10 bits of 1-s).
This will continue until there are 216-1 entries in the dictionary, at that
point we will keep the dictionary in its current state. The end-of-stream
mark will be 65535 (16 bits of 1-s).