Complexity of Mergesort

Transcript Complexity of Mergesort

CISC121 – Lecture 12
• Last time:
– Efficient recursive and non-recursive sorts.
– Analyzing the complexity of recursive methods.
• Today
– Last lecture!
Summer 2007
CISC121 - Prof. McLeod
1
You Will Need To:
• Look at exercise 5 and assignment 5.
• If you wish to replace any of your assignment
marks you can submit the “makeup” assignment
on the day of the final exam.
Summer 2007
CISC121 - Prof. McLeod
2
Final Exam…
• On the 16th, unless other arrangements have
been made.
• Exam topics are listed off the course web site.
• Exambank has 18 exams from 1996 to 2006.
• I will not provide full exam solutions, but will be
happy to discuss individual exam problems.
• Ana or Krista can supply extra tutoring if needed.
Ask them if you want an exam prep tutorial.
Summer 2007
CISC121 - Prof. McLeod
3
Today
• Analyze the complexity of mergesort.
• Number representation and roundoff error.
• Movie!
• “Satisfaction” survey
Summer 2007
CISC121 - Prof. McLeod
4
Mergesort – “aMergeSort” Code
• Code for sorting arrays:
public static void aMergeSort (int[] A) {
aMergeSort(A, 0, A.length-1);
} // end aMergeSort
public static void aMergeSort (int[] A, int first,
int last) {
if (first < last) {
int mid = (first + last) / 2;
aMergeSort(A, first, mid);
aMergeSort(A, mid + 1, last);
aMerge(A, first, last);
} // end if
} // end aMergeSort recursive
Summer 2007
CISC121 - Prof. McLeod
5
Mergesort – “aMerge” Code
private static void aMerge (int[] A, int first, int
last) {
int mid = (first + last) / 2;
int i1 = 0, i2 = first, i3 = mid + 1;
int[] temp = new int[last - first + 1];
while (i2 <= mid && i3 <= last) {
if (A[i2] < A[i3]) {
temp[i1] = A[i2]; i2++;
} else {
temp[i1] = A[i3]; i3++;
}
i1++;
} // end while
Summer 2007
CISC121 - Prof. McLeod
6
Mergesort – “aMerge” Code - Cont.
while (i2 <= mid) {
temp[i1] = A[i2]; i2++; i1++;
} // end while
while (i3 <= last) {
temp[i1] = A[i3]; i3++; i1++;
} // end while
i1 = 0; i2 = first;
while (i2 <= last) {
A[i2] = temp[i1]; i1++; i2++;
} // end while
} // end aMerge
Summer 2007
CISC121 - Prof. McLeod
7
Complexity of Mergesort
• Consider the aMergeSort code shown above:
• Suppose that the entire method takes t(n) time,
where n is A.length. We want to know the big
O notation for t(n).
• There are no loops in aMergeSort, just some
constant time operations, the two recursive calls
and the call to aMerge.
n n
t ( n )  a  t    t    tmerge ( n )
2 2
Summer 2007
CISC121 - Prof. McLeod
8
Complexity of Mergesort - Cont.
• What is the time function for aMerge?
• There is some O(1) stuff and four loops that are
O(n):
tmerge (n )  a  cn
• So,
n n 
t (n )  a  t    t    a  cn
2 2
Summer 2007
CISC121 - Prof. McLeod
9
Complexity of Mergesort - Cont.
• So far, we have not made any mention of the
state of the data. Does it make any difference if
the data is in reverse order (worst case), random
order (average case) or in order already (best
case)?
• Express t(n) in a recursive expression:
if n  1
b

t (n)  
n n
a  t  2   t  2   cn otherwise

Summer 2007
CISC121 - Prof. McLeod
10
Complexity of Mergesort - Cont.
• Assume that n is a power of 2:
b

t (n)  
n
a  2t  2   cn

if n  1
otherwise
• (It is easy enough to show that the proof still holds
when n is not a power of two - but I’m not going to
do that here).
Summer 2007
CISC121 - Prof. McLeod
11
Complexity of Mergesort - Cont.
• Substitute n/2 for n, to get t(n/2):
n
 n  n
t    a  2t  2   c  
2
2  2
or,

 n   n 
t ( n )  a  2 a  2t  2   c    cn
 2   2 

n
t ( n )  3a  2 t  2   2cn
2 
2
Summer 2007
(i  2 )
CISC121 - Prof. McLeod
12
Complexity of Mergesort - Cont.
• Do the next unrolling, which will be n/22:
n
t (n )  7a  2 t  3   3cn
2 
3
(i  3)
• So, after i unrolling’s:
n
t ( n )  2  1a  2 t  i   icn
2 
i
Summer 2007
i
CISC121 - Prof. McLeod
13
Complexity of Mergesort - Cont.
• This recursion stops when the anchor case, n  1
is encountered. This will occur when:
n
1
i
2
or when,
2i  n, or i  log n
• Substituting this back in the equation on the
previous slide:
Summer 2007
CISC121 - Prof. McLeod
14
Complexity of Mergesort - Cont.
• At the anchor case:
n
t ( n )  n  1a  nt    (log n )cn
n
or,
t ( n )  n  1a  nt (1)  cn log n
or,
t ( n )  n  1a  nb  cn log n  j  kn  cn log n
• Now the equation can be simplified to yield the big O
notation, which indicates that t(n) is O(nlog(n)).
Summer 2007
CISC121 - Prof. McLeod
15
public static void quickSort (int[] A, int first, int
last) {
int lower = first + 1;
int upper = last;
swap(A, first, (first+last)/2);
int pivot = A[first];
while (lower <= upper) {
while (A[lower] < pivot) lower++;
while (A[upper] > pivot) upper--;
if (lower < upper) swap(A, lower++, upper--);
else lower++;
}
swap(A, upper, first);
if (first < upper - 1) quickSort(A, first, upper-1);
if (upper + 1 < last) quickSort(A, upper+1, last);
} // end quickSort(subarrays)
Summer 2007
CISC121 - Prof. McLeod
16
Complexity of Quicksort
• The worst case is when a near-median value is
not chosen – the pivot value is always a maximum
or a minimum value. Now the algorithm is O(n2).
• However, if the pivot values are always near the
median value of the arrays, the algorithm is
O(nlog(n)) – which is the best case. (See the
derivation of this complexity for merge sort).
• The average case also turns out to be O(nlog(n)).
Summer 2007
CISC121 - Prof. McLeod
17
Number Representation
• Binary numbers or “base 2” is a natural
representation of numbers to a computer.
• As a transition, hexadecimal (or “hex”, base 16)
numbers are also used.
• Octal (base 8) numbers are used to a lesser
degree.
• Decimal (base 10) numbers are *not* naturally
represented in computers.
Summer 2007
CISC121 - Prof. McLeod
18
Number Representation - Cont.
• In base 2 (digits either 0 or 1):
r=2, a binary number: (110101.11)2=
1×25+1×24+0×23+1×22+0×21+1×20 +1×2-1 +1×2-2 =
=53.75 (in base 10)
“r” is the “radix” or the base of the number
Summer 2007
CISC121 - Prof. McLeod
19
Number Representation - Cont.
• Octal Numbers: a base-8 system with 8 digits: 0,
1, 2, 3, 4, 5, 6 and 7:
•
For example:
(127.4)8 = 1×82+2×81+7×80+4×8-1=87.5
Summer 2007
CISC121 - Prof. McLeod
20
Number Representation - Cont.
• Hexadecimal Numbers: a base-16 system with
16 digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E,
and F:
•
For example:
(B65F)16 = 11×163+6×162+5×161+15×160 = 46687.
Summer 2007
CISC121 - Prof. McLeod
21
Number Representation - Cont.
• The above series show how you can convert from
binary, octal or hex to decimal.
• How to convert from decimal to one of the other
bases?:
• integral part: divide by r and keep the remainder.
• decimal part: multiply by r and keep the carry
• “r” is the base - either 2, 8 or 16
Summer 2007
CISC121 - Prof. McLeod
22
Number Representation - Cont.
• For example,
convert 625.7610
to binary:
• So, 62510 is
10011100012
Divisor(r)
Dividend
2
625
2
312 (quotient)
2
156
2
78
0
2
39
0
2
19
1
2
9
1
2
4
1
2
2
0
2
1
0 most
1 significant
digit
0
Summer 2007
Remainder
CISC121 - Prof. McLeod
least
1 significant
digit
0
23
Number Representation - Cont.
• For the “0.7610”
part:
• So, 0.7610 is
0.11000010102
• 625.76 is:
Multiplier(r)
Carry
2
0 .76
2
1 .52 (product)
1
2
1 .04
1
2
0 .08
0
2
0 .16
0
2
0 .32
0
2
0 .64
0
2
1 .28
1
2
0 .56
0
2
1 .02
1
(1001110001.1100001010)2
Summer 2007
Multiplicand
CISC121 - Prof. McLeod
...
24
Number Representation - Cont.
• Converting between binary, octal and hex is much
easier - done by “grouping” the numbers:
• For example:
(010110001101011.111100000110)2=(?)8
010 110 001 101 011 . 111 100 000 110
(2
Summer 2007
6
1
5
3
.
7
4
CISC121 - Prof. McLeod
0
6)8
25
Number Representation - Cont.
• Another example:
(2C6B.F06)16=(?)2
(2
( 0010
Summer 2007
C
6
1100 0110
B
.
F
0
6)16
1011 . 1111 0000 0110)2
CISC121 - Prof. McLeod
26
From Before: Integer Primitive Types in
Java
• For byte, from -128 to 127, inclusive (1 byte).
• For short, from -32768 to 32767, inclusive (2
bytes).
• For int, from -2147483648 to 2147483647,
inclusive (4 bytes).
• For long, from -9223372036854775808 to
9223372036854775807, inclusive (8 bytes).
• A “byte” is 8 bits, where a “bit” is either 1 or 0.
Summer 2007
CISC121 - Prof. McLeod
27
Storage of Integers
• An “un-signed” 8 digit binary number can range
from 00000000 to 11111111
• 00000000 is 0 in base 10.
• 11111111 is 1x20 + 1x21 + 1x22 + … + 1x27 = 255,
base 10.
Summer 2007
CISC121 - Prof. McLeod
28
Storage of Integers - Cont.
• So, how can a negative binary number be stored?
• One way is to use the Two’s Complement
system of storage.
• Make the most significant bit a negative number:
• So, the lowest “signed” binary 8 digit number is
now: 10000000, which is -1x27, or -128 base 10.
Summer 2007
CISC121 - Prof. McLeod
29
Storage of Integers - Cont.
• Two’s Complement System:
Summer 2007
binary
base 10
10000000
-128
10000001
-127
11111111
-1
00000000
0
00000001
1
01111111
127
CISC121 - Prof. McLeod
30
Storage of Integers - Cont.
• For example, the binary number
10010101 is
1x20 + 1x22 + 1x24 - 1x27
= 1 + 4 + 16 - 128
= -107 base 10
• Now you can see how the primitive integer type,
byte, ranges from -128 to 127.
Summer 2007
CISC121 - Prof. McLeod
31
Storage of Integers - Cont.
• Suppose we wish to add 1 to the largest byte
value:
01111111
+00000001
• This would be equivalent to adding 1 to 127 in
base 10 - the result would normally be 128.
• In base 2, using two’s compliment, the result of
the addition is 10000000, which is -128 in base
10!
• So integer numbers wrap around, in the case of
overflow - no warning is given in Java!
Summer 2007
CISC121 - Prof. McLeod
32
Storage of Integers - Cont.
• An int is stored in 4 bytes using “two’s
complement”.
• An int ranges from:
10000000 00000000 00000000 00000000
to
01111111 11111111 11111111 11111111
or -2147483648 to 2147483647 in base 10
Summer 2007
CISC121 - Prof. McLeod
33
Real Primitive Types
• For float, (4 bytes) roughly ±1.4 x 10-38 to ±3.4
x 1038 to 7 significant digits.
• For double, (8 bytes) roughly ±4.9 x 10-308 to
±1.7 x 10308 to 15 significant digits.
Summer 2007
CISC121 - Prof. McLeod
34
Storage of Real Numbers
• The system used to store real numbers in Java
complies with the IEEE standard number 754.
• Like an int, a float is stored in 4 bytes or 32
bits.
• These bits consist of 24 bits for the mantissa and
8 bits for the exponent:
mantissa
exponent
00000000 00000000 00000000 00000000
Summer 2007
CISC121 - Prof. McLeod
35
Storage of Real Numbers - Cont.
• So a value is stored as:
value = mantissa  2exponent
• The exponent for a float can range from 2-128 to
2128, which is about 10-38 to 1038.
• The float mantissa must lie between -1.0 and
1.0 exclusive, and will have about 7 significant
digits when converted to base 10.
Summer 2007
CISC121 - Prof. McLeod
36
Storage of Real Numbers - Cont.
• The double type is stored using 8 bytes or 64
bits - 53 bits for the mantissa, and 11 bits for the
exponent.
• The exponent gives numbers between 2-1024 and
21024, which is about 10-308 and 10308.
• The mantissa allows for the storage of about 16
significant digits in base 10.
• (Double.MAX_VALUE is:
1.7976931348623157E308)
Summer 2007
CISC121 - Prof. McLeod
37
Storage of Real Numbers - Cont.
• See the following web site for more info:
http://grouper.ieee.org/groups/754/
• Or:
http://en.wikipedia.org/wiki/IEEE_floatingpoint_standard
Summer 2007
CISC121 - Prof. McLeod
38
Storage of Real Numbers - Cont.
• So, a real number can only occupy a finite
amount of storage in memory.
• This effect is very important for two kinds of
numbers:
– Numbers like 0.1 that can be written exactly in base
10, but cannot be stored exactly in base 2.
– Real numbers (like  or e) that have an infinite number
of digits in their “real” representation can only be
stored in a finite number of digits in memory.
• And, we will see that it has an effect on the
accuracy of mathematical operations.
Summer 2007
CISC121 - Prof. McLeod
39
Roundoff Error
• Consider 0.1:
(0.1)10 = (0.0 0011 0011 0011 0011 0011…)2
• What happens to the part of a real number that
cannot be stored?
• It is lost - the number is either truncated or
rounded (truncated in Java).
• The “lost part” is called the Roundoff Error.
Summer 2007
CISC121 - Prof. McLeod
40
Storage of “Real” or “Floating-Point” Numbers
- Cont.
• Compute:
10000
 0.1
i 1
• And, compare to 1000.
float sum = 0;
for (int i = 0; i < 10000; i++)
sum += 0.1;
System.out.println(sum);
Summer 2007
CISC121 - Prof. McLeod
41
Storage of “Real” or “Floating-Point” Numbers
- Cont.
• Prints a value of 999.9029 to the screen.
• If sum is declared to be a double then the
value: 1000.0000000001588 is printed to the
screen.
• So, the individual roundoff errors have piled up to
contribute to a cumulative error in this
calculation.
• As expected, the roundoff error is smaller for a
double than for a float.
Summer 2007
CISC121 - Prof. McLeod
42
Roundoff Error – Cont.
• This error is referred to in two different ways:
• The absolute error:
absolute error = |x - xapprox|
• The relative error:
relative error = (absolute error)  |x|
Summer 2007
CISC121 - Prof. McLeod
43
Roundoff Error - Cont.
• So for the calculation of 1000 as shown above,
the errors are:
Type
Absolute
Relative
float
0.0971
9.71E-5
double
1.588E-10
1.588E-13
• The relative error on the storage of 0.1 is the
absolute error divided by 1000.
Summer 2007
CISC121 - Prof. McLeod
44
The Effects of Roundoff Error
• Roundoff error can have an effect on any
arithmetic operation carried out involving real
numbers.
• For example, consider subtracting two numbers
that are very close together:
• Use the function
f ( x )  1  cos( x )
for example. As x approaches zero, cos(x)
approaches 1.
Summer 2007
CISC121 - Prof. McLeod
45
The Effects of Roundoff Error
• Using double variables, and a value of x of
1.0E-12, f(x) evaluates to 0.0.
• But, it can be shown that the function f(x) can also
be represented by f’(x):
2
sin ( x)
f ' ( x)  f ( x) 
1  cos( x)
• For x = 1.0E-12, f’(x) evaluates to 5.0E-25.
• The f’(x) function is less susceptible to roundoff
error.
Summer 2007
CISC121 - Prof. McLeod
46
The Effects of Roundoff Error - Cont.
• Another example. Consider the smallest root of
the polynomial: ax2+bx+c=0:
 b  b  4ac
x1 
2a
2
• What happens when ac is small, compared to b?
• It is known that for the two roots, x1 and x2:
c
x1 x2 
a
Summer 2007
CISC121 - Prof. McLeod
47
The Effects of Roundoff Error - Cont.
• Which leads to an equation for the root which is
not as susceptible to roundoff error:
x1 
 2c
b  b  4ac
2
• This equation approaches –c/b instead of zero
when ac << b2.
Summer 2007
CISC121 - Prof. McLeod
48
The Effects of Roundoff Error - Cont.
• The examples above show what can happen
when two numbers that are very close are
subtracted.
• Remember that this effect is a direct result of
these numbers being stored with finite accuracy in
memory.
Summer 2007
CISC121 - Prof. McLeod
49
The Effects of Roundoff Error - Cont.
• A similar effect occurs when an attempt is made to add a
comparatively small number to a large number:
boolean aVal = ((1.0E10 + 1.0E-20)==1.0E10);
System.out.println(aVal);
• Prints out true to the screen
• Since 1.0E-20 is just too small to affect any of the bit
values used to store 1.0E10. The small number would
have to be about 1.0E-5 or larger to affect the large
number.
• So, keep this behaviour in mind when designing
expressions!
Summer 2007
CISC121 - Prof. McLeod
50
The Effect on Summations
• Taylor Series are used to approximate many
functions. For example:
( 1)i 1 xi
ln( 1  x )  
i
i 1

• For ln(2):
( 1)i 1
1 1 1
ln( 2)  
 1     ...
i
2 3 4
i 1

Summer 2007
CISC121 - Prof. McLeod
51
The Effect on Summations – Cont.
• Since we cannot loop to infinity, how many terms
would be sufficient?
• Since the sum is stored in a finite memory space,
at some point the terms to be added will be much
smaller than the sum itself.
• If the sum is stored in a float, which has about 7
significant digits, a term of about 1x10-8 would not
be significant. So, i would be about 108 - that’s a
lot of iterations!
Summer 2007
CISC121 - Prof. McLeod
52
The Effect on Summations - Cont.
• On testing using a float, it took 33554433
iterations and 25540 msec to compute! (sum no
longer changing, value = 0.6931375)
• Math.log(2) = 0.6931471805599453
• So, roundoff error had a significant effect and the
summation did not even provide the correct value.
A float could only provide about 5 correct
significant digits, tops.
• For double, about 1015 iterations would be
required! (I didn’t try this one…)
• So, this series does not converge quickly, and
roundoff error has a strong effect on the answer!
Summer 2007
CISC121 - Prof. McLeod
53
The Effect on Summations - Cont.
• Here is another way to compute natural logs:
1
1 x 
2 i 1
ln 
x
  2
1 x 
i 0 2i  1

• Using x = 1/3 will provide ln(2).
Summer 2007
CISC121 - Prof. McLeod
54
The Effect on Summations - Cont.
• For float, this took 8 iterations and <1msec
(value = 0.6931472).
• Math.log(2) = 0.6931471805599453
• For double, it took 17 iterations, <1 msec to give
the value = 0.6931471805599451
• Using the Windows calculator ln(2) =
0.69314718055994530941723212145818 (!!)
• So, the use of the 17 iterations still introduced a
slight roundoff error.
Summer 2007
CISC121 - Prof. McLeod
55
Aside - Extended Precision in Windows
Summer 2007
CISC121 - Prof. McLeod
56
Numeric Calculations
• Error is introduced into a calculation through two
sources (assuming the formulae are correct!):
– The inherent error in the numbers used in the
calculation.
– Error resulting from roundoff error.
• Often the inherent error dominates the roundoff
error.
• But, watch for conditions of slow convergence or
ill-conditioned matrices, where roundoff error will
accumulate or is amplified and end up swamping
out the inherent error.
Summer 2007
CISC121 - Prof. McLeod
57
Numeric Calculations - Cont.
• Once a number is calculated, it is very important
to be able to estimate the error using both
sources, if necessary.
• The error must be known in order that the number
produced by your program can be reported in a
valid manner.
• This is a non-trivial topic in numeric calculation
that we will not discuss in this course.
Summer 2007
CISC121 - Prof. McLeod
58
Real World Roundoff Error Disasters
• Arianne 5 Launch
• Patriot Missiles
• See the movie!
Summer 2007
CISC121 - Prof. McLeod
59
Patriot Missile Problem
• The Patriot’s tracking system used a 24 bit
number to keep track of the number of tenth
seconds passed since the tracking system was
turned “on”.
• 0.1 in binary is
0.0001100110011001100110011001100....
• If you only have 24 bits to store this number then
the error is
0.0000000000000000000000011001100…. or
0.000000095 in base 10
Summer 2007
CISC121 - Prof. McLeod
60
Patriot Missile Problem, Cont.
• So over 100 hours of operation:
0.000000095×100×60×60×10=0.34 seconds out.
• A scud is moving a mach 5 = 1,676 metres per
second, so the error is 0.34×1,676 = 570 metres.
Summer 2007
CISC121 - Prof. McLeod
61

Complexity of Mergesort

Transcript Complexity of Mergesort

Directory