Floating point variables of different lengths

Transcript Floating point variables of different lengths

Floating point variables of
different lengths
Trade-off: accuracy vs. memory space
• Recall that the computer can combine adjacent bytes in the
RAM memory to form larger memory cells
Trade-off: accuracy vs. memory space
(cont.)
• Effect of combining memory cells:
Combination # bits in memory cell
Capability
1 byte
8 bits
28 = 256 possible patterns
2 bytes
16 bits
216 = 65536 possible patterns
4 bytes
32 bits
232 = 4294967296 possible patterns
Trade-off: accuracy vs. memory space
(cont.)
• Trade-off between accuracy and memory usage
• To obtain a higher accuracy (= more significant
digit of accuracy), we need to combine more
memory cells
Another trade-off: speed
• Fact:
• Arithmetic expressions that uses higher accuracy will
take longer time to compute
(It take longer time to multiple 15 digit numbers than 2
digit numbers)
Another trade-off: speed (cont.)
• Trade-off between accuracy and speed
• When a Java program needs to use higher accurate
numbers, it will not only use more memory, but it will
also take a longer time to complete.
Floating point numbers in Java
• Java provides 2 different sizes of floating point numbers
(and variables):
• Single precision floating numbers (has lower accuracy
and uses less memory)
• Double precision floating numbers (has more accuracy
and uses more memory)
Floating point numbers in Java (cont.)
• This offer programmers more flexibility:
• Computations that have low accuracy requirements can
use a single precision to save memory space and run
faster
• Computations that have high accuracy requirements
must use a larger size to attain the required accuracy
Floating point numbers in Java (cont.)
• Single precision floating point variable:
• uses 4 consecutive bytes of memory as a single 32 bit
memory cell
• A single precision floating point variable can represent
a floating point number:
• in range of from −1038 to 1038
• and with about 7 decimal digits accuracy
Floating point numbers in Java (cont.)
• A double precision floating point variable is a variable that:
• uses 8 consecutive bytes of memory as a single 64 bit
memory cell
• A double precision floating point variable can
represent a floating point number:
• in range of from −10308 to 10308
• and with about 15 decimal digits accuracy
Defining single and double precision
floating point variables
• We have already learned how to define single precision
floating point variables:
double variableName
;
• The syntax used to define single precision floating point
variables is similar to the one used to define double precision
floating point variables
The only difference is that we need to use a different
keyword to denote single precision floating point variables
Defining single and double precision
floating point variables (cont.)
• Syntax to define single precision floating point variables:
float variableName
;
Warning: float and double are considered
as different types
• What determine a type:
• Each data type uses a unique data encoding method
• Because single and double precision floating point
numbers uses different encoding methods, Java considers
them as different types
Warning: float and double are considered
as different types (cont.)
• You can use the tool below (next slide) to experience how
a single precision floating point number is encoded:
• Type in a decimal number in the field named Decimal
representation
• Press enter to see the 32 bits pattern used to encode the
decimal number
Warning: float and double are considered
as different types (cont.)
• Check the website for Decimal representation.
• The row of tiny squares are the 32 bits
• Checked square represents a bit 1 and unchecked square
represents a bit 0
Warning: float and double are considered
as different types (cont.)
• The following webpage let you compare the different
encoding used in single and double precision:
http://mathcs.emory/~cheung/Courses/170/Syllabus/04/IE
EE-float/IEEE-754-encoding.html
• Usage: Enter a decimal number in the Decimal FloatingPoint field and press Not Rounded
Converting (casting) to a single or a double
precision representation
• The computer has built-in machine instructions to convert
between different encodings
• The Java programming language provides access the
computer's conversion operations through a number of
conversion operators
• Computer jargon:
• Casting operation = a type conversion operation
Converting (casting) to a single or a double
precision representation (cont.)
• Java's Casting operators for single and double precision
floating point numbers:
(float) --- convert to the single precision floating
point representation
(double) --- convert to the double precision
floating
point representation
Converting (casting) to a single or a double
precision representation (cont.)
• Example 1: conversion sequence float ⇒ double ⇒ float
public class Casting01
{
public static void main(String[] args)
{
float x;
// Define single precision floating point
double y;
// Define double precision floating point
x = 3.1415927f;
// f denotes "float"
y = (double) x; // **** convert to double representation
Converting (casting) to a single or a double
precision representation (cont.)
System.out.print("Original single precision x = ");
System.out.println(x);
System.out.print("Converted double precision y = ");
System.out.println(y);
x = (float) y; // **** convert to float representation
System.out.print("Re-converted single precision x = ");
System.out.println(x);
}
}
Converting (casting) to a single or a double
precision representation (cont.)
Output:
Original single precision x = 3.1415927
Converted double precision y =
3.1415927410125732
Re-converted single precision x = 3.1415927
Converting (casting) to a single or a double
precision representation (cont.)
• Notes:
• The trailing letter f in "3.1415927f" denotes a float
typed number
(Yes, even numbers are typed in Java)
• Notice that accuracy of variable x was preserved after
the second conversion
Converting (casting) to a single or a double
precision representation (cont.)
• Example: conversion sequence double ⇒ float ⇒ double
public class Casting01
{
public static void main(String[] args)
{
float x;
// Define single precision floating point
double y;
// Define double precision floating point
x = 3.1415927f;
// f denotes "float"
y = (double) x; // **** convert to double representation
Converting (casting) to a single or a double
precision representation (cont.)
System.out.print("Original single precision x = ");
System.out.println(x);
System.out.print("Converted double precision y = ");
System.out.println(y);
x = (float) y; // **** convert to float representation
System.out.print("Re-converted single precision x = ");
System.out.println(x);
}
}
Converting (casting) to a single or a double
precision representation (cont.)
• Output:
Original double precision x = 3.14159265358979
Converted single precision y = 3.1415927
Re-converted double precision x =
3.1415927410125732
Converting (casting) to a single or a double
precision representation (cont.)
• Notes:
• A decimal number without a trailing "f" has the data type
double
• Notice that we have lost many digits of accuracy in the
float ⇒ double conversion !!!
Priority level of the casting operators
• I want to emphasize that:
• (double)
• (float)
are operators
Priority level of the casting operators
(cont.)
• The casting operators in Java are unary operators (i.e., has
1 operand)
Analogy:
Unary negation operator
−x (negates the value in variable x)
Casting operator is a unary operator
(float)x (converts the value in variable x)
Priority level of the casting operators
(cont.)
• Operators in Java has a priority level
Priority level of casting operators:
Operator
Priority
Note
( .... )
Highest
(float) (double) −
Higher
Unary operator, e.g.:
(float) 3.0
* /
High
Binary operator, e.g.:
4*5
+ -
Lowest
Binary operator, e.g.:
4+5
Priority level of the casting operators
(cont.)
• When operators of different priority appear in one single
arithmetic expression, then the operator with the highest
priority is executed first.
• It's the same as what you have learned in
Elementary School...
Phenomenon: lost of accuracy
• In the previous 2 examples, we have observed the
following phenomenon:
Phenomenon: lost of accuracy (cont.)
• Observation:
• When we convert a float to a double and then back to a
float, there is no loss in accuracy
• When we convert a double to a float and then back to a
float, there is a high loss in accuracy
Phenomenon: lost of accuracy (cont.)
• We can understand why we lose accuracy if we depict the
conversion process as follows:
Phenomenon: lost of accuracy (cont.)
• Explanation:
• The float ⇒ double ⇒ float steps pass through a
widening conversion and retain accuracy
• The double ⇒ float ⇒ double steps pass through a
narrowing conversion and lost accuracy
Overflow condition
• When converting a higher accuracy type to a lower
accuracy type, you may cause an overflow condition
• Overflow:
• Each data type can represent a certain range of values
• Overflow = storing a out of range value into a variable
Overflow condition
• Example:
• A double typed variable can store a value in the range of
−10308 ... 10308
• A float typed variable can only store a value in the range
of −1038 ... 1038
Overflow condition (cont.)
• Here is a Java program with an overflow condition:
public class Overflow1
{
public static void main(String[] args)
{
double d; // range: -10^(308) .. 10^(308)
float f;
// range: -10^(38) .. 10^(38)
d = 3.1415e100; // In range of "double", out of range of "float"
f = (float) d; // Overflow !!!
System.out.print("d = ");
System.out.println(d);
System.out.print("f = ");
System.out.println(f);
}
}
Overflow condition (cont.)
• Output of this program:
d = 3.1415E100
f = Infinity
Overflow condition (cont.)
• Conclusion:
• When you convert a values from a higher accuracy
type to a lower accuracy type, you may cause a
significant loss of information
Safe and unsafe conversion operations
• Safe and unsafe conversions:
• Safe conversion = a conversion from one
representation (encoding) to another representation
(encoding) where there is no loss in accuracy
• Unsafe conversion = a conversion from one
representation (encoding) to another representation
(encoding) where there is significant loss in accuracy
Safe and unsafe conversion operations
(cont.)
• We saw in the previous example that:
• double ⇒ float is a safe conversion
• float ⇒ double is a unsafe conversion
Expressions containing values of different
types
• It is common to use different data types in the same Java
program
• Little known fact about a computer:
• A computer can only operate on data of the same data type
In other words:
• A computer can only add two double typed values
• A computer can only subtract two double double values
• And so on.
Expressions containing values of different
types (cont.)
• A computer can only add two float typed values
• A computer can only subtract two float double values
• And so on.
• A computer does not have an instruction to add (or subtract) a
double typed value and a float typed value
(This has to do with the encoding method used for different
types of data)
Expressions containing values of different
types (cont.)
• Operations on different types of value:
• In order to perform any operation on two values of
differing types, the computer must:
1. Convert one of the types into the other type
2. Perform the operation on the value (now of the
same type
Automatic conversion performed in Java
• It is extremely inconvenient for programmers to have to
write conversion operations when values of different types
are used in the same expression
• Java makes writing programs less painful by providing a
number of automatic conversions
The automatic conversions are all safe conversion
Automatic conversion performed in Java
(cont.)
• Java conversion rule:
• lower type OP higher type ⇒ higher type OP higher type
• When values and/or variables of different types are
used in an arithmetic expression (OP), the values
and/or variables of the less accurate type is
automatically converted to the more accurate type
• The operation (OP) is then performed on 2 values of
the more accurate type
• The result of the operation is also of the more
accurate type
Automatic conversion performed in Java
(cont.)
• Assignment statement: type1 = type2 ;
• If type1 is a higher accuracy type than type2, then
the type2 value is automatically converted to type1
before the assignment statement is executed.
(Because the conversion was safe)
• If type1 is a lower accuracy type than type2, then the
assignment statement is not allowed You will need to
use an casting operator to make the assignment
statement valid.
Automatic conversion performed in Java
(cont.)
• Examples:
float x;
double y;
y = x;
===> higher accuracy type = lower accuracy type
1. the float value in x is converted to a double
2. the (converted) value is assigned to y
x = y;
===> lower accuracy type = higher accuracy type
This assignment is NOT allowed (see rules above)
(This is because the conversion is unsafe)
x = (float) y; ===> 1. The casting operator (float) converts the
double value into a float value
Automatic conversion performed in Java
(cont.)
2. The (converted) float value is assigned to x
y = x + y;
===> x + y
1. the float value is x is converted to a double
2. then + is performed on 2 double values
y = double result
3. The result is double and can be assigned to y
(because y is a double typed variable)
x = x + y;
===> x + y
1. the float value is x is converted to a double
2. then + is performed on 2 double values
x = double result
3. The result is double and cannot be assigned to x
(because x is a float typed variable

Floating point variables of different lengths

Transcript Floating point variables of different lengths

Directory