CIS 234: Numbering Systems & Character Codes

Download Report

Transcript CIS 234: Numbering Systems & Character Codes

CIS 234: Character Codes
Dr. Ralph D. Westfall
April, 2011
Problem 1 (other PowerPoint)

computers only understand binary
coded data (zeros and ones)


00000000, 11111111, 01010101
people like to count in decimals
00000000=0, 11111111=255, 01010101=85

1st problem: it is extremely hard for
people to work with binary data
Problems 2a and 2b

since computers only work with
numbers, they need to use numbers to
identify letters to print or show on
screen e.g., 01000001=65=A


people who don't read English also use
computers
next problem: what kind of numbering
should be used for different languages?
Problem 2 Solution


using binary data to display characters
make up a "coding scheme" that
assigns characters to numbers


ASCII code: 7-8 bits (1 byte)
Unicode: 16 bits (2 bytes)
ASCII Code

used for teletypes before computers


128 characters in original ASCII
0 to 31 (decimal) control the machine
7 (BEL) rings bell
8 (BS) backspace key
10 (LF) line feed (go down 1 line)
13 (CR) carriage return (to left of page)
Java: '\n' = 10 and 13 together (2 bytes)
ASCII Characters


A = 41 hex (65 decimal), Z = 5A h (90)
a = 61 hex (97 decimal), z = 7A h (122)


see calculator (String or ASCII choices)
space character = 20 hex (32 decimal)


see how space character code is used in
browser Address textbox
; (semicolon) = 3B hex (59 decimal)
Printable ASCII Characters
(space)
ASCII mage is from Wikipedia
ASCII Numbers

codes are for characters on screen and
do NOT equal the values of the
characters

Code numeric values can NOT be used in
calculations without adjustments
0 = 30 hex (ASCII 0 is really 48 decimal)
9 = 39 hex (57 decimal)
Unicode

ASCII is a 7-8 bit encoding scheme


128-256 character limit
Unicode is a 16-bit scheme



Uni comes from the word universal (also
from Unix)
can code 65,536 characters (actually more)
Java uses Unicode encoding so that it can
be used for many different languages
Unicode - 2

Unicode characters for many languages


Western alphabets: Latin (English), Greek,
Cyrillic (Russian), etc.
Unicode uses 0000000 + ASCII for English


00000000 01000001 = A (65 decimal)
Asian characters: CJK (Chinese, Japanese,
Korean) has over 20,000 characters

many character systems require installing
special fonts onto user's computer
Using Unicode in Java
char letter = 'A' ; //easiest way
char letter = '\u0041' ; // also = 'A'
char letter = '\u3220' ;
// or '\u3280' ;
// 1 Chinese character for 1


\ (backslash) = escape character
\u means Unicode (#s are in hexadecimal)
char sound = '\u0007' ;

// BEL
sounds speakers when "printed" to screen
Review Questions




How many bits are there in ASCII code?
How many bits are there in Unicode?
True or False: All ASCII codes can be
seen as characters on the screen
How many characters can be printed
using ASCII? Using Unicode? (match 2)

around 90, around 12,000, over 50,000
Review Questions - 2



Why was Unicode created to handle
over 50,000 characters?
Give an example of what some nonprintable ASCII character does on a
computer or screen
How does Java code need to handle
calculations on numeric characters
entered on the screen by the user
Review Questions - 3


Is a space a character?
What is the Chinese character for the
number 1? 2? 3?


this will NOT be on a test!
see answers on next slide
Chinese Characters: 3, 2 and 1
Appendix

the following slides show how ASCII
characters can be read from the
keyboard and converted to values that
can be used for mathematical
calculations
Reading Characters in DOS
int iInit = System.in.read() ;


gets numeric value of character it reads
if character is A, iInit = 65 (decimal)
char cInit = (char) System.in.read() ;

(char) "casts" (converts) numeric value to
character type
System.out.println(iInit) ;
System.out.println(cInit) ;
//number
//character
Reading Characters in Java - 2


2 characters sent when hit Enter key
CR (13) and then LF (10 decimal)
when accepting keyboard input from
DOS window in Java, need to "absorb"
both characters from Enter keystroke
System.in.read(); System.in.read();


reads characters, doesn't store (=) them
program is now ready to read next input
Using Characters for Math




numbers (characters) read from
keyboard have numeric values
need to convert character's decimal
value to its mathematical value
0 = 30 h (48 decimal), 9 = 39 h (57)
math value = decimal value – 48
int quantity = System.in.read() – 48 ;
code
// notes