0 - Ohio State Computer Science and Engineering

Download Report

Transcript 0 - Ohio State Computer Science and Engineering

CSE 360: Introduction to Computer
Systems
Course Notes
Bettina Bair ([email protected])
http://carmen.osu.edu
http://www.cse.ohio-state.edu/~bbair
Copyright © 1998-2006 by Rick Parent, Todd Whittaker, Bettina Bair, Pete Ware, Wayne Heym
CSE360
1
Section Details



MTWF 9:30 & 2:30, DL 305
Bettina Bair ([email protected])
Homepage:
– http://www.cse.ohio-state.edu/~bbair


Office: Dreese Labs 493
Hours: MW 10:30, TF 1:30
– or by appointment


Phone: 292-2565
Grader:
– Hamid Ettefagh ([email protected])
– John Colvin ([email protected])
CSE360
2
Topics of Discussion
 Course
description
 Required texts
 Policies
 Syllabus
 Expectations
CSE360
3
Description:

Introduction to computer architecture at the
machine language and assembly language
level; assembly language programming and
lab.

Prerequisites: CSE 214 or 222 or H222
CSE360
4
Text:
1.
2.
Computer Systems: Architecture, Organization,
and Programming, Arthur B. Maccabe, Irwin,
1993.
Sparc Architecture, Assembly Language
Programming & C, Richard Paul, Prentice Hall –
a good reference, if you are interested
3.
4.
Class handouts
Material online at http://carmen.osu.edu
CSE360
5
Grading Policy:





An assigned grader will grade all homeworks and labs –
your lecturer will grade all exams.
Missed assignments or tests without prior approval will
receive a grade of zero.
Reasonable excuses must be given in writing to me one
week prior to the due date or test date, at which time the
circumstances will be evaluated, and approval granted or
rejected.
No late homeworks or labs will be accepted.
Exams are closed book, closed notes, and cover all of the
material up to that point.
CSE360
6
Grading Weights:
Homeworks (6)
25% as assigned
Labs (3)
25% as assigned
Midterm
20% around the 6th week
Final
as indicated in master
30%
schedule
Grading Scale - to be determined
CSE360
7
Students with Disabilities
If you need an accommodation based on the
impact of a disability, please contact me to arrange
an appointment as soon as possible.
 Office for Disability Services

– verifies the need for accommodations
– Helps develop accommodation strategies.

If you have not previously contacted the Office for
Disability Services, I encourage you to do so.
CSE360
8
Academic Misconduct
Academic misconduct is defined as any activity
which tends to compromise the academic
integrity of the institution, or subvert the
educational process.
 University policy requires that all cases of
suspected academic misconduct be submitted to
the Committee for Academic Misconduct for a
hearing and evaluation.

– Any academic misconduct will be dealt with via the
appropriate University authorities.
CSE360
9
Academic Misconduct
Homework, lab assignments, and exams are to be
your own work.
 High-level discussion of assignments is
encouraged, but the more specific your discussion,
the closer you come to cheating.

– The policy on collaboration with others is fairly liberal
-- but please don't be tempted to test its limits.
CSE360
10
Academic Misconduct
You may not write or otherwise record any part of
your solution to an assignment while someone is
helping you.
 You may not take a physical or electronic copy of
any part of a solution to an assignment from
anyone.
 You may not give a physical or electronic copy of
any part of a solution to an assignment to anyone.

CSE360
11
Academic Misconduct

You are encouraged to talk with others (especially
others in the class) about the design, logic, and
implementation of a program.
– Do not give anyone or take from anyone written or
recorded material
– Do write up your own solution without assistance.

Professional ethics:
– You may not turn in an assignment solution from a
previous quarter's offering of the course
CSE360
12
Expectations
Read your e-mail
 Read, reply to the class discussion group on
Carmen
 Attend class (it’s correlated to results!)
 Complete homeworks and labs on time
 Read the assigned pages from the text

CSE360
13
Can I change my section?

Not until Brutus updates
– at the end of the first week
– only if there are seats available.

Priority will be given
– CSE Majors that are Graduating Seniors
– CSE Majors
– People who attend class the first week
CSE360
14
Can I work on assignments from home?
Submission via Carmen “dropbox”
 HW: MS Word, PDF, or text format
 Labs:

– Submitted as text formatted source (*.s) file
– Require access to ISEM application

Available thru your CSE account: stdsun.cse.ohio-state.edu
– SSH, telnet and file transfer (ftp) protocols are useful
– Read more about remote access on Carmen

CSE360
ISEM may also be available online – where? How? I don’t
know.
15
Who do I approach if I have a problem
with grading?
 For
labs and homework, contact your grader
first
– See me if not resolved
 For
CSE360
exams, contact me
16
The Carmen Discussion Group
carmen.osu.edu
 It’s a place for students to discuss issues related to
course work.
 Post any questions you might have.
 Use discretion when making a posting.
 Look out for important announcements.
 Instructors/Graders answer questions whenever
they can.

CSE360
17
Course Objectives

Principles of Computer Organization and
Architecture
– Basic Machine Representation of Signed
Integers, Character Strings, Arrays, Stacks,
Records, Linked Lists;

Assembly Language Programming.
– Fundamentals of Computer Instruction Set
Architectures;
– Low Level Algorithms for Data Manipulation and
Conversion and Parameter Passing
CSE360
18
150+ Years of Amazing Computers
Sherman, set the WABAC Machine to the year 1822…
CSE360
19
Babbage’s Difference Engine, 1822
Babbage's
difference
engine No. 2,
finally built in
1991
Could hold 7
numbers of 31
decimal digits
Could tabulate 7th
degree
polynomials
CSE360
20
Ada Lovelace, the first programmer
Mathematician, Patron
 Wrote a program for
Babbage’s (theoretical)
Analytical Engine to
calculate the Bernoulli
sequence, in 1843
 In 1979, a
contemporary
programming language
was named Ada in her
honour.

CSE360
21
1890: Hollerith Tabulating System


Census Counter
Hollerith Tabulating
System Was A System Of
Machines
– Punch,
– Tabulator
– Sorting Box

Hollerith's Business
Joined A Firm That Later
Became IBM.
CSE360
22
1943-45: Eniac


Electrical Numerical Integrator
And Computer
Built To Compute Ballistics
Tables For U.S. Army Artillery
During World War II.
– 1,000 Times Faster Than Any
Existing Device.



CSE360
External Plug Wires Used To
Program The Machine
Principal Designers, J. Presper
Eckert And John Mauchley
Cost, About $400,000
23
Vacuum Tubes
 ENIAC
– Used Some 18,000
Vacuum Tubes.
– 30 Feet By 50 Feet
– Weighed 30 Tons
The ENIAC was a decimal machine!
CSE360
24
Programming the Eniac
CSE360
25
Original Eniac Programmers
CSE360
26
The Bug



In 1947, engineers found
A moth stuck in one of the
components.
Taped it in their logbook
Labeled it "first actual
case of bug being found."
CSE360
27
Grace Hopper (1906-1992)

1953: Invented The
Compiler
– Translates English Language
Instructions Into Language
Of The Target Computer
– "Lazy" And Hoped That
"The Programmer May
Return To Being A
Mathematician."


CSE360
Led To The Development
Of The Business Language
Cobol.
Retired From The U.S.
Navy As A Rear Admiral.
28
IAS (1946-1952)
 Institute
For Advanced
Study At Princeton
University.
 Designed And
Directed By John Von
Neumann.
 Cost: Several Hundred
Thousand Dollars.
Used externally stored programs that could be loaded and executed.
CSE360
29
1949: Core Memory




CSE360
A Small Ring, Or Core, Of Ferrite
(A Ferromagnetic Ceramic) Can Be
Magnetized In Either Of Two
Opposite Directions.
A Core Can Be Used For Storing
One Bit Of Information.
For Almost 15 Years, 'Core' Was
The Most Important Memory
Device.
The Invention Of Core Memory
Was A Leap Forward In Costeffectiveness And Reliability.
30
1950s Assembly Programming Class
This would be
so much easier
with a
computer…
CSE360
31
1965: PDP8





Programmed Data Processor
50,000+ Sold
Cost: $18,000.
Speed: 1.5 Micro-second Cycle
Time
Primary Memory: 4K
– 12-bit Word Core Memory

Power: 780 Watts
What does cycle time mean?
CSE360
32
1960s/70s Card Reader
Card is pre-printed with FORTRAN field layouts
CSE360
33
1977: Trs-80




Radio Shack "Trash-80,"
4K Of Memory
Could Not Handle Lowercase Letters
Only Three Error Messages:
– "HOW?"

Whenever The User Tried To Perform An
Illegal Function
– "What"

When A Syntax Error Occurred
– "Sorry"



CSE360
When The Available Memory Ran Out
Cost Only $400!
Some 55,000 Machines Sold In First Year
34
1979: Vic-20



Processor Speed: 1.0227 Mhz.
ROM: 16kb
RAM: 5kb (3.5kb User Memory)
– Expandable To 32kb.

Screen: 22 Columns By 23
Rows.
– Character Dot Matrix: 8 By 8 Or
8 By 16 (User Programmable).
– Screen Dot Matrix: 176 By 184
With Up To 16 Colors.


Sound: 3 Voices Plus White
Noise.
Media: Tape Drive
Bettina’s first PC!
CSE360
35
1984: Macintosh

Revolutionary Graphical User
Interface (GUI).
– A Device Called A Mouse
– Pictorial Symbols (Icons) On The
Screen.
– Select Commands, Call Up Files,
Start Programs, Etc.

CSE360
Original Selling Price: $2,495
36
What if you had to build your own
computer – from scratch?
CSE360
37
Course Objectives


Understanding the architecture (how the
computer executes assembly language
instructions) is the more important aspect of a
course at this level.
The fundamental concept to understand is that
everything in the computer is represented by ones
and zeros (by electric current flowing or not
flowing at a specific place, or by something being
magnetized one direction or the other, etc.).
CSE360
38
Course Objectives


At the lowest level, this course will cover various
binary formats of assembly language instructions
and various ways in which data can be
represented using ones and zeros and how these
can be organized into a program.
At high levels, assembly language programming
techniques will be studied and a specific
assembly language will be used to illustrate these
techniques.
CSE360
39
Homework #0-0


Log into Carmen
See if you can find the following:
–
–
–
–
–
–
CSE360
Contact information for your instructor.
Course policy on late assignments
Course notes (slides)
Reading assignment for the second class-meeting
Dropbox and deadline for first homework
Story of Mel, A Real Progammer in the discussion
group
40
Homework #0-1



Purchase the textbook written by Maccabe.
Read the assigned material for the week
Pledge to do the reading assignment before
each class meeting.
CSE360
41
Homework #0-10


Login to your CS unix account, on stdsun.cse.ohiostate.edu.
Your default password is the last four digits of your
social security number followed by your first and last
initials.
–

For example, Luke Skywalker, whose social security number is
123-45-6789, has a password of 6789ls.
In a CSE laboratory room, you will have to log in to the
Windows PC first.
–
Your initial password there is the same as for UNIX except that it
has an additional exclamation mark (‘!’) at the end. Luke
Skywalker’s initial Windows password is 6789ls!
CSE360
42
Make a Table on an Index Card

Show Different Representations of Numeric
Values.
– Column Headings Should be:
Decimal Octal Hexadecimal Binary
CSE360
43
One Row for Each Numeric Value.

Show, in Increasing Order,
– Representations for 0, 1, 2, 3, 4, … 20
– Then, 25, 26, … 216
– Finally 220, 230, 231, 232
CSE360
44
For Example,
Note
Roman
Nat’l Lang
Decimal
Octal
Hex
Binary
0
0
0
0
1
1
1
1
20
I
one
2
2
2
10
21
II
two
20
24
14
10100
XXIV
Twenty
32
40
20
100000
XXXII
..
zero
And so
on.
And so
on.
25
..
2 16
2 20
2 30
2 31
2 32
CSE360
45
Information Representation 1

Positional Number Systems: position of character
in string indicates a power of the base (radix).
Common bases: 2, 8, 10, 16. (What base are we
using to express the names of these bases?)
– Base ten (decimal): digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 form
the alphabet of the decimal system.

E.g., 31610 =
– Base eight (octal): digits 0, 1, 2, 3, 4, 5, 6, 7 form the
alphabet.

CSE360
E.g., 4748 =
46
Information Representation 2
– Base 16 (hexadecimal): digits 0-9 and A-F.

E.g., 13C16 =
– Base 2 (binary): digits (called “bits”) 0, 1 form the
alphabet.

E.g., 100110 =
– In general, radix r representations use the first r chars in
{0…9, A...Z} and have the form dn-1dn-2…d1d0.
Summing dn-1rn-1 + dn-2rn-2 + … + d0r0 will convert
to base 10. Why to base 10?
CSE360
47
Information Representation 3

Base Conversions
– Convert to base 10 by multiplication of powers

E.g., 100125 = (
)10
– Convert from base 10 by repeated division

E.g., 63210 = (
)8
– Converting base x to base y: convert base x to base 10
then convert base 10 to base y
CSE360
48
Information Representation 4
– Special case: converting among binary, octal, and
hexadecimal is easier



CSE360
Go through the binary representation, grouping in sets of 3 or
4.
E.g., 110110012 = 11 011 001 = 3318
110110012 = 1101 1001 = D916
E.g., C3B16 = (
)8
49
Information Representation 5

What is special about binary?
– The basic component of a computer system is a
transistor (transfer resistor): a two state device which
switches between logical “1” and “0” (actually
represented as voltages on the range 5V to 0V).
– Octal and hexadecimal are bases in powers of 2, and are
used as a shorthand way of writing binary. A
hexadecimal digit represents 4 bits, half of a byte.
1 byte = 8 bits. A bit is a binary digit.
– Get comfortable converting among decimal, binary,
octal, hexadecimal. Converting from decimal to
hexadecimal (or binary) is easier going through octal.
CSE360
50
Information Representation 6
Binary
Hex
Decimal Binary
Hex
Decimal
0000
0
0
1000
8
8
0001
1
1
1001
9
9
0010
2
2
1010
A
10
0011
3
3
1011
B
11
0100
4
4
1100
C
12
0101
5
5
1101
D
13
0110
6
6
1110
E
14
0111
7
7
1111
F
15
CSE360
51
Information Representation 7

Ranges of values
– Q: Given k positions in base n, how many values can
you represent?
– A: nk values over the range (0…nk-1)10
n=10, k=3: 103=1000 range is (0…999)10
n=2, k=8: 28=256 range is (0…255)10
n=16, k=4: 164=65536 range is (0…65535)10
– Q: How are negative numbers represented?
CSE360
52
Information Representation 8

Integer representation:
– Value and representation are distinct. E.g., 12 may be
represented as XII, C16, 1210, and 11002. Note: -12 may
be represented as -C16, -1210, and -11002.
– Simple and efficient use of hardware implies using a
specific number of bits, e.g., a 32-bit string, in a binary
encoding. Such an encoding is “fixed width.”
– Four methods: (fixed-width) simple binary, signed
magnitude, binary coded decimal, and 2’s complement.
– Simple binary: as seen before, all numbers are assumed
to be positive, e.g., 8-bit representation of
6610 = 0100 00102 and 19410 = 1100 00102
CSE360
53
Information Representation 9
– Signed magnitude: simple binary with leading sign bit.
0 = positive, 1 = negative. E.g., 8-bit signed mag.:
6610 = 0100 00102
-6610 = 1100 00102
What ranges of numbers may be expressed in 8 bits?
Largest:
Smallest:
Extend 1100 0010 to 12 bits:
CSE360
54
Information Representation 10
Problems: (1) Compare the signed magnitude numbers
1000 0000 and 0000 0000. (2) Must have “subtraction”
hardware in addition to “addition” hardware.
– Binary Coded Decimal (BCD): use a 4 bit pattern to
express each digit of a base 10 number
0000 = 0
0100 = 4
1000 = 8
E.g.,
CSE360
0001 = 1
0101 = 5
1001 = 9
0010 = 2
0110 = 6
1010 = +
0011 = 3
0111 = 7
1011 = -
123 : 0000 0001 0010 0011
+123 : 1010 0001 0010 0011
-123 : 1011 0001 0010 0011
55
Information Representation 11
BCD Disadvantages:
– Takes more memory. 32 bit simple binary can represent more
than 4 billion discrete values. 32 bit BCD can hold a sign and
7 digits (or 8 digits for unsigned values) for a maximum of
110 million values, a 97% reduction.
– More difficult to do arithmetic. Essentially, we must force the
Base 2 computer to do Base 10 arithmetic.
BCD Advantages:
– Used in business machines and languages, i.e., in COBOL for
precise decimal math.
– Can have arrays of BCD numbers for essentially arbitrary
precision arithmetic.
CSE360
56
Information Representation 12
– Two’s Complement


CSE360
Used by most machines and
languages to represent
integers. Fixes the -0 in the
signed magnitude, and
simplifies machine
hardware arithmetic.
Divides bit patterns into a
positive half and a negative
half (with zero considered
positive); n bits creates a
range of [-2n-1… 2n-1 -1].
CODE Simple Signed 2’s comp
0000
0
+0
0
0001
1
1
1
0010
2
2
2
0011
3
3
3
0100
4
4
4
0101
5
5
5
0110
6
6
6
0111
7
7
7
1000
8
-0
-8
1001
9
-1
-7
1010
10
-2
-6
1011
11
-3
-5
1100
12
-4
-4
1101
13
-5
-3
1110
14
-6
-2
1111
15
-7
-1
57
Information Representation 13
– Representation in 2’s complement; i.e., represent i in
n-bit 2’s complement, where -2 n-1  i  +2 n-1-1


Positive numbers: same as simple binary
Negative numbers:
– Obtain the n-bit simple binary equivalent of | i |
– Obtain its negation as follows:
• Invert the bits of that representation
• Add 1 to the result
CSE360

Ex.: convert -32010 to 16-bit 2’s complement

Ex.: extend the 12-bit 2’s complement number
1101 0111 1000 to 16 bits.
58
Information Representation 14

Binary Arithmetic
– Addition and subtraction only for now
– Rules: similar to standard addition and subtraction, but
only working with 0 and 1.




0+0=0
1+0=1
0+1=1
1 + 1 = 10
0-0=0
1-0=1
1-1=0
10 - 1 = 1
– Must be aware of possible overflow.
CSE360

Ex.: 8-bit signed magnitude 0101
0110 + 0110 0011 =

Ex.: 8-bit signed magnitude 0101
0110 - 0110 0011 =
59
Information Representation 15

2’s Complement binary arithmetic
– Addition and subtraction are the same operation
– Still must be aware of overflow.
CSE360

Ex.: 8 bit 2’s complement: 2310 + 4510 =

Ex.: 8 bit 2’s complement: 2310 - 4510 =

Ex.: 8 bit 2’s complement: 10010 + 4510 =
60
Information Representation 16
– 2’s Complement overflow
signs on operands can’t overflow
 If operand signs are same, but result’s sign is
different, must have overflow
 Opposite
CSE360
61
Information Representation 17

Characters and Strings
– EBCDIC, Extended Binary Coded Decimal Interchange Code


Used by IBM in mainframes (360 architecture and descendants).
Earliest system
– ASCII, American Standard Code for Information Interchange.

Most common system
– Unicode, http://www.unicode.org



CSE360
New international standard
Variable length encoding scheme with either 8- or 16-bit minimum
“a unique number for every character, no matter what the platform, no
matter what the program, no matter what the language.”
62
Information Representation 18

ASCII
– see table 1.7 on pg. 18.

In Unix, run “man ascii”.
– 7 bit code


Printable characters for human interactions
Control characters for non-human communication (computercomputer, computer-peripheral, etc.)
– 8-bit code: most significant bit may be set



CSE360
Extended ASCII (IBM), includes graphical symbols and lines
ISO 8859, several international standards
Unicode’s UTF-8, variable length code with 8-bit minimum
63
ASCII
 Easy
to decode
– But takes up a predictable amount of space
 Upper
and lower case characters are 0x20 (3210)
apart
 ASCII representation of ‘3’ is not the same as the
binary representation of 3.
– To convert ASCII to binary (an integer), ‘3’-‘0’ = 3
 Line
feed (LF) character
– 000 10102 = 0x0a = 1010
– ‘\n’ = 0xa
CSE360
Character
‘ ’
‘A’
‘a’
‘R’
‘r’
‘0’
‘3’
ASCII Binary
010
100
110
101
111
011
011
0000
0001
0001
0010
0010
0000
0011
ASCII Hex
0x20
0x41
0x61
0x52
0x72
0x30
0x33
64
Information Representation 19

Decode:
1000001, 1010011, 1000011, 1001001, 1001001, 0100000, 1101001,
1110011, 0100000, 1100101, 1100001, 1110011, 1111001, 0000000
– Or (in hex):
41 53 43 49 49 20 69 73 20 65 61 73 79 00


How many bytes is this?
What’s the use of the ’00’?

String definition is
programming language
dependent.

C, C++: strings are arrays of
characters terminated by a null byte.
CSE360
Character
‘ ’
‘A’
‘a’
‘R’
‘r’
‘0’
‘3’
ASCII Binary
010
100
110
101
111
011
011
0000
0001
0001
0010
0010
0000
0011
ASCII Hex
0x20
0x41
0x61
0x52
0x72
0x30
0x33
65
Information Representation 20

Simple data compression
– ASCII codes are fixed length.
– Huffman codes are variable length and based on
statistics of the data to be transmitted.

Assign the shortest encoding to the most common character.
– In English, the letter ‘e’ is the most common.
– Either establish a Huffman code for an entire class of messages,
– Or create a new Huffman code for each message, sending/storing
both the coding scheme and the message.

CSE360
“a widely used and very effective technique for compressing
data; savings of 20% to 90% are typical, depending on the
characteristics of the file being compressed.” (Cormen, p. 337)
66
ECL - Expected Code Length
Char
Fixed len
encoding
Freq
Var len
encoding
# bits
Expected
# bits

00
.5
1
1
.5

01
.25
01
2
.5

10
.15
001
3
.45

11
.10
000
3
.3
Avg len
2
CSE360
1.75
67
Information Representation 21

Huffman Tree for “a man a plan a canal panama”
– Determine frequencies of letters (example ignores spaces)
‘a’
‘c’
‘l’
‘m’
‘n’
‘p’
Count
Frequency
10
1
2
2
4
2
0.476190
0.047619
0.095238
0.095238
0.190476
0.095238
– Create a forest of single node trees.



CSE360
Choose the two trees having the smallest total frequencies (the two
“smallest” trees)
Merge them together (lesser frequency as the left subtree.
Continue merging until only one tree remains.
68
Information Representation 22



Reading a ‘1’ calls for
following the left branch.
Reading a ‘0’ calls for
following the right branch.
Decoding using the tree:
To decode ‘0001’, start at
root and follow r_child,
r_child, r_child, l_child,
revealing encoded ‘m’.
Huffman Tree for "a man a plan a canal panama"
1.0
'a'
.4762
.5238
'n'
.1905
.3333
.1428
'c'
.0476
CSE360
'l'
.0952
.1905
'm'
.0952
'p'
.0952
69
Information Representation 23

Comparison of Huffman and 3-bit code example
– 3-bit: 000 011000100 000 101010000100 000
001000100000010 101000100000011000 = 63 bits
– Huffman: 1 0001101 1 00000010101 1
001110110010 0000101100011 = 46 bits
– Savings of 17 bits, or 27% of original message
‘a’
3-bit code
000
Huffman Code
1
Count
10
H length
10
3 length
30
‘c’
‘l’
‘m’
‘n’
001
010
0011
0010
1
2
4
8
3
6
011
100
0001
01
2
4
8
8
6
12
101
0000
2
8
6
46
63
‘p’
Totals
CSE360
70
Tree for: ABE DEFACED A FADED BED
freq
A
4/19
B
2/19
C
1/19
D
5/19
E
5/19
F
2/19
9/19
A
10/19
5/19
F
E
3/19
C
CSE360
D
B
71
ECL - Expected Code Length
Char
Fixed len
encoding
Freq
Var len
encoding
# bits
Expected
# bits

00
.5
1
1
.5

01
.25
01
2
.5

10
.15
001
3
.45

11
.10
000
3
.3
Avg len
2
CSE360
1.75
72
ECL for: ABE DEFACED A FADED BED
freq
A 4/19
B 2/19
C 1/19
D 5/19
E 5/19
F 2/19
ecl = 2.42
code
11
1000
1001
01
00
101
ecl
8/19
8/19
4/19
10/19
10/19
6/19
Use the same encodings to decode
11 10000011010001
11100100
1001111000
CSE360
73
Parity: Simple error detection
 Data
transmission, aging media, static
interference, dust on media, etc. demand the
ability to detect errors.
– Ex.: send ASCII ‘S’: send 1010011, but
receive 1010010(‘R’)?
 Single
bit errors detected by using parity
checking.
 Parity, here, is the “the state of being odd or
even.”
CSE360
74
Information Representation 24

How to detect a 1-bit error:
– Add a 1-bit parity to make an odd or even number of
bits per byte.
ASCII
Even parity
Odd Parity
‘S’
101 0011
0101 0011
1101 0011
‘E’
100 0101
1100 0101
0100 0101
– Parity bit is stripped by hardware after checking.
Sender/receiver both agree to odd or even parity.
– 2 flipped bits in the same encoding are not detected.
What if parity bit is flipped?
CSE360
75
Information Representation 25

Two meanings for Hamming distance.
1. Specific. A count of the number of bits different in two
encodings.
E.g., dist(1100, 1001) =
dist(0101, 1101) =
2. General. The minimum over all distinct pairs in an
entire code.



The ASCII encoding scheme has a Hamming distance of 1.
A simple parity encoding scheme has a Hamming distance of 2.
Hamming distance serves as a measure of the
robustness of error checking (as a measure of the
redundancy of the encoding).
CSE360
76
Basic Components 1

Terminology from Ch. 2:
– Flip flop: basic storage device that holds 1 bit
– D flip flop: special flip flop that outputs the last value
that was input to it (a data signal).
– Clock: two different meanings: (1) a control signal that
oscillates (low to high voltage) every x nanoseconds;
(2) the “write select” line for a flip flop.
Data
In
D Flip Flop
Clock
CSE360
Data
Out
one cycle
77
Basic Components 2
– Register: collection of flip flops with parallel load.
Clock (or “write select”) signal controlled. Stores
instructions, addresses, operands, etc.
– Bus: Collection of related data lines (wires).
Input Bus
d7
d6
d5
d4
d3
Clock
d2
d1
d0
8
Clock
8 Bit Register
8
Output Bus
CSE360
78
Basic Components 3
– Combinational circuits: implement Boolean functions.
No feedback in the circuit, output is strictly a function
of input.

Gates: and, or, not, xor
AND
OR
NOT
XOR
E.g., xy + z
x
y
z
CSE360
f
79
Basic Components 4
– Gates can be used in combination to implement a
simple (half) adder.
Addition creates a value, plus a carry-out.
Z=XY
CO = X  Y

CSE360
X
Y
Z
CO
0
0
0
0
0
1
1
0
1
0
1
0
1
1
0
1
X
Y
Z
CO
80
Basic Components 5
– Sequential Circuits: introduce feedback into the circuit.
Outputs are functions of input and current state.
D
Q
C
– Multiplexers: combinational circuits that use n bits to
select an output from 2n input lines.
i0
i1
i2
i3
4 to 1 MUX
f
s0 s1
CSE360
81
Basic Components 6

Von Neumann
Architecture
– Can access either
instructions or data from
memory in each cycle.
– One path to memory
(von Neumann bottleneck)
– Stored program system. No
distinction between
programs and data
Main Memory System
Address
Pathway
Data and
Instruction
Pathway
Operational Registers
Arithmetic and Logic Unit
Program Counter
Control Unit
Input/Output System
CSE360
82
Basic Components 7
Examples of Von Neumann architecture to be
explored in this course:




SAM: tiny, good for learning architecture
MIPS: text’s example assembly language
SPARC: labs
M68HC11: used in ECE 567 (taken by CSE majors)
Roughly, the order of presentation in this course is as
follows:



A couple of days on the Main Memory System
Weeks on the Central Processing Unit (CPU)
Finish the course with the I/O System
CSE360
83
Memory Subsystem – the busses
Address Bus
k
Data Bus
n
The number of elements depend
on the size of the address bus.
• If k=3, how many addresses?
• If k=4, how many addresses?
000
001
010
011
100
101
n-bit Addressible
# Addresses = 2k
CSE360
84
Memory Subsystem – the busses
Address Bus
k
Data Bus
n
Capacity depends on how many bits in
each element, or the size of the data
bus.
000
001
010
011
100
101
• If n=1 and k=3, how many bits?
If n=2?
• If n=8 and k=3, how many Bytes?
n-bit Addressible
Bit capacity = 2k * n
CSE360
85
Memory Element & Address Sizes
•If a machine’s memory is 5-bit
addressable, then, at each distinct
address, 5 bits are stored. The contents at
each address are represented by 5 bits.
•If 3 bits are used to represent memory
addresses, then the memory can have at
most 23 = 8 distinct addresses.
•Such a memory can store at most 8  5 =
40 bits of data.
Address
Contents
Decimal
Binary
0
000
00011
1
001
01111
2
010
01110
3
011
10100
4
100
00101
5
101
01110
6
110
10100
7
111
10011
•If the data bus is 10 bits wide, then up to
10 bits at a time can be transferred
between memory and processor; this is a
10-bit word.
CSE360
86
Memory Subsystem - Addressibility
Address Bus
001
k
Data Bus

010
011
n

000
Addressibility is the size of the
memory element
The size of the element may be
smaller than the size of the data
bus.
– If n=8, only 1 Byte Addressible
– If n=16, 1 or 2 Byte Addressible
100
101
n-bit Addressible
How does Addressibility affect capacity?
CSE360
87
Memory Subsystem - Addressing


Memory may be
organized into
banks, with bit
labels
The GLOBAL
address of each
addressible
element would be:
[relative address]
& [bank address]
Address Bus
Data Bus
Bank 0
Bank 1
000
000 0
000 1
001
001 0
001 1
010
010 0
010 1
011
011 0
011 1
100
100 0
100 1
101
101 0
101 1
See the pattern that forms?
CSE360
88
Memory Subsystem - Alignment
Data bus is 4x
the size of
addressible
element.
So, you may
read (or
write) one or
more Bytes
at a time…
But only from/to
the same
row of
memory!
Address Bus
Data Bus
32
Bank
00
Bank
01
Bank
10
Bank
11
000
000 00
000 01
000 10
000 11
001
001 00
001 01
001 10
001 11
010
010 00
010 01
010 10
010 11
011
011 00
011 01
011 10
011 11
100
100 00
100 01
100 10
100 11
101
101 00
101 01
101 10
101 11
Okay to read/write
2 Bytes from 10010?
2B from 01011?
CSE360
4B from 01100?
4B from 00101?
8bit
89
Memory Subsystem - Alignment

Where are operands of
various sizes positioned?
Address Bus
– 1 Bytes Aligned

on any address
– 2 Byte Aligned



Data Bus
on “halfword” boundary
32
addresses divisible by 2
end in hex
0,2,4,6,8,A,C,E)
Bank
00
Bank
01
Bank
10
Bank
11
000
000 00
000 01
000 10
000 11
001
001 00
001 01
001 10
001 11
010
010 00
010 01
010 10
010 11
011
011 00
011 01
011 10
011 11
100
100 00
100 01
100 10
100 11
101
101 00
101 01
101 10
101 11
– 4 Byte Aligned



CSE360
on “word” boundary
addresses divisible by 4
end in hex 0,4,8,C)
8bit
90
Basic Components 11

Byte ordering: how numeric data is stored in memory
– Ex.: 24789651110 = 0EC699BF16
– Stored at address 0
0 OE
1 C6
2 99
Big Endian
High order
(big end) is
at byte 0
Little
Endian
Low order
(little end) is
at byte 0
3 BF
0 BF
1 99
2 C6
3 0E
Contrast with bit ordering
CSE360
7
6
5
4
3
2
1
0
1
0
1
1
1
1
1
1
91
Basic Components 12

Read/Write operations: must know the address to
read or write. (read = fetch = load, write = store)

CPU puts address on address bus
A0
A1

CPU sends read signal
A(m-1)
– (R/W=1, CS=1)
– (Read/don’t Write, Chip Select)

Wait

Memory puts data on
data bus
– reset (CS=0)
CSE360
CS
R/
D0
D1
D(n-1)
92
W
Basic Components 13

Types of memory:
– ROM: Read Only Memory: non-volatile (doesn’t get erased when
powered down; it’s a combinational circuit!)
– PROM: Programmable ROM: use a ROM burner to write data to it
initially. Can’t be re-written.
– EPROM: Erasable PROM. Uses UV light to erase.
– EEPROM: Electrically Erasable PROM.
– RAM: Random access memory. Can efficiently read/write any
location (unlike sequential access memory). Used for main
memory.

Many variations (types) of RAM, all volatile
– SDRAM, DDR SDRAM
– RDRAM
– www.tomshardware.com
CSE360
93
Instructional Sparc Emulator - ISEM

Editing, Assembling, Linking, and Loading
– There are three components to the Instructional SPARC Emulator
(ISEM) package that we use for this class:
 the assembler,
 the linker, and
 the emulator/debugger.
CSE360
94
Instructional Sparc Emulator - ISEM –

Editing
– There are a number of programs that you can use to create your
source files.
 Emacs is probably the most popular;
 vi is also available, but its command syntax is difficult to learn
and use;
 using pine program, you can use the pico editor, which
combines many features of Emacs into a simple menu-driven
facility.
– Start Emacs by “xemacs sourcefile.s &”, which creates the file
called sourcefile.s.
– Use the tutorial, accessed by typing "Ctrl-H Ctrl-H t".
– For other editors, you are on your own.
CSE360
95
Example Sparc Assembly Language
Instructions
% type xmp0.s
.data
A_m:
.word ’?’
B_m : .word 0x30
C_m : .word 0
.text
!
!
!
!
!
!
!
Assembler directive: data starts here. A_m, B_m, and
C_m are symbolic constants. Furthermore, each
is an address of a certain-sized chunk of memory. Here,
each chunk is four bytes (one word) long. When the
program gets loaded, each of these chunks stores a
number in 2’s complement encoding, as follows: At
address C_m, zero; at B_m, 48; at A_m, 0x3F = 077 = 63.
!
start:
!
set A_m, %r2
!
ld
[%r2], %r2
!
set B_m, %r3
!
ld
[%r3], %r3
!
sub %r2, %r3, %r2 !
set C_m, %r4
!
st
%r2, [%r4]
!
terminate:
!
ta
0
!
beyond_end:
!
CSE360
Assembler directive, instructions start here
Label (symbolic constant) for this address
Put address A_m into register 2
Use r2 as an indirect address for a load (read)
Put address B_m into register 3
Read from B_m and replace r3 w/ value at addr B_m
Subtract r3 from r2, save in r2
Put address C_m into register 4
Store (write) r2 to memory at address C_m
Label for address where ’ta 0’ instruction stored
Stop the program
Label for address beyond the end of this program
96
Instructional Sparc Emulator - ISEM

Assembling
– The assembler is called "isem-as", and is the GNU Assembler
(GAS), configured to cross-assemble to a SPARC object format.
– It is used to take your source code, and produce object code that
may be linked and run on the ISEM emulator.
– The syntax for invoking the assembler is:
isem-as [-a[ls]] sourcefile.s -o objectfile.o
– The input is read from sourcefile.s, and the output is written to
objectfile.o.
– The option "-a" tells the assembler to produce a listing file. The
sub-options "l" and "s" tell the assembler to include the assembly
source in the listing file and produce a symbol table, respectively.
CSE360
97
Instructional Sparc Emulator - ISEM

The listing file
– Will identify all the syntactic errors in your program, and it will
warn you if it identifies "suspicious" behavior in your source file.
– Column 1 identifies a line number in your source file.
– Column 2 is an offset for where this instruction or data resides in
memory.
– Column 3 is the image of what is put in memory, either the
machine instructions or the representation of the data.
– The final column is the source code that produced the line.
– At the bottom of the file you will find the symbol table.
– Again, the symbols are represented as offsets that are relocated
when the program is loaded into memory.
CSE360
98
isem-as -als labn.s -o labn.o >!
labn.lst
1
2
3
4
5
6
7
7
8
9
9
10
11
12
12
13
14
15
16
0000
0004
0008
000c
0000003F
00000030
00000000
00000000
.data
.word ’?’
.word 0x30
.word 0
.text
A_m:
B_m:
C_m:
start:
0000 05000000
8410A000
0008 C4008000
000c 07000000
8610E000
0014 C600C000
0018 84208003
001c 09000000
88112000
0024 C4210000
0028 91D02000
002c 01000000
DEFINED SYMBOLS
xmp0.s:2
xmp0.s:3
xmp0.s:4
xmp0.s:6
xmp0.s:14
xmp0.s:16
NO UNDEFINED SYMBOLS
CSE360
Line in
source file
(.s)
set
A_m, %r2
ld
set
[%r2], %r2
B_m, %r3
ld
sub
set
[%r3], %r3
%r2, %r3, %r2
C_m, %r4
st
terminate:
ta
beyond_end:
Offset to
address
in
memory
%r2, [%r4]
0
.data:00000000
.data:00000004
.data:00000008
.text:00000000
.text:00000028
.text:0000002c
A_m
B_m
C_m
start
terminate
beyond_end
Labels are
symbolic
offsets
Contents
at
address
in
memory
99
Instructional Sparc Emulator - ISEM

Linking
– Linking turns a set of raw object file(s) into an executable program.
– From the manual page, "ld combines a number of object and archive files,
relocates their data and ties up symbol references. Often the last step in
building a new compiled program to run is a call to ld."
– Several object files are combined into one executable using ld; the
separate files could reference symbols from one another.
– The output of the linker is an executable program.
– The syntax for the linker is as follows:
isem-ld objectfile.o [-o execfile]
Examples
% isem-ld foo.o -o foo Links foo.o into the executable foo.
% isem-ld foo.o
Links foo.o into the executable a.out.
CSE360
100
Instructional Sparc Emulator - ISEM

Loading/Running
– Execute the program and test it in the emulation environment.
– The program "isem" is used to do this, and the majority of its features are
covered in your lab manual.
– Invoke isem as follows
isem [execfile]
Examples
% isem foo
Invokes the emulator, loads the program foo
% isem
Invokes the emulator, no program is loaded
– Once you are in the emulator, you can run your program by typing "run"
at the prompt.
CSE360
101
ISEM Debugging Tools 1
% isem xmp0
Instructional SPARC Emulator
Copyright 1993 - Computer Science Department
University of New Mexico
ISEM comes with ABSOLUTELY NO WARRANTY
ISEM Ver 1.00d : Mon Jul 27 16:29:45 EDT 1998
Loading File: xmp0
2000 bytes loaded into Text region at address 8:2000
2000 bytes loaded into Data region at address a:4000
PC: 08:00002020
start
nPC: 00002024
:
sethi
PSR: 0000003e
N:0 Z:0 V:0 C:0
0x10, %g2
ISEM> run
Program exited normally.
Assembly language programs are not notoriously chatty.
CSE360
102
ISEM Debugging Tools 2

reg
– Gives values of all 32
general registers
– Also PC

----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7--G 00000000 00000000 0000000f 00000030 00004008 00000000 00000000 00000000
O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
PC: 08:0000204c
CSE360
sethi
PSR: 0000003e
N:0 Z:0 V:0 C:0
0x0, %g0
ISEM> symb
Symbol List
A_m : 00004000
B_m : 00004004
.
dump [addr]
– Either symbol or hex
address
– Gives the values stored in
memory
nPC: 00002050
beyond_end :
symb
– Shows the resolved values
of all symbolic constants

ISEM> reg
.
.
terminate : 00004028
ISEM> dump A_m
0a:00004000
00 00 00 3f 00 00 00 30 00 00 00 0f 00 00 00 00 ...?...0........
0a:00004010
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0a:00004020
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
103
ISEM Debugging Tools

break [addr]
– Set breakpoints in execution
– Once execution is stopped, you can look at the contents of registers
and memory.

trace
– Causes one (or more) instruction(s) to be executed
– Registers are displayed
– Handy for sneaking up on an error when you’re not sure where it is.
CSE360
104
ISEM Debugging Tools

For the all-time “most wanted” list of errors (and their
fixes)
CSE360
105
ISEM Debugging

If you still need help
–
–
–
–
CSE360
Print a fresh copy of your source
Make good notes describing the error
Visit your lecturer or grader
Post a question to the discussion board
106
Basic Components 14

CPU: executes instructions -- primitive operations
that the computer can perform.
– E.g.,

CSE360
arithmetic
data movement
control
logical
A+B
A := B
if expr goto label
AND, OR, XOR…
Instructions specify both the operation and the operands. An
encoded operand is often a location in memory where the value
of interest may be found (address of value of interest).
107
Basic Components 15
– Instruction set: all instructions for a machine.
Instruction format specifies number and type of
operands.

Ex.: Could have an instruction like
ADD A, B, R
Where A, B, and R are the addresses of operands in memory.
The result is R := A+B.
Addr
Memory
Label
0
8
A
4
9
B
17
R
8
C
CSE360
108
Basic Components 16
– Actually, the “instruction” might be represented in a
source file as:
0x41444420412C20422C20520A. …
A D D
A ,
B ,
R
As such, it is an assembly language instruction.
– An assembler might translate it to, say, 0x504C, the
machine’s representation of the instruction.
As such, it is a machine language instruction.
CSE360
109
A Simple Instruction Set 1

Simple instruction set: the Accumulator machine.
– Simplify instruction set by only allowing one operand.
Accumulator implied to be the second operand.
– Accumulator is a special register. Similar to a simple
calculator.






CSE360
ADD
SUB
MPY
DIV
LOAD
STORE
addr
addr
addr
addr
addr
addr
ACC  ACC + M[addr]
ACC  ACC - M[addr]
ACC  ACC * M[addr]
ACC  ACC / M[addr]
ACC  M[addr]
M[addr]  ACC
110
A Simple Instruction Set 2

Ex.: C = AB + CD
Address
LOAD
MPY
STORE
LOAD
MPY
ADD
STORE
Accumulator
CSE360
20
21
30
22
23
30
22
!
!
!
!
!
!
!
Acc<-M[20]
Acc<-Acc*M[21]
M[30]<-Acc
Acc<-M[22]
Acc<-Acc*M[23]
Acc<-Acc+M[30]
M[22]<-Acc
0000
0001
0010
0011
1100
1110
Symbolic
Contents
1)
2)
3)
4)
5)
20
A
0001
21
B
22
C
23
D
0011
1110
0100
temp
0010
0010
…
30
Try C=2A+B
Try C=A+2
111
An Instruction (Encoding) Format


Machine language: Converting from assembly language to
machine language is called assembling.
Assume 8-bit architecture. Each instruction may be 8 bits.
3 bits hold the op-code and 5 bits hold the operand.
op-code
7


operand
54
0
How much memory can we address?
How many op-codes can we have?
CSE360
Operation
Code
ADD
SUB
MPY
DIV
LOAD
STORE
000
001
010
011
100
101
112
A Simple Instruction Set 4
Convert the mnemonic op-codes into binary codes.
 Hand assemble our program:
 Instructions are stored in consecutive memory:

CSE360
Addr
Memory
Mnemonic
0
1
2
3
4
5
6
…
20
21
22
23
…
30
100
010
101
100
010
000
101
…
4
5
6
7
…
20
LOAD A
MPY B
STORE temp
LOAD C
MPY D
ADD temp
STORE C
10100
10101
11110
10110
10111
11110
10110
A
B
C
D
temp
113
Simple Accumulator Machine
INC
Addr
Op
2 to 1
MUX
Decode
3
Timing and
Control
PC
IR
2
9
12
Bus
4
5
6
7
0
10
1
MAR
MDR
2 to 1
MUX
ACC
11
ALU
Memory
8
13
CSE360
14
114
Simple Accumulator Machine
(SAM)

REGISTERS
– ACC – Accumulator, stores program values
– IR - Instruction Register, holds the instruction during
interpretation
– MAR - Memory Address Register, stores address to
read/write to/from
– MDR - Memory Data Register, stores data from
memory, either written/read
– PC - Program Counter, stores the address of the next
instruction
CSE360
115
Simple Accumulator Machine
(SAM)

Combinational Circuits
– ALU - Arithmetic and logic unit, implements the
operations (eg, +,-,*,/)
– Decode - Instruction decoder, splits off the opcode and
operands
– INC - Incrementer, increments the PC
– MUX - Multiplexer, controls inputs to PC and ACC
CSE360
116
Simple Accumulator Machine
(SAM)

Sequential Circuit
– Timing and control - asserts control signals, clock

Combination of flip-flops, circuits and capacitors
– Memory – stores instructions and data
CSE360
117
A Simple Instruction Set 6
– Control signals: control functional units to determine
order of operations, access to bus, loading of registers,
etc.
Number
0
1
2
3
4
5
6
7
CSE360
Operation
ACCbus
load ACC
PCbus
load PC
load IR
load MAR
MDRbus
load MDR
Number
8
9
10
11
12
13
14
Operation
ALUACC
INCPC
ALU operation
ALU operation
Addrbus
CS
R/W
118
A Simple Instruction Set 7
State
PC to bus
load MAR
INC to PC
load PC
0
1
2
3
MDR to bus
load IR
Addr to bus
load MAR
Y
4
5
OP=store
ACC to bus
load MDR
CSE360
N
CS, R/W
CS
Execute
6
OP=load
Y
7
8
Fetch
CS, R/W
MDR to bus
load ACC
N
MDR to bus
ALU to ACC
ALU op
load ACC
119
State 0: Control Signals 2, 5, 9, 3
Put the address of the next instruction in the Addr Register and Inc. PC.
INC
Addr
Op
Decode
2 to 1
MUX
PC to bus
load MAR
INC to PC
load PC
3
Timing and
Control
PC
Fetch
CS, R/W
IR
MDR to bus
load IR
2
9
12
Bus
4
Addr to bus
load MAR
5
6
0
OP=store
CS, R/W
CS
Execute
ACC
1
11
MAR
OP=load
ALU
Memory
8
MDR to bus
load ACC
MDR to bus
ALU to ACC
ALU op
load ACC
13
CSE360
MDR
2 to 1
MUX
ACC to bus
load MDR
10
14
120
7
State 1: Control Signals 13, 14
Fetch the word of memory at Address, and load into Data Register.
INC
PC to bus
load MAR
INC to PC
load PC
Addr
Op
Decode
Timing and
Control
Fetch
2 to 1
MUX
CS, R/W
3
PC
MDR to bus
load IR
IR
Addr to bus
load MAR
2
9
12
Bus
4
OP=store
5
6
0
ACC to bus
load MDR
CS, R/W
ACC
OP=load
MDR to bus
load ACC
MDR to bus
ALU to ACC
ALU op
load ACC
1
11
MAR
ALU
Memory
8
13
CSE360
MDR
2 to 1
MUX
CS
10
Execute
14
121
7
State 2: Control Signals 6, 4
Send the word from the Data Register to the Instruction Register.
INC
PC to bus
load MAR
INC to PC
load PC
Addr
Op
Decode
2 to 1
MUX
Fetch
CS, R/W
3
Timing and
Control
PC
MDR to bus
load IR
IR
Addr to bus
load MAR
2
9
12
Bus
4
OP=store
5
6
0
ACC to bus
load MDR
CS, R/W
ACC
OP=load
MDR to bus
load ACC
MDR to bus
ALU to ACC
ALU op
load ACC
1
11
MAR
ALU
Memory
8
13
CSE360
MDR
2 to 1
MUX
CS
10
Execute
14
122
7
State 3: Control Signals 12, 5
Put the address from the instruction in the Address Register.
INC
PC to bus
load MAR
INC to PC
load PC
Addr
Op
Decode
2 to 1
MUX
Fetch
CS, R/W
3
Timing and
Control
PC
MDR to bus
load IR
IR
Addr to bus
load MAR
2
9
12
Bus
4
OP=store
5
6
0
ACC to bus
load MDR
CS, R/W
ACC
OP=load
MDR to bus
load ACC
MDR to bus
ALU to ACC
ALU op
load ACC
1
11
MAR
ALU
Memory
8
13
CSE360
MDR
2 to 1
MUX
CS
10
Execute
14
123
7
After State 3, what values are now stored
in each register?
PC
 MAR
 MDR
 IR
 ACC

CSE360
124
State 4: Control Signals 0, 7
Take the value from the ACCumulator and store it in the Data Register.
INC
Addr
PC to bus
load MAR
INC to PC
load PC
Op
2 to 1
MUX
Decode
Fetch
CS, R/W
3
Timing and
Control
PC
MDR to bus
load IR
IR
Addr to bus
load MAR
2
9
12
Bus
4
5
OP=store
6
0
ACC to bus
load MDR
10
CS, R/W
Execute
OP=load
MDR to bus
load ACC
MDR to bus
ALU to ACC
ALU op
load ACC
1
MAR
ALU
Memory
8
13
CSE360
MDR
2 to 1
MUX
ACC
CS
11
14
125
7
State 5: Control Signal 13
Write the data from the Data Register to the address stored in the MAR.
INC
Addr
PC to bus
load MAR
INC to PC
load PC
Op
2 to 1
MUX
Decode
Fetch
CS, R/W
3
Timing and
Control
PC
MDR to bus
load IR
IR
Addr to bus
load MAR
2
9
12
Bus
4
5
OP=store
6
0
ACC to bus
load MDR
10
CS, R/W
Execute
OP=load
MDR to bus
load ACC
MDR to bus
ALU to ACC
ALU op
load ACC
1
MAR
ALU
Memory
8
13
CSE360
MDR
2 to 1
MUX
ACC
CS
11
14
126
7
State 6: Control Signals 13, 14
Load the word at the Address from the Addr Reg into the Data Register.
INC
PC to bus
load MAR
INC to PC
load PC
Addr
Op
Decode
2 to 1
MUX
Fetch
CS, R/W
3
Timing and
Control
PC
MDR to bus
load IR
IR
Addr to bus
load MAR
2
9
12
Bus
4
OP=store
5
6
0
ACC to bus
load MDR
CS, R/W
ACC
OP=load
MDR to bus
load ACC
MDR to bus
ALU to ACC
ALU op
load ACC
1
11
MAR
ALU
Memory
8
13
CSE360
MDR
2 to 1
MUX
CS
10
Execute
14
127
7
After State 6, what values are now stored
in each register?
PC
 MAR
 MDR
 IR
 ACC

CSE360
128
State 7: Control Signals 6, 1
Load the word from Data Register into the ACCumulator.
INC
PC to bus
load MAR
INC to PC
load PC
Addr
Op
Decode
2 to 1
MUX
Fetch
CS, R/W
3
Timing and
Control
PC
MDR to bus
load IR
IR
Addr to bus
load MAR
2
9
12
Bus
4
OP=store
5
6
7
0
ACC to bus
load MDR
CS, R/W
ACC
OP=load
MDR to bus
load ACC
MDR to bus
ALU to ACC
ALU op
load ACC
1
11
MAR
ALU
Memory
8
13
CSE360
MDR
2 to 1
MUX
CS
10
Execute
14
129
State 8: Control Signals 6, 8, 10/11, 1
Use word from the Data Register for Arith Op and put result in ACC.
INC
PC to bus
load MAR
INC to PC
load PC
Addr
Op
Decode
Fetch
2 to 1
MUX
CS, R/W
MDR to bus
load IR
3
Timing and
Control
PC
IR
Addr to bus
load MAR
2
9
12
Bus
4
OP=store
5
6
0
ACC to bus
load MDR
CS, R/W
OP=load
MDR to bus
load ACC
10
ACC
MDR to bus
ALU to ACC
ALU op
load ACC
1
11
MAR
ALU
Memory
8
13
CSE360
MDR
2 to 1
MUX
CS
Execute
14
130
7
New Instruction
•What is necessary to implement a new instruction?
•New states?
•New control signals?
•New fetch/execute cycle?
•An Example:
•SWAP
Exchange value in Accumulator with value at
Address
•SWAP addr
CSE360
! Acc <- #M[addr], M[addr] <- #Acc
131
New Instruction

What changes to fetch/execute cycle?
– The fetch part of the cycle usually remains the same.
– Recall the values stored in registers after each state


E.g., After State 6,
what values are in each register?
–
–
–
–
–

PC
MAR
MDR
IR
ACC
Handy to have #M[addr] in MDR
– Start after state 6 then… .
CSE360
PC to bus
load MAR
INC to PC
load PC
Fetch
CS, R/W
MDR to bus
load IR
Addr to bus
load MAR
OP=store
ACC to bus
load MDR
CS, R/W
CS
Execute
OP=load
MDR to bus
load ACC
MDR to bus
ALU to ACC
ALU op
load ACC
132
New State 9: Control Signals 6, 4
Save the Data value from the MDR in the Address Register.
INC
MDR -> bus
Load IR
Addr
Op
2 to 1
MUX
Decode
3
Timing and
Control
PC
IR
2
9
12
Bus
4
5
6
7
0
10
1
MAR
MDR
2 to 1
MUX
ACC
11
ALU
Memory
8
13
CSE360
14
133
New State 10: Control Signals 0, 7
Send the ACCumulator value to the Data Register.
INC
ACC -> bus
load MDR
Addr
Op
2 to 1
MUX
Decode
3
Timing and
Control
PC
IR
2
9
12
Bus
4
5
6
7
0
10
1
MAR
MDR
2 to 1
MUX
ACC
11
ALU
Memory
8
13
CSE360
14
134
New State 11: Control Signals 15?, 1
Put the saved value from the
IR into the ACCumulator.
INC
Addr
Op
Decode
2 to 1
MUX
IR ->bus
load ACC
3
PC
IR
2
9
12
Bus
4
5
6
0
10
ACC
1
11
MAR
ALU
Memory
8
13
CSE360
MDR
2 to 1
MUX
Note: there is no control
signal in the current
architecture opposite of
4 (Load IR), so we
would have to create a
new control signal
(MAR to bus) in
addition to creating these
new states.
Timing and
Control
14
135
7
New State 12 (Old 5): Control Signals 13
Write the data from the Data Register to the address stored in the MAR.
INC
Addr
CS
Op
2 to 1
MUX
Decode
3
Timing and
Control
PC
IR
2
9
12
Bus
4
5
6
0
10
1
MAR
MDR
2 to 1
MUX
ACC
11
ALU
Memory
8
13
CSE360
14
136
7
New Instruction Solution



Changes to States, added 9 thru 12
Changes to Signals, added 15: IR-> bus
Changes to Fetch/Execute, new register transfer language (RTL)
PC -> bus, load MAR, INC -> PC, Load PC
CS, R/w
MDR -> bus, load IR
Addr -> bus, load MAR
CS, R/w
MDR -> bus, load IR
ACC -> bus, load MDR
IR-> bus, load ACC
CS
What if we had added MAR->bus
instead of IR->bus?
CSE360
137
Instruction Set Architectures 1

RISC vs. CISC
– Complex Instruction Set Computer (CISC):



Many, powerful instructions.
High code density to address the Von Neumann Bottleneck.
Instructions have varying lengths, number of operands,
formats, and clock cycles in execution.
– Reduced Instruction Set Computer (RISC):



CSE360
Fewer, less powerful, optimized instructions.
Requires simpler, faster hardware.
Instructions have fixed length, number of operands, formats,
and similar number of clock cycles in execution.
138
Instruction Set Architectures 2

Motivation: memory is comparatively slow.
– 10x to 20x slower than processor.
– Need to minimize number of trips to memory.



Provide faster storage in the processor -- registers.
Registers (16, 32, 64 bits wide) are used for intermediate
storage for calculations, or repeated operands.
Accumulator machine
– One data register -- ACC.
– 2 memory accesses per instruction -- one for the instruction and
one for the operand.

CSE360
Add more registers (R0, R1, R2, …, Rn)
139
Instruction Set Architectures 3

How many addresses to specify?
– With binary operations, need to know two source
operands, a destination, and the operation.

E.g., op (dest_operand) (src_op1) (src_op2)
– Based on number of operands, could have:




CSE360
3 addr. machine: both sources and dest are named.
2 addr. machine: both sources named, dest is a source.
1 addr. machine: one source named, other source and dest. is
the accumulator.
0 addr. machine: all operands implicit and available on the
stack.
140
Instruction Set Architectures 4

1-address architecture: a:=ab+cde
– Memory only
Code
LOAD
MPY
STORE
LOAD
MPY
MPY
ADD
STORE

# mem refs
100
104
100
108
112
116
100
100
2
2
2
2
2
2
2
2
Using registers
Code
LOAD
MPY
STORE
LOAD
MPY
MPY
ADD
STORE
# mem refs
100
104
R2
108
112
116
R2
100
2
2
1
2
2
2
1
2
1½-address architecture: at least one operand must always be a
register. (½ address is register, 1 address is the memory
operand: LOAD 100, R1).
– Like an accumulator machine, but with many accumulators.
CSE360
141
Instruction Set Architectures 5

3-address architecture: a:=ab+cde
– Using memory only:
Code
MPY
MPY
MPY
ADD
100,
200,
200,
100,
# mem refs
100,
108,
116,
200,
104
112
200
100
;a:=ab
;t:=cd
;t:=et
;a:=t+a
Memory
– Using registers:
Code
MPY
MPY
MPY
ADD
R2, 100,
R3, 108,
R3, 116,
100, R3,
# mem refs
104
112
R3
R2
;t1:=ab
;t2:=cd
;t2:=et2
;a:=t1+t2
100
104
108
112
116
...
200
(a)
(b)
(c)
(d)
(e)
(t)
What about instruction size?
CSE360
142
Instruction Set Architecture

How does instruction size affect addressing?
– 16-bit instruction, 3 address, 6 instructions
Opcode =
3 bits
(23=8)
Operand =
(size -opcode) / #addr
=4 bits
Operand
= (16-3) / 3
=4 bits
Operand =
13 / 3
=4 bits
– How many addresses will be supported?
– What if the instruction were 32 bit?
CSE360
143
Instruction Set Architectures 6

2-address architecture: a:=ab+cde
– Using memory only:
Code
MPY
MOVE
MPY
MPY
ADD
# mem refs
100,
200,
200,
200,
100,
104
108
112
116
200
;a:=ab
;t:=c
;t:=td
;t:=te
;a:=t+a
4
3
4
4
4
Memory
– Using registers:
Code
MPY 100,
MOVE R2,
MPY R2,
MPY R2,
ADD 100,
# mem refs
104
108
112
116
R2
;a:=ab
;R2:=c
;R2:=R2d
;R2:=R2e
;a:=t+a
4
2
2
2
3
100
104
108
112
116
...
200
(a)
(b)
(c)
(d)
(e)
(t)
Most CISC arch. this way, making 1 operand implicit
CSE360
144
Instruction Set Architectures 7

0-address architecture: a:=ab+cde
– Stack machine: All operands are implicit. Only push
and pop touch memory. All other operands are pulled
from the top of stack, and result is pushed on top.
E.g., HP calculators.
Code
PUSH A
PUSH B
MPY
PUSH C
PUSH D
PUSH E
MPY
MPY
ADD
POP A
CSE360
# mem refs
2
2
1
2
2
2
1
1
1
2
Stack
E
D*E
D
B
C
C*D*E
A*B +
AC*D*E
A*B
145
Instruction Set Architectures 8
Load/Store Architectures -- RISC
RISC
Load/
Store
Use of registers is simple and efficient.
Therefore, the only instructions that can
access memory are load and store. All
others reference registers.
Code
LOAD
LOAD
LOAD
LOAD
LOAD
MPY
MPY
MPY
ADD
STORE
CSE360
# mem refs
R2, 100
R3, 104
R4, 108
R5, 112
R6, 116
R2, R2,
R3, R4,
R3, R3,
R2, R2,
100, R2
R3
R5
R6
R3
;R2a
;R3b
;R4c
;R5d
;R6e
;R2ab
;R3cd
;R3(cd)e
;R2ab+(cd)e
;aab+(cd)e
2
2
2
2
2
1
1
1
1
2
146
Instruction Set Architectures 9

Why load/store architectures?
– Number of instructions (hence, memory references to fetch them)
is high, but can work without waiting on memory.

CISC machines tend to need to have their more complex instructions
interpreted in micro code
– More room in CPU for registers and memory cache.
– Easier to overlap instruction execution through pipelining.
Fetch …. execute
Fetch …. execute
Fetch …. execute
CSE360
Fetch …. execute
147
Instruction Set Architectures 9

Side effects
– Register interlock: delaying execution until memory read
completes.

Machine waits when necessary, to avoid erroneous results.
ld [%r1], %r2
add %r2, 100, %r3
– Branch delays: instruction after branch is always executed.

Instruction scheduling
– Rearranging instructions to maximize efficiency of
pipelining


CSE360
To prevent register interlock (loads on SPARC)
To use branch delay slots (branches on SPARC).
148
SPARC Assembly Language 1

SPARC (Scalable Processor ARChitecture)
– Used in Sun workstations, descended from RISC-II
developed at UC Berkeley
– General Characteristics:




32-bit word size (integer, address, register size, etc.)
Byte-addressable memory
RISC load/store architecture, 32-bit instruction, few addressing
modes
Many registers (32 general purpose, 32 floating point, various
special purpose registers)
– ISEM: Instructional SPARC Emulator - nicer than a
real machine for learning to write assembly language
programs.
CSE360
149
SPARC Assembly Language 2

Structure
– Line oriented: 4 types of lines


Blank - Ignored
Labeled – Any line may be labeled. Creates a
symbol in listing. Labels must begin
with a letter (other than ‘L’), then any
alphanumeric characters. Label must
end with a colon “:”. Label just
assigns a name to an address.


Assembler Directives - E.g., .data
.word .text, etc.
Instructions
– Comments start after “!” character and
go to the end of the line.
CSE360
.data
x_m:
y_m:
z_m:
.word 0x42
.word 0x20
.word 0
.text
start:
set
ld
set
ld
x_m, %r2
[%r2], %r2
y_m, %r3
[%r3], %r3
! Load x into reg 2
! Load y into reg 3
150
SPARC Assembly Language 3

Directives: Instructions to the assembler
– Not executed by the machine

.data -- following section contains declarations
– Each declaration reserves and initializes a certain number of bits
of storage for each of zero or more operands in the declaration.
• .word -- 32 bits
• .half -- 16 bits
• .byte -- 8 bits
E.g.,
w:
x:
y:
z:

CSE360
.data
.half
.byte
.byte
.word
27000
8
’m’, 0x6e, 0x0, 0, 0
0x3C5F
.text -- following section contains executable instructions
151
SPARC Assembly Language 11
– More assembler directives (.asciz and .ascii):

Each of the following two directives is equivalent:
– msg01: .asciz "a phrase"
– msg01: .byte 'a', ' ', 'p', 'h', 'r'
.byte 'a', 's', 'e', 0


Note that .asciz generates one byte for each character between
the quote (") marks in the operand, plus a null byte at the end.
The .ascii directive does not generate that extra byte. Each of
the following three directives is equivalent:
– digits: .ascii "0123456789"
– digits: .byte '0', '1', '2', '3', '4', '5'
.byte '6', '7', '8', '9'
– digits: .byte 0x30, 0x31, 0x32, 0x33, 0x34
.byte 0x35, 0x36, 0x37, 0x38, 0x39
CSE360
152
SPARC Assembly Language

Memory alignment: .align 4
– Used when mixing allocations of bytes, words, halfwords, etc.
and need word boundary alignment

Reserve bytes of space: .skip 20
– Useful for allocating large amounts of space (e.g.,
arrays)

Create a symbolic constant: .set mask, 0x0f
– Can now use the word “mask” anywhere we could use
the constant 0x0f previously
CSE360
153
SPARC Assembly Language 4

Registers -- 32 bits wide
– 32 general purpose integer registers, known by several
names to the assembler





%r0-%r7 also known as %g0-%g7 global registers -- Note,
%r0 always contains value 0.
%r8-%r15 also known as %o0-%o7 output registers
%r16-%r23 also known as %l0-%l7 local registers
%r24-%r31 also known as %i0-%i7 input registers
Use the %r0-%r31 names for now. Other names are used in
procedure calls.
– 32 floating point registers %f0-%f31. Each reg. is
single precision. Double prec. uses reg. pairs.
CSE360
154
SPARC Assembly Language 5

Assembly language
– 3-address operations - format different from book
op src1, src2, dest !opposite of text
E.g.,
add %r1, %r2, %r3
!%r3  %r1 + %r2
or
%r2, 0x0004, %r2 !%r2  %r2 b-w-or 0x0004
– Contrast SPARC with MiPs (used in the book)




CSE360
indirect address notation: @addr vs [addr]
operand order, especially the destination register
register notation: R2 vs. %r2
branches
155
SPARC Assembly Language 6
– 2-address operations: load and store
ld [addr], %r2
st %r2, [addr]
! %r2  M[addr]
! M[addr]  %r2
– Use set to put an address (a label, a symbolic
constant) into a register, followed by ld to load the
data itself.
set x_m, %r1 !put addr x_m into %r1
ld [%r1],%r2 !use addr in %r1 to load %r2
CSE360
156
SPARC Assembly Language 7

Immediate values: operand is not an address, but a
value
E.g., add %rs, siconst13, %rd !%rd%rs+const




CSE360
Immediate value coded as 13 bit 2’s complement. Range is,
then, -212…212-1 or -4096 to 4095.
Immediate values can be specified in decimal, hexadecimal,
octal, or binary. E.g., add %r2, 0x1A, %r2
Constant is coded into instruction itself, therefore available
after fetching the instruction (no extra trip to memory for an
operand).
On SPARC, no special notation for differentiating constants
from addresses because no ambiguity in a load/store
architecture.
157
SPARC Assembly Language 8

Synthetic Instructions: assembler translates one
“instruction” into one or more machine instructions.
– set : used to load a 32-bit signed integer constant into a register.
Has 2 operands - 32 bit value and register number. How does that
fit into a 32 bit instruction?
E.g.,
set iconst32, %rd
set -10, %r3
set x_m, %r4
set ’=’, %r8
– clr %rd : used to set all bits in a register to 0. How?
– mov %rs, %rd : copies a register.
– neg %rs, %rd : copies the negation of a register.
CSE360
158
SPARC Assembly Language 9
– Operand sizes

–
set
ld
ldsb
ldub
x_m, %r2
[%r2], %r1
[%r2], %r1
[%r2], %r1
!Put addr x_m in %r2
!load word
!load byte, sign extended
!load byte, extend with 0’s
st
stb
sth
%r1, [%r2] !store word, addr is mult of 4
%r1, [%r2] !store byte, any address
%r1, [%r2] !store half word, address is even
Characters use 8 bits


CSE360
double word = 8 bytes, word = 4 bytes, half word = 2 bytes,
byte = 8 bits. Recall memory alignment issues.
ldub to load a character
stb to store a character
159
SPARC Assembly Language 10
– Traps : provides initial help with I/O, also used in
operating systems programming.





ta 0 : terminate program
ta 1 : output ASCII character from %r8
ta 2 input ASCII character into %r8
ta 4 : output integer from %r8 in unsigned hexadecimal
ta 5 : input integer into %r8, can be decimal, octal, or hex
E.g.,
set ’=’, %r8
ta 1
ta 5
mov %r8, %r1
set 0x0a, %r8
ta 1
CSE360
!put ’=’ in %r8
!output the ’=’
!read in value into %r8
!copy %r8 into %r1
!load a newline into %r8
!output the newline
160
SPARC Assembly Language 12
– Quick review of instructions so far:






ld [addr], %rd
st %rd, [addr]
op %rs1, %rs2, %rd
op %rs, siconst13, %rd
set siconst32, %rd
ta #
!
!
!
!
!
!
%rd  M[addr]
M[addr]  %r2
op is ALU op
%rd%rs op const
%rdconst
trap signal
– Have actually seen many more variants, e.g., ldub,
ldsb, sth, clr, mov, neg, add, sub, smul, sdiv,
umul, udiv, etc. Can evaluate just about any simple
arithmetic expression.
CSE360
161
Review: Sparc Loads, Stores
x_m:
.data
.word 0xa1b2c3d4
.skip 12
.text
set x_m, %r2
ld [%r2], %r3
ldsb [%r2], %r4
ldub [%r2], %r5
st %r3, [%r2+4]
sth %r3, [%r2+8]
stb %r3, [%r2+12]
ta 0
After this runs, what values are in %r2-5, and memory locations starting at byte address x_m?
CSE360
162
Flow of Control 1

In addition to sequential execution, need ability to
repeatedly and conditionally execute program fragments.
– High level language has: while, for, do, repeat, case, if-then-else,
etc.
– Assembler has if, goto.
– Compare: high level vs. pseudo-assembler, implementation of f=n!
f = 1;
i = 2;
while (i <= n)
{
f = f * i;
i = i + 1;
}
CSE360
f = 1
i = 2
loop: if (i > n) goto done
f = f * i
i = i + 1
goto loop
done: ...
163
Flow of Control 2
– Branch -- put a new address in the program counter.
Next instruction comes from the new address,
effectively, a “goto”.
– Unconditional branch


(book)
(SPARC)
BRANCH
ba
addr
addr
! PC  addr
! PC  addr
– Conditional branch

CSE360
(book) BRcc R1, R2, target
“if R1 cc R2 then PC  target” and cc is comparison
operation (e.g., LT is <, GE is , etc.)
164
Flow of Control 4

Other conditions (from text, very similar to MIPS)
BRLT
BRLE
BREQ
BRNE
BRGE
BRGT

Rn,
Rn,
Rn,
Rn,
Rn,
Rn,
Rm,
Rm,
Rm,
Rm,
Rm,
Rm,
target
target
target
target
target
target
;
;
;
;
;
;
if
if
if
if
if
if
Rn <
Rn 
Rn 
Rn 
Rn 
Rn 
Rm
Rm
Rm
Rm
Rm
Rm
then
then
then
then
then
then
PCtarget
PCtarget
PCtarget
PCtarget
PCtarget
PCtarget
Can implement high level control structures now.
– Factorial example, using the book’s assembly language:
loop:
done:
CSE360
LOAD
LOAD
LOAD
BRGT
MPY
ADD
BRANCH
STORE
R1,
R2,
R3,
R2,
R1,
R2,
loop
f,
#1
#2
n
R3,
R1,
R2,
R1
done
R2
#1
;
;
;
;
;
;
;
;
R1 = f = 1
R2 = i = 2
R3 = n
branch if i > n
f = f * i
i = i + 1
goto loop
f = n!
165
Flow of Control 3

Evaluating conditional
branches
– Evaluate condition
– If condition is true, then
PC  target, else PC 
PC+1
PC to bus, etc.
Fetch
Addr to bus, load
PC
Yes
No
Yes
Consider changes to the fetch-execute cycle given
earlier for accumulator machine.
•Do data paths need to change?
•New control paths?
•New opcodes?
•New instruction formats?
CSE360
OP=
BRANCH
Yes
Cond=T
No
OP=BRcc
Execute
No
166
Flow of Control 5

Condition Codes
– Book’s assembly language has 3-address branches. SPARC uses
1-address branches. Must use condition codes.
– Non-MIPS machines use condition codes to evaluate branches.
Condition Code Register (CCR) holds these bits. SPARC has 4-bit
CCR.
N
Z
V
C
– N: Negative, Z: Zero, V: Overflow, C: Carry. All are shown in a
trace, or in the reg command under ISEM.
– Condition codes are not changed by normal ALU instructions.
Must use special instructions ending with cc, e.g., addcc.
CSE360
167
ALU Hardware 1

Recall the half-adder
– Full-adder adds three single digit binary numbers.
Results in a sum, and a carry out.
Cin
X
Y
Sum
Cout
0
0
0
0
0
0
0
1
1
0
0
1
0
1
0
0
1
1
0
1
1
0
0
1
0
1
0
1
0
1
1
1
0
0
1
1
1
1
1
1
CSE360
x
cin
y
x
y

cout
cin
FA

Sum
Sum
cout
168
ALU Hardware 2

Now cascade the full adder hardware
register x
cout
FA
FA
register y
FA
FA
FA
0
register z

How are CCR bits set? (Above is a ripple-carry adder.)
–
–
–
–
CSE360
C-bit = Cout
V-bit = Cout  Cn-1
Z-bit = (rzn-1  rzn-2  rzn-3  ...  rz0)
N-bit = rzn-1
169
Flow of Control 6
.text
start:
set 1, %r2
set 0xFFFFFFFE, %r1
cc_set: subcc %r1, %r2, %r3
end:
ta 0
ISEM> reg
----0--- ----1--G 00000000 fffffffe
O 00000000 00000000
L 00000000 00000000
I 00000000 00000000
PC: 08:00002028
cc_set
ISEM> trace
----0--- ----1--G 00000000 fffffffe
O 00000000 00000000
L 00000000 00000000
I 00000000 00000000
PC: 08:0000202c
CSE360
----2--- ----3--00000001 00000000
00000000 00000000
00000000 00000000
00000000 00000000
nPC: 0000202c
:
subcc
! –2 in 32-bit 2’s comp
! r3<= -2-1
----4--- ----5--- ----6--- ----7--00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
PSR: 0000003e
N:0 Z:0 V:0 C:0
%g1, %g2, %g3
----2--- ----3--00000001 fffffffd
00000000 00000000
00000000 00000000
00000000 00000000
nPC: 00002030
----4--- ----5--- ----6--- ----7--00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
PSR: 00b0003e
N:1 Z:0 V:0 C:0
170
Flow of Control 7
– Setting the condition codes


Regular ALU operations don’t set condition codes.
Use addcc, subcc, smulcc, sdivcc, etc., to set condition
codes.
– Consider
subcc %r1, %r2, %r0
%r1
%r2
1
0
0
1
1
1
N
Z
V
C
Do the values in the CCR tell us anything about the
relationship between %r1 and %r2?
CSE360
171
Flow of Control 8
– Branches use logic to evaluate CCR (SPARC)
Operation
Assembler Syntax
Branch Condition
Branch always
ba
target
1 (always)
Branch never
bn
target
0 (never)
Branch not equal
bne
target
Z
Branch equal
be
target
Z
Branch greater
bg
target
(Z  (N  V))
Branch less or equal
ble
target
(Z  (N  V))
Branch greater or equal
bge
target
(N  V)
Branch less
bl
target
NV
Branch greater, unsigned
bgu
target
(C  Z)
Branch less or equal, unsigned
bleu
target
CZ
Branch carry clear
bcc
target
C
Branch carry set
bcs
target
C
Branch positive
bpos
target
N
Branch negative
bneg
target
N
Branch overflow clear
bvc
target
V
Branch overflow set
bvs
target
V
CSE360
172
Flow of Control 9
– Setting Condition Codes (continued)

Synthetic instruction cmp %rs1, %rs2
– Sets CCR, but doesn't modify any registers.
– Implemented as subcc %rs1, %rs2, %g0

Back to the factorial example (SPARC)
loop:
done:
CSE360
set 1, %r1
set 2, %r2
set n, %r3
ld [%r3], %r3
!
!
!
!
%r1
%r2
Get
Put
= f = 1
= i = 2
loc of n
n in %r3
cmp %r2, %r3
bg done
nop
! Set CCR (i?n)
! i > n done
! Branch delay
umul %r1, %r2, %r1
add %r2, 1, %r2
! f = f * i
! i = i + 1
ba loop
nop
! Goto loop
! Branch delay
set f, %r3
st %r1, [%r3]
! Get loc of f
! f = n!
173
Flow of Control 10
– Branch delay slots: unique to RISC architecture

Non-technical explanation: processor is running so fast, it can’t
make a quick turn.
– Instruction following branch is always executed.



Technical explanation: the efficiency advantage of pipelining
is greater if the following instruction, which has almost
completed execution, is allowed to complete.
Compilers take advantage of branch delay slots by putting a
useful instruction there if possible.
For our purposes, use the nop (no operation) instruction to fill
branch delay slots.
Beware! Forgetting the nop will be a large source of errors in your programs!
CSE360
174
High Level Control Structures 1

Converting high level control structures
– You get to be the “compiler”.

Some compilers convert the source language (C, Pascal,
Modula 2, etc.) into assembly language and then assemble the
result to an object file. GNU C, C++ do this to GAS (Gnu
Assembler).
– if-then-else, while-do, repeat-until are all possible to
create in a structured way in assembly language.
CSE360
175
High Level Control Structures 2

General guidelines
– Break down into independent (or nested) logical units
– Convert to if/goto pseudo-code.
f = 1;
for (i=2; i<=n; i++)
f = f * i;
f=1
i=2
loop: if (i>n) goto done
f = f*i
i = i+1
goto loop
done: ...
– Mechanical, step-by-step, non-creative process
CSE360
176
High Level Control Structures 3

if-then-else
if (a<b)
c = d + 1;
else
c = 7;
init: set
ld
set
ld
if:
cmp
bge
nop
set
ld
add
ba
nop
else: set
end: set
st
if/goto
if (a >= b) goto else
c = d + 1
goto end
else: c = 7
end:

CSE360
a,
[%r2],
b,
[%r3],
%r2,
else
%r2
%r2
%r3
%r3
%r3
d,
%r5
[%r5], %r5
%r5, 1, %r4
end
7,
c,
%r4,
!
!
!
!
!
!
get &a into r2
get a into r2
get &b into r3
get b into r3
a ?? b (want >=)
a >= b, do then
! get &d into r5
! get d into r5
! r4 <- d+1
%r4
! get 7 into r4
%r5
! get &c into r5
[%r5] ! c <- r4
177
High Level Control Structures 4

while loops:
while (a<b)
a = a+1;
c = d;

if/goto:
init: set
ld
set
ld
whle: cmp
bge
nop
body: add
st
ba
nop
done: set
...
a,
[%r4],
b,
[%r3],
%r2,
done
%r4
%r2
%r3
%r3
%r3
!
!
!
!
!
!
get &a into r4
get a into r2
get &b into r3
get b into r3
a ?? b (want >=)
a >= b skip body
%r2, 1, %r2
! r2 = a + 1
%r2,
[%r4] ! a = a + 1
whle
! repeat loop body
c,
%r5
! get &c into r5
whle: if (a>=b) goto done
body: a = a+1
goto whle
done: c = d
CSE360
178
High Level Control Structures 5

repeat-until loops:
rpt:
repeat
…
until (a>b)

...
...
set
ld
set
ld
cmp
ble
nop
a,
[%r2],
b,
[%r3],
%r2,
rpt
%r2
%r2
%r3
%r3
%r3
;
;
;
;
;
;
get &a into r2
get a into r2
get &b into r3
get b into r3
a <= b?
do body again
if/goto:
repeat:
…
if (a<=b) goto repeat
CSE360
179
High Level Control Structures 6

Complex condition
if((a<b)and(b>=c))
…
if((a<b)or(b>=c))
…

These can be combined
and used in if/else or
while loops.
CSE360
Primitive Language
if (a>=b) then goto skip
if (b<c) then goto skip
body: ...
...
skip: ...
Primitive Language
if (a<b) then goto body
if (b<c) then goto skip
body: ...
...
skip: ...
180
Flow of Control 11
– Optimizing code: change order of instructions, combine
instructions, take advantage of branch delay slots.

Factorial example again. (for i:=n downto 1 do…)
loop:


CSE360
set 1, %r1
set n, %r2
ld [%r2], %r2
umul %r1, %r2, %r1
subcc %r2, 1, %r2
bg loop
nop
set f, %r3
st %r1, [%r3]
!
!
!
!
!
!
!
!
!
%r1=f=1
Get loc of n
Put n in %r2
f=f*n
Decrement n
Repeat
Branch delay
Get loc of f
f=n!
Reduced 7 instructions in loop to just 4.
(You gain no advantage if you optimize code in your labs.)
181
Synthetic Instructions

Remember lab0?
x_m:
y_m:
z_m:
.data
.word 0x42
.word 0x20
.word 0
.text
start:
set
ld
set
x_m, %r2
[%r2], %r2
y_m,%r3
ld
[%r3], %r3
and so on…
Suppose you gave this command to ISEM (after loading):
ISEM> dump start
start
05 00 00 10 84 10 a0 00 c4 00 80 00 07 00 00 10
Could you find the set instruction?
CSE360
182
Instruction Encodings 1

First, Instruction Encoding is how instructions are
assembled
– All instructions must fit into 32 bits.

Register-register: op=10, i=0
31 30 29
op

op3
14 13 12
rs1
i
54
asi
rs2
rd
op3
rs1
i
simm13
Floating point: op=10, i=0
op
CSE360
rd
19 18
Register-immediate: op=10, i=1
op

25 24
rd
op3
rs1
i
opf
rs2
183
Instruction Encodings 2

Call instructions: op=01
31 30 29
op


Branch instructions: op=00, op2=010
31 30 29 28
25 24 22 21
op ia
op2
cond
disp22
SETHI instructions: op=00, op2=100
op

disp30
rd
op2
imm22
Ex.: add %r2, %r3, %r4
31 30 29
10
25 24
00100
19 18
000000
14 13 12
00010
0
54
00000000
00011
in hexadecimal: 88008003
CSE360
184
Decoding an Instruction
05 00 00 1016
0000 0101 0000 0000 0000 0000 0001 00002
Instruction Group (bits 30:31) = 00
Destination Register (bits 25:29) = 00010
Op Code (bits 22:24) = 100
Constant (bits 0:21) = 0000000000000000010000
Meaning: sethi 0x10, %r2
%r2 <-- 00000000000000000100000000000000 (0x4000)
CSE360
185
Understanding SET Synthetic
Usually used to put the value of an address in memory into a register.
For example, set 0x4004, %r3
Can do neither ‘add %r0, 0x4004, %r3’ nor ‘or %r0, 0x4004, %r3’. Why not?
SET is a synthetic instruction which may be implemented in two steps.
bit positions
#1
#2
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
sethi 0x10, %r3 ! Puts 0x10 in the Most Significant 22 bits
%r3
0 0 0 1 0 0 1 0 0 1 0
0x10
0 0 0 0 0 0 0 0 0 0 0
sethi
%r3
0 0 0 0 0 0 0 0 0 0 0
9
8
7
6
5
4
3
2
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
hex value
0 1 0 0 1 0 0 1 0 0 0 0x12481248
0 x x x x x x x x x x 0x10
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0 0x4000
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0 0x4000
0 0x00000004
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0 0x4004
0
0
0
0
0
0
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0 0x 07 00 00 10
0 0x 86 10 E0 04
or %r3, 0x0004, %r3 ! Puts 0x0004 in the least significant bits
%r3
0 0 0 0 0 0 0 0 0 0 0 0
0x0004
0 0 0 0 0 0 0 0 0 0 0 0
OR
%r3
0 0 0 0 0 0 0 0 0 0 0 0
Machine language encoding for 'set 0x4004, %r3'
sethi 0x10, %r3
or %r3, 4, %r3
CSE360
0
1
0
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0
1
0
0
186
SET Synthetic Instruction

set iconst, rd
sethi
or
sethi
or
CSE360
%hi(iconst), rd
rd, %lo(iconst), rd
--or-%hi(iconst), rd
--or-%g0, iconst, rd
187
SPARC Assembly Language

Memory alignment: .align 4
– Used when mixing allocations of bytes, words, halfwords, etc.
and need word boundary alignment

Reserve bytes of space: .skip 20
– Useful for allocating large amounts of space (e.g.,
arrays)

Create a symbolic constant: .set mask, 0x0f
– Can now use the word “mask” anywhere we could use
the constant 0x0f previously
CSE360
188
SET and Symbolic Addresses

C-style example of pointer data type
char
x;
char * ptr;
ptr = &x;
*ptr = ‘a’;

x_m r1
ptr_m r2
‘a’ r3
x_m:
ptr_m:
//
//
//
//
object of type character
pointer to character type
ptr has address of x (points to x)
store ‘a’ at address in ptr
Assembly language equivalent
.data
x_m:
.byte 0
.align 4
ptr_m: .word 0
.text
set x_m, %r1
set ptr_m, %r2
st %r1, [%r2]
set ’a’, %r3
set ptr_m, %r2
ld [%r2], %r1
stb %r3, [%r1]
! reserve character space; x_m = &x; [x_m] = x
! align to word boundary
! pointer variable; [ptr_m] = ptr
!
!
!
!
!
!
!
get address x_m into %r1
get address ptr_m into %r2
make [ptr_m] point to [x_m]
put character ‘a’ into r3
get address ptr_m into %r2
get address [ptr_m], i.e. x_m, into %r1
store ‘a’ at address [ptr_m], i.e., ptr
‘a’
x_m, i.e., addr of x
CSE360
189
Bitwise Operations 1

Bit Manipulation Instructions
– Bitwise logical operations

and %rs1, %rs2, %rd
10010011… (32 bits)
01111001…

or
%rs1, %rs2, %rd
10010011… (32 bits)
01111001…

xor %rs1, %rs2, %rd
10010011… (32 bits)
01111001…
CSE360
x
0
0
1
1
y
0
1
0
1
x y
0
0
0
1
x
0
0
1
1
y
0
1
0
1
x+y
0
1
1
1
x
0
0
1
1
y
0
1
0
1
xy
0
1
1
0
190
Bitwise Operations 2

andn %rs1, %rs2, %rd
10010011… (32 bits)
01111001…

orn %rs1, %rs2, %rd
10010011… (32 bits)
01111001…

not %rs, %rd
10010011… (32 bits)

CSE360
x
0
0
1
1
y
0
1
0
1
xy
0
0
1
0
x
0
0
1
1
y
0
1
0
1
xy
1
0
1
1
x
0
1
x
1
0
Recall the cc operations, so andcc, orcc, etc. are available.
(However, there is no notcc; use xnorcc.)
191
Bitwise Operations 3




CSE360
For what kinds of things are these bit level operations used?
Recall the synthetic operation clr, and mov.
clr %r2

or %r0, %r0, %r2
mov %r2, %r3

or %r0, %r2, %r3
Masking operations: Want to select a bit or group of bits from
a set of 32. E.g., convert lower (or upper) to upper case:
‘a’ in binary is 01100001
‘A’ in binary is 01000001
All we need to do is “turn off” the bit in position 5.
and %r1, 0b11011111, %r1 will turn off that bit!
What if we subtract 32 (0b100000) from %r1?
What about converting upper to lower case?
192
Bitwise Operations 4
– Bitwise shifting operations

Shift logical left: sll %rs1, %rs2, %rd
%rs1: data to be shifted
%rs2: shift count
%rd: destination register
E.g.,
set 0xABCD1234, %r2
sll %r2, 3, %r3
%r2: 1010 1011 1100 1101 0001 0010 0011 0100
%r3: 0101 1110 0110 1000 1001 0001 1010 0000

CSE360
sll is equivalent to multiplying by a power of 2 (barring
overflow). (In the decimal system, what’s a shortcut for
multiplying by a power of ten?)
193
Bitwise Operations 5

Shift Logical Right: srl %rs1, %rs2, %rd
– Shifts right instead of left, inserting zeros.

Arithmetic shifts: propagate the sign bit when shifting right,
e.g., sra. (Left shift doesn't change.)
– Almost equivalent to dividing by a power of 2.

Rotating shifts: Bits that would have gone into the bit bucket
are shifted in instead. (E.g., rr, rl)
Rotate Right
Rotate Left
– Rotate not implemented in SPARC
CSE360
194
Addressing Modes 1

Addressing Modes
– How do we specify operand values?



In a register, location is encoded in the instruction.
As a constant, immediate value is in the instruction.
In memory, operand is somewhere in memory, location may
only be known at runtime.
– Memory operands:

CSE360
Effective address: actual location of operand in memory. This
may be calculated implicitly (e.g., by a displacement in the
instruction) or may be calculated by the programmer in code.
195
Addressing Modes 2
– Summary of addressing modes:
Mode
Immediate
Register Direct
Memory Direct
Memory Indirect
Register Indirect
Register Indexed
Register
Displaced
Post Increment
Example
add %r1, 100, %r1
add %r1, %r2, %r1
add %r1, [2000], %r2
add %r1, [[2000]], %r2
ld [%r1], %r2
st %r1, [%r2+%r3]
st %r1, [%r2+x]
Loc. Of Operand
instruction
%r2
mem[2000]
mem[mem[2000]]
mem[%r1]
mem[%r2+%r3]
mem[%r2+x]
Suitable for
SPARC?
Constants
Integers, constants
Integers, constants
Pointers
Pointers
Arrays
Records
Yes
Yes
No
No
Yes
Yes
Yes
ld [%r1]+, %r2
ld -[%r1], %r2
Arrays, strings,
stacks
Arrays, strings,
stacks
No
Pre Decrement
mem[%r1]
increment %r1
decrement %r1,
mem[%r1]
CSE360
No
196
Addressing Modes 3
– Memory Direct addressing

Entire address is in the instruction (not in SPARC).
E.g., accumulator machine: each instruction had an opcode and
a hard address in memory.
– Can’t be done on SPARC because an address is 32 bits, which is
the length of an instruction. No room for opcodes, etc. Can be
done in CISC because multi-word instructions are permitted.
– Memory Indirect addressing

CSE360
Pointer to operand is in memory. Instruction specifies location
of pointer. Requires three memory fetches (one each for
instruction, pointer, and data). Not in RISC machines because
instruction is too slow; such an instruction would cause its own
register interlock!
197
Addressing Modes 4

Register Indirect addressing
– Register has address of operand (a pointer). Instruction
specifies register number, effective address is contents
of register.
n_m:
set
ld
.data
.word 5
.text
n_m, %r1
[%r1], %r3
; initialize n to 5
; %r1 has n_m, pointer to n
; fetch n into %r3
– Simulating Register Indirect addressing on SPARC


CSE360
SPARC doesn't truly have register indirect addressing.
Assembler converts ‘st %r2, [%r1]’ into ‘st %r2, [%r1+%r0]’
198
Addressing Modes 5

n_m:
a_m:
sum_m:
b_m:
loop:
.data
.word
.word
.word
.skip
Ex.: sum up array of integers:
5
4,2,5,8,3
0
5*4
!
!
!
!
.text
clr %r2
!
set n_m, %r3
!
ld [%r3], %r3
!
set a_m, %r4
!
ld [%r4], %r5
!
add %r5, %r2, %r2 !
add %r4, 4, %r4 !
subcc %r3, 1, %r3 !
bg loop
!
nop
!
set sum_m, %r1
!
st %r2, [%r1]
!
ta 0
!
CSE360
Size of array
5 word array
Sum of elements
another 5 word array
r2 will hold sum
r3 points to n
r3 gets array size
r4 points to array a
Load element of a into r5
sum = sum + element
Incr ptr by word size
Decrement counter
Loop until count = 0
Branch delay slot
r1 points to sum
Store sum
done
5
4
2
5
8
3
r2 r3
0
5
4
3
2
1
n_m
a_m
a_m+4
a_m+8
a_m+12
a_m+16
sum_m
r4
r5
loop
a_m
loop+1
a_m+4
loop+2
a_m+8
loop+3
a_m+12
loop+4
a_m+16
199
Register Indexed & Displaced
Recall these Assembler directives
 Reserve bytes of space: .skip 20
 Create a symbolic constant: .set offset,
0x16
Register Indexed and Displaced addressing modes
help us work with pointers, arrays, and records in
assembly language.
CSE360
200
Addressing Modes 7
– Register Indexed addressing


.data
A: .skip
Suitable for accessing successive elements of the same type in
a data structure.
Ex.: Swap elements A[i] and A[k] in array
24*4
! reserve array[0..23] of int
! assume i is in %r2 and k is in %r3
.text
set
A,
%r4 ! beginning of array ptr.
sll
%r2, 2,
%r2 ! “multiply” i by 4
sll
%r3, 2,
%r3 ! “multiply” k by 4
ld
[%r2+%r4], %r7 ! r7 <- a[i]
r2 r3
ld
[%r3+%r4], %r8 ! r8 <- a[k]
st
%r8, [%r2+%r4] ! a[i] <- r8
001 0010
<- r7
st
%r7, [%r3+%r4] ! a[k] <=
100 1000

CSE360
A
A+4
A+8
A+12
r4
A
r7
r8
after sll
Effective address calculations!
201
Addressing Modes 8
 Array
mapping functions: used by
compilers to determine addresses of array
elements.
– Must know upper bound, lower bound, and size
of elements of array.


Total storage = (upper - lower + 1)*element_size
Address offset for element at index
k = (k - lower)*element_size
Address (byte) offset for A[3] = (3-0)*4 = 12
This is for 1 dimensional arrays only!
CSE360
202
Addressing Modes 9

1D array mapping functions: Want an array of n
elements, each element is 4 bytes in size, array
starts at address arr.
–
–
–
–
Total storage is 4n bytes
First element is at arr+0
Last element is at arr+4(n-1)
kth (k can range from 0…n-1) element is at arr+4k. Array
uses zero-based indexing.
arr+0
k=0
arr+4
k=1
arr+8
k=2
arr+12
arr+16
arr+20
k=3
k=4
k=5
array of 6 elements, 4 bytes each
CSE360
203
Addressing Modes 10

2D array mapping functions: must linearize the
2D concept; e.g., map the 2D structure into 1D
memory. 0 1 2 3 4
0
0,0
0,1
0,2
0,3
0,4
1
1,0
1,1
1,2
1,3
1,4
2
2,0
2,1
2,2
2,3
2,4
3 Rows
(0...2)
5 Columns (0...4)
– Convert into 1D array in memory
0,0
CSE360
0,1
0,2
0,3
0,4
1,0
1,1
.....
2,3
2,4
204
Addressing Modes 11

2 ways to convert to 1D
– Row major order (Pascal, C, Modula-2) stores first by rows,
then by columns. E.g.,
0,0
0,1
0,2
0,3
0,4
1,0
1,1
.....
2,3
2,4
– Column major order (FORTRAN) stores first by columns
then by rows. E.g.,
0,0
CSE360
1,0
2,0
0,1
1,1
2,1
0,2
.....
1,4
2,4
205
Addressing Modes


Row major 2D array mapping function:
Given an array starting at address arr, that is x rows by y
columns, each element is m bytes in size, and indices start at
zero, then element (i, j) may be found at location:
0
1
2
3
4
0
0,0
0,1
0,2
0,3
0,4
1
1,0
1,1
1,2
1,3
1,4
2
2,0
2,1
2,2
2,3
2,4
arr + (y ´ i + j) ´ m
Offset to A (0,2) =
(5 * 0 + 2) * element size
3 Rows
(0...2)
5 Columns (0...4)
0,0
CSE360
0,1
0,2
0,3
0,4
1,0
1,1
.....
2,3
2,4
206
Addressing Modes 12
– 3D array mapping function: natural extension of 2D
function. Store by row, then column, then depth.
+1
+3
+5
+7
+9
0,0,1 0,1,1 0,2,1 0,3,1 0,4,1
+4 1,1,1
+6 1,2,1
+8 1,3,1 1,4,1
1,0,1
0,0,0 0,1,0 0,2,0 0,3,0 0,4,0
+0
+2
+16 2,2,1
+18 2,3,1 2,4,1
2,1,1
0,1,0 1,1,0 1,2,0 1,3,0 1,4,0
1,0,0
+10
+12 2,0,1
+14
2,0,0 2,1,0 2,2,0 2,3,0 2,4,0
3 Rows, 5 Columns, 2 Depth
– Array starting at arr with x rows, y columns, depth z, m
element size. Element (i, j, k) is found at location:
arr + (z(yi + j) + k)m
CSE360
207
Addressing Modes 15
– Displacement Addressing

Suitable for accessing the individual fields of record data
structures. Each field can be of a different type.
20 Characters
Name

Age
Integer
DOB
Integer
Logical
view of a
record
Use .set directive to establish offsets to fields within records.
Then use displacement addressing to access those fields.
Actual layout of record in memory
20 bytes
person+0
CSE360
4 bytes
4 bytes
person+20
person+24
208
Addressing Modes 16

Ex.: Add 1 to the age field in a person record
.data
.set
name, 0
.set
age, 20
.set
dob, 24
person: .skip 28
!
!
!
!
offset to
offset to
offset to
size of a
name field
age field
date of birth
person record
!
!
!
!
get addr of person record
get the age of the person
increment age by 1
store back to record
.text
....
set
ld
add
st
person, %r1
[%r1+age], %r2
%r2, 1, %r2
%r2, [%r1+age]

CSE360
Problem: alignment in memory. May have to waste some
space in the person record in order to have the integer fields
align on a word boundary.
209
Addressing Modes 17
– Auto-increment and Auto-decrement addressing


CSE360
SPARC does not support these modes. They may be simulated
using register indirect addressing followed by an add or
subtract of the size of the element on that register.
Useful for traversing arrays forward (auto-increment) and
backward (auto-decrement). Also useful for stacks and queues
of data elements.
210
Subroutines 1

Subroutine (also function, method, procedure, or
subprogram)
– a portion of code within a larger program, which performs a specific task
and can be relatively independent of the remaining code.

Advantages of subroutines
–
–
–
–
–
reducing the duplication of code in a program
enabling reuse of code across multiple programs
decomposing complex problems into simpler pieces
improving readability of a program
hiding or regulating part of the program
Requires little hardware support, mostly protocols
and conventions to handle parameters.
CSE360
211
Subroutines 2
 Terminology
– Caller: the code (which could be a subroutine
itself) which invokes the subroutine of interest
– Callee: the subroutine being invoked by the
caller
– Function: subroutine that returns one or more
values back to the caller and exactly one of
these values is distinguished as the return value
– Return value: the distinguished value returned
by a function
CSE360
212
Subroutines 3

Terminology (continued)
– Procedure: a subroutine that may return values to the
caller (through the subroutine’s parameter(s)), but none
of these values is distinguished as the return value
– Return address: address of the subroutine call
instruction
– Parameters: information passed to/from a subroutine
(a.k.a. arguments)
– Subroutine linkage: a protocol for passing parameters
between the caller and the callee
CSE360
213
Subroutines 4

Calling a subroutine
– Assembly language syntax for calling a subroutine
call label
nop
– Must change the program counter (as in a branch
instruction) however, we must also keep track of where
to resume execution after the subroutine finishes. Call
instruction handles this atomically (i.e., without
interruption) by:
%r15  #PC
(PC  #nPC)
nPC
 label
CSE360
214
Subroutines 4

Returning from a subroutine
– Assembly language syntax for returning from a
subroutine
retl
nop

Again, must change the program counter to return to
an instruction after the one that called the subroutine.
The address of the instruction that called it was saved
in %r15, and we must skip over the branch delay slot
as well. So, this is accomplished by:
nPC  %r15+8
CSE360
215
Subroutines 5

Parameter passing: 2 approaches
– Register based linkage: pass parameters solely through
registers. Has the advantage of speed, but can only pass a
few parameters, and it won’t support nested subroutine calls.
Such a subroutine is called a leaf subroutine.
– Stack based linkage: pass parameters through the run-time
stack. Not as fast, but can pass more parameters and have
nested subroutine calls (including recursion).
CSE360
216
Register-based Linkage 1
– Subroutine linkage:


Startup
Sequence
Cleanup
Sequence
Prologue
Body
tl
CSE360
Callee
Caller
re

Startup Sequence: load
parameters and return
address into registers,
branch to subroutine.
Prologue: if non-leaf
procedure then save return
address to memory, save
registers used by callee.
Epilogue: place return
parameters into registers,
restore registers saved in
prologue, restore saved
return address, return.
Cleanup Sequence: work
with returned values
ca
ll

Epilogue
217
Register-based Linkage 2
– Example: Print subroutine.
main:
print:

CSE360
.text
set
set
mov
call
nop
mov
call
nop
add
call
nop
ta
set
or
mov
ta
mov
ta
retl
nop
1, %r1
3, %r2
%r1, %r8
print
! Initialize r1 and r2
%r2, %r8
print
! Print %r2
%r1, %r2, %r8
print
! Do our calculation
! Print the result (expect ‘4’)
0
‘0’, %r1
%r8, %r1, %r2
%r2, %r8
1
‘\n’, %r8
1
! Print %r1
!
!
!
!
Ascii value of zero
Treat r8 as parameter
Move into output register
Output character
! Output end of line (newline)
! Return
What’s wrong with the above code?
218
Register-based Linkage 3
– Which registers can leaf subroutines change?

Convention for optimized leaf procedures:
Register(s)
%r0
%r1
%r2-%r7
%r8
%r8-%r13
%r14
%r15
%r30
%r16-%r29, %r31


CSE360
Use
Zero
Temporary
Caller’s variables
Return value
Parameters
Stack pointer
Return address
Frame pointer
Caller’s variables
Mentionable?
Yes
Yes
No
Yes
Yes
No
Yes, but preserve
No
No
The subroutine must not use the value in any other register except to
save it to memory somewhere and restore it before returning to the
caller.
Problem: how can a subroutine call another subroutine? How can a
subroutine call itself?
219
Register-based Linkage 4
– Example: procedure to print linked list of ints.
head
5
7
4
1
.data
.set dta, 0
.set ptr, 4
head: .word 0
! offset in record to data
! offset in record to next pointer
.text
main: . . . .
set head, %r8
ld
[%r8], %r8
call trav
nop
. . . .
!
!
!
!
!
nil
does all init and allocation of list
prepare parameter to traverse proc
follow head pointer to first node
call subroutine
branch delay
trav: mov %r8, %r1
! copy pointer to %r1
loop: cmp %r1, 0
! check for null pointer
be done
! null pointer means we are done
nop
! branch delay
ld [%r1+dta], %r8 ! follow pointer and get data field
ta 4
! print data field
ld [%r1+ptr], %r1 ! get pointer to next record
ba loop
nop
! branch delay
done: retl
nop
CSE360
220
Parameter Passing 1
– Review of parameter passing mechanisms:




CSE360
Pass by value copy: parameters to subroutine are copies upon
which the subroutine acts.
Pass by result copy: parameters are copies of results produced
by the subroutine.
Pass by reference copy: parameters to subroutine are (copies
of) addresses of values upon which the subroutine acts. Callee
is responsible for saving each result to memory at the location
referred to by the appropriate parameter.
Hybrid: some parameters passed by value copy, some by result
copy, and/or some by reference copy. Callee is responsible for
saving results for reference parameters.
221
Parameter Passing 2
– Parameter passing notes:


Array or record parameters typically are passed by reference
copy (efficiency reasons). Primitive data types may be passed
either way.
Conventions among languages allows any language to call
functions in any other language:
– Pascal: VAR parameters are passed by reference copy; all others
are passed by value copy.
– C: all parameters are passed by value copy. Must explicitly pass
a pointer if you want a reference parameter.
– C++: like Pascal, can pass by value or reference copy.
– FORTRAN: all things passed by reference copy (even
constants).
– ADA: pass by value/result copy.
CSE360
222
Parameter Passing 3
.text
! Example 10.1 of Lab Manual
! pr_str – print a null terminated string
! Parameters: %r8 – pointer to string (initially)
!
! Temporaries: %r8 – the character to be printed
!
%r9 – pointer to string
!
pr_str:
mov %r8, %r9
! we need %r8 for the “ta 1” below
pr_lp:
ldub [%r9], %r8 ! load character
cmp %r8, 0
! check for null
be
pr_dn
nop
ta
1
! print character
ba
pr_lp
inc %r9
! increment the pointer (in
!
branch delay slot)
pr_dn:
retl
nop
CSE360
223
Parameter Passing 4

Summary from text (p. 220)
– Pass by value copy: For small “in” parameters. Subroutines
cannot alter the originals whose copies are passed as parameters.
– Pass by value/result copy: For small “in/out” parameters.
Caller’s cleanup sequence stores values of any “in/out”
parameters.
– Pass by reference copy: for “in/out” parameters of all sizes, and
large “in” parameters. “Out” values are provided by changing
memory at those addresses. (Note: pass by reference copy is
passing an address by value copy.)
CSE360
224
Parameter Passing 5
– Write Sparc code for the caller and callee for the
following subroutine using register based parameter
passing
!
!
!
!
!
!
!
!
!
!
!
!
!
!
global_function Integer subchr (A, B, C)
Substitutes character C for each B in string [A],
and returns count of changes.
// In comments, "[A+index]" is
index = 0
count = 0
LOOP: if [A+index]=0 go to END
if [A+index]B go to INC
[A+index]=C
count=count+1
INC: index=index+1
go to LOOP
END:
Assume
C_m:
B_m:
A_m:
R_m:
CSE360
denoted by "ch".
// while (ch != 0) {
//
if (ch == B) {
//
ch = C;
//
count++; }
//
index++;
// }
.data
! data section
.byte ’I’
! parameter C
.byte ’i’
! parameter B
.asciz "i will tip" ! parameter A
.align 4
.word 0
! for storing result count
225
Stack-based Linkage 1

Stack based linkage
– Advantages





Permits subroutines to call others.
Allows a larger number of parameters to be passed.
Permits records and arrays to be passed by value copy.
Saving of registers by callee is “built-in”.
A way for callee to reserve memory for other uses is “built-in”, too.
– Disadvantages


Slower than register based
More complex protocol
– Why a stack?

CSE360
Subroutine calls and returns happen in a last-in first-out order (LIFO).
Also known as a runtime stack, parameter stack, or subroutine stack.
226
Stack-based Linkage 2

Items “saved” on the stack
in one activation record
– Parameters to the
subroutine
– Old values of registers
used in the subroutine
– Local memory variables
used in subroutine
– Return value and return
address

CSE360
Say A() calls B(), B()
calls C(), and C() calls
A()
Runtime Stack
2nd stack
frame for A
1st stack
frame for C
1st stack
frame for B
1st stack
frame for A
Expanded View
Local variables
Saved general purpose
registers
Return addresses
Return values
Parameters
227
Stack-based Linkage 3
– Stack based linkage parameter passing
convention
Startup sequence:
Caller
– Push parameters
– Push space for return value

Prologue
Epilogue
– Restore general purpose registers
– Free local variable space
– Use return address to return

Body
tl

Cleanup
Sequence
Prologue
re
– Push registers that are changed
(including return address)
– Allocate space for local variables
Startup
Sequence
Callee
ca
ll

Epilogue
Cleanup Sequence
– Pop and save returned values
– Pop parameters
CSE360
228
Stack-based Linkage 4
– Stack based parameter passing example:

Register %r14  %sp  stack pointer
– Invariant: Always indicates the top of the stack (it has the
address in memory of the last item on stack, usually a word).
– Moved when items are “pushed” onto the stack.
– Due to interruptions (system interrupts (I/O) and exceptions),
values stored above %sp (at addresses less than %sp) can change
at any time! Hence, any access above %sp is unsafe!

Register %r30  %fp  frame pointer
– Indicates the previous stack pointer. Activation record is from
(some subroutine-specific number of words before) the %fp to
the %sp.
– Invariant: %fp is constant within a subroutine (after prologue).
CSE360
229
Stack-based Linkage 5
– Stack based parameter passing example:

!
!
!
!
!
!
!
!
!
!
!
!
!
!
Want to implement the following subroutine (also a caller):
global_function Integer subchr (A, B, C)
Substitutes character C for all B in string A,
and returns count of changes.
// In comments, "*(A+index)" is
index = 0
count = 0
LOOP: if *(A+index)=0 go to END
if *(A+index)B go to INC
*(A+index)=C
count=count+1
INC: index=index+1
go to LOOP
END:
denoted by "ch".
// while (ch != 0) {
//
if (ch == B) {
//
ch = C;
//
count++; }
//
index++;
// }
C_m:
B_m:
A_m:
R_m:
CSE360
.data
! data section
.byte ’I’
! parameter C
.byte ’i’
! parameter B
.asciz "i will tip" ! parameter A
.align 4
.word 0
! for storing result count
230
Stack-based Linkage 6
.data
! data section
.word ’I’
! parameter C
.word ’i’
! parameter B
.asciz "i will tip"
! parameter A
.align 4
! align to word address
stack: .skip 250*4
! allocate 250 word stack
bstak:
! point to bottom of stack
R_m:
.word 0
! reserve for count
.text
! Program’s one-time initialization
start:
set bstak, %sp
! set initial stack ptr
mov %sp, %fp
! set initial frame ptr
! STARTUP SEQUENCE to call subchr()
sub %sp, 16, %sp ! move stack ptr
set A_m, %r1
! A is passed by reference
st %r1, [%sp+4]
! push address on stack
set B_m, %r1
! B is passed by value
ld [%r1], %r1
! get value of B
st %r1, [%sp+8]
! push parameter B on stack
set C_m, %r1
! C is passed by value
ld [%r1], %r1
! get value of C
st %r1, [%sp+12] ! push parameter C on stack
! SUBROUTINE CALL
call subchr
! make subroutine call
nop
! branch delay slot
! CLEANUP SEQUENCE
ld [%sp], %r1
! pop return value off stack
add %sp, 16, %sp ! pop stack
set R_m, %r2
! get address of R
st %r1, [%r2]
! store R
. . .
! the rest of the program
C_m:
B_m:
A_m:
CSE360
stack:
%sp -> Return value
addr (a)
b
c
%fp ->
231
Stack-based Linkage 7
! SUBROUTINE PROLOGUE
subchr:
sub %sp, 32, %sp
st %fp, [%sp+28]
add %sp, 32, %fp
st %r15, [%fp-8]
st %r8, [%fp-12]
…
!
!
!
!
!
!
open 8 words on stack
Save old frame pointer
old sp is new fp
save return address
Save gen. Register
Save r9-r13, omitted
! SUBROUTINE BODY
ld_reg:
ld [%fp+4], %r8
ld [%fp+8], %r9
ld [%fp+12], %r10
clr %r12
clr %r13
loop:
ldub [%r8+%r13], %r11
cmp %r11, 0x0
be done
cmp %r11, %r9
bne inc
nop
stb %r10, [%r8+%r13]
add %r12, 1, %r12
inc:
add %r13, 1, %r13
ba loop
nop
done:
st %r12, [%fp+0]
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
“pop” (load) addr of A
“pop” (load) value of B
“pop” (load) value of C
%sp ->
count
index
load a string chr
is chr=null?
then go to done
is chr<>B? (branch delay)
then go to inc
branch delay slot
change chr to C
increment count
%fp ->
increment index
do next chr
branch delay slot
“push” (store) count on stack
!
!
!
!
!
!
!
Restore r9-r13, omitted
Restore r8
get saved return address
Get old value of frame ptr
Restore stack pointer
return to caller
branch delay slot
! EPILOGUE
CSE360
…
ld [%fp-12], %r8
ld [%fp-8], %r15
ld [%fp-4], %fp
add %sp, 32, %sp
retl
nop
...
%r9
%r8
return addr
old frame ptr
Return value
addr (a)
b
c
232
Stack-based Linkage 8

General Guidelines
– Keep Startups, Cleanups, Prologues, and Epilogues
standard (but not necessarily identical); easy to cut,
paste, and modify.
– Caller: leave space for return value on the TOP of the
stack.
– Callee: always save and restore locally used registers.
– Pass data structures and arrays by reference, all others
by value (efficiency).
CSE360
233
Our Fourth Example Architecture
Motorola M68HC11
 Called “HC11” for short
 Used in ECE 567, a course required of CSE
majors
 References:

– Data Acquisition and Process Control with the
M68HC11 Microcontroller, 2nd Ed., by F. F. Driscoll,
R. F. Coughlin, and R. S. Villanucci, Prentice-Hall,
2000.
– M68HC11 Processor Manual, on Carmen
CSE360
234
Another Reference
Late in an academic term (such as now), you can
hope to access on-line lecture notes from the
Electrical and Computer Engineering course,
ECE 265.
 Visit http://www.ece.osu.edu
 Under “Academic Program”, click on the link
“ECE Course Listings”.
 Find 265 and click on the link “Syllabus of this
quarter”.

CSE360
235
HC11 compared with Sparc (1)
HC11
Sparc
CISC
RISC, Load/Store
Instruction encoding lengths
vary (8 to 32 bits)
Instruction encoding lengths
constant (32 bits)
About 316 instructions
About 175 instructions
4 16-bit user registers, one of
which is divided into two 8bit registers
32 32-bit user integer
registers
CSE360
236
HC11 compared with Sparc (2)
HC11
Sparc
8-bit data bus
32-bit data bus
16-bit address bus
32-bit address bus
8-bit addressable
8-bit addressable
Instruction execution not
overlapped
Instruction execution
overlapped in a pipeline
CSE360
237
HC11 compared with Sparc (3)
A Strange Fact: The HC11 architecture “allows
accessing an operand from an external memory
location with no execution-time penalty.”
[p. 27, M68HC11 Processor Manual]
 Reason: The HC11 requirements state that the
CPU cycle must be kept long enough to
accommodate a memory access within one cycle.
This seeming miracle is accomplished by keeping
processor speed slow enough.

CSE360
238
HC11 Programmer’s Model (1)
7
0
0
7
Accumulator A
Accumulator B
Accumulator D
0
15
X Index Register
Y Index Register
Stack Pointer (SP)
Program Counter (PC)
CSE360
239
HC11 Programmer’s Model (2)
Condition Code Register (CCR)
7
6
5
4
3
2
1
0
S
X
H
I
N
Z
V
C
Carry/Borrow
Overflow
Zero
Negative
I Interrupt Mask
Half-Carry
X Interrupt Mask
Stop
CSE360
240
HC11 Assembly Language Format
(1)
Like Sparc, it is line-oriented.
 A line may:

– Be blank (containing no printable characters),
– Be a comment line, the first printable character being
either a semicolon (‘;’) or an asterisk (‘*’), or
– Have the following format (“[] means an optional
field”):
[Label] Operation [Operand field] [Comment field]
CSE360
241
HC11 Assembly Language Format
(2)

Label:
– begins in column 1, ending either with a space or a
colon (‘:’)
– Contains 1 to 15 characters
– Case sensitive
– The first character may not be a decimal digit (0-9)
– Characters may be upper- or lowercase letter, digits 09, period (‘.’), dollar sign (‘$’), or underscore (‘_’)
CSE360
242
HC11 Assembly Language Format
(3)

Operation:
– Cannot begin in column 1
– Contains:




Instruction mnemonic,
Assembler directive, or
Macro call (we haven’t studied macro expansion in this course)
Operand field:
– Terminated by a space or tab character,
– So multiple operands are separated by commas (‘,’)
without using any spaces or tabs
CSE360
243
HC11 Assembly Language Format
(4)

Comment field:
– Begins with the first space character following the
operand field (or following the operation, if there is no
operand field)
– So no special printable character is required to begin a
comment field
– But it appears to be conventional to begin a comment
field with a semicolon (‘;’)
CSE360
244
Prefixes for Numeric Constants
Encoding
Decimal
HC11
Sparc
No symbol
No symbol
Hexadecimal
$
0x
Octal
@
0
Binary
%
0b
CSE360
245
Assembler Directives (1)
Meaning
HC11
Sparc
Set location
counter (origin)
ORG
.data or .text
End of source
END
Doesn’t have
Equate symbol to a
value
EQU
.set
Form constant byte
FCB
.byte
CSE360
246
Assembler Directives (2)
Meaning
HC11
Sparc
Form double byte
FDB
.half
Form character
string constant
FCC
.ascii
Reserve memory
byte or bytes
RMB
.skip
CSE360
247
HC11 Addressing Modes
Immediate (IMM)
 Extended (EXT)
 Direct (DIR)
 Inherent (INH)
 Relative (REL)
 Indexed (INDX, INDY)

CSE360
248
Immediate (IMM)
Assembler interprets the # symbol to mean the
immediate addressing mode
 Examples

–
–
–
–
–
–
CSE360
LDAA
LDAA
LDAA
LDAA
LDAA
LDAA
#10
#$1C
#@17
#%11100
#’C’
#LABEL
249
Extended (EXT)
Lack of # symbol indicates extended or direct
addressing mode. These are forms of memory
direct addressing, like SAM.
 “Extended” means full 16-bit address, whereas
“Direct” means directly to a low address, specified
using only the least significant 8 bits of the
address.
 Examples

– LDAA $2025
– LDAA LABEL
CSE360
250
Direct (DIR)

Examples
– LDAA $C2
– LDAA LABEL
CSE360
251
Inherent (INH)
All operands are implicit (i.e., inherent in the
instruction)
 Examples: ABA, SBA, DAA
 ABA means add the contents of register B to the
contents of A, placing the sum in A (A + B  A)
 SBA means A – B  A
 DAA means to adjust the sum that got placed in A
by the previous instruction to the correct BCD
result; e.g., $09 + $26 yields $2F in A, then DAA
changes this to $35.

CSE360
252
Relative (REL)
Used only for branch instructions
 Relative to the address of the following instruction
(the new value of the PC)
 Signed offset from -128 to +127 bytes
 Examples

– BGE
– BHS
– BGT
CSE360
-18
27
LABEL
253
Indexed (INDX, INDY)
Uses the contents of either the X or Y register and
adds it to a (positive, unsigned) offset contained in
the instruction to calculate the effective address
 Example

– LDAA 4,X
CSE360
254
Interrupts

When an interrupt is acknowledged, the CPU’s
hardware saves the registers’ contents on the
stack. An interrupt service routine ends with a(n)
RTI instruction. This instruction automatically
restores the CPU register values from the copies
on the stack.
CSE360
255
Condition Code Register (CCR)
It’s reasonably safe to say that every instruction
that changes a register (A, B, D, X, Y, SP) affects
the CCR appropriately. Unlike Sparc, there are no
arithmetic instructions that do not set condition
codes.
 There do exist instructions that compare a register
to a memory location by subtracting the memory
contents from the register and throwing the result
away, but setting the CCR (CMPA, CMPB, CPD,
CPX, CPY).

CSE360
256
HC11 Condition Code Register

The H bit is turned on by an 8-bit addition
operation when there is a carry from the lowerorder nibble into the higher-order nibble, that is to
say, from bit 3 into bit 4.
1 000
0000 1111
+0000 1000
------------0001 0111
CSE360
257
HC11 Condition Code Register

The Z bit is turned on when the result is zero.
0000 0000

The N bit is turned on when the result is negative
according to the appropriately-sized 2's
complement encoding scheme.
1010 1010
CSE360
258
HC11 Condition Code Register

The V bit is turned on when, under the
appropriately-sized 2's complement interpretation
of the two source operands and the result, the
result is wrong.
0100
+ 1100
------0000
CSE360
2’s Comp Simple Binary
+4
+4
-4
+12
------Incorrec
0
0??
t so C-bit
Correct
so V-bit
is off
is on
259
HC11 Condition Code Register

The C bit is turned on when, under the simple
binary interpretation of the two source operands
and the result, the result is wrong.
0111
+ 0111
------1110
CSE360
2’s Comp Simple Binary
+7
+7
+7
+7
------Correct
-2??
14
so C-bit
Incorrec
t so V-bit
is on
is off
260
Example HC11 Program

Problem: Produce the following waveforms on the
three least significant bits (LSBs) of parallel 8-bit
output Port B (mapped to $1004), where we name
the bits X, Y, and Z in increasing order of
significance (X is bit 0; Y is bit 1; Z is bit 2).
10 ms
X
20 ms
Y
15 ms
Z
CSE360
261
Example Source File, p. 1
STACK: EQU
PORTB: EQU
$00FF
$1004
ORG 0
DELAY1: FCB 10
DELAY2: FCB 20
DELAY3: FCB 15
CSE360
; set stack pointer
; set address of Port B
; set the waveform times
; for X, Y, and Z
262
Example Source File, p. 2
ORG $E000
; program starts at $E000
MAIN: LDS #STACK ; initialize stack pointer
L0:
LDAA #1
; set X on Port B to 1
STAA PORTB
LDAB DELAY1 ; delay for 10 ms
L1:
JSR DELAY_1MS
DECB
BNE L1
CSE360
263
Example Source File, p. 3
LDAA #%00000010
STAA PORTB
LDAB DELAY2
L2: JSR DELAY_1MS
DECB
BNE L2
LDAA #%00000100
STAA PORTB
LDAB DELAY3
L3: JSR DELAY_1MS
DECB
BNE L3
BRA L0
CSE360
; set Y on Port B to 1
; delay for 20 ms
; set Z on Port B to 1
; delay for 15 ms
; continue to cycle
264
Example Source File, p. 4
DELAY_1MS: PSHB
LDAB #198
DELAY:
DECB
BRN DELAY
NOP
BNE DELAY
PULB
RETURN:
RTS
RESET:
CSE360
ORG
FDB
END
$FFFE
MAIN
; subr. to delay for 1 ms
; initialize reset vector
265
Traps and Exceptions 1

Traps, Exceptions, and Extended Operations
– Other side of low level programming -- the interface
between applications and peripherals
– OS provides access and protocols
CSE360
266
Traps and Exceptions 2
– BIOS: Basic Input/Output System



Subroutines that control I/O
No need for you to write them as application programmer
OS interfaces application with BIOS through traps (extended
operations (XOPs))
Applications
software
BIOS
Keyboard
CSE360
Screen
Mouse
Disk
267
Traps and Exceptions 3
– Where are OS traps kept? Two approaches:


Transient monitor: traps kept in a library that is copied into the
application at link-time
Appl 1
Appl 2
Appl 3
Appl 4
OS rtns
OS rtns
OS rtns
OS rtns
Resident monitor: always keep OS in main memory;
applications share the trap routines.
Appl 1
Appl 3
Appl 5
Appl 2
Appl 4
Appl 6
OS rtns

CSE360
OS routines monitor devices. Frequently used routines kept
resident; others loaded as needed.
268
Traps and Exceptions 4
– (Assuming a res. monitor) How to find I/O routines?

Store routines in memory, and make a call to a hard address.
E.g., call 256
– When new OS is released, need to recompile all application
programs to use different addresses.

Use a dispatcher
– Dispatcher is a subroutine that takes a parameter (the trap
number). Dispatcher knows where all routines actually are in
memory, and makes the branch for you. Dispatcher subroutine
must always exist in the same location.
BIOS 1
Application
Dispatcher
BIOS 12
BIOS n
CSE360
269
Traps and Exceptions 5

Use vectored linking
– Branch table exists at a well known location. The address of
each trap subroutine is stored in the table, indexed by the trap
number.
– On RISC, usually about 4 words reserved in the table. If the trap
routine is larger than 4 words, can call the actual routine.
CSE360
100
Addr of trap 0
100
104
Addr of trap 1
116
108
Addr of trap 2
132
100+4n
Addr of trap n
100+16n
270
Traps and Exceptions 6
– Levels of privilege





CSE360
Supervisor mode - can access every resource
User mode - limited access to resources
OS routines operate in supervisor mode, access is determined
by bit in PSW (processor status word).
XOP (book’s notation) can always be executed, sets privilege
to supervisor mode (ta)
RTX (book’s notation) can only be executed by the OS, and
returns privilege to user mode (rett)
271
Traps and Exceptions 7
– Exceptions



Caused by invalid use of resource. E.g., divide by zero, invalid
address, illegal operation, protection violation, etc.
Control transferred automatically to exception handler routine.
Similar to trap or XOP transfer.
Exceptions vs. XOPs
– XOPs explicit in code, exceptions are implicit
– XOPs service request and return to application; exceptions print
message and abort (unless masked).
– On SPARC, trap table has 256 entries.

CSE360
0-127 are reserved for exceptions and external interrupts. 128255 are used for XOPs. Trap table begins at address 0x0000.
Each entry is 4 instructions (16 bytes) long.
272
Traps and Exceptions 8
– Trap example: non-blocking read ta 3

If there is nothing in the keyboard buffer, return with a
message that nothing is there. Otherwise, put the character
into register 8.
– Status of the keyboard is kept in a memory location, as
is the (one-character) keyboard buffer. Memory
mapped devices.
! ta 3 returns character if one is there, otherwise
! it returns 0x8000000 into %r8
set
0x8000000, %r8 ! set default return val
set
KbdStatus, %r1 ! KbdStatus is memory loc
ld
[%r1],
%r1
! read status (1 is ready)
andcc %r1, 1, %r1
! check status
be
rtn
! can’t read anything
set
KbdBuff, %r1
! KbdBuff is memory loc
ld
[%r1],
%r8
! get character
rtn: rett
! return to caller
CSE360
273
Traps and Exceptions 9
– Trap execution: ta 3


Calculate trap address: 3 * 16 + 0x0800 = 16 * (3 + 0x080)
Save nPC and PSW to memory
– SPARC uses register windows
– Assumes local registers are available




Set privilege level to supervisor mode
Update PC with trap address (and make nPC = PC + 4) (jumps to trap
table)
Trap table has instruction ba ta3_handler
rett
– Restores PC (from saved nPC value) and PSW (resets to
user mode)
– Returns to application program
CSE360
274
Programmed I/O 1
Programmed
I/O
– Early approach: Isolated I/O
 Special
instructions to do input and output,
using two operands: a register and an I/O
address.
 CPU puts device address on address bus, and
issues an I/O instruction to load from or
store to the device.
CSE360
275
Programmed I/O 2
Isolated I/O
addr bus
data bus
Memory
read/write
CPU
addr bus
data bus
I/O
read/write
CSE360
276
Memory Mapped I/O


No special I/O instructions. Treat the I/O device like a
memory address. Hardware checks to see if the memory
address is in the I/O device range, and makes the adjustment.
Use high addresses (not “real” memory) for I/O memory maps.
E.g., 0xFFFF0000 through 0xFFFFFFFF.
memory
addr bus
unused
data bus
Memory
read/write
CPU
I/O
I/O
unused
CSE360
277
Programmed I/O 3
– Advantages of each
 Memory
mapped: reduced instruction set,
reduced redundancy in hardware.
 Isolated: don’t have to give up memory
address space on machines with little
memory
CSE360
278
Programmed I/O - UARTs

UARTs
– Universal Asynchronous Receiver Transmitter
Keyboard
01101010
serial
UART
parallel
0
1
1
0 CPU
.
.
0
– Asynchronous = not on the same clock.
– Handshake coordinates communication between two
devices.
– A kind of programmed I/O.
CSE360
279
UARTs 1

UART registers
– Control: set up at init, speed,
parity, etc.
– Status: transmit empty,
receive ready, etc.
– Transmit: output data
– Receive: input data
– All four needed for bidirectional communications,
– Status/control, transmit /
receive often combined.
Why?
CSE360
Control bus
Address bus
Control Reg
Status Reg
Transmit Reg
Receive Reg
Transmit
Logic
Receive
Logic
Data bus
280
UARTs 2

FFFF 0000
Memory mapped UARTs
– Both memory and I/O “listen” to
the address bus. The appropriate
device will act based on the
addresses.
– Keyboards and Printers require
three addresses (when addresses
are not combined).
– Modems require four.
– (why?)
Address bus
Control bus
CPU
Memory
UART1
UART 1 data
FFFF 0004 UART 1 status
FFFF 0008 UART 1 control
FFFF 000C
UART 2 xmit
FFFF 0010
UART 2 recv
FFFF 0014 UART 2 status
FFFF 0018 UART 2 control
FFFF 001C
UART 3 xmit
and so on
UART2
Data bus
CSE360
281
Programmed I/O 4

Programmed I/O Characteristics:
– Used to determine if device is ready (can it be read or
written).
– Each device has a status register in addition to the data
register.
– Like previous trap example, must check status before
getting data.
– Involves polling loops.
CSE360
282
Programmed I/O – Polling
Ex.: ta 2 handler (blocking keyboard input)
ta_2_handler:
set
KbdBuff,
%r1
set
KbdStatus, %r9
wait: ld
[%r9], %r10
andcc %r10, 1, %r10
be
wait
nop
ld
[%r1], %r8
rett



!
!
!
!
!
!
!
!
get addr of kbd buffer
get addr of kbd status
get status
check if ready
loop until ready
branch delay
get data
return from trap
Are you ready?...
Are you ready
now?...
How about NOW?...
Nope ..
Not
yet..
Hang
on..
Can’t afford to wait like this. Computer is millions
of times faster than a typist. Also, multi-tasking
operating systems can’t wait.
Special purpose computers can wait. E.g.,
microwave oven controllers.
Must have a better way! Interrupts are the answer!
CSE360
283
Interrupts and DMA transfers 1

Programmed (polled) I/O used busy waiting.
– Advantages: simpler hardware
– Disadvantages: wastes time

Interrupts (IRQs on PCs)
– I/O device “requests” service from CPU.
– CPU can execute program code until interrupted.
Solves busy waiting problems.
– Interrupt handlers are run (like traps) whenever an
interrupt occurs. Current application program is
suspended.
CSE360
284
Interrupts and DMA transfers 2
 Servicing
an interrupt
– I/O controller generates interrupt,
sets request line “high”.
– CPU detects interrupt at
beginning of fetch/execute cycle
(for interrupts “between”
instructions).
– CPU saves state of running
program, invokes intrpt. handler.
– Handler services request; sets the
request line “low”.
– Control is returned to the
application program.
CSE360
Application
Program
:
:
*Interrupt
Detected*
:
:
Interrupt
Handler
Service
Request
:
:
Clear
Interrupt
285
Interrupts and DMA transfers 3


Changes to fetch/execute cycle
Problems
– Requires additional hardware in
Timing & Control.
– Queuing of interrupts
– Interrupting an interrupt handler
(solution: priorities and maskable
interrupts)
– Interrupts that must be serviced
within an instruction
– How to find address of interrupt
handler
CSE360
Y
Interrupt
Pending?
N
Save PC
Save PSW
PSW=new PSW
PC=handler_addr
PC -> bus
load MAR
INC to PC
load PC
286
Interrupts and DMA transfers 4
 Example:
interrupt driven string output
– Want to print a string without busy waiting.
– Want to return to the application as fast as
possible
I’m
ready!
CSE360
287
Trap handler implementation

Install trap handler into trap table
– Buffer is like circular queue
– only outputs, at most, one character
disp_buf: .skip 256
disp_frnt: .byte 0
disp_bck: .byte 0
! buffers string to print
! offset to front of queue
! offset to back of queue
ta_6_handler:
! Copy str from mem[%r8] to mem[disp_buf+disp_bck]
! Disp_back = (disp_back+len(str)) mod 256
! If display is ready
!
If first char is not null, then output it
!
Disp_frnt = (disp_frnt+1) mod 256
rett
! Return from trap
CSE360
Disp_buf:
disp_frnt
Oldest
byte
Undisplayed
byte
newest
byte
disp_bck 
288
Interrupt handler implementation

This too outputs only one character at most, but when display becomes
ready again, it generates another interrupt which invokes this routine!
display_IRQ_handler:
! Save any registers used
! If disp_frnt != disp_bck (queue is not empty)
!
Get char at mem[disp_frnt]
!
If char is not null, then output it
!
Disp_frnt = (disp_frnt+1) mod 256
! Restore registers and set the request line “low”
rett
! Return from trap

I’m
ready!
Uses the UART for transmission.
CPU
Memory
CSE360
289
Interrupts and DMA transfers 5

Problems with interrupt driven I/O



CPU is involved with each interrupt
Each interrupt corresponds to transfer of a single byte
Lots of overhead for large amounts of data (blocks of 512 bytes)
Execute 10s or 100s
of instructions per byte
Memory
Transfer one
word of data
CSE360
CPU
Device
Controller
Interrupt
Transfer one byte of data
290
Interrupts and DMA transfers 6

DMA (Direct Memory Access)



Want I/O without CPU intervention
Want larger than one byte data transfers
Solution: add a new device that can talk to both I/O devices
and memory without the CPU; a “specialized” CPU strictly for
data transfers.
CPU
Device
Controller
Memory
DMA
Controller
CSE360
291
Interrupts and DMA transfers 7

Steps to a DMA transfer
– CPU specifies a memory address, the operation
(read/write), byte count, and disk block location to the
DMA controller (or specify other I/O device).
– DMA controller initiates the I/O, and transfers the data
to/from memory directly
– DMA controller interrupts the CPU when the entire
block transfer is completed.

Problem
– Conflicts accessing memory. Can either arbitrate
access or get a more expensive dual ported memory
system.
CSE360
292