Chapter 2 SPARC ARCHITECTURE

Download Report

Transcript Chapter 2 SPARC ARCHITECTURE

CSC 3210
Computer Organization and
Programming
Chapter 2
SPARC Architecture
Dr. Anu Bourgeois
1
Introduction
• SPARC is a load/store architecture
• Registers used for all arithmetic and
logical operations
• 32 registers available at a time
• Uses only load and store
instructions to access memory
2
Registers
• Registers are accessed directly for
rapid computation
• 32 registers – divided into 4 sets
-- Global: %g0-%g7
-- In: %i0 - %i7
-- Out: %o0 - %o7
-- Local: %l0 - %l7
• %g0 – always returns 0
• %o6, %o7, %i6, %i7 – do not use
• Register size = 32 bits each
3
Table of Registers
Global registers
Register
Synonym
Out registers
Register
Synonym
Local registers
Register
Synonym
Out registers
Register
Synonym
%g0*
%r0
%o0
%r8
%l0
%r16
%i0
%r24
%g1
%r1
%o1
%r9
%l1
%r17
%i1
%r25
%g2
%r2
%o2
%r10
%l2
%r18
%i2
%r26
%g3
%r3
%o3
%r11
%l3
%r19
%i3
%r27
%g4
%r4
%o4
%r12
%l4
%r20
%i4
%r28
%g5
%r5
%o5
%r13
%l5
%r21
%i5
%r29
%g6
%r6
%o6
%r14,
%sp
%l6
%r22
%i6,
%fp
%r30
%g7
%r7
%o7#
%r15
%l7
%r23
%i7^
%r31
* -- Always discards writes and returns zero
# -- Called subroutine return address
^ -- Subroutine return address
4
SPARC Assembler
• SPARC assembler as: 2-pass assembler
• First pass:
– Updates location counter without paying
attention to undefined labels for operands
– Defines label symbol to location counter
• Second pass:
– Values substituted in for labels
– Ignores labels followed by colons
5
Assembly Language Programs
• Programs are line based
• Use mnemonics which generate
machine code upon assembling
• Statements may be labeled
• Comments: ! or /* … */
/* instructions to add and to subtract the
contents of %o0 and %o1 */
start:
add
sub
%o0, %o1, %l0
%o0, %o1, %l1
!l0=o0+o1
!l1=o0-o1
6
Psuedo-ops
• Statements that do not generate
machine code
– e.g. Data defininitions, statements to provide the
assembler information
• Generally start with a period
a:
.word
3
• Can be labeled
.global main
main:
7
Compiling Code – 2 step
process
• C compiler will call as and produce
the object files
• Object files are the machine code
• Next calls the linker to combine .o
files with library routines to produce
the executable program – a.out
8
Compiling a C program
%gcc -S program.c : produces the
.s assembly language file
%gcc expr.s –o expr : assembles
the program and produces the executable
file
NOTE: You will only do this
for the 1st assignment
9
Start of Execution
• C compiler expects to start execution at
an address main
• The label must be at the first statement
to execute and declared to be global
main:
.global main
save %sp, -96, %sp
• save instruction provides space to
save registers for the debugger
10
Macros
• If we have macros defined, then the
program should be a .m file
• We can expand the macros to produce
a .s file by running m4 first
% m4 expr.m > expr.s
% gcc expr.s –o expr
11
SPARC Instructions
• 3 operands: 2 source operands and 1
destination operand
• Source registers are unchanged
• Result stored in destination register
• Constants : -4096 ≤ c < 4096
op
op
regrs1, regrs2, regrd
regrs1, imm, regrd
12
Sample Instructions
clr regrd
 Clears a register to zero
mov reg_or_imm, regrd
 Copies content of source to destination
add regrs1, reg_or_imm, regrd
 Adds oper1 + oper2  destination
sub regrs1, reg_or_imm, regrd
 Subtracts oper1 - oper2  destination
13
Multiply and Divide
• No instruction available in SPARC
• Use function call instead
• Must use %o0 and %o1 for sources and
%o0 holds result
mov
b, %o0
mov
c, %o1
call
.mul
a = b * c
mov
b, %o0
mov
c, %o1
call
.div
a = b ÷ c
14
Instruction Cycle
• Instruction cycle broken into 4 stages:
Instruction fetch
Fetch & decode instruction, obtain any
operands, update PC
Execute
Execute arithmetic instruction, compute
branch target address, compute memory
address
Memory access
Access memory for load or store
instruction; fetch instruction at target of
branch instruction
Store results
Write instruction results back to
register file
15
Pipelining
• SPARC is a RISC machine – want to
complete one instruction per cycle
• Overlap stages of different instructions
to achieve parallel execution
• Can obtain a speedup by a factor of 4
• Hardware does not have to run 4 times
faster – break h/w into 4 parts to run
concurrently
16
Pipelining
• Sequential: each h/w stage idle 75% of the time.
timeex = 4 * i
• Parallel: each h/w stage working after filling the
pipeline. timeex = 3 + i
17
Data Dependencies – Load
Delay Problem
load
add
[%o0], %o1
%o1, %o2, %o2
18
Branch Delay Problem
• Branch target address not available until after
execution of branch instruction
• Insert branch delay slot instruction
19
Branch delays
• Try to place an instruction after the
branch that is useful – can also use
nop
• The instruction following a branch
instruction will always be fetched
• Updating the PC determines which
instruction to fetch next
20
cmp
bg
mov
sub
%l0, %l1
next
%l2, %l3
%l3, 20, %l4
Condition true:
branch to next
Condition false:
continue to sub
cmp
bg
F E M W
F E M W
mov
F E M W
???
bg
execute
F E M W
Determine
if branch
taken
Update if
true
Target 
PC
mov
fetch
Fetch instruction
from
memory[PC]
Update PC
PC++
Obtain
operands
21
Actual SPARC Code: expr.m
22
Expanding Macros
• After running through m4: %m4
• Produce executable: %gcc
• Execute file: %./expr
expr.m > expr.s
expr.s – expr
23
The Debugger – gdb
• Used to verify correctness, and find bugs
• Can also execute a program, stop execution at
any point and single-step execution
• After assembling the program and placing the
output into expr, launch gdb: %gdb expr
• To run code in gdb, type “r”:
(gdb) r
24
gdb Commands
• Can be set at any address to stop execution in order
to check status of program and registers
• To set a breakpoint at a label:
(gdb) b main
Breakpoint 1 at 0x106a8
(gdb)
• Typing “c” continues execution until it reaches the
next breakpoint or end of code
• Can print contents of a register
(gdb) p $l1
$2 = -8
(gdb)
• Best way to learn is by practice
25
Filling Delay Slots
• The call instruction is called a delayed control transfer
instruction : changes address from where future instructions
will be fetched
• The following instruction is called a delayed instruction, and is
located in the delay slot
• The delayed instruction is executed before the branch/call
happens
• By using a nop for the delay slot – still wasting a cycle
• Instead, we may be able to move the instruction prior to the
branch instruction into the delay slot.
26
Filling Delay Slots
• Move sub instructions to the delay
slots to eliminate nop instructions
.global main
main:
save
mov
sub
call
sub
%sp, -96, %sp
9, %l0
%l0, 1, %o0
.mul
%l0, 7, %o1
call
sub
mov
.div
%l0, 11, %o1
%o0, %l1
ret
restore
!initialize x
!(x - 1) into %o0
!(x - 7) into %o1
!(x - 11) into %o1, the divisor
!store it in y
! end the program
27
Filling Delay Slots
• Executing the mov instruction, while
fetching the sub instruction
.global main
main:
EXECUTE 
FETCH

save
mov
sub
call
sub
%sp, -96, %sp
9, %l0
%l0, 1, %o0
.mul
%l0, 7, %o1
call
sub
mov
.div
%l0, 11, %o1
%o0, %l1
ret
restore
!initialize x
!(x - 1) into %o0
!(x - 7) into %o1
!(x - 11) into %o1, the divisor
!store it in y
! end the program
28
Filling Delay Slots
• Now executing the sub instruction,
while fetching the call instruction
.global main
main:
EXECUTE 
FETCH 
save
mov
sub
call
sub
%sp, -96, %sp
9, %l0
%l0, 1, %o0
.mul
%l0, 7, %o1
call
sub
mov
.div
%l0, 11, %o1
%o0, %l1
ret
restore
!initialize x
!(x - 1) into %o0
!(x - 7) into %o1
!(x - 11) into %o1, the divisor
!store it in y
! end the program
29
Filling Delay Slots
• Now executing the call instruction, while
fetching the sub instruction
.global main
main:
save
mov
sub
EXECUTE  call
FETCH
 sub
call
sub
mov
ret
restore
%sp, -96, %sp
9, %l0
%l0, 1, %o0
.mul
%l0, 7, %o1
.div
%l0, 11, %o1
%o0, %l1
!initialize x
!(x - 1) into %o0
!(x - 7) into %o1
!(x - 11) into %o1, the divisor
!store it in y
! end the program
• Execution of call will update the PC to fetch from mul
routine, but since sub was already fetched, it will be executed
30
before any instruction from the mul routine
Filling Delay Slots
• Now executing the sub instruction, while fetching from the
mul routine
.global main
main:
EXECUTE 
save
mov
sub
call
sub
%sp, -96, %sp
9, %l0
%l0, 1, %o0
.mul
%l0, 7, %o1
call
sub
mov
.div
%l0, 11, %o1
%o0, %l1
ret
restore
……
!initialize x
!(x - 1) into %o0
!(x - 7) into %o1
!(x - 11) into %o1, the divisor
!store it in y
! end the program
.mul:
FETCH
 save …..
……
31
Filling Delay Slots
• Now executing the save instruction, while fetching the next
instruction from the mul routine
.global main
main:
save
mov
sub
call
sub
%sp, -96, %sp
9, %l0
%l0, 1, %o0
.mul
%l0, 7, %o1
call
sub
mov
.div
%l0, 11, %o1
%o0, %l1
ret
restore
……
!initialize x
!(x - 1) into %o0
!(x - 7) into %o1
!(x - 11) into %o1, the divisor
!store it in y
! end the program
.mul:
EXECUTE  save …..
FETCH
 ……
32
Filling Delay Slots
• While executing the last instruction of the mul routine, will
come back to main and fetch the call .div instruction
.global main
main:
save
mov
sub
call
sub
FETCH
 call
sub
mov
%sp, -96, %sp
9, %l0
%l0, 1, %o0
.mul
%l0, 7, %o1
.div
%l0, 11, %o1
%o0, %l1
ret
restore
……
!initialize x
!(x - 1) into %o0
!(x - 7) into %o1
At this point %o0 has the
result from the multiply
routine – this is the first
operand for the divide
routine
!(x - 11) into %o1, the divisor
!store it in y
The subtract instruction
will compute the 2nd
! end the program operand before starting
execution of the divide
routine
.mul:
EXECUTE 
save …..
……
33
2.9
Branching
Instructions for testing and branching:
2.9.1
Testing
The information about the state of execution of an instruction
is saved in the following flags:
Z zero
whether the result was zero
N negative
whether the result was negative
V overflow
whether the result was too large for the
register
C carry
whether the result generated a carry out
Special add and sub instructions:
‘cc’ is appended to the mnemonic, and the instruction sets
condition codes Z, N, V, and C to save the state of
execution.
E.g.
addcc
subcc
regrs1, reg_or_imm, regrd
regrs1, reg_or_imm, regrd
34
2.9.2 Branches
•
Branch instructions are similar to call instructions.
•
They will specify the label of the destination
instruction.
•
These too are delayed control transfer instructions.
Branch instructions test the condition codes in order t
determine if the branching condition exists:
b_{icc}
label
where bicc stands for one of the branches testing the integer
condition codes.
35
Table of signed number branches
Assembler
Mnemonic
Unconditional
Branches
ba
Branch always, goto
bn
Branch never
Assembler
Mnemonic
Signed Arithmetic
Branches
bl
Branch on less than zero
ble
Branch on less or equal to zero
be
Branch on equal to zero
bne
Branch on not equal to zero
bge
Branch on greater or equal to zero
bg
Branch on greater than zero
36
37
38
39
40
41
2.10
Control statements
2.10.1
While :
The condition of a while loop is to be evaluated before
the loop is executed, and if the condition is not met, the
loop, including the first instruction of the loop, is not to
be executed.
Consider the C equivalent of the while loop:
While ( a <= 17)
{
a = a += b;
c++;
}
42
43
44
45
46
Annulled Conditional Branches:
-
Branch is taken if condition is true, otherwise, if condition is
false, then branch is annulled
Delay slot is still fetched in either case, but the execution is
what is annulled, causing a wasted cycle when false
47
2.10.2
Do
Consider a Do loop:
48
49
2.10.3 For
For structure in C:
For ( ex1; ex2;, ex3 ) st
Express the above definition as:
ex1;
While ( ex2 ) {
st
ex3
}
50
Thus the translation of
for (a=1; a<= b; a++)
c *= a;
would be:
51
2.10.4
If Then
The statement following the relational expression is to be branched
over if the condition is not true. To accomplish this, we need to
logically complement the sense of the branch, following the
relational expression evaluation, before the code for the statement.
Table of complements of the branches
Condition
Complement
bl
bge
ble
bg
be
bne
bne
be
bge
bl
bg
ble
52
For example, to translate
53
54
2.10.5
If Else
An if-else statement allows us to do a letter with regard to filling the
delay slot.
Consider:
If ((a+b) >= c) {
a += b;
c++;
} else {
a -= b;
C--;
}
C += 10;
55
We will complement initial test to branch over and then code to the
else code if the condition is false.
56
57
58