oldchapter2part1

Download Report

Transcript oldchapter2part1

CS/COE0447
Computer Organization & Assembly
Language
Chapter 2 Part 1
1
Topics to cover in Chapter 2
•
•
•
•
MIPS operations and operands
MIPS registers
Memory view
Instruction encoding
• Arithmetic operations
• Logic operations
• Memory transfer operations
2
MIPS Operations/Operands
•
•
“Operation” (instruction)
“Operand”
•
MIPS operations
–
–
–
–
–
–
–
•
Arithmetic operations (integer/floating-point) (add, sub,…)
Logical operations (and, or,…)
Shift operations (shift a certain number of bits to the left or right)
Compare operations (do something if one operand is less than another,…)
Load/stores to transfer data from/to memory
Branch/jump operations
System control operations/coprocessor operations
MIPS operands
–
–
–
–
General-purpose registers
Fixed registers, e.g., HI/LO registers
Memory location
Immediate value
3
MIPS Arithmetic
rd
rs
rt
• <op> <rdestination> <rsource1> <rsource2>
• All arithmetic instructions have 3 operands
– Operand order is fixed: destination first
– 32 registers (page 2 of green card)
• Examples
– add $t0, $s0, $s2
– sub $s0, $t0, $t1
# $t0 = $s0 + $s2
# $s0 = $t0 – $t1
4
MIPS Registers
$zero
$at
$v0
$v1
$a0
$a1
$a2
$a3
$t0
$t1
$t2
$t3
$t4
$t5
$t6
$t7
32 bits
32 bits
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15
r16
r17
r18
r19
r20
r21
r22
r23
r24
r25
r26
r27
r28
r29
r30
r31
General-Purpose Registers
32 bits
$s0
$s1
$s2
$s3
$s4
$s5
$s6
$s7
$t8
$t9
$k0
$k1
$gp
$sp
$fp
$ra
HI
LO
PC
Special-Purpose Registers
5
General-Purpose Registers
• GPR: all can be used as operands in instructions
• Still, conventions and limitations exist to keep GPRs from being
used arbitrarily
–
–
–
–
r0, termed $zero, always has a value “0”
r31, termed $ra (return address), is reserved for subroutine call/return
Etc. (we’ll see others later)
Register usage and related software conventions are summarized in “application
binary interface” (ABI), which is important when writing system software such as
an assembler and a compiler
6
Instruction Encoding
• Instructions are encoded in binary numbers
– Assembler translates assembly programs into binary numbers
– Machine decodes binary numbers to figure out what the instruction is
– MIPS has “fixed” 32-bit instruction encoding
• MIPS has several instruction formats
–
–
–
–
R-format: arithmetic instructions
I-format: transfer/branch/immediate format
J-format: jump instruction format
(FI/FR-format: floating-point instruction format)(later chapter)
7
MIPS Instruction Formats
Name
Fields
Comments
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
All MIPS instructions 32 bits
R-format
op
rs
rt
rd
shamt
funct
Arithmetic/logic instruction format
I-format
op
rs
rt
J-format
op
address/immediate
target address
Transfer, branch, immediate format
Jump instruction format
8
R-Format Instructions
• Define “fields” of the following number of bits
each: 6 + 5 + 5 + 5 + 5 + 6 = 32
6
5
5
5
5
6
• For simplicity, each field has a name:
opcode
rs
rt
rd
shamt funct
For shift instructions:
“shift amount”
9
R-Format Example
• MIPS Instruction:
add
$8,$9,$10
Decimal number per field representation:
Binary number per field representation:
hex representation:
decimal representation:
On Green Card: Format in column 1, opcodes in column 3
(Let’s look and then come back)
10
M I P S Reference Data: CORE INSTRUCTION SET
NAME
MNEMON-IC
FORMAT
OPERATION (in
Verilog)
OPCODE/
FUNCT
(hex)
Add
add
R
R[rd] = R[rs] + R[rt] (1)
Add Immediate
addi
I
R[rt] = R[rs] +
SignExtImm (1)(2)
8hex
Branch On
Equal
beq
I
if(R[rs]==R[rt])
PC=PC+4+
BranchAddr (4)
4hex
0 / 20hex
Later (1) May cause overflow exception
(2) SignExtImm = { 16{immediate[15]}, immediate }
(3) ZeroExtImm = { 16{1b’0}, immediate }
(4) BranchAddr = { 14{immediate[15]}, immediate, 2’b0}
11
R-Format Instructions (REMINDER)
• Define “fields” of the following number of bits
each: 6 + 5 + 5 + 5 + 5 + 6 = 32
6
5
5
5
5
6
• For simplicity, each field has a name:
opcode
rs
rt
rd
shamt funct
12
R-Format Example
• MIPS Instruction:
add
$8,$9,$10
Decimal number per field representation:
Binary number per field representation:
Now let’s fill this in
13
R-Format Example
• MIPS Instruction:
add
$8,$9,$10
Decimal number per field representation:
0
9
10
8
0
32
Binary number per field representation:
000000 01001 01010 01000 00000 100000
hex representation:
decimal representation:
012A 4020hex
hex
19,546,144ten
14
I-Format Instructions
• Define “fields” of the following number of bits
each: 6 + 5 + 5 + 16 = 32
6
5
5
16
• For simplicity, each field has a name:
opcode
rs
rt
immediate
Let’s do an example using addi
15
M I P S Reference Data: CORE INSTRUCTION SET
NAME
MNEMON-IC
FORMAT
OPERATION (in
Verilog)
OPCODE/
FUNCT
(hex)
Add
add
R
R[rd] = R[rs] + R[rt] (1)
Add Immediate
addi
I
R[rt] = R[rs] +
SignExtImm (1)(2)
8hex
Branch On
Equal
beq
I
if(R[rs]==R[rt])
PC=PC+4+
BranchAddr (4)
4hex
0 / 20hex
(1) May cause overflow exception
(2) SignExtImm = { 16{immediate[15]}, immediate }
(3) ZeroExtImm = { 16{1b’0}, immediate }
(4) BranchAddr = { 14{immediate[15]}, immediate, 2’b0}
16
I-Format Example
• MIPS Instruction:
addi
$8,$9,7
Decimal number per field representation:
Binary number per field representation:
17
I-Format Example
• MIPS Instruction:
addi
$8,$9,7
Decimal number per field representation:
Binary number per field representation:
hex representation:
hex
FILL IN THESE VALUES!!!!!
18
M I P S Reference Data: CORE INSTRUCTION SET
NAME
MNEMON-IC
FORMAT
OPERATION (in
Verilog)
OPCODE/
FUNCT
(hex)
Add
add
R
R[rd] = R[rs] + R[rt] (1)
Add Immediate
addi
I
R[rt] = R[rs] +
SignExtImm (1)(2)
8hex
Branch On
Equal
beq
I
if(R[rs]==R[rt])
PC=PC+4+
BranchAddr (4)
4hex
0 / 20hex
(1) May cause overflow exception
(2) SignExtImm = { 16{immediate[15]}, immediate }
(3) ZeroExtImm = { 16{1b’0}, immediate }
(4) BranchAddr = { 14{immediate[15]}, immediate, 2’b0}
19
Executing the addi instruction
addi $8,$9,7
Suppose 
$8
$9 0x00000023
Instruction from earlier slide: format.
Immediate = 0x0007 FINISH THIS!!!!
So, you add 00000023
00000007
20
Logic Instructions
Name
R-format
Fields
op
rs
rt
Comments
rd
shamt
funct
Logic instruction format
• Bit-wise logic operations
• <op> <rdestination> <rsource1> <rsource2>
• Examples
–
–
–
–
and $t0, $s0, $s2
or $s0, $t0, $t1
nor $s0, $t0, $t1
xor $s0, $t0, $t1
# $t0 = $s0 ^ $s2
# $s0 = $t0 | $t1
# $s0 = ~($t0 | $t1)
# $s0 = $t0 ^ $t1
21
Dealing with Immediates
Name
I-format
•
rs
rt
16-bit immediate
Transfer, branch, immediate format
a = a + 1;
b = b – 4;
c = d | 0x04;
Some frequently used arithmetic/logic instructions have their “immediate”
version
–
–
–
–
•
op
Comments
Many operations involve small “immediate value”
–
–
–
•
Fields
addi $t0, $t1, 16
andi $t2, $t3, 254
ori $t5, $t6, 4
xori $t7, $t7, 16
# t0 <- $t1 + 16
# t2 <- $t3 & 0xfe
# t5 <- $t6 | 0x04
# t7 <- $t7 ^ 0x10
There is no “subi”; Why?
22
Long Immediates
• Sometimes we need a long immediate, e.g., 32 bits
– Do we need an instruction to load a 32-bit constant value to a register?
• MIPS requires that we use two instructions
– lui $t0, 1010101001010101
• Then we get
lower-order 16 bits
$t0 1010101001010101 0000000000000000
– ori $t0, $t0, 1100110000110011
$t0
1010101001010101
1100110000110011
23
Memory Transfer Instructions
• Only two instructions provided
– LOAD: move data from memory to register
• lw $t3, 4($t2)
# $t3 <- M[$t2 + 4]
– STORE: move data from register to memory
• sw $t4, 16($t1) # M[$t1 + 16] <- $t4
• Support for other data types than 32-bit word needed
– 16-bit half-word
• “short” type in C
• 16-bit processing is common in signal processing
• lh/sh (load half/store half)
– 8-bit byte
• “char” type in C
• 8-bit processing is common in controller applications
• lb/sb (load byte/store byte)
24
Address Calculation
• Memory address is specified with (register, constant)
– Register to keep “base address”
– Constant field to keep “offset”
– Address is “base address” + “offset”, i.e., register + constant
• The offset can be positive or negative
– It is written in terms of number of bytes for load/store instructions
• If no register needs to be used then use register 0
– Register 0 always stores value 0 (“hardwired”)
• If no offset, then offset is 0
• MIPS uses this simple address calculation method, but other
architectures such as PowerPC or x86 have different methods
25
Machine Code Example
void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
swap:
sll
add
lw
lw
sw
sw
jr
$t0, $a1, 2
$t1, $a0, $t0
$t3, 0($t1)
$t4, 4($t1)
$t4, 0($t1)
$t3, 4($t1)
$ra
26
Memory View
• Viewed as a large, single-dimension 8-bit array with an
address (“byte address”)
• A memory address is an index into the array
BYTE
1
BYTE
2
BYTE
3
BYTE
4
BYTE
5
BYTE
…
0
27
Memory Organization
• Words are aligned
– 2 least significant bits of address are 0s
0
WORD
4
WORD
8
WORD
12
WORD
16
WORD
20
WORD
…
• 232 bytes with byte addresses from 0 to 232
–1
• 230 words with byte addresses 0, 4, 8, …,
232 – 4
• Half-words are aligned
– LSB of address is 0
• Addressing within a word
– Which byte is first and which byte is last?
– Big-endian vs. Little-endian
MSB
0
0
LSB
1
2
MSB
0
3
3
LSB
2
1
0
28
Peer Instruction: $s3=i, $s4=j,
$s5=@A
Loop: addiu $s4,$s4,1
# j = j + 1
sll
addu
lw
slti
beq
addiu
slti
bne
$t1,$s3,2
$t1,$t1,$s5
$t0,0($t1)
$t1,$t0,10
$t1,$0, Loop
$s3,$s3,1
$t1,$t0, 0
$t1,$0, Loop
#
#
#
#
#
#
#
#
$t1 = 4 * i
$t1 = @ A[i] do j =
$t0 = A[i]
$t1 = $t0 < 10 while
goto Loop [not delayed]
i = i + 1
$t1 = $t0 < 0
goto Loop [not delayed]
j + 1
(______);
What C code properly fills in the blank in loop on right?
1:
2:
3:
4:
5:
6
A[i++]
A[i++]
A[i]
A[i++]
A[i]
>=
>=
>=
>=
>=
10
10
10
10
10
|
||
||
&&
A[i]
A[i++]
A[i]
A[i++]
<
<
<
<
0
0
0
0
None of the above
29
Instruction
Which Peer
instruction
has same representation as 35ten?
A. add $0, $0, $0
B. subu $s0,$s0,$s0
C. lw $0, 0($0)
D. addi $0, $0, 35
E. subu $0, $0, $0
F. Trick question!
Instructions are not numbers
• Use Green Card handout to answer
31
More on Alignment
A misaligned access
–
•
How do we define a word at address 3?
–
–
•
lw $t1, 3($zero)
Data 0, 1, 2, 3
• If you meant this, use address 0 instead of 3
Data 3, 4, 5, 6
• If you meant this, it is misaligned!
• Certain hardware may support this, but usually not
• Get a byte from address 3, a word from address 4
and manipulate data to get what you want
0
0
1
2
3
4
4
5
6
7
8
8
9
10 11
…
•
Alignment issue does not exist for byte
accesses
34
Shift Instructions
Name
R-format
Fields
op
NOT
USED
rt
Comments
rd
shamt
funct
shamt is “shift amount”
• Bit-wise logic operations
• <op> <rdestination> <rsource> <shamt>
• Examples
– sll $t0, $s0, 4
– srl $s0, $t0, 2
# $t0 = $s0 << 4
# $s0 = $t0 >> 2
• Shift amount can be in a register (“shamt” field not used)
• Shift right arithmetic (SRA) keeps the sign of a number
– sra $s0, $t0, 4
35
Control
• Decision-making instructions
– Alter the control flow, i.e., change the “next” instruction to be executed
• MIPS conditional branch instructions
– bne $t0, $t1, LABEL
– beq $t0, $t1, LABEL
• Example
if (i == h) h =i+j;
YES
NO
bne $s0, $s1, LABEL
add $s3, $s0, $s1
LABEL: …
(i == h)?
h=i+j;
LABEL:
36
Control, cont’d
• MIPS unconditional branch instruction (jump)
– j LABEL
• Example
– f, g, and h are in registers $s3, $s4, and $s5
bne $s4, $s5, ELSE
add $s3, $s4, $s5
j EXIT
ELSE: sub $s3, $s4, $s5
EXIT: …
if (i == h) f=g+h;
else f=g–h;
YES
NO
(i == h)?
f=g+h;
f=g–h
EXIT
37
Control, cont’d
• Loops
– “while” loops
• Example on page 74
– “for” loops
38
Control, cont’d
• We have beq and bne; what about branch-if-less-than?
– slt
if (s1<s2) t0=1;
else t0=0;
slt $t0, $s1, $s2
• Can you make a “pseudo” instruction “blt $s1, $s2,
LABEL”?
• Assembler needs a temporary register to do this
– $at is reserved for this purpose
39
Instruction Format
Branch/
Immediate
•
rt
16-bit immediate
The 16-bit immediate value is in signed, 2’s complement form
Addressing in branch instructions
–
–
–
•
rs
Address in the instruction is not a 32-bit number – it’s only 16-bit
–
•
op
The 16-bit number in the instruction specifies the number of “instructions” to be skipped
Memory address is obtained by adding this number to the PC
Next address = PC + 4 + sign_extend(16-bit immediate << 2)
Example
–
beq $s1, $s2, 100
4
17
18
25
40
Instruction Format, cont’d
Jump
op
26-bit immediate
• The address of next instruction is obtained by concatenating with PC
– Next address = {PC[31:28],IMM[25:0],00}
– Address boundaries of 256MB
• Example
– j 10000
2
2500
41
MIPS Addressing Modes
•
Immediate addressing (I-format)
op
•
•
op
rs
rt
op
rs
rt
immediate
rd
shamt
funct
immediate
Base addressing (load/store) [register + offset]
rs
rt
offset
PC-relative addressing (beq/bne) [PC + 4 + offset]
op
•
rt
Register addressing (R-/I-format)
op
•
rs
rs
rt
offset
Pseudo-direct addressing (j) [concatenation w/ PC]
op
26-bit address
42
Summary
43
Register Usage Convention
NAME
REG. NUMBER
USAGE
$zero
0
zero
$at
1
reserved for assembler usage
$v0~$v1
2~3
values for results and expression eval.
$a0~$a3
4~7
arguments
$t0~$t7
8~15
temporaries
$s0~$s7
16~23
temporaries, saved
$t8~$t9
24~25
more temporaries
$k0~$k1
26~27
reserved for OS kernel
$gp
28
global pointer
$sp
29
stack pointer
$fp
30
frame pointer
$ra
31
return address
44
Stack and Frame Pointers
•
Stack pointer ($sp)
–
–
–
–
•
Procedure frame
–
–
•
Keeps the address to the top of the
stack
$29 is reserved for this purpose
Stack grows from high address to low
Typical stack operations are
push/pop
Contains saved registers and local
variables
“Activation record”
Frame pointer ($fp)
–
–
–
–
Points to the first word of a frame
Offers a stable reference pointer
$30 is reserved for this
Some compilers don’t use $fp
45
“C Program” Down to “Numbers”
swap:
muli
add
lw
lw
sw
sw
jr
$2, $5, 4
$2, $4, $2
$15, 0($2)
$16, 4($2)
$16, 0($2)
$15, 4($2)
$31
void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
00000000101000010…
00000000000110000…
10001100011000100…
10001100111100100…
10101100111100100…
10101100011000100…
00000011111000000…
46
To Produce an Executable
source file
.asm/.s
assembler
object file
.obj/.o
source file
.asm/.s
assembler
object file
.obj/.o
source file
.asm/.s
assembler
object file
.obj/.o
linker
executable
.exe
library
.lib/.a
47
An Assembler
• Expands macros
– Macro is a sequence of operations conveniently defined by a user
– A single macro can expand to many instructions
• Determines addresses and translates source into binary numbers
–
–
–
–
Start from address 0
Record in “symbol table” addresses of labels
Resolve branch targets and complete branch instructions’ encoding
Record instructions that need be fixed after linkage
• Packs everything in an object file
• “Two-pass assembler”
– To handle forward references
48
An Object File
• Header
– Size and position of other pieces of the file
• Text segment
– Machine codes
• Data segment
– Binary representation of the data in the source
• Relocation information
– Identifies instructions and data words that depend on absolute addresses
• Symbol table
– Keeps addresses of global labels
– Lists unresolved references
• Debugging information
– Contains a concise description of the way in which the program was compiled
49
Assembler Directives
• Guides the assembler to properly handle following
codes with certain considerations
• .text
– Tells assembler that codes follow
• .data
– Tells assembler that data follow
• .align
– Directs aligning the following items
• .global
– Tells to treat the following symbol as global
• .asciiz
– Tells to handle the following as a “string”
50
Code Example
.text
.align
.globl
2
main
main:
subu
…
$sp, $sp, 32
lw
…
la
lw
jal
…
jr
$t6, 28($sp)
loop:
.data
.align
$a0, str
$a1, 24($sp)
printf
$ra
0
str:
.asciiz “The sum from 0 … 100 is %d\n”
51
Macro Example
.data
int_str:
.macro
.asciiz “%d”
.text
print_int($arg)
la $a0, int_str
mov $a1, $arg
jal printf
.end_macro
…
print_int($7)
la $a0, int_str
mov $a1, $7
jal printf
52
Linker
53
Procedure Calls
• Argument passing
– First 4 arguments are passed through $a0~$a3
– More arguments are passed through stack
• Result passing
– First 2 results are passed through $v0~$v1
– More results can be passed through stack
• Stack manipulations can be tricky and error-prone
• More will be discussed in recitations
54
SPIM
55
High-Level Lang. vs. Assembly
High-level language
Assembly
Example
C, Fortran, Java, …
-
+
High productivity
– Short description & readability
Portability
Low productivity
– Long description & low readability
Not portable
–
Limited optimization capability in certain
cases
With proper knowledge and experiences,
fully optimized codes can be written
56
Interpreter vs. Compiler
Interpreter
Compiler
Concept
Line-by-line translation and execution
Whole program translation and native
execution
Example
Java, BASIC, …
C, Fortran, …
+
Interactive
Portable (e.g., Java Virtual Machine)
High performance
–
Low performance
Binary not portable to other machines or
generations
57
Compiler Structure
• Compiler is a software with a variety of functions
implemented inside it
– Front-end
• Deals with high-level language constructs and translates
them into more relevant tree-like or list-format internal
representation (IR)
• Symbols (e.g., a[i+8*j]) are still available
– Back-end
• Back-end IR that more or less correspond to machine
instructions
• With more steps, IR becomes machine instructions
58
Compiler Structure, cont’d
• Front-end
– Scanning
• Takes the input source program and chops the programs into
recognizable “tokens”
• Also known as “lexical analysis” and “LEX” tool is used
– Parsing
• Takes the token stream, checks the syntax, and produces abstract
syntax trees
• “YACC” or “BISON” tools are used
– Semantic analysis
• Takes the abstract syntax trees, performs type checking, and builds
a symbol table
– IR generation
• Similar to assembly language program, but assumes unlimited
registers, etc.
59
Compiler Structure, cont’d
• Back-end
– Local optimization
• Optimizations within a basic block
• Common Sub-expression Elimination (CSE), copy propagation, constant
propagation, dead code elimination, …
– Global optimization
• Optimizations that deal with multiple basic blocks
– Loop optimizations
• Loops are so important that many compiler and architecture optimizations
target them
• Induction variable removal, loop invariant removal, strength reduction, …
– Register allocation
• Try to keep as many variables in registers as possible
– Machine-dependent optimizations
• Utilize any useful instructions provided by the target machine
60
Code Example (AST)
while (save[i] == k)
i+=1;
61
Code Example (IR)
62
Code Example (CFG)
63
Code Example (CFG), cont’d
64
Code Example (Reg. Alloc.)
65
Compiler Example
• gcc (GNU open-source C Compiler)
– cpp: C pre-processor
• Macro expansion (#define)
• File expansion (#include)
• Handles other directives (#if, #else, #pragma)
– cc1: C compiler (front-end & back-end)
• Compiles your C program
– gas: assembler
• Assembles compiler-generated assembly codes
– ld: linker
• Links all the object files and library files to generate an
executable
66
Optimization and Compile Time
• Optimizations are a must
– Code quality is much improved - 2 ~ 10 times improvement is not
uncommon
• Turning on optimizations can incur a significant compile
time overhead
– For a big, complex software project, compile time is an issue
– Not that serious in many cases as compiling is a rare event compared to
program execution
– Considering the resulting code quality, you pay this fee
– Now, computers became fast!
67
Building a Compiler
•
A modern compiler is a very complex software tool
–
–
•
Traditional tools to create a compiler
–
–
•
LEX
YACC/BISON
Start from existing open-source compilers
–
–
–
•
Need a very careful design of data structures
All good software engineering techniques applied
LCC
GCC
There are many other research prototypes!
If you are interested in compiler technologies
–
–
CS1621 – Structure of Programming Languages
CS1622 – Introduction to Compiler Design
68
Summary
•
Operations and operands
•
In MIPS assembly language
–
–
•
Some lessons
–
–
•
Registers replace C variables
One instruction (simple operation) per line
Simpler is better
• e.g., A few instruction formats vs. many complex instruction formats
Smaller is faster
• e.g., 32 GPRs vs. 128 GPRs
Instructions
–
–
–
Arithmetic
Logic
Memory transfer
69