Ch2 指令系统与汇编语言程序设计
Download
Report
Transcript Ch2 指令系统与汇编语言程序设计
Instruction Set &
Assembly Language Programming
Jianjian SONG
Software Institute, Nanjing
University
Content
Computer Architecture Taxonomy
ARM Architecture Introduction
ARM Instruction Set
ARM Assembly Language Programming
1. Computer Architecture
Taxonomy
What is architecture?
Architecture & Organization 1
Architecture is those attributes visible to the
programmer
Instruction set, number of bits used for data
representation, I/O mechanisms, addressing
techniques.
e.g. Is there a multiply instruction?
Organization is how features are implemented
Control signals, interfaces, memory technology.
e.g. Is there a hardware multiply unit or is it done
by repeated addition?
Architecture & Organization 2
All Intel x86 family share the same
basic architecture
The IBM System/370 family share the
same basic architecture
This gives code compatibility
At least backwards
Organization differs between different
versions
von Neumann architecture
Memory holds data, instructions.
Central processing unit (CPU) fetches
instructions from memory.
Separate CPU and memory distinguishes
programmable computer.
CPU registers help out: program
counter (PC), instruction register (IR),
general-purpose registers, etc.
CPU + memory
address
memory
data
200
PC
CPU
200
ADD r5,r1,r3
ADD IR
r5,r1,r3
Harvard architecture
address
data memory
data
address
program memory
data
PC
CPU
von Neumann vs. Harvard
Harvard can’t use self-modifying code.
Harvard allows two simultaneous
memory fetches.
Most DSPs use Harvard architecture for
streaming data:
greater memory bandwidth;
more predictable bandwidth.
RISC vs. CISC
Complex instruction set computer
(CISC):
many addressing modes;
many operations.
Reduced instruction set computer
(RISC):
load/store;
pipelinable instructions.
Load-store Architecture
指令集仅能处理(如ADD、SUB等)寄存器中(或
指令中直接指定)的值,而且总是将处理结果
放回寄存器中。针对存储器的唯一操作是将存
储器的值装入寄存器(load指令),或将寄存器
的值存到存储器(store指令)。
相比较,典型的CISC处理器允许将存储器中的
值加(ADD)到寄存器,有时还允许将寄存器的
值加(ADD)到存储器中。
Instruction set characteristics
Fixed vs. variable length.
Addressing modes.
Number of operands.
Types of operands.
Programming model
Programming model: registers visible to
the programmer.
Some registers are not visible (e.g. IR).
Multiple implementations
Successful architectures have several
implementations:
varying clock speeds;
different bus widths;
different cache sizes;
etc.
2. ARM Architecture Introduction
ARM (Advanced RISC Machines)
ARM公司是一家设计公司,是IP 供应商,
靠转让设计许可证由合作伙伴生产各具
特色的芯片。
What is IP?Intellectual Property
ARM的特点
ARM具有RISC体系的一般特点:
大量寄存器
绝大多数操作都在寄存器中进行,通过Load/Store
的在内存和寄存器间传递数据。
寻址方式简单
采用固定长度的指令格式
此外,
小体积、低功耗、低成本、高性能
16位/32位双指令集
全球众多合作伙伴
ARM体系结构的版本和扩充
六个版本
ARMv1 ~ ARMv6
ARM体系结构的扩充
Thumb (T variant): 16位指令集,用以改善
指令密度;
DSP (E variant): 用于DSP应用的算术运算指
令集;
Jazeller (J variant): 允许直接执行Java字节
什么是指令密度?
码
执行同等操作序列的前提下,单位内存空间所容纳的机器指令数。
ARM体系结构版本的命名格式
命名字符串:
ARM
vx (x: 指令集版本号,1~6)
表示变种的字符 (如 T, E, J )
用字符x表示排除某种写功能。
ARM处理器系列
ARM7系列
ARM9系列
ARM9E系列
ARM10系列
SecureCore系列
Intel StrongARM
Intel XScale
3. ARM Instruction Set
ARM
ARM
ARM
ARM
ARM
assembly language
programming model
memory organization
data operations
flow of control
Assembly language
Why assembly language?
One-to-one with instructions (more or less).
Basic features:
One instruction per line.
Labels provide names for addresses (usually in
first column).
Instructions often start in later columns.
Columns run to end of line.
ARM assembly language example
label1 ADR
LDR
ADR
LDR
SUB
r4,c
r0,[r4] ; a comment
r4,d
r1,[r4]
r0,r0,r1 ; comment
ARM指令的一般编码格式
31 28 27 26 25 24
cond 00
21 20 19
X opcode S
16 15
Rn
12 11
Rd
Shifter-operand
opcode: 指令操作符编码
cond: 指令执行条件编码
S: 指令的操作是否影响CPSR的值
Rn: 包含第一个操作数的寄存器编码
Rd: 目标寄存器编码
Shifter_operand: 第二个操作数
0
ARM指令的基本寻址方式
寄存器寻址
; (R3)+2→R3
例:LDR R0 , [R3]
; ((R3))→R0
寄存器变址
例:ADD R3 , R3 , #2
寄存器间接寻址
; (R1)+(R2)→R0
立即数寻址
例:ADD R0 , R1 , R2
例:LDR R0 , [R1, #4]
; ((R1)+4)→R0
相对寻址
例:B rel
; (PC)+rel→PC
Pseudo-ops
Some assembler directives don’t
correspond directly to instructions:
Define current address.
Reserve storage.
Constants.
ARM programming model
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15 (PC)
0
31
CPSR
NZCV
Endianness
Relationship between bit and byte/word
ordering defines endianness:
bit 31
bit 0
byte 3 byte 2 byte 1 byte 0
little-endian
bit 0
bit 31
byte 0 byte 1 byte 2 byte 3
big-endian
ARM data types
Word is 32 bits long.
Word can be divided into four 8-bit
bytes.
ARM addresses can be 32 bits long.
Address refers to byte.
Address 4 starts at byte 4.
Can be configured at power-up as
either little- or big-endian mode.
ARM status bits
Every arithmetic, logical, or shifting
operation sets CPSR bits:
N (negative), Z (zero), C (carry), V
(overflow).
Examples:
-1 + 1 = 0: NZCV = 0110.
231-1+1 = -231: NZCV = 0101.
Instructions Overview
Data instructions
Move Instructions
Load/Store instructions
Comparison instructions
Branch instructions
ARM data instructions
Basic format:
ADD r0,r1,r2
Computes r1+r2, stores in r0.
Immediate operand:
ADD r0,r1,#2
Computes r1+2, stores in r0.
ARM data instructions
ADD, ADC : add (w.
carry)
SUB, SBC : subtract
(w. carry)
RSB, RSC : reverse
subtract (w. carry)
MUL, MLA : multiply
(and accumulate)
AND, ORR, EOR
BIC : bit clear
LSL, LSR : logical shift
left/right
ASL, ASR : arithmetic
shift left/right
ROR : rotate right
RRX : rotate right
extended with C
Data operation varieties
Logical shift:
Arithmetic shift:
fills with zeroes.
fills with ones.
RRX performs 33-bit rotate, including C
bit from CPSR above sign bit.
ARM move instructions
MOV, MVN : move (negated)
MOV r0, r1 ; sets r0 to r1
ARM load/store instructions
LDR, LDRH, LDRB : load (half-word,
byte)
STR, STRH, STRB : store (half-word,
byte)
Addressing modes:
register indirect : LDR r0,[r1]
with second register : LDR r0,[r1,-r2]
with constant : LDR r0,[r1,#4]
ARM comparison instructions
CMP : compare
CMN : negated compare
TST : bit-wise test
TEQ : bit-wise negated test
These instructions set only the NZCV
bits of CPSR.
ARM branch instructions
B: Branch
BL: Branch and Link
ARM ADR pseudo-op
Cannot refer to an address directly in an
instruction.
Generate value by performing
arithmetic on PC.
ADR pseudo-op generates instruction
required to calculate address:
ADR r1,FOO
Example: C assignments
C:
x = (a + b) - c;
Assembler:
ADR
LDR
ADR
LDR
ADD
ADR
LDR
r4,a
r0,[r4]
r4,b
r1,[r4]
r3,r0,r1
r4,c
r2,[r4]
;
;
;
;
;
;
;
get address for a
get value of a
get address for b, reusing r4
get value of b
compute a+b
get address for c
get value of c
C assignment, cont’d.
SUB r3,r3,r2
ADR r4,x
STR r3,[r4]
; complete computation of x
; get address for x
; store value of x
Example: C assignment
C:
y = a*(b+c);
Assembler:
ADR
LDR
ADR
LDR
ADD
ADR
LDR
r4,b ; get address for b
r0,[r4] ; get value of b
r4,c ; get address for c
r1,[r4] ; get value of c
r2,r0,r1 ; compute partial result
r4,a ; get address for a
r0,[r4] ; get value of a
C assignment, cont’d.
MUL r2,r2,r0 ; compute final value for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y
Example: C assignment
C:
z = (a << 2) |
(b & 15);
Assembler:
ADR
LDR
MOV
ADR
LDR
AND
ORR
r4,a ; get address for a
r0,[r4] ; get value of a
r0,r0,LSL 2 ; perform shift
r4,b ; get address for b
r1,[r4] ; get value of b
r1,r1,#15 ; perform AND
r1,r0,r1 ; perform OR
C assignment, cont’d.
ADR r4,z ; get address for z
STR r1,[r4] ; store value for z
Additional addressing modes
Base-plus-offset addressing:
LDR r0,[r1,#16]
Loads from location r1+16
Auto-indexing increments base register:
LDR r0,[r1,#16]!
Post-indexing fetches, then does offset:
LDR r0,[r1],#16
Loads r0 from r1, then adds 16 to r1.
ARM flow of control
All operations can be performed
conditionally, testing CPSR:
EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE,
LT, GT, LE
Branch operation:
B #100
Can be performed conditionally.
Example: if statement
C:
if (a < b) { x = 5; y = c + d; } else x = c d;
Assembler:
; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BGE fblock ; if a >= b, branch to false block
If statement, cont’d.
; true block
MOV r0,#5 ; generate value for x
ADR r4,x ; get address for x
STR r0,[r4] ; store x
ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y
ADR r4,y ; get address for y
STR r0,[r4] ; store y
B after ; branch around false block
If statement, cont’d.
; false block
fblock ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute a-b
ADR r4,x ; get address for x
STR r0,[r4] ; store value of x
after ...
Example: Conditional instruction
implementation
; true block
MOVLT r0,#5 ; generate value
ADRLT r4,x ; get address for
STRLT r0,[r4] ; store x
ADRLT r4,c ; get address for
LDRLT r0,[r4] ; get value of
ADRLT r4,d ; get address for
LDRLT r1,[r4] ; get value of
ADDLT r0,r0,r1 ; compute y
ADRLT r4,y ; get address for
STRLT r0,[r4] ; store y
for x
x
c
c
d
d
y
Example: switch statement
C:
switch (test) { case 0: … break; case 1: … }
Assembler:
ADR r2,test ; get address for test
LDR r0,[r2] ; load value for test
ADR r1,switchtab ; load address for switch
table
LDR r15,[r1,r0,LSL #2] ; index switch table
switchtab DCD case0
DCD case1
...
Example: FIR filter
C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];
Assembler
; loop initiation code
MOV r0,#0 ; use r0 for I
MOV r8,#0 ; use separate index for arrays
ADR r2,N ; get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f
FIR filter, cont’.d
ADR r3,c ; load r3 with base of c
ADR r5,x ; load r5 with base of x
; loop body
loop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
ADD r8,r8,#4 ; add one word offset to array
index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?
BLT loop ; if i < N, continue
ARM subroutine linkage
Branch and link instruction:
BL foo
Copies current PC to r14.
To return from subroutine:
MOV r15,r14
Nested subroutine calls
f1
Nesting/recursion requires coding
convention:
LDR r0,[r13] ; load arg into r0 from stack
; call f2()
STR r13!,[r14] ; store f1’s return adrs
STR r13!,[r0] ; store arg to f2 on stack
BL f2 ; branch and link to f2
; return from f1()
SUB r13,#4 ; pop f2’s arg off stack
LDR r13!,r15 ; restore register and return
Summary
Load/store architecture
Most instructions are RISCy, operate in
single cycle.
Some multi-register operations take longer.
All instructions can be executed
conditionally.
4. ARM Assembly Language
Programming
Why and when to use?
AT&T format and Intel format
Grammar of ARM assembly language
Examples
Why and when to use?
操作系统内核中的底层程序直接与硬件
打交道,需要用到的专用指令。
CPU中的特殊指令
频繁使用代码的时间效率
程序的空间效率(如操作系统的引导程序)
Refer to “Linux内核源代码情景分析” (浙江大学出版社)1.5节
AT&T format and Intel format
Grammar of ARM assembly
language
语句
程序格式
语句
语句
指令
伪操作
宏
语句格式
{ symbol } { instruction | directive |
pseudo-instruction } { ;comment }
伪操作
符号定义伪操作
数据定义伪操作
汇编控制伪操作
框架描述伪操作
信息报告伪操作
其它伪操作
关于变量的伪操作
声明一个全局变量,并初始化
声明一个局部变量,并初始化
GBLA, GBLL, GBLS
LCLA, LCLL, LCLS
变量赋值
SETA, SETL, SETS
Example
GBLA objectsize
objectsize SETA 0xff
SPACE objectsize
GBLL statusB
statusB SETL {TRUE}
;声明一个全局的算术变量
;给该变量赋值
;使用该变量
关于数据常量的伪操作
EQU
name EQU expr {, type}
通常在.inc文件中
分配内存单元
SPACE
DCB
{label} SPACE bye_num
分配一块内存单元,并用0初始化
{label} DCB expr, {expr}
分配一段字节内存单元,并用expr初始化
DCD
{label} DCD expr, {expr}
分配一段字内存单元(分配的内存都是字对齐的),
并用expr初始化
MACRO and MEND
子程序与宏
宏定义体
在子程序比较短,而需要传递的参数比较多的情况下使用宏
汇编技术
MACRO: 宏定义的开始
MEND: 宏定义的结束
通常在.mac文件中
格式
MACRO
{$label} macroname {$para1, $para2, ...}
...
;code
MEND
Example
MACRO
$label xmac $p1
...
;code
$label.loop1
;宏定义体的内部标号
...
;code
BGE $label.loop1
$label.loop2
;宏定义体的内部标号
...
;code
BL $p1
;参数p1是一个子程序的名称
BGT $label.loop2
...
;code
MEND
Example (cont’d)
“abc xmac subr1”调用宏展开后的结果
...
;code
abcloop1
;内部标号label被abc代替
...
;code
BGE abcloop1 ; 内部标号label被abc代替
abcloop2
;内部标号label被abc代替
...
;code
BL subr1
;参数p1被实际值subr1代替
BGT abcloop2
...
;code
其它伪操作
AREA: 定义一个代码段或数据段
AREA sectionname {, attr1} {, attr2}
ENTRY: 程序入口点
END: 源程序结束
其它伪操作(cont’d)
GET/INCLUDE
EXPORT
INCLUDE filename
EXPORT symbol {[WEAK]}
IMPORT
IMPORT symbol {[WEAK]}
伪指令
ADR
ADRL
ADRL{cond} register, expr
ADRL伪指令比ADR读取更大的地址范围。
汇编替换为两条指令
LDR
ADR{cond} register, expr
将基于PC的地址值或基于寄存器的地址值读取到寄存器中
汇编替换成一条指令
LDR{cond} register, =[expr | label_expr]
将一个32位的常数或地址值读取到寄存器中
NOP
空操作,如MOV R0, R0
程序格式
以段为单位组织源文件
代码段和数据段
AREA伪操作
Example
Review
Computer architecture and ARM
architecture
Instruction set
Assembly language programming
Program structure
Statements