Lecture 3 - 中央研究院資訊科學研究所

Download Report

Transcript Lecture 3 - 中央研究院資訊科學研究所

嵌入式處理器架構與程式設計
王建民
中央研究院 資訊所
2008年 7月
Contents
Introduction
 Computer Architecture
 ARM Architecture
 Development Tools
 GNU Development Tools
 ARM Instruction Set
 ARM Assembly Language
 ARM Assembly Programming
 GNU ARM ToolChain
 Interrupts and Monitor

2
Lecture 3
ARM Architecture
Outline



Overview
ARM Architecture
ARM Processor Core
4
Introduction to ARM

Advanced RISC Machines




Designs the ARM range of RISC processor cores
Licenses ARM core designs to semiconductor
partners who fabricate and sell to their customers.


Founded in November 1990
Spun out of Acorn Computers
ARM does not fabricate silicon itself
Also develop technologies to assist with the
design-in of the ARM architecture

Software tools, boards, debug hardware, application
software, bus architectures, peripherals etc
5
Why ARM here?



ARM is the most licensed and thus widespread
processor cores in the world.
Used especially in portable devices due to low
power consumption and reasonable performance
(MIPS/watt)
Several interesting extension available or in
development like Thumb instruction set and
Jazelle Java machine
6
ARM Partnership Model
7
ARM Powered Products
8
History of the ARM Architecture
1
2
Halfword
and signed
halfword /
byte support
System
mode
4
SA-110
3
Early ARM
architectures
ARM7TDMI
ARM720T
5TE
CLZ
SA-1110
Thumb
instruction
set
Improved
ARM/Thumb
Interworking
ARM9TDMI
ARM940T
5TEJ
Java bytecode
execution
Saturated maths
ARM9EJ-S
ARM926EJ-S
DSP multiplyaccumulate
instructions
ARM7EJ-S
ARM1026EJ-S
ARM1020E
4T
Jazelle
XScale
ARM9E-S
ARM966E-S
SIMD Instructions
6
Multi-processing
V6 Memory
architecture (VMSA)
Unaligned data
support
ARM1136EJ-S
9
Example ARM-based System
16 bit RAM
32 bit RAM
Interrupt
Controller
nIRQ
8 bit ROM
nFIQ
Peripherals
I/O
ARM
Core
15
AMBA
Arbiter
Reset
ARM
TIC
External
RAM
External
Bus
Interface
Decoder


Interrupt
Controller
On-chip
RAM
APB
System Bus
Peripheral Bus

Advanced Microcontroller Bus
Architecture
ADK

Remap/
Pause
AHB or ASB
AMBA

Timer
Bus Interface
Bridge
External
ROM
Complete AMBA Design Kit
ACT


AMBA Compliance Testbench
PrimeCell

AMBA compliant peripherals
16
The RealView Product Families
Compilation Tools
Debug Tools
Platforms
ARM Developer Suite (ADS) –
Compilers (C/C++ ARM & Thumb),
Linker & Utilities
AXD (part of ADS)
ARMulator (part of ADS)
Trace Debug Tools
Integrator™ Family
Multi-ICE
Multi-Trace
RealView Compilation Tools (RVCT)
RealView Debugger (RVD)
RealView ARMulator ISS (RVISS)
RealView ICE (RVI)
RealView Trace (RVT)
17
ARM Debug Architecture
Ethernet
Debugger (+ optional
trace tools)

EmbeddedICE Logic


JTAG interface (ICE)



Compresses real-time instruction and data
access trace
Contains ICE features (trigger & filter
logic)
Trace port analyzer (TPA)

TAP
controller
Converts debugger commands to JTAG
signals
Embedded trace Macrocell (ETM)


Provides breakpoints and processor/system
access
Trace Port
JTAG port
ETM
EmbeddedICE
Logic
ARM
core
Captures trace in a deep buffer
18
Outline



Overview
ARM Architecture
ARM Processor Core
19
ARM Architecture

32-bit RISC-processor core






Cache (depending on the implementation)
Bus structure




Fixed length 32-bit instructions
3-address instruction format
Load/store architecture
Pipelined execution (ARM7: 3 stages)
Von Neuman-type bus structure (ARM7)
Harvard-type bus structure (ARM9)
Coprocessor support
Simple structure  reasonably good speed/power
consumption ratio
20
ARM Features

Operating states




Memory formats






ARM: 32-bit ARM instruction set
Thumb: 16-bit Thumb instruction set
Jazelle cores can also execute Java bytecode
Little-endian
Big-endian
6 data types
7 operating modes
37 pieces of 32-bit integer registers
Exception support
21
Data Types


The ARM is a 32-bit architecture.
When used in relation to the ARM:





Byte means 8 bits
Halfword means 16 bits (two bytes), aligned on 2-byte
boundary
Word means 32 bits (four bytes), aligned on 4-byte
boundary
Both signed and unsigned data types are supported.
ARM coprocessor supports floating point values.
22
Processor Modes

The ARM has seven basic operating modes:







User: unprivileged mode under which most tasks run
FIQ: entered when a high priority (fast) interrupt is
raised
IRQ: entered when a low priority (normal) interrupt is
raised
Supervisor: entered on reset and when a Software
Interrupt instruction is executed
Abort: used to handle memory access violations
Undef: used to handle undefined instructions
System: privileged mode using the same registers as
user mode

Not in ARM Architectures 1, 2 or 3
23
Privileged Modes




Most programs operate in User mode.
Modes other than User mode are
collectively known as privileged modes.
Privileged modes are used to service
interrupts or exceptions, or to access
protected resources.
Privileged modes have more rights to
memory systems and coprocessor.
24
Registers

ARM has 37 registers all of which are 32-bits long.





The current processor mode governs which of
several banks is accessible. Each mode can access





1 dedicated program counter
1 dedicated current program status register
5 dedicated saved program status registers
30 general purpose registers
a particular set of r0-r12 registers
the stack pointer, r13 (sp) and the link register, r14 (lr)
the program counter, r15 (pc)
the current program status register, cpsr
Privileged modes (except System) can also access

a particular spsr (saved program status register)
25
ARM Register Set
Current Visible Registers
Abort Mode
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
r15 (pc)
cpsr
spsr
Banked out Registers
User
r13 (sp)
r14 (lr)
FIQ
IRQ
SVC
Undef
r14 (lr)
r13 (sp)
r14 (lr)
r13 (sp)
r14 (lr)
r13 (sp)
r14 (lr)
spsr
spsr
spsr
spsr
r8
r9
r10
r11
r12
r13 (sp)
26
Register Organization Summary
User
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
FIQ
User
mode
r0-r7,
r15,
and
cpsr
IRQ
User
mode
r0-r12,
r15,
and
cpsr
SVC
Undef
User
mode
r0-r12,
r15,
and
cpsr
User
mode
r0-r12,
r15,
and
cpsr
Abort
User
mode
r0-r12,
r15,
and
cpsr
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
r13 (sp)
r14 (lr)
r13 (sp)
r14 (lr)
r13 (sp)
r14 (lr)
r13 (sp)
r14 (lr)
spsr
spsr
spsr
spsr
spsr
Thumb state
Low registers
Thumb state
High registers
r15 (pc)
cpsr
Note: System mode uses the User mode register set
27
Example: User to FIQ Mode
Registers in use
Registers in use
User Mode
FIQ Mode
r0
r0
r1
r2
r3
r4
r5
r6
r7
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
EXCEPTION
r8
r9
r10
r11
r12
r13
r14
FIQ
FIQ
FIQ
FIQ
FIQ
FIQ
FIQ
r15 (pc)
r8
r9
r10
r11
r12
r13 (sp)
r14 (lr)
r8
r9
r10
r11
r12
r13
r14
FIQ
FIQ
FIQ
FIQ
FIQ
FIQ
FIQ
Return address calculated from User mode
PC value and stored in FIQ mode LR
cpsr
spsr FIQ
spsr FIQ
User mode CPSR copied to FIQ mode SPSR
28
Access Registers using Instructions

No breakdown of currently accessible registers.




All instructions can access r0-r14 directly.
Most instructions also allow use of the PC.
Specific instructions to allow access to CPSR and
SPSR.
When in a privileged mode, it is also possible to
load / store the (banked out) user mode registers to
or from memory.

See later for details.
29
Program Status Registers1

The program status registers






Condition code flags: hold information about the most
recently performed ALU operation.
Interrupt disable bits: control the enabling and disabling
of interrupts.
T-bit: reflects the operating state.
Mode bits: set the processor operating mode.
Reserved bits: unused.
To maintain compatibility with future ARM
processors, you must not alter any othe the
reserved bits.
30
Program Status Registers2
31
28 27
N Z C V Q
24
J
23
16 15
U
f




e
f
i
n
s
N = Negative result from ALU
Z = Zero result from ALU
C = ALU operation Carried out
V = ALU operation oVerflowed

Architecture 5TEJ only
Indicates if saturation has occurred
J bit


e
d
x



6
5
4
0
I F T
mode
c
I = 1: Disables the IRQ.
F = 1: Disables the FIQ.
T Bit




7
Interrupt Disable bits.

Sticky Overflow flag - Q flag


d
Condition code flags


n
8
Architecture xT only
T = 0: Processor in ARM state
T = 1: Processor in Thumb state
Mode bits

Specify the processor mode
Architecture 5TEJ only
J = 1: Processor in Jazelle state
31
Condition Flags
Flag
Logical Instruction
Arithmetic Instruction
Negative
(N=‘1’)
No meaning
Bit 31 of the result has been set
Indicates a negative number in
signed operations
Zero
(Z=‘1’)
Result is all zeroes
Result of operation was zero
Carry
(C=‘1’)
After Shift operation
‘1’ was left in carry flag
Result was greater than 32 bits
oVerflow
(V=‘1’)
No meaning
Result was greater than 31 bits
Indicates a possible corruption of
the sign bit in signed
numbers
32
Mode Bits
M[4:0]
10000
10001
10010
10011
10111
11011
11111
Processor Mode
User
FIQ
IRQ
Supervisor
Abort
Undefined
System
33
Program Counter (r15)

When the processor is executing in ARM state:




When the processor is executing in Thumb state:




All instructions are 32 bits wide.
All instructions must be word aligned.
pc value is stored in bits [31:2] with bits [1:0]
undefined.
All instructions are 16 bits wide.
All instructions must be halfword aligned.
pc value is stored in bits [31:1] with bit [0] undefined.
When the processor is executing in Jazelle state:


All instructions are 8 bits wide.
Processor performs a word access to read 4 instructions
at once.
34
Link Register (r14)


The r14 is used as the subroutine link register (LR)
and stores the return address when Branch with
Link operations are performed, calculated from
the PC.
Thus to return from a linked branch

MOV r15, r14
or

MOV pc, lr
35
Exception Handling1


Exceptions arise whenever the normal flow of a
program has to be halted temporarily.
When an exception occurs, the ARM:



Stores the return address in LR_<mode>
Copies CPSR into SPSR_<mode>
Sets appropriate CPSR bits




Change to ARM state
Change to exception mode
Disable interrupts (if appropriate)
Sets PC to fetch the next instruction from the relevant
vector address
36
The Vector Table
0x1C
0x18
FIQ
IRQ
0x10
(Reserved)
Data Abort
0x0C
Prefetch Abort
0x08
Software Interrupt
0x04
Undefined Instruction
0x00
Reset
0x14
Vector Table
Vector table can be at
0xFFFF0000 on ARM720T
and on ARM9/10 family devices
37
Exception Handling2



Exceptions are always entered in ARM state.
After the exception has been processed, the
control normally flows back to the original task.
To return, exception handler needs to:




Clear the disable interrupt flags that were set on entry
Restore CPSR from SPSR_<mode>
Restore PC from LR_<mode>
The last two steps must happen atomically as part
of a single instruction.
38
Exception Handling3
Exception
BL
SWI
UDEF
PABT
FIQ
IRQ
DABT
Return instruction
MOV
PC, R14
MOVS PC, R14_svc
MOVS PC, R14_und
SUBS PC, R14_abt,
SUBS PC, R14_fiq,
SUBS PC, R14_irq,
SUBS PC, R14_abt,
RESET
Not applicable
#4
#4
#4
#8
39
Quiz #1

What registers are used to store the program
counter and link register?

What is r13 often used to store?

Which mode, or modes has the fewest available
number of registers available? How many and
why?
40
Outline



Overview
ARM Architecture
ARM Processor Core
41
ARM7TDMI Organization

Register Bank







2 read ports and 1 write port
In addition, 1 read port and 1 write port for PC
Barrel Shifter
ALU
Address Register and Incrementer
Data Register
Instruction Decoder and Control Logic
43
Pipelined Execution
Cycle
PC
Address
Instruction
200
ADD
204
SUB
208
MOV
20C
AND
210
ORR

1
2
3
4
5
6
7
200
204
208
20C
210
214
218
Fetch
Decode
Execute
Fetch
Decode
Execute
Fetch
Decode
Execute
Fetch
Decode
Execute
Fetch
Decode
Execute
When cycle = 3, PC = 208



ADD instruction (addr=200=PC-8) in the execute stage
SUB instruction (addr=204=PC-4) in the decode stage
MOV instruction (aadr=208=PC) in the fetch stage
45
3-Stage Pipeline


There are 3 instructions undertaken
simultaneously at different stage
For data processing instructions



Latency = 3 cycles
Throughput = 1 instruction / cycle
When accessing PC, PC = address of the
instruction being executed + 8
46
Instruction Fetch and Decode
47
Data Processing Instructions

Operations




Operands



Arithmetic operations: ADD, SUB, …
Logic operations: AND, ORR, …
Register operations: MOV, CMP, …
Register-Register
Register-Immediate
All operations can be executed in a single
clock cycle.
48
Register-Register Operation
49
Register-Immediate Operation
50
Multi-Cycle Instructions

Data Transfer Instructions: LDR and STR




1st cycle: Compute a memory address similar to a data
processing instruction.
2nd cycle: Load data from memory to read data register
or store data to memory
3rd cycle: Transfer data from read data register to
Register Bank for LDR
Branch Instructions: BL



1st cycle similar to address calculation
2nd cycle saves return address
3rd cycle adjusts the value in link register
52
Address Calculation
53
Store Data and Auto-Indexing
54
Pipelining for STR
Cycle
1
2
3
Fetch
Decode
Execute
Fetch
Decode
4
5
6
7
8
Operation
ADD
STR
AND
MOV
CMP



Addr. calc. Data xfer
Fetch
Decode
Fetch
Execute
Decode
Execute
Fetch
Decode
Execute
Memory access once in every cycle
Data path used once in every cycle
Decoder generate control signals for the
data path in the next cycle(s)
55
2nd Cycle of Load Data
56
3rd Cycle of Load Data
57
2nd Cycle of Branch
58
3rd Cycle of Branch
59
Pipelining for BL
Cycle
1
2
3
Fetch
Decode
Execute
Fetch
Decode
4
5
6
7
8
Operation
ADD
BL
?
??
AND
MOV
Fetch
Target calc. Link return
Adjust
Decode
Fetch
Fetch
Decode
Execute
Fetch
Decode
Execute
60