ARM Systems-on-chip - Electrical & Computer Engineering

Download Report

Transcript ARM Systems-on-chip - Electrical & Computer Engineering

CPE 626: Advanced VLSI Design
L01
Department of Electrical and
Computer Engineering
University of Alabama in Huntsville
Outline
 Computer Engineering:
Motivation, Present, Future
 Computer Engineering Methodology
 Power as a Design Constraint
 Stored-program Computer: MU0 Example
 Digital System Modeling: Motivation
2
Why Computer Engineering?
CHANGE! It is exciting. It has never been more exciting!
It impacts every aspect of human life.
PC, 2002
PDA, 2002
Eniac, 1946
(first stored-program computer)
Bionic, 2002
3
Why Such Change?
 Continuous growth in performance
due to advances in technology (CMOS VLSI) and
innovations in computer design (RISC, RAID, ILP)
 Lower cost
due to simpler development and higher volumes
 These resulted in significant enhancement
of the capability available to computer user
 Example: our today’s PC of less than $1000
has more performance, main memory and
disk storage than $1 million computer in 1970s
4
Computer Engineering Methodology
Market
Implementation
Complexity
Applications
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
Implement Next
Generation System
Simulate New
Designs and
Organizations
Workloads
5
Technology Trends
Logic
Capacity
Speed/Latency
4x in 3 years
1.54x per year
State of the art:
Intel Pentium 4,
Disk
4x in 3-4 years
2x in 10 years
2.2 GHz,
0.13microns,
42 million transistors
Reuters, Monday 11 June 2001:
Intel engineers have designed and manufactured the
world’s smallest and fastest transistor of 0.02 microns in
size.
DRAM
4x in 3-4 years
2x in 10 years
This will open the way for microprocessors of 1 billion
transistors, running at 20 GHz by 2007.
6
Pentium III Die Photo


















EBL/BBL - Bus logic, Front, Back
MOB - Memory Order Buffer
Packed FPU - MMX Fl. Pt. (SSE)
IEU - Integer Execution Unit
FAU - Fl. Pt. Arithmetic Unit
MIU - Memory Interface Unit
DCU - Data Cache Unit
PMH - Page Miss Handler
DTLB - Data TLB
BAC - Branch Address Calculator
RAT - Register Alias Table
SIMD - Packed Fl. Pt.
RS - Reservation Station
BTB - Branch Target Buffer
IFU - Instruction Fetch Unit (+I$)
ID - Instruction Decode
ROB - Reorder Buffer
MS - Micro-instruction Sequencer
1st Pentium III, Katmai: 9.5 M transistors, 12.3 *
10.4 mm in 0.25-mi. with 5 layers of aluminum
7
Pentium 4 Die Photo
 42M Xtors
 PIII: 26M
 217 mm2
 PIII: 106
mm2
 L1 Execution
Cache
 Buffer
12,000
Micro-Ops
 8KB data
cache
 256KB L2$
8
Future Applications
 Desktop: 90% of cycles will be spent on media
applications
 video encode/decode, polygon & image-based graphics
 audio processing, compression, music,
speech recognition/synthesis
 modulation/demodulation at audio and video rates
 Scientific desktops: high-performance FPs and graphics
 Commercial servers: support for databases and
transaction processing, enhancement for reliability,
support for scalability
 Embedded computing:
special support for graphics or video, power limitations
9
Future Directions
 Conditions
 new workloads are characterised
with more exploitable parallelism
 dominant wire delays on a billion transistor chip
will force hardware to be more distributed
 Novel architectural techniques
Develop architectural
 Exploit parallelism
techniques that exploit
semiconductor technology
o multiprocessor on chip
and workload characteristics
o simultaneous multithreading
in order to maximize
 CPU-memory integration
performance at low cost
o memory tolerating techniques
o flexible hierarchy to adapt to application
 Reconfigurable computing
10
Power as a Design Constraint
Power becomes critical issue
 Portable and mobile platforms
 battery-operated devices
 Desktops, server farms
 Reliability?
 Power consumption: IT consumes 10% in the US
 Power density: 30 W/cm2 in Alpha 21364
(3x of typical hot plate)
11
Power as a Design Constraint (cont’d)
Dynamic power
consumption
Power due to shortPower due to
circuit current
leakage current
during transition
P  ACV f  AVIshort f  VIleak
2
A (activity of gates) =>
Turn off unused parts
or
use design techniques
to minimize number of
transitions
Reduce the
supply voltage, V
fmax
( V  Vt )2

V
qVt
Ileak  exp( 
)
kT
Reduce
threshold Vt
12
Recap: Computer Architecture
 Computer Architecture describes user’s view of the
computer: visible registers, data types, instruction set,
instruction formats, memory management table structures,
exception handling
 Computer Organization describes user’s invisible
implementation of the architecture: pipeline structure,
caches, TLB, ...
13
Stored-program computer
FF..FF16
instructions
registers
address
data
processor
instructions
and data
memory
00..00 16
14
Typical Hierarchy
 Transistors
 Logic gates, memory cells, special
circuits
 Single-bit adders, MUXs, flip-flops,
decoders, coders
 Word-wide adders, MUXs, registers,
decoders, buses
 ALUs, shifters, register files, memory
blocks
 Processor, peripheral cells, cache
memories, MMUs
 Integrated system chips
 PCBs
 Mobile phones, laptops, PCs, engine
controllers
Vdd
A
A.B
B
Vss
15
MU0 – A Simple Processor
 Instruction format
 Instruction set
4 bits
opcode
12 bits
S
Instruction
Opcode
Effect
LDA S
0000
ACC := mem16[S]
STO S
0001
mem16[S] := ACC
ADD S
0010
ACC := ACC + mem16[S]
SUB S
0011
ACC := ACC - mem16[S]
JMP S
0100
PC := S
JGE S
0101
if ACC >= 0 PC := S
JNE S
0110
if ACC !=0 PC := S
STP
0111
stop
16
MU0 Datapath Example
 Program Counter – PC
 Accumulator - ACC
 Arithmetic-Logic Unit – ALU
 Instruction Register
 Instruction Decode and
Control Logic
ad dress bus
Follow the principle that the
memory will be limiting
factor in design: each
instruction takes exactly the
number of clock cycles
defined by the number of
memory accesses it must
take.
PC
contro l
IR
me mory
AL U
ACC
da ta b us
17
MU0 Datapath Design
 Assume that each instruction starts
when it has arrived in the IR
 Step 1: EX (execute)
 LDA S: ACC <- Mem[S]
 STO S: Mem[S] <- ACC
 ADD S: ACC <- ACC + Mem[S]
 SUB S: ACC <- ACC - Mem[S]
 JMP S: PC <- S
 JGE S: if (ACC >= 0) PC <- S
 JNE S: if (ACC != 0) PC <- S
 Step 2: IF (fetch the next instruction)
 Either PC or the address in the IR
is issued to fetch the next
instruction
 address is incremented in the
ALU and value saved into the PC
 Initialization
 Reset input to start
executing instructions from
a known address; here it is
000hex
o provide zero at the ALU
output and then load it
into the PC register
18
MU0 RTL Organization
 Control Logic
 Asel
 Bsel
 ACCce (ACC change enable)
 PCce (PC change enable)
 IRce (IR change enable)
 ACCoe (ACC output enable)
 ALUfs (ALU function select)
 MEMrq (memory request)
 RnW (read/write)
 Ex/ft (execute/fetch)
19
MU0 control logic
In p ut s
Op c o de Ex / f t ACC1 5
In s t ruc t i o n
Re s e t ACCz
Reset
xxxx
1
x
x
x
LDA S
0000
0
0
x
x
0000
0
1
x
x
STO S
0001
0
0
x
x
0001
0
1
x
x
ADD S
0010
0
0
x
x
0010
0
1
x
x
SUB S
0011
0
0
x
x
0011
0
1
x
x
JMP S
0100
0
x
x
x
JGE S
0101
0
x
x
0
0101
0
x
x
1
JNE S
0110
0
x
0
x
0110
0
x
1
x
STOP
0111
0
x
x
x
Out p ut s
Bs el
PCc e ACCo e
MEMrq Ex / f t
As e l ACCc e IRc e
ALUf s
Rn W
0
0
1
1
1
0
=0
1
1
0
1
1
1
0
0
0
=B
1
1
1
0
0
0
1
1
0
B+1
1
1
0
1
x
0
0
0
1
x
1
0
1
0
0
0
1
1
0
B+1
1
1
0
1
1
1
0
0
0
A+B
1
1
1
0
0
0
1
1
0
B+1
1
1
0
1
1
1
0
0
0
A-B
1
1
1
0
0
0
1
1
0
B+1
1
1
0
1
0
0
1
1
0
B+1
1
1
0
1
0
0
1
1
0
B+1
1
1
0
0
0
0
1
1
0
B+1
1
1
0
1
0
0
1
1
0
B+1
1
1
0
0
0
0
1
1
0
B+1
1
1
0
1
x
0
0
0
0
x
0
1
0
20
MU0 ALU Design
 ALU functions: A+B, A-B, B,
B+1, 0 (used only when reset is
active) => 4 functions
Binv
Cin
 Aen (enable operand A)
 Binv (invert operand B)
reset
sum
B
A
Aen
Cout
21
Digital System Modeling: Motivation





Requirements specification
Functional specification
Testing and verification of the design
Formal verification of the correctness of the design
Automatic synthesis
22
Gajski and Kuhn’s Y Chart
Architectural
Behavioral
Structural
Algorithmic
Systems
Functional Block
Processor
Hardware Modules
Algorithms
Logic
ALUs, Registers
Register Transfer
Circuit Gates, FFs
Logic
Transistors
Transfer Functions
Rectangles
Cell, Module Plans
Floor Plans
Domains
Clusters
Functional – operations performed by the system
Physical Partitions
Structural – how the system is composed
Geometry – how the system is laid out in physical space
Physical/Geometry
23