EECC550 - Shaaban | studyslide.com

EECC550 - Shaaban

Transcript EECC550 - Shaaban

Computer Organization EECC 550
Week 1
Week 2
Week 3
•
Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer
Notation (RTN). [Chapters 1, 2]
•
Instruction Set Architecture (ISA) Characteristics and Classifications: CISC Vs. RISC. [Chapter 2]
•
MIPS: An Example RISC ISA. Syntax, Instruction Formats, Addressing Modes, Encoding &
Examples. [Chapter 2]
•
Central Processor Unit (CPU) & Computer System Performance Measures. [Chapter 4]
•
CPU Organization: Datapath & Control Unit Design. [Chapter 5]
Week 4
Week 5
•
–
MIPS Single Cycle Datapath & Control Unit Design.
–
MIPS Multicycle Datapath and Finite State Machine Control Unit Design.
Microprogrammed Control Unit Design. [Chapter 5]
–
Week 6
Microprogramming Project
•
Midterm Review and Midterm Exam
•
CPU Pipelining. [Chapter 6]
•
The Memory Hierarchy: Cache Design & Performance. [Chapter 7]
•
The Memory Hierarchy: Main & Virtual Memory. [Chapter 7]
Week 9
•
Input/Output Organization & System Performance Evaluation. [Chapter 8]
Week 10
•
Computer Arithmetic & ALU Design. [Chapter 3] If time permits.
Week 11
•
Final Exam.
Week 7
Week 8
EECC550 - Shaaban
#1 Lec # 1 Winter 2005 11-29-2005
Computing System History/Trends +
Instruction Set Architecture (ISA) Fundamentals
•
Computing Element Choices:
–
–
–
•
•
•
•
•
•
•
•
Computing Element Programmability
Spatial vs. Temporal Computing
Main Processor Types/Applications
General Purpose Processor Generations
The Von Neumann Computer Model
CPU Organization (Design)
Recent Trends in Computer Design/performance
Hierarchy of Computer Architecture
Hardware Description: Register Transfer Notation (RTN)
Computer Architecture Vs. Computer Organization
Instruction Set Architecture (ISA):
–
–
–
–
–
–
–
–
–
–
(Chapters 1, 2)
Definition and purpose
ISA Specification Requirements
Main General Types of Instructions
ISA Types and characteristics
Typical ISA Addressing Modes
Instruction Set Encoding
Instruction Set Architecture Tradeoffs
Complex Instruction Set Computer (CISC)
Reduced Instruction Set Computer (RISC)
Evolution of Instruction Set Architectures
EECC550 - Shaaban
#2 Lec # 1 Winter 2005 11-29-2005
Computing Element Choices
•
•
General Purpose Processors (GPPs): Intended for general purpose computing
(desktops, servers, clusters..)
Application-Specific Processors (ASPs): Processors with ISAs and
architectural features tailored towards specific application domains
–
•
•
Co-Processors: A hardware (hardwired) implementation of specific
algorithms with limited programming interface (augment GPPs or ASPs)
Configurable Hardware:
–
–
•
•
E.g Digital Signal Processors (DSPs), Network Processors (NPs), Media Processors,
Graphics Processing Units (GPUs), Vector Processors??? ...
Field Programmable Gate Arrays (FPGAs)
Configurable array of simple processing elements
Application Specific Integrated Circuits (ASICs): A custom VLSI hardware
solution for a specific computational task
The choice of one or more depends on a number of factors including:
- Type and complexity of computational algorithm
(general purpose vs. Specialized)
- Desired level of flexibility/
programmability
- Development cost/time
- Power requirements
The main goal of this course is the study of fundamental design
techniques for General Purpose Processors
- Performance requirements
- System cost
- Real-time constrains
EECC550 - Shaaban
#3 Lec # 1 Winter 2005 11-29-2005
Programmability / Flexibility
Computing Element Choices
General Purpose
Processors
(GPPs):
The main goal of this course is the study
of fundamental design techniques
for General Purpose Processors
Application-Specific
Processors (ASPs)
Processor : Programmable computing element that
runs programs written using a pre-defined set of
instructions
Configurable Hardware
Selection Factors:
- Type and complexity of computational algorithms
(general purpose vs. Specialized)
- Desired level of flexibility
- Performance
- Development cost
- System cost
- Power requirements
- Real-time constrains
Co-Processors
Specialization , Development cost/time
Performance/Chip Area/Watt
(Computational Efficiency)
Application Specific
Integrated Circuits
(ASICs)
Performance
EECC550 - Shaaban
#4 Lec # 1 Winter 2005 11-29-2005
Computing Element Choices:
Computing Element Programmability
Fixed Function:
Programmable:
• Computes one function (e.g.
FP-multiply, divider, DCT)
• Function defined at
fabrication time
• e.g hardware (ASICs)
• Computes “any”
computable function (e.g.
Processors)
• Function defined after
fabrication
Parameterizable Hardware:
Performs limited “set” of functions
e.g. Co-Processors
Processor = Programmable computing element
that runs programs written using pre-defined instructions
EECC550 - Shaaban
#5 Lec # 1 Winter 2005 11-29-2005
Computing Element Choices:
Spatial vs. Temporal Computing
Spatial
(using hardware)
Temporal
(using software/program
running on a processor)
Processor
Instructions
Processor = Programmable computing element
that runs programs written using a pre-defined set of instructions
EECC550 - Shaaban
#6 Lec # 1 Winter 2005 11-29-2005
The main goal of this course is the study of fundamental design
techniques for General Purpose Processors
•
General Purpose Processors (GPPs) - high performance.
–
–
–
–
•
Embedded processors and processor cores
–
–
e.g: Intel XScale, ARM, 486SX, Hitachi SH7000, NEC V800...
Often require Digital signal processing (DSP) support or other
application-specific support (e.g network, media processing)
Single program
Lightweight, often realtime OS or no OS
Examples: Cellular phones, consumer electronics .. (e.g. CD players)
Microcontrollers
–
–
–
–
–
Extremely cost/power sensitive
Single program
Small word size - 8 bit common
Highest volume processors by far
Examples: Control systems, Automobiles, toasters, thermostats, ...
Examples of Application-Specific Processors
Increasing
volume
–
–
–
•
RISC or CISC: Intel P4, IBM Power4, SPARC, PowerPC, MIPS ...
Used for general purpose software
Heavy weight OS - Windows, UNIX
Workstations, Desktops (PC’s), Clusters
Increasing
Cost/Complexity
Main Processor Types/Applications
EECC550 - Shaaban
#7 Lec # 1 Winter 2005 11-29-2005
Performance
The Processor Design Space
Application specific
architectures
for performance
Embedded
Real-time constraints
processors
Specialized applications
Low power/cost constraints
Microcontrollers
Microprocessors
GPPs
Performance is
everything
& Software rules
The main goal of this course is the
study of fundamental design techniques
for General Purpose Processors
Cost is everything
Chip Area, Power Processor Cost
complexity
Processor = Programmable computing element
that runs programs written using a pre-defined set of instructions
EECC550 - Shaaban
#8 Lec # 1 Winter 2005 11-29-2005
General Purpose Processor/Computer System Generations
Classified according to implementation technology:
•
The First Generation, 1946-59: Vacuum Tubes, Relays, Mercury Delay Lines:
– ENIAC (Electronic Numerical Integrator and Computer): First electronic
computer, 18000 vacuum tubes, 1500 relays, 5000 additions/sec (1944).
– First stored program computer: EDSAC (Electronic Delay Storage Automatic
Calculator), 1949.
•
The Second Generation, 1959-64: Discrete Transistors.
– e.g. IBM Main frames
•
The Third Generation, 1964-75: Small and Medium-Scale Integrated (MSI)
Circuits.
– e.g Main frames (IBM 360) , mini computers (DEC PDP-8, PDP-11).
•
The Fourth Generation, 1975-Present: The Microcomputer. VLSI-based
Microprocessors (single-chip processor)
– First microprocessor: Intel’s 4-bit 4004 (2300 transistors), 1970.
– Personal Computer (PCs), laptops, PDAs, servers, clusters …
– Reduced Instruction Set Computer (RISC) 1984
Common factor among all generations:
All target the The Von Neumann Computer Model or paradigm
EECC550 - Shaaban
#9 Lec # 1 Winter 2005 11-29-2005
The Von-Neumann Computer Model
• Partitioning of the programmable computing engine into components:
– Central Processing Unit (CPU): Control Unit (instruction decode, sequencing
of operations), Datapath (registers, arithmetic and logic unit, buses).
– Memory: Instruction (program) and operand (data) storage.
– Input/Output (I/O).
– The stored program concept: Instructions from an instruction set are
fetched from a common memory and executed one at a time.
Control
Input
Memory
(instructions,
data)
Computer System
Datapath
registers
ALU, buses
Output
CPU
I/O Devices
Major CPU Performance Limitation: The Von Neumann computing
model implies sequential execution one instruction at a time
EECC550 - Shaaban
#10 Lec # 1 Winter 2005 11-29-2005
Generic CPU Machine Instruction Processing Steps
(Implied by The Von Neumann Computer Model)
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Obtain instruction from program storage
(memory)
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in storage for later use
Determine successor or next instruction
Instruction
Major CPU Performance Limitation: The Von Neumann computing model
implies sequential execution one instruction at a time
EECC550 - Shaaban
#11 Lec # 1 Winter 2005 11-29-2005
Hardware Components of Computer Systems
Five classic components of all computers:
1. Control Unit; 2. Datapath; 3. Memory; 4. Input; 5. Output
}
}
Processor
I/O
Computer
Processor
(active)
Control
Unit
Datapath
Memory
(passive)
(where
programs,
data
live when
running)
Devices
Keyboard,
Mouse, etc.
Input
I/O
Disk
Output
Display,
Printer, etc.
EECC550 - Shaaban
#12 Lec # 1 Winter 2005 11-29-2005
CPU Organization
• Datapath Design:
– Capabilities & performance characteristics of principal
Functional Units (FUs):
– (e.g., Registers, ALU, Shifters, Logic Units, ...)
– Ways in which these components are interconnected (buses
connections, multiplexors, etc.).
– How information flows between components.
• Control Unit Design:
– Logic and means by which such information flow is controlled.
– Control and coordination of FUs operation to realize the targeted
Instruction Set Architecture to be implemented (can either be
implemented using a finite state machine or a microprogram).
• Hardware description with a suitable language, possibly
using Register Transfer Notation (RTN).
EECC550 - Shaaban
#13 Lec # 1 Winter 2005 11-29-2005
Control
Unit
A Typical
Microprocessor
Layout:
The Intel
Pentium Classic
1993 - 1997
60MHz - 233 MHz
Datapath
First Level of Memory (Cache)
EECC550 - Shaaban
#14 Lec # 1 Winter 2005 11-29-2005
Control
Unit
A Typical
Microprocessor
Layout:
The Intel
Pentium Classic
1993 - 1997
60MHz - 233 MHz
Datapath
First Level of Memory (Cache)
EECC550 - Shaaban
#15 Lec # 1 Winter 2005 11-29-2005
Computer System Components
CPU Core
1 GHz - 3.8 GHz
4-way Superscaler
RISC or RISC-core (x86):
Deep Instruction Pipelines
Dynamic scheduling
Multiple FP, integer FUs
Dynamic branch prediction
Hardware speculation
SDRAM
PC100/PC133
100-133MHZ
64-128 bits wide
2-way inteleaved
~ 900 MBYTES/SEC )64bit)
Current Standard
Double Date
Rate (DDR) SDRAM
PC3200
200 MHZ DDR
64-128 bits wide
4-way interleaved
~3.2 GBYTES/SEC
(one 64bit channel)
~6.4 GBYTES/SEC
(two 64bit channels)
L1
CPU
All Non-blocking caches
L1 16-128K
1-2 way set associative (on chip), separate or unified
L2 256K- 2M 4-32 way set associative (on chip) unified
L3 2-16M
8-32 way set associative (off or on chip) unified
L2
Examples: Alpha, AMD K7: EV6, 200-400 MHz
Intel PII, PIII: GTL+ 133 MHz
Intel P4
800 MHz
L3
Caches
Front Side Bus (FSB)
Off or On-chip
adapters
Memory
Controller
Memory Bus
RAMbus DRAM (RDRAM)
400MHZ DDR
16 bits wide (32 banks)
~ 1.6 GBYTES/SEC
I/O Buses
NICs
Controllers
Example: PCI, 33-66MHz
32-64 bits wide
133-528 MBYTES/SEC
PCI-X 133MHz 64 bit
1024 MBYTES/SEC
Memory
Disks
Displays
Keyboards
Networks
I/O Devices:
North
Bridge
South
Bridge
Chipset
I/O Subsystem
EECC550 - Shaaban
#16 Lec # 1 Winter 2005 11-29-2005
Performance Increase of Workstation-Class
Microprocessors 1987-1997
Integer SPEC92 Performance
> 100x performance increase in one decade
EECC550 - Shaaban
#17 Lec # 1 Winter 2005 11-29-2005
Microprocessor Transistor
Count Growth Rate
100000000
Currently > 1 Billion
Alpha 21264: 15 million
Pentium Pro: 5.5 million
PowerPC 620: 6.9 million
Alpha 21164: 9.3 million
Sparc Ultra: 5.2 million
10000000
Moore’s Law
Pentium
i80486
Transistors
1000000
i80386
i80286
100000
Moore’s Law:
i8086
10000
2X transistors/Chip
Every 1.5 years
i8080
2300
i4004
1000
1970
1975
1980
1985
1990
1995
2000
(circa 1970)
Year
~ 500,000x transistor density increase in the last 35 years
EECC550 - Shaaban
#18 Lec # 1 Winter 2005 11-29-2005
Increase of Capacity of VLSI Dynamic RAM
(DRAM) Chips
size
1024 M bit = 1 G bit
1000000000
100000000
16 M bit
Bits
10000000
1 M bit
1000000
256k bit
100000
64k bit
10000
year
size(Megabit)
1980
0.0625
1983
0.25
1986
1
1989
4
1992
16
1996
64
1999
256
2000
1024
1000
1970
1975
1980
1985
1990
1995
Year
~ 17,000x DRAM chip capacity increase in 20 years
2000
1.55X/yr,
or doubling every 1.6
years
(Also follows Moore’s Law)
EECC550 - Shaaban
#19 Lec # 1 Winter 2005 11-29-2005
Computer Technology Trends:
Evolutionary but Rapid Change
• Processor:
– 1.5-1.6 performance improvement every year; Over 100X performance in last
decade.
• Memory:
– DRAM capacity: > 2x every 1.5 years; 1000X size in last decade.
– Cost per bit: Improves about 25% or more per year.
– Only 15-25% performance improvement per year.
• Disk:
–
–
–
–
Capacity: > 2X in size every 1.5 years.
Cost per bit: Improves about 60% per year.
200X size in last decade.
Only 10% performance improvement per year, due to mechanical limitations.
• Expected State-of-the-art PC by end of year 2005 :
– Processor clock speed:
– Memory capacity:
– Disk capacity:
> 4000 MegaHertz (4 Giga Hertz)
> 4000 MegaByte (4 Giga Bytes)
> 500 GigaBytes (0.5 Tera Bytes)
EECC550 - Shaaban
#20 Lec # 1 Winter 2005 11-29-2005
A Simplified View of The
Software/Hardware Hierarchical Layers
EECC550 - Shaaban
#21 Lec # 1 Winter 2005 11-29-2005
Hierarchy of Computer Architecture
High-Level Language Programs
Software
Assembly Language
Programs
Application
Operating
System
Machine Language
Program
Compiler
Software/Hardware
Boundary
Firmware
Instr. Set Proc. I/O system
Instruction Set
Architecture
(ISA)
The ISA forms an abstraction layer
that sets the requirements for both
complier and CPU designers
Datapath & Control
Hardware
e.g.
BIOS (Basic Input/Output System)
Digital Design
Circuit Design
Microprogram
Layout
Logic Diagrams
VLSI placement & routing
Register Transfer
Notation (RTN)
Circuit Diagrams
EECC550 - Shaaban
#22 Lec # 1 Winter 2005 11-29-2005
Levels of Program Representation
temp = v[k];
High Level Language
Program
v[k] = v[k+1];
v[k+1] = temp;
Compiler
lw $15,
lw $16,
sw$16,
sw$15,
Hardware
Software
Assembly Language
Program
Assembler
Machine Language
Program
0000
1010
1100
0101
1001
1111
0110
1000
1100
0101
1010
0000
0110
1000
1111
1001
0($2)
4($2)
0($2)
4($2)
1010
0000
0101
1100
1111
1001
1000
0110
MIPS
Assembly
Code
0101
1100
0000
1010
1000
0110
1001
1111
Machine Interpretation
Control Signal
Specification
°
°
ALUOP[0:3] <= InstReg[9:11] & MASK
Register Transfer Notation (RTN)
Microprogram
EECC550 - Shaaban
#23 Lec # 1 Winter 2005 11-29-2005
A Hierarchy of Computer Design
Level Name
1
Modules
Electronics
2
Logic
3
Organization
Gates, FF’s
Registers, ALU’s ...
Processors, Memories
Primitives
Descriptive Media
Transistors, Resistors, etc.
Gates, FF’s ….
Circuit Diagrams
Logic Diagrams
Registers, ALU’s …
Register Transfer
Notation (RTN)
Low Level - Hardware
4 Microprogramming
Assembly Language
Microinstructions
Microprogram
Firmware
5 Assembly language
programming
6 Procedural
Programming
7
Application
OS Routines
Applications
Drivers ..
Systems
Assembly language
Instructions
Assembly Language
Programs
OS Routines
High-level Languages
High-level Language
Programs
Procedural Constructs
Problem-Oriented
Programs
High Level - Software
EECC550 - Shaaban
#24 Lec # 1 Winter 2005 11-29-2005
Hardware Description
• Hardware visualization:
– Block diagrams (spatial visualization):
Two-dimensional representations of functional units and their
interconnections.
– Timing charts (temporal visualization):
Waveforms where events are displayed vs. time.
• Register Transfer Notation (RTN):
– A way to describe microoperations capable of being performed
by the data flow (data registers, data buses, functional units) at
the register transfer level of design (RT).
– Also describes conditional information in the system which
cause operations to come about.
– A “shorthand” notation for microoperations.
• Hardware Description Languages:
– Examples: VHDL: VHSIC (Very High Speed Integrated
Circuits) Hardware Description Language, Verilog.
EECC550 - Shaaban
#25 Lec # 1 Winter 2005 11-29-2005
Register Transfer Notation (RTN)
• Dependent RTN: When RTN is used after the data flow is
assumed to be frozen. No data transfer can take place over a
path that does not exist. No statement implies a function the
data flow hardware is incapable of performing.
• Independent RTN: Describe actions on registers without
regard to nonexistence of direct paths or intermediate
registers. No predefined data flow. i.e No datapath design yet
• The general format of an RTN statement:
Conditional information: Action1; Action2
• The conditional statement is often an AND of literals (status
and control signals) in the system (a p-term). The p-term
is said to imply the action.
• Possible actions include transfer of data to/from
registers/memory data shifting, functional unit
operations etc.
EECC550 - Shaaban
#26 Lec # 1 Winter 2005 11-29-2005
RTN Statement Examples
AB
or
R[A]  R[B]
where R[X] mean the content of register X
– A copy of the data in entity B (typically a register) is
placed in Register A
– If the destination register has fewer bits than the source,
the destination accepts only the lowest-order bits.
– If the destination has more bits than the source, the value
of the source is sign extended to the left.
CTL T0: A = B
– The contents of B are presented to the input of
combinational circuit A
– This action to the right of “:” takes place when control
signal CTL is active and signal T0 is active.
EECC550 - Shaaban
#27 Lec # 1 Winter 2005 11-29-2005
RTN Statement Examples
MD M[MA]
or MD Mem[MA]
– Means the memory data (MD) register receives the contents
of the main memory (M or Mem) as addressed from the
Memory Address (MA) register.
AC(0), AC(1), AC(2), AC(3)
–
–
–
–
–
Register fields are indicated by parenthesis.
The concatenation operation is indicated by a comma.
Bit AC(0) is bit 0 of the accumulator AC
The above expression means AC bits 0, 1, 2, 3
More commonly represented by AC(0-3)
E  T3: CLRWRITE
– The control signal CLRWRITE is activated when the
condition E  T3 is active.
EECC550 - Shaaban
#28 Lec # 1 Winter 2005 11-29-2005
Computer Architecture Vs. Computer Organization
• The term Computer architecture is sometimes erroneously restricted
to computer instruction set design, with other aspects of computer
design called implementation.
The ISA forms an abstraction layer that sets the
requirements for both complier and CPU designers
• More accurate definitions:
– Instruction Set Architecture (ISA): The actual programmervisible instruction set and serves as the boundary or interface
between the software and hardware.
– Implementation of a machine has two components:
• Organization: includes the high-level aspects of a computer’s
CPU Microarchitecture
design such as: The memory system, the bus structure, the
(CPU design)
internal CPU unit which includes implementations of arithmetic,
logic, branching, and data transfer operations.
• Hardware: Refers to the specifics of the machine such as detailed
logic design and packaging technology. Hardware design and implementation
• In general, Computer Architecture refers to the above three aspects:
1- Instruction set architecture 2- Organization. 3- Hardware.
EECC550 - Shaaban
#29 Lec # 1 Winter 2005 11-29-2005
Instruction Set Architecture (ISA)
“... the attributes of a [computing] system as seen by the
programmer, i.e. the conceptual structure and functional
behavior, as distinct from the organization of the data flows
and controls the logic design, and the physical
implementation.”
ISA forms an abstraction layer that sets the
– Amdahl, Blaaw, and Brooks, 1964. The
requirements for both complier and CPU designers
The instruction set architecture is concerned with:
• Organization of programmable storage (memory & registers):
Includes the amount of addressable memory and number of
available registers.
• Data Types & Data Structures: Encodings & representations.
• Instruction Set: What operations are specified.
• Instruction formats and encoding.
• Modes of addressing and accessing data items and instructions
• Exceptional conditions.
EECC550 - Shaaban
#30 Lec # 1 Winter 2005 11-29-2005
Computer Instruction Sets
• Regardless of computer type, CPU structure, or
hardware organization, every machine instruction must
specify the following:
– Opcode: Which operation to perform. Example: add,
load, and branch. Opcode = Operation Code
– Where to find the operand or operands, if any: Operands
may be contained in CPU registers, main memory, or I/O
ports.
– Where to put the result, if there is a result: May be
explicitly mentioned or implicit in the opcode.
– Where to find the next instruction: Without any explicit
branches, the instruction to execute is the next instruction
in the sequence or a specified address in case of jump or
branch instructions.
EECC550 - Shaaban
#31 Lec # 1 Winter 2005 11-29-2005
Instruction Set Architecture (ISA)
Instruction
Specification Requirements
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
• Instruction Format or Encoding:
– How is it decoded?
• Location of operands and result (addressing
modes):
– Where other than memory?
– How many explicit operands?
– How are memory operands located?
– Which can or cannot be in memory?
• Data type and Size.
• Operations
– What are supported
• Successor instruction:
– Jumps, conditions, branches.
• Fetch-decode-execute is implicit.
EECC550 - Shaaban
#32 Lec # 1 Winter 2005 11-29-2005
Main General Types of Instructions
• Data Movement Instructions, possible variations:
–
–
–
–
–
–
Memory-to-memory.
Memory-to-CPU register.
CPU-to-memory.
Constant-to-CPU register.
CPU-to-output.
etc.
• Arithmetic Logic Unit (ALU) Instructions.
• Branch (Control) Instructions:
– Unconditional jumps.
– Conditional branches.
EECC550 - Shaaban
#33 Lec # 1 Winter 2005 11-29-2005
Examples of Data Movement Instructions
Instruction
Meaning
Machine
MOV A,B
Move 16-bit data from memory loc. A to loc. B
VAX11
lwz R3,A
Move 32-bit data from memory loc. A to register R3
PPC601
li $3,455
Load the 32-bit integer 455 into register $3
MIPS R3000
MOV AX,BX
Move 16-bit data from register BX into register AX
Intel X86
LEA.L (A0),A2
Load the address pointed to by A0 into A2
MC68000
EECC550 - Shaaban
#34 Lec # 1 Winter 2005 11-29-2005
Examples of ALU Instructions
Instruction
Meaning
Machine
MULF A,B,C
Multiply the 32-bit floating point values at mem.
locations A and B, and store result in loc. C
VAX11
nabs r3,r1
Store the negative absolute value of register r1 in r2
PPC601
ori $2,$1,255
Store the logical OR of register $1 with 255 into $2
MIPS R3000
SHL AX,4
Shift the 16-bit value in register AX left by 4 bits
Intel X86
ADD.L D0,D1
Add the 32-bit values in registers D0, D1 and store
the result in register D0
MC68000
EECC550 - Shaaban
#35 Lec # 1 Winter 2005 11-29-2005
Examples of Branch Instructions
Instruction
Meaning
Machine
BLBS A, Tgt
Branch to address Tgt if the least significant bit
at location A is set.
VAX11
bun r2
Branch to location in r2 if the previous comparison
signaled that one or more values was not a number.
PPC601
Beq $2,$1,32
Branch to location PC+4+32 if contents of $1 and $2
are equal.
MIPS R3000
JCXZ Addr
Jump to Addr if contents of register CX = 0.
Intel X86
BVS next
Branch to next if overflow flag in CC is set.
MC68000
EECC550 - Shaaban
#36 Lec # 1 Winter 2005 11-29-2005
Operation Types in The Instruction Set
Operator Type
Arithmetic and logical
Examples
Integer arithmetic and logical operations: add, or
Data transfer
Loads-stores (move on machines with memory
addressing)
Control
Branch, jump, procedure call, and return, traps.
System
Operating system call/return, virtual memory
management instructions ...
Floating point
Floating point operations: add, multiply ....
Decimal
Decimal add, decimal multiply, decimal to
character conversion
String
String move, string compare, string search
Media
The same operation performed on multiple data
(e.g Intel MMX, SSE)
EECC550 - Shaaban
#37 Lec # 1 Winter 2005 11-29-2005
Instruction Usage Example:
Top 10 Intel X86 Instructions
Rank
instruction
Integer Average Percent total executed
1
load
22%
2
conditional branch
20%
3
compare
16%
4
store
12%
5
add
8%
6
and
6%
7
sub
5%
8
move register-register
4%
9
call
1%
10
return
1%
Total
96%
Observation: Simple instructions dominate instruction usage frequency.
CISC to RISC observation
EECC550 - Shaaban
#38 Lec # 1 Winter 2005 11-29-2005
Types of Instruction Set Architectures
According To Operand Addressing Fields
Memory-To-Memory Machines:
– Operands obtained from memory and results stored back in memory by any
instruction that requires operands.
– No local CPU registers are used in the CPU datapath.
– Include:
• The 4 Address Machine.
• The 3-address Machine.
• The 2-address Machine.
The 1-address (Accumulator) Machine:
– A single local CPU special-purpose register (accumulator) is used as the source of
one operand and as the result destination.
The 0-address or Stack Machine:
– A push-down stack is used in the CPU.
General Purpose Register (GPR) Machines:
– The CPU datapath contains several local general-purpose registers which can
be used as operand sources and as result destinations.
– A large number of possible addressing modes.
– Load-Store or Register-To-Register Machines: GPR machines where only
data movement instructions (loads, stores) can obtain operands from memory
and store results to memory.
CISC to RISC observation (load-store simplifies CPU design)
EECC550 - Shaaban
#39 Lec # 1 Winter 2005 11-29-2005
Types of Instruction Set Architectures
Memory-To-Memory Machines:
The 4-Address Machine
•
•
No program counter (PC) or other CPU registers are used.
Instruction encoding has four address fields to specify:
– Location of first operand. - Location of second operand.
– Place to store the result.
- Location of next instruction.
Instruction:
Memory
CPU
add Res, Op1, Op2, Nexti
Op1Addr: Op1
Op2Addr: Op2
Meaning:
Res  Op1 + Op2
+
or more precise RTN:
M[ResAddr]  M[Op1Addr] + M[Op2Addr]
ResAddr: Res
:
:
Instruction Format (encoding)
Bits:
NextiAddr: Nexti
Can address
224
Instruction
Size:
13 bytes
bytes = 16 MBytes
8
24
add
ResAddr
Opcode
Which
operation
Where to
put result
24
24
Op1Addr
Op2Addr
Where to find operands
24
NextiAddr
Where to find
next instruction
EECC550 - Shaaban
#40 Lec # 1 Winter 2005 11-29-2005
Types of Instruction Set Architectures
Memory-To-Memory Machines:
The 3-Address Machine
•
•
A program counter (PC) is included within the CPU which points to the next
instruction.
No CPU storage (general-purpose registers).
Memory
CPU
add Res, Op1, Op2
Op1Addr: Op1
Op2Addr: Op2
Instruction:
+
ResAddr: Res
:
:
Meaning:
Res  Op1 + Op2
or more precise RTN:
M[ResAddr]  M[Op1Addr] + M[Op2Addr]
PC  PC + 10 Increment PC
Where to find
next instruction
NextiAddr: Nexti
Program
24
Counter (PC)
Can address 224 bytes = 16 MBytes
Instruction
Size:
10 bytes
Instruction Format (encoding)
Bits:
8
24
add
ResAddr
Opcode
Which
operation
24
Where to
put result
Op1Addr
24
Op2Addr
Where to find operands
EECC550 - Shaaban
#41 Lec # 1 Winter 2005 11-29-2005
Types of Instruction Set Architectures
Memory-To-Memory Machines:
The 2-Address Machine
•
The 2-address Machine: Result is stored in the memory address of one of
the operands.
Instruction:
Memory
Op1Addr:
CPU
Meaning:
Op1
+
Op2Addr: Op2,Res
:
:
Op2  Op1 + Op2
or more precise RTN:
M[Op2Addr]  M[Op1Addr] + M[Op2Addr]
PC  PC + 7 Increment PC
Instruction Format (encoding)
Where to find
next instruction
NextiAddr: Nexti
add Op2, Op1
Program
24
Counter (PC)
Bits:
8
24
add
Op2Addr
Opcode
Which
operation
24
Op1Addr
Where to find operands
Where to
put result
Instruction
Size:
7 bytes
EECC550 - Shaaban
#42 Lec # 1 Winter 2005 11-29-2005
Types of Instruction Set Architectures
The 1-address (Accumulator) Machine
•
A single accumulator in the CPU is used as the source of one operand and
result destination.
Instruction:
Memory
Op1Addr:
CPU
add Op1
Meaning:
Op1
+
:
:
Accumulator
Where to find
next instruction
NextiAddr: Nexti
Where to find
operand2, and
where to put result
Program
24
Counter (PC)
Acc  Acc + Op1
or more precise RTN:
Acc  Acc + M[Op1Addr]
PC  PC + 4 Increment PC
Instruction Format (encoding)
Bits:
8
24
add
Op1Addr
Opcode
Where to find
Which
operand1
operation
Instruction
Size:
4 bytes
EECC550 - Shaaban
#43 Lec # 1 Winter 2005 11-29-2005
Types of Instruction Set Architectures
The 0-address (Stack) Machine
•
A push-down stack is used in the CPU.
4 Bytes
Memory
push
Op1Addr: Op1
Op2Addr: Op2
ResAddr: Res
:
:
Instruction Format
24
Bits: 8
CPU
Stack
pop
TOS
Op2, Res
SOS
Op1
add
+
etc.
Instruction:
push Op1Addr
push Op1
Opcode Where to find
operand
Meaning:
TOS  M[Op1Addr]
Instruction:
Instruction Format
1 Byte
add
Bits: 8
Meaning:
add
TOS  TOS + SOS
Opcode
8
4 Bytes
NextiAddr: Nexti
TOS = Top Entry in Stack
SOS = Second Entry in Stack
Program
24
Counter (PC)
Instruction Format
24
Bits: 8
pop ResAddr
Instruction:
pop Res
Opcode
Memory
Destination
Meaning:
M[ResAddr]  TOS
EECC550 - Shaaban
#44 Lec # 1 Winter 2005 11-29-2005
Types of Instruction Set Architectures
General Purpose Register (GPR) Machines
• CPU contains several general-purpose registers which can
be used as operand sources and result destination.
CPU
Memory
Registers
Op1Addr: Op1
load
add
+
:
:
NextiAddr: Nexti
store
R8
R7
R6
R5
R4
R3
R2
R1
Program
24
Counter (PC)
Instruction Format
Instruction:
3
24
Bits: 8
load R8, Op1
load R8 Op1Addr
Meaning:
R8  M[Op1Addr] Opcode
Where to find
operand1
PC  PC + 5
Size = 4.375 bytes rounded up to 5 bytes
Instruction:
add R2, R4, R6
Meaning:
R2  R4 + R6
PC  PC + 3
Instruction Format
3
3
3
Bits: 8
add
R2 R4 R6
Opcode Des Operands
Size = 2.125 bytes rounded up to 3 bytes
Instruction Format
Instruction:
3
24
Bits: 8
store R2, Op2
Meaning:
store R2 ResAddr
M[Op2Addr]  R2
Opcode
Destination
PC  PC + 5
Here add instruction has three register specifier fields
While load, store instructions have one register specifier field
and one memory address specifier field
Size = 4.375 bytes rounded up to 5 bytes
EECC550 - Shaaban
#45 Lec # 1 Winter 2005 11-29-2005
Expression Evaluation Example with 3-, 2-,
1-, 0-Address, And GPR Machines
For the expression A = (B + C) * D - E
3-Address
2-Address
add A, B, C load A, B
mul A, A, D add A, C
sub A, A, E mul A, D
sub A, E
3 instructions
Code size:
30 bytes
9 memory
accesses for
data
1-Address
Accumulator
load B
add C
mul D
sub E
store A
4 instructions 5 instructions
Code size:
Code size:
28 bytes
12 memory
accesses for
data
20 bytes
5 memory
accesses for
data
where A-E are in memory
GPR
0-Address
Load-Store
Register-Memory
Stack
push B
push C
add
push D
mul
push E
sub
pop A
8 instructions
Code size:
23 bytes
5 memory
accesses for
data
load R1, B
add R1, C
mul R1, D
sub R1, E
store A, R1
5 instructions
Code size:
25 bytes
5 memory
accesses for
data
load R1, B
load R2, C
add R3, R1, R2
load R1, D
mul R3, R3, R1
load R1, E
sub R3, R3, R1
store A, R3
8 instructions
Code size:
34 bytes
5 memory
accesses for
data
EECC550 - Shaaban
#46 Lec # 1 Winter 2005 11-29-2005
Typical ISA Addressing Modes
Addressing
Mode
Sample
Instruction
Meaning
Register
Add R4, R3
R4  R4 + R3
Immediate
Add R4, #3
R4  R4 + 3
Displacement
Add R4, 10 (R1)
R4  R4 + Mem[10+ R1]
Indirect
Add R4, (R1)
R4  R4 + Mem[R1]
Indexed
Add R3, (R1 + R2)
R3  R3 +Mem[R1 + R2]
Absolute
Add R1, (1001)
R1  R1 + Mem[1001]
Memory indirect
Add R1, @ (R3)
R1  R1 + Mem[Mem[R3]]
Autoincrement
Add R1, (R2) +
R1 R1 + Mem[R2]
R2  R2 + d
Autodecrement
Add R1, - (R2)
R2 R2 - d
R1 R1 + Mem[R2]
Scaled
Add R1, 100 (R2) [R3]
For GPR ISAs
R1 R1+ Mem[100+ R2 + R3*d]
EECC550 - Shaaban
#47 Lec # 1 Winter 2005 11-29-2005
Addressing Modes Usage Example
For 3 programs running on VAX ignoring direct register mode:
Displacement
42% avg, 32% to 55%
75%
Immediate:
33% avg, 17% to 43%
Register deferred (indirect):
13% avg, 3% to 24%
Scaled:
7% avg, 0% to 16%
Memory indirect:
3% avg, 1% to 6%
Misc:
2% avg, 0% to 3%
88%
75% displacement & immediate
88% displacement, immediate & register indirect.
Observation: In addition Register direct, Displacement,
Immediate, Register Indirect addressing modes are important.
CISC to RISC observation
(fewer addressing modes simplify CPU design)
EECC550 - Shaaban
#48 Lec # 1 Winter 2005 11-29-2005
Displacement Address Size Example
Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs
Int. Avg.
FP Avg.
30%
20%
10%
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0%
Displacement Address Bits Needed
1% of addresses > 16-bits
12 - 16 bits of displacement needed
CISC to RISC observation
EECC550 - Shaaban
#49 Lec # 1 Winter 2005 11-29-2005
Instruction Set Encoding
Considerations affecting instruction set encoding:
– To have as many registers and addressing modes as
possible.
– The Impact of of the size of the register and addressing
mode fields on the average instruction size and on the
average program.
– To encode instructions into lengths that will be easy to
handle in the implementation. On a minimum to be
a multiple of bytes.
• Fixed length encoding: Faster and easiest to implement in
hardware. e.g. Simplifies design of pipelined CPUs
• Variable length encoding: Produces smaller instructions.
• Hybrid encoding.
EECC550 - Shaaban
CISC to RISC observation
#50 Lec # 1 Winter 2005 11-29-2005
Three Examples of Instruction Set Encoding
Operations &
no of operands
Address
specifier 1
Address
field 1
Address
specifier n
Address
field n
Variable Length Encoding: VAX (1-53 bytes)
Operation
Address
field 1
Address
field 2
Fixed Length Encoding:
Operation
Operation
Operation
Address
Specifier
Address
Specifier 1
Address
Specifier
Address
field3
MIPS, PowerPC, SPARC (all instructions are 4 bytes each)
Address
field
Address
Specifier 2
Address
field 1
Address field
Address
field 2
Hybrid Encoding: IBM 360/370, Intel 80x86
EECC550 - Shaaban
#51 Lec # 1 Winter 2005 11-29-2005
Instruction Set Architecture Tradeoffs
• 3-address machine: shortest code sequence; a large number of bits
per instruction; large number of memory accesses.
• 0-address (stack) machine: Longest code sequence; shortest
individual instructions; more complex to program.
Machine = CPU or ISA
• General purpose register machine (GPR):
– Addressing modified by specifying among a small set of
registers with using a short register address (all new ISAs since
1975).
– Advantages of GPR:
• Low number of memory accesses. Faster, since register access
is currently still much faster than memory access.
• Registers are easier for compilers to use.
• Shorter, simpler instructions.
• Load-Store Machines: GPR machines where memory addresses
are only included in data movement instructions (loads/stores)
between memory and registers (all new ISAs designed after 1980).
CISC to RISC observation (load-store simplifies CPU design)
EECC550 - Shaaban
#52 Lec # 1 Winter 2005 11-29-2005
ISA Examples
Machine
Number of General
Purpose Registers
EDSAC
IBM 701
CDC 6600
IBM 360
DEC PDP-8
DEC PDP-11
Intel 8008
Motorola 6800
DEC VAX
1
1
8
16
1
8
1
1
16
Intel 8086
Motorola 68000
Intel 80386
MIPS
HP PA-RISC
SPARC
PowerPC
DEC Alpha
HP/Intel IA-64
AMD64 (EMT64)
1
16
8
32
32
32
32
32
128
16
Architecture
year
accumulator
accumulator
load-store
register-memory
accumulator
register-memory
accumulator
accumulator
register-memory
memory-memory
extended accumulator
register-memory
register-memory
load-store
load-store
load-store
load-store
load-store
load-store
register-memory
1949
1953
1963
1964
1965
1970
1972
1974
1977
1978
1980
1985
1985
1986
1987
1992
1992
2001
2003
EECC550 - Shaaban
#53 Lec # 1 Winter 2005 11-29-2005
Examples of GPR Machines
For Arithmetic/Logic (ALU) Instructions
Max. number of
memory addresses
0
(ISAs)
Max. number
of operands allowed
3
SPARC, MIPS
PowerPC, ALPHA
1
2
Intel 80386
Motorola 68000
2 or 3
2 or 3
VAX
EECC550 - Shaaban
#54 Lec # 1 Winter 2005 11-29-2005
Complex Instruction Set Computer (CISC)
ISAs
• Emphasizes doing more with each instruction.
• Motivated by the high cost of memory and hard disk
capacity when original CISC architectures were proposed:
– When M6800 was introduced: 16K RAM = $500, 40M hard disk = $ 55, 000
– When MC68000 was introduced: 64K RAM = $200, 10M HD = $5,000 Circa 1980
• Original CISC architectures evolved with faster, more
complex CPU designs, but backward instruction set
compatibility had to be maintained.
• Wide variety of addressing modes:
• 14 in MC68000, 25 in MC68020
• A number instruction modes for the location and number of
operands:
• The VAX has 0- through 3-address instructions.
• Variable-length or hybrid instruction encoding is used.
EECC550 - Shaaban
#55 Lec # 1 Winter 2005 11-29-2005
Example CISC ISAs
Motorola 680X0
18 addressing modes:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Data register direct.
Address register direct.
Immediate.
Absolute short.
Absolute long.
Address register indirect.
Address register indirect with postincrement.
Address register indirect with predecrement.
Address register indirect with displacement.
Address register indirect with index (8-bit).
Address register indirect with index (base).
Memory inderect postindexed.
Memory indirect preindexed.
Program counter indirect with index (8-bit).
Program counter indirect with index (base).
Program counter indirect with displacement.
Program counter memory indirect postindexed.
Program counter memory indirect preindexed.
Operand size:
•
Range from 1 to 32 bits, 1, 2, 4, 8,
10, or 16 bytes.
Instruction Encoding:
•
Instructions are stored in 16-bit
words.
•
the smallest instruction is 2- bytes
(one word).
•
The longest instruction is 5 words
(10 bytes) in length.
EECC550 - Shaaban
#56 Lec # 1 Winter 2005 11-29-2005
Example CISC ISA:
Intel 80386
12 addressing modes:
•
•
•
•
•
•
•
•
•
•
•
•
Register.
Immediate.
Direct.
Base.
Base + Displacement.
Index + Displacement.
Scaled Index + Displacement.
Based Index.
Based Scaled Index.
Based Index + Displacement.
Based Scaled Index + Displacement.
Relative.
Operand sizes:
•
Can be 8, 16, 32, 48, 64, or 80 bits long.
•
Also supports string operations.
Instruction Encoding:
•
The smallest instruction is one byte.
•
The longest instruction is 12 bytes long.
•
The first bytes generally contain the opcode,
mode specifiers, and register fields.
•
The remainder bytes are for address
displacement and immediate data.
EECC550 - Shaaban
#57 Lec # 1 Winter 2005 11-29-2005
Reduced Instruction Set Computer (RISC)
~1984
ISAs
• Focuses on reducing the number and complexity of
Machine = CPU or ISA
instructions of the machine.
• Reduced number of cycles needed per instruction.
– Goal: At least one instruction completed per clock cycle.
•
•
•
•
Designed with CPU instruction pipelining in mind.
Fixed-length instruction encoding.
Only load and store instructions access memory.
Simplified addressing modes.
– Usually limited to immediate, register indirect, register
displacement, indexed.
• Delayed loads and branches.
• Prefetch and speculative execution.
• Examples: MIPS, HP PA-RISC, SPARC, Alpha, PowerPC.
EECC550 - Shaaban
#58 Lec # 1 Winter 2005 11-29-2005
Example RISC ISA:
PowerPC
8 addressing modes:
•
•
•
•
•
•
•
•
Register direct.
Immediate.
Register indirect.
Register indirect with immediate
index (loads and stores).
Register indirect with register index
(loads and stores).
Absolute (jumps).
Link register indirect (calls).
Count register indirect (branches).
Operand sizes:
•
Four operand sizes: 1, 2, 4 or 8 bytes.
Instruction Encoding:
•
Instruction set has 15 different formats
with many minor variations.
•
•
All are 32 bits in length.
EECC550 - Shaaban
#59 Lec # 1 Winter 2005 11-29-2005
Example RISC ISA:
HP Precision Architecture
HP PA-RISC
7 addressing modes:
•
•
•
•
•
•
•
Register
Immediate
Base with displacement
Base with scaled index and
displacement
Predecrement
Postincrement
PC-relative
Operand sizes:
•
Five operand sizes ranging in powers of
two from 1 to 16 bytes.
Instruction Encoding:
•
Instruction set has 12 different formats.
•
•
All are 32 bits in length.
EECC550 - Shaaban
#60 Lec # 1 Winter 2005 11-29-2005
Example RISC ISA:
SPARC
5 addressing modes:
•
•
•
•
•
Register indirect with immediate
displacement.
Register inderect indexed by another
register.
Register direct.
Immediate.
PC relative.
Operand sizes:
•
Four operand sizes: 1, 2, 4 or 8 bytes.
Instruction Encoding:
•
Instruction set has 3 basic instruction
formats with 3 minor variations.
•
All are 32 bits in length.
EECC550 - Shaaban
#61 Lec # 1 Winter 2005 11-29-2005
Example RISC ISA:
DEC Alpha AXP
4 addressing modes:
•
•
•
•
Register direct.
Immediate.
Register indirect with displacement.
PC-relative.
Operand sizes:
•
Four operand sizes: 1, 2, 4 or 8 bytes.
Instruction Encoding:
•
Instruction set has 7 different formats.
•
•
All are 32 bits in length.
EECC550 - Shaaban
#62 Lec # 1 Winter 2005 11-29-2005
RISC ISA Example:
MIPS R3000 (32-bit)
Instruction Categories:
•
•
•
•
•
•
5 Addressing Modes:
•
•
•
Load/Store.
Computational.
Jump and Branch.
Floating Point
(using coprocessor).
Memory Management.
Special.
•
•
Register direct (arithmetic).
Immedate (arithmetic).
Base register + immediate offset
(loads and stores).
PC relative (branches).
Pseudodirect (jumps)
Registers
R0 - R31
PC
HI
Operand Sizes:
•
Memory accesses in any
multiple between 1 and 4 bytes.
LO
Instruction Encoding: 3 Instruction Formats, all 32 bits wide.
OP
rs
rt
OP
rs
rt
OP
rd
sa
funct
immediate
jump target
MIPS is the target ISA for CPU design in this course
EECC550 - Shaaban
#63 Lec # 1 Winter 2005 11-29-2005
Evolution of Instruction Set Architectures
Single Accumulator (EDSAC 1949)
Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model
from Implementation
High-level Language Based
(B5000 1963)
Concept of an ISA Family
(IBM 360 1964)
General Purpose Register (GPR) Machines
Complex Instruction Sets (CISC)
(Vax, Motorola 68000, Intel x86 1977-80)
Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
Reduced Instruction Set Computer (RISC)
(MIPS, SPARC, HP-PA, PowerPC, . . . 1984..)
EECC550 - Shaaban
#64 Lec # 1 Winter 2005 11-29-2005

EECC550 - Shaaban

Transcript EECC550 - Shaaban

Directory