EE 382 Computer Organization and Design
Download
Report
Transcript EE 382 Computer Organization and Design
EE382
Processor Design
Stanford University
Winter Quarter 1998-1999
Instructor: Michael Flynn
Teaching Assistant: Steve Chou
Administrative Assistant: Susan Gere
Lecture 1 - Introduction
Michael Flynn EE382 Winter/99
Slide 1
Class Objectives
Learn theoretical analysis and limits
— develop intuition
— project long-term trends and bound design space more
efficiently than simulation
Learn models for VLSI component cost tradeoffs
— emphasis on microprocessor
Learn modeling techniques for computer system
performance
— emphasis on queuing
Put it all together to balance system performance and
cost
— Emphasis on multiprocessors, memory, and I/O
— Practical examples and design targets
Michael Flynn EE382 Winter/99
Slide 2
Course Prerequisites
Computer Architecture and Organization (EE282)
— Instruction Set Architecture
— Machine Organization
— Basic Pipeline Design
— Cache Organization
— Branch Prediction
— Superscalar Execution
• In-Order
• Out-of-Order
Statistics
— Basic probability
• distribution functions
• statistical measures
— Familiarity with stochastic processes and Markov models
is helpful, but not required
Michael Flynn EE382 Winter/99
Slide 3
Course Information
Access to the course web page is necessary
http://www-leland.stanford.edu/class/ee382/
— Course info, assignments, old exams, design
tools,FAQs, ...
Textbook and reference material
— Computer Architecture: Pipelined and Parallel Processor
Design, Michael J. Flynn
Problem set and design problem philosophy
— Learn by doing: maximize learning/effort
Exam philosophy
— Extend what you have learned
— Open-book, not a speed or trick contest
You are expected to give us feedback
— Questions, office hours, email, surveys
Michael Flynn EE382 Winter/99
Slide 4
Grading
Problem Sets and Design Problems 40%
— 6 problem sets,
— 2 design problems
Midterm 20%
Final Exam 40%
— Covers entire course
— Scheduled March 15, 8:30-11:30AM
Michael Flynn EE382 Winter/99
Slide 5
Key Concepts of Abstraction
Instruction Set Architecture (ISA)
— Functional interface for assembly-language programmer
— Examples: SGI MIPS, Sun SPARC, PowerPC, HPPA, DEC
Alpha, Intel (x86), IBM System/390, IBM AS/400
Implementation (Machine Organization)
— Partitioning into units and logic design
— Examples
• Intel386 CPU, Intel486 CPU, Pentium® Processor, Pentium® Pro
Processor
• Alpha 21064, 21164, 21264
Realization
— Physical fabrication and assembly
— Examples
• IBM 709(‘54) built with vacuum tubes and 7090(‘59) built with transistors
• Pentium Processor in 0.8 mm, 0.6mm, 0.35 mm BiCMOS/CMOS
Michael Flynn EE382 Winter/99
Slide 6
Instruction Set Architecture
“... the attributes of a [computing] system as seen by the
programmer, i.e. the conceptual structure and
functional behavior, as distinct from the organization of
the data flow and controls, the logical design, and the
physical implementation.”
Amdahl, Blaauw, and Brooks, 1964
Consists of:
— Organization of storage
— Data types
— Encodings and representations (instruction formats)
— Instruction (or Operation Code) Set
— Modes for addressing data Items and instructions
— Program visible exceptional conditions
Specifies requirements for binary compatibility across
implementations
Michael Flynn EE382 Winter/99
Slide 7
Instruction Set Types
Load/Store (L/S)
— Only load and store instructions refer to memory
• no memory ALU ops
— used by several microprocessors
• Power PC, HP, DEC Alpha
Register/Memory (R/M)
— ALU operations can have either source or destination in
memory
— Used by mainframes and most microprocessors
• IBM System/370, Intel Architecture (x86), all x86 compatables
Register or Memory (R+M)
— ALU operations can have any/all operands in memory
— Not used commonly now
• DEC Vax
Michael Flynn EE382 Winter/99
Slide 8
L/S ISA General Characteristics
32 GPR x 32b....more recently 64b
instr size: 32b... more recently 64b
instr types
— R1 <- R2 op R3 for ALU ops
— R1 <-> MEM [RB,D] for LD/ST
Michael Flynn EE382 Winter/99
Slide 9
R/M ISA General Characteristics
16 GPR x 32b
instr size...16b, 32b, 48b
instr types
— RR R1 <- R1 op R2
— RM R1 <- R1 op MEM [RB,RX,D]
— MM MEM1 [RB,RX,D] <- MEM1 [RB,RX,D] op MEM2
[RB,RX,D] used for character, decimal ops only.
Michael Flynn EE382 Winter/99
Slide 10
ISA Syntax Terminology
OP.type destination, source1,source2
— eg ADD.F R1,R2,R3 puts result of floating pt. add in
floating reg 1.
— OP without type implies integer type unless fp is clear
from the context.
— destination is always first operand, so that store is ST
MEM [RB,RX,D], R2
Michael Flynn EE382 Winter/99
Slide 11
ISA Assumptions
assume all i.s. have a PSW and condition codes...CC
Branch is BC.CC target, target is either R or Mem.
unconditional branch is BR, even though it’s
implemented with BC
other branches BCT, BAL (branch and link)
Michael Flynn EE382 Winter/99
Slide 12
Moore’s Law
Transistors
Per Die
108
16M
Memory
Microprocessor
107
4M
1M
106
256K
64K
105
4K
103
Intel386™Processor
80286 Processor
16K
104 1K
Pentium™
Intel486™Processor
8086
8080
4004
102
101
1
1970
1975
1980
1985
1990
1995
2000
Moore’s Law: No. Tx per chip increases 4X every 3 years
CAGR = 60%
Michael Flynn EE382 Winter/99
Source: Intel
Slide 13
Die Size Growth
Die Size (mm 2)
1000
Pentium (tm)
LOGIC
100
68040
80486
80386
68020
80286
68000
8086
16M
4M
1M
256K
DRAM
64K
10
1975
1980
1985
1990
1995
2000
Year
Michael Flynn EE382 Winter/99
Source: Intel
Slide 14
Finer Lithography
Resolution (mm)
10
1.0
1
Resolution
Overlay
CD Control
Generation
0.8
0.5
0.35
0.25
0.1
0.01
'83
'86
'89
'92
'95
'98
'01
YEAR
Michael Flynn EE382 Winter/99
Source: Intel
Slide 15
Limits on scaling
As device sizes get smaller there are difficulties
maintaining the rate of down sizing of feature sizes
It currently appears that around 50nm several factors
may limit scaling
— hot carrier effects
— time dependent dielectric breakdown
— gate tunneling current
— short channel effects and effect on VT
Michael Flynn EE382 Winter/99
Slide 16
Beyond CMOS MOSFETs
If “limits” prove real; there are alternative technologies
with system’s implications
— low temperature CMOS
— sub threshold logic
— new gate oxide materials
— SOI
Michael Flynn EE382 Winter/99
Slide 17
Fabrication Facility Costs
Dollars in Millions
10000
1000
100
10
1
1965
1970
1975
1980
1985
1990
1995
2000
Moore’s Second Law: Fab Costs Grow 40% Per Year
Michael Flynn EE382 Winter/99
Source: VLSI Research, Inc.
Slide 18
Microprocessor Business Model
New “generation” of silicon technology every 2.5-3 years
— 30% reduction in linear dimensions => 50% in area
— 30% reduction in device delay => 50% increase in speed
— Used to reduce cost and improve performance on previous
generation microprocessor
— Used to enable new generation of microprocessor with
wider, more parallel, more functional machine organization
— Incremental changes between generations
Business growth enables investment in new technology
— Driven by performance, new applications, and “dancing
bunny people”
Michael Flynn EE382 Winter/99
Slide 19
Performance Growth
1200
DEC Alpha 21264/600
1100
1000
900
P erf ormance
800
700
600
500
DEC Alpha 5/500
400
300
DEC Alpha 5/300
200
100
SUN-4/ MIPS
260
M/120
0
1987
Figure 1.20 from P&H
1988
IBM
MIPS
M2000 RS6000
1989
1990
DEC Alpha 4/266
IBM POWER 100
DEC AXP/500
HP 9000/750
1991
1992
Year
1993
1994
1995
1996
1997
Workstation Performance Improving 54% per year
That’s almost 1% per week!
Michael Flynn EE382 Winter/99
Slide 20
PC Shipment Growth
Performance Growth and New Applications Drive Volume
Source: Dataquest by A. Yu in IEEE Micro 12/96
Michael Flynn EE382 Winter/99
Slide 21
System Price/Performance
1977
1998
IBM System 360/50
0.15 MIPS
64 KB
$1M
DEC VAX11/780
1 MIPS
1 MB
$200K
Dell Dimension XPS-300
725 MIPS
64 MB
$2412 (1/4/98)
$6.6M per MIPS
$200K per MIPS
$3.33 per MIPS
1965
Photographs from Virtual Computing History Group
Michael Flynn EE382 Winter/99
Slide 22
Representative System
L2 Cache
L1
Icache
L1
Dcache
Pipelines
Registers
CPU
• • •
CPU
Chipset
Memory
I/O Bus(es)
Michael Flynn EE382 Winter/99
Slide 23
Summary
Current architectures exploit parallelism for performance
— Multiple pipelines and caches
— Multiprocessors
Technology costs are increasing rapidly
— High volume is critical to recover costs
• interface standards and evolution necessary
— Product success depends on cost-effective area allocation and
partitioning
Technology capacity and performance increasing rapidly
— Critical to evaluate broad space of design options at each generation
• Opportunity to learn from the past and to innovate
Theoretical analysis and modeling combined with design
targets are powerful tools for developing computer systems.
This course will help prepare you to apply those
for your future career in theory or practice.
Michael Flynn EE382 Winter/99
Slide 24
This Week
Check access to the web page
— Make sure you can read and print
— First problem set will be posted by Friday
Reading
— Scan Chapter 1
— Sections 2.1,2.2
Room Change
— move to Gates B03
— no festival Friday lecture
Michael Flynn EE382 Winter/99
Slide 25