CPSC 321 Computer Architecture

Download Report

Transcript CPSC 321 Computer Architecture

CPSC 321
Computer Architecture
Fall 2006
Lecture 1
Introduction and Five Components of a Computer
Adapted from CS 152 Spring 2002 UC Berkeley
Copyright (C) 2001 UCB
Course Instructor
Rabi Mahapatra
E-mail: ([email protected]),
Sections: 501-503:MWF 12:40 – 1:30 PM
• 520B, HRBB tel: 845-5787
• Office Hours: After the Class
TA Information
• Suman K Mandal
Email:
Office:
Office Hours:
• Lei Wu Phone:
E-mail: ([email protected])
Office: 526, HRBB tel: 571-2640
Office Hour: TBD
Course Information [contd…]
• Grading: Projects, Assignments, Exams
– Assignments
20%
– Mid Term
25%
– Finals
25%
– Projects
30%
• Labs
– MIPS (Assembly Programming), Verilog (HDL)
• Projects
– Project 1: MIPS
– Projects 2 & 3: Verilog (Datapath Design)
Course Information [contd…]
• Book (Required)
– Computer Organization and Design: The Hardware/Software
Interface, Third Edition ,
David A. Patterson and John L. Hennessy, Morgan Kaufmann
Publishers.
Do not get second edition
REFERENCES:
– Digital Design
M. Morris Mano, 3rd Edition, Prentice Hall
– The Verilog Hardware Description Language
Thomas & Morby, 5th Edition, Kluwer Academic Publishers
– Check the course webpage for other materials and links
Course Information [contd…]
• Course Webpage
– http://courses.cs.tamu.edu/rabi/cpsc321/
• CS Accounts
– Use your CS accounts to turnin and check
any email regarding course
Course Overview
Input
Multiplier
Input
Multiplicand
32
Multiplicand
Register
LoadMp
32=>34
signEx
<<1
32
34
34
32=>34
signEx
1
0
34x2 MUX
Arithmetic
Multi x2/x1
34
34
Sub/Add
34-bit ALU
Control
Logic
32
LO register
(16x2 bits)
Prev
2
Booth
Encoder
HI register
(16x2 bits)
LO[1]
ENC[2]
ENC[1]
ENC[0]
LoadLO
2
LoadHI
Extra
2 bits
2
ShiftAll
ClearHI
2
"LO
[0]"
34
32
Computer Arithmetic
32
Result[HI]
LO[1:0]
32
Result[LO]
Single/multicycle
Datapaths
Datapaths
Course Overview [contd…]
IFetchDcd
Exec Mem
IFetchDcd
WB
Exec Mem
IFetchDcd
WB
Exec Mem
IFetchDcd
WB
Exec Mem
Performance
WB
Pipelining
Memory
Memory Systems
What’s In It For Me ?
• In-depth understanding of the inner-workings of
modern computers, their evolution, and tradeoffs present at the hardware/software boundary.
– Insight into fast/slow operations that are easy/hard to
implementation hardware
• Experience with the design process in the
context of a large complex (hardware) design.
– Functional Spec --> Control & Datapath --> Physical
implementation
– Modern CAD tools
Computer Architecture - Definition
• Computer Architecture = ISA + MO
• Instruction Set Architecture
– What the executable can “see” as underlying hardware
– Logical View
• Machine Organization
– How the hardware implements ISA ?
– Physical View
Computer Architecture – Changing Definition
• 1950s to 1960s: Computer Architecture Course:
–Computer Arithmetic
• 1970s to mid 1980s: Computer Architecture Course:
–Instruction Set Design, especially ISA appropriate for compilers
• 1990s: Computer Architecture Course:
Design of CPU, memory system, I/O system, Multiprocessors,
Networks
• 2000s: Computer Architecture Course:
–Non Von-Neumann architectures, Reconfiguration
• DNA Computing, Quantum Computing ????
Some Examples …
° Digital Alpha
(v1, v3)
1992-97
RIP soon
° HP PA-RISC (v1.1, v2.0)
1986-96
RIP soon
° Sun SPARC (v8, v9)
1987-95
° SGI MIPS
1986-96
(MIPS I, II, III, IV, V)
° IA-16/32 (8086,286,386, 486,
Pentium, MMX, SSE, …)
1978-1999
° IA-64 (Itanium)
1996-now
° AMD64/EMT64
2002-now
° IBM POWER (PowerPC,…)
1990-now
° Many dead processor architectures live on in
° microcontrollers
The MIPS R3000 ISA (Summary)
• Instruction Categories
– Load/Store
– Computational
– Jump and Branch
– Floating Point
• coprocessor
– Memory Management
– Special
R0 - R31
PC
HI
LO
3 Instruction Formats: all 32 bits wide
OP
rs
rt
OP
rs
rt
OP
rd
sa
immediate
jump target
funct
“What” is Computer Architecture ?
Application
Operating
System
Compiler
CPSC 321
Firmware
Instr. Set Proc. I/O system
Datapath & Control
Digital Design
Circuit Design
Layout
• Coordination of many levels of abstraction
• Under a rapidly changing set of forces
• Design, Measurement, and Evaluation
Instruction Set
Architecture
Impact of changing ISA
• Early 1990’s Apple switched instruction set
architecture of the Macintosh
– From Motorola 68000-based machines
– To PowerPC architecture
• Intel 80x86 Family: many implementations
of same architecture
– program written in 1978 for 8086 can be run
on latest Pentium chip
Factors affecting ISA ???
Technology
Programming
Languages
Applications
Computer
Architecture
Cleverness
Operating
Systems
History
ISA: Critical Interface
software
instruction set
hardware
Examples: 80x86 50,000,000 vs. MIPS 5500,000 ???
The Big Picture
Processor
Input
Control
Memory
Datapath
Output
Since 1946 all computers have had 5 components!!!
Example Organization
• TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20
MBus Module
SuperSPARC
Floating-point Unit
L2
$
Integer Unit
Inst
Cache
Ref
MMU
Data
Cache
CC
MBus
L64852 MBus control
M-S Adapter
SBus
Store
Buffer
Bus Interface
DRAM
Controller
SBus
DMA
SBus
Cards
SCSI
Ethernet
STDIO
serial
kbd
mouse
audio
RTC
Floppy
Technology Trends
• Processor
– logic capacity: about 30% per year
– clock rate:
about 20% per year
• Memory
– DRAM capacity: about 60% per year (4x every 3 years)
– Memory speed: about 10% per year
– Cost per bit: improves about 25% per year
• Disk
– capacity: about 60% per year
– Total use of data: 100% per 9 months!
• Network Bandwidth
– Bandwidth increasing more than 100% per year!
Technology Trends
Microprocessor Logic Density
DRAM chip capacity
°
°
10000000
uP-Name
R10000
Pentium
R4400
i80486
1000000
Transistors
Year
1980
1983
1986
1989
1992
1996
1999
2002
DRAM
Size
64 Kb
256 Kb
1 Mb
4 Mb
16 Mb
64 Mb
256 Mb
1 Gb
100000000
i80386
i80286
100000
R3010
i8086
SU MIPS
i80x86
M68K
10000
MIPS
Alpha
i4004
1000
1965
1970
1975
1980
1985
1990
1995
2000
2005
In ~1985 the single-chip processor (32-bit) and the single-board computer emerged
In the 2002+ timeframe, these may well look like mainframes compared single-chip
computer (maybe 2 chips)
Technology Trends
Smaller feature sizes – higher speed, density
ECE/CS 752; copyright J. E. Smith, 2002 (Univ. of Wisconsin)
Technology Trends
Number of transistors doubles every 18 months
(amended to 24 months)
ECE/CS 752; copyright J. E. Smith, 2002 (Univ. of Wisconsin)
Levels of Representation
temp = v[k];
High Level Language
Program
v[k] = v[k+1];
v[k+1] = temp;
Compiler
•
•
•
•
Assembly Language
Program
Assembler
Machine Language
Program
0000
1010
1100
0101
1001
1111
0110
1000
lw
lw
sw
sw
1100
0101
1010
0000
0110
1000
1111
1001
$15,
$16,
$16,
$15,
1010
0000
0101
1100
0($2)
4($2)
0($2)
4($2)
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
Machine Interpretation
Control Signal
Specification
ALUOP[0:3] <= InstReg[9:11] & MASK
Execution Cycle
Instruction
Obtain instruction from program storage
Fetch
Instruction
Determine required actions and instruction size
Decode
Operand
Locate and obtain operand data
Fetch
Execute
Result
Compute result value or status
Deposit results in storage for later use
Store
Next
Instruction
Determine successor instruction
The Role of Performance
Example of Performance Measure
Performance Metrics
• Response Time
– Delay between start end end time of a task
• Throughput
– Numbers of tasks per given time
• New: Power/Energy
– Energy per task, power
Examples
(Throughput/Performance)
• Replace the processor with a faster
version?
– 3.8 GHz instead of 3.2 GHz
• Add an additional processor to a system?
– Core Duo instead of P4
Measuring Performance
• Wall-clock time –or- Total Execution Time
• CPU Time
– User Time
– System Time
Try using time command on UNIX system
Relating the Metrics
• Performance = 1/Execution Time
• CPU Execution Time = CPU clock cycles
for program x Clock cycle time
• CPU clock cycles = Instructions for a
program x Average clock cycles per
Instruction
Amdahl’s Law
• Pitfall: Expecting the improvement of one aspect of a
machine to increase performance by an amount
proportional to the size of improvement
Amhdahl’s Law [contd…]
• A program runs in 100 seconds on a machine, with multiply
operations responsible for 80 seconds of this time. How much do I
have to improve the speed of multiplication if I want my program to
run five times faster ?
• Execution Time After improvement =
(exec time affected by improvement/amount of improvement) + exec
time unaffected
exec time after improvement = (80 seconds / n) + (100 – 80 seconds)
We want performance to be 5 times faster =>
20 seconds = 80/n seconds / n + 20 seconds
0 = 80 / n !!!!
Amdahl’s Law [contd…]
• Opportunity for improvement is affected by
how much time the event consumes
• Make the common case fast
• Very high speedup requires making nearly
every case fast
• Focus on overall performance, not one
aspect
Summary
• Computer Architecture = Instruction Set Architure + Machine
Organization
• All computers consist of five components
– Processor: (1) datapath and (2) control
– (3) Memory
– (4) Input devices and (5) Output devices
• Not all “memory” are created equally
– Cache: fast (expensive) memory are placed closer to the
processor
– Main memory: less expensive memory--we can have more
• Interfaces are where the problems are - between functional units
and between the computer and the outside world
• Need to design against constraints of performance, power, area and
cost
Summary
• Performance “eye of the beholder”
Seconds/program =
(Instructions/Pgm)x(Clk Cycles/Instructions)x(Seconds/Clk cycles)
• Amdahl’s Law “Make the Common Case
Faster”