EE282-Autumn 2002 - Oregon State University

Download Report

Transcript EE282-Autumn 2002 - Oregon State University

ECE472
Computer Architecture
Patrick Chiang
TA: Kang-Min Hu
Department of Electrical Engineering
Oregon State University
http://eecs.oregonstate.edu/~pchiang
EE472 – Spring 2007
Lecture 1 - 1
P. Chiang, with Slide Help from
C. Kozyrakis (Stanford)
Is this class for you?
• This class will not be easy
– My first quarter of teaching computer architecture at Oregon State
– Assumes good mastery of basic assembly language programming
– What is the class makeup?
• ECE
1/2
• CS
1/2
– This is “ECE472”, and emphasizes the hardware side of Comp. Arch.
• There is CS472 in Spring 2008 quarter
• Class Breakdown
– 5 Homeworks: 10%
– 1 Midterm: 20%
– 1 Project: 30%
– 1 Final: 40%
• Average grade: around B/B+, with some flexibility
EE472 – Fall 2007
Lecture 1 - 2
P. Chiang with slides from C.
Kozyrakis (Stanford)
Today: What’s the big picture?
• Syllabus: Given this Thursday
• Start with the C-code
• Do the assembly language
• FIRST: How to evaluate whether a computer is “fast”, or “good”?
– Execution Time (time to run process(s))
– Power
– Cost
– Flexibility (complexity, programmability)
EE472 – Fall 2007
Lecture 1 - 3
P. Chiang with slides from C.
Kozyrakis (Stanford)
Applications
I/O Chan
Link
ISA
API
What do Computer Architects Do?
Interfaces
IR
Regs
Technology
Machine Organization
ECE471: Digital VLSI
Computer
Architect
Measurement &
Analysis
Software
Requirements
The science/art of constructing efficient systems for computing tasks
EE472 – Fall 2007
Lecture 1 - 4
P. Chiang with slides from C.
Kozyrakis (Stanford)
What is Computer Architecture?
•
Understanding every level of the complete system:
–
–
–
–
Software
Compiler
Computer Architecture
VLSI digital circuit design
•
–
•
Devices
For a engineer, you must understand “depth” and “breadth”
–
–
Everything is related
Must understand every level of the problem to make the right “choices”
•
•
Cannot just black-box and say: “Not my problem. Someone else will solve it.”
Choice of where you want to go next depends on understanding changes
along the entire vertical structure
–
–
•
For SOC, even analog/mixed-signal design
How is the technology changing? Are there fundamental shifts?
i.e. multi-core, parallel processing
Execution Time = ?
EE472 – Fall 2007
Lecture 1 - 5
P. Chiang with slides from C.
Kozyrakis (Stanford)
Write Some C Code for Me
• C code
• What does the complier do?
– Assembly language
EE472 – Fall 2007
Lecture 1 - 6
P. Chiang with slides from C.
Kozyrakis (Stanford)
Now that we have assembly code, how do we
evaluate performance?
• Execution time =
• Is execution time the only metric for performance?
• What about power?
• What about cost?
• What about usability/programmability?
EE472 – Fall 2007
Lecture 1 - 7
P. Chiang with slides from C.
Kozyrakis (Stanford)
Notice one thing about your C Code:
Application Specific
• Where are you running this code?
– Laptop
– Desktop
– Cellphone
– Google Server Farm
– Digital Signal Processor
• Each application has completely different fundamentals and constraints
EE472 – Fall 2007
Lecture 1 - 8
P. Chiang with slides from C.
Kozyrakis (Stanford)
Do a DSP Calculation now-• Write C-code for DSP
– i.e. Polygon Rendering for X-box Halo 3
– MP3 Decode
• Write assembly code for this:
EE472 – Fall 2007
Lecture 1 - 9
P. Chiang with slides from C.
Kozyrakis (Stanford)
Do a Transaction Processing Code Now-• Google query--?
EE472 – Fall 2007
Lecture 1 - 10
P. Chiang with slides from C.
Kozyrakis (Stanford)
Processor-based Digital Systems
• Systems with a programmable, general-purpose processor
– Advantages ??
• Computers are the canonical example
– PCs, laptops, workstations, …
• However, most processors are embedded or in servers
– Game consoles, PDAs, cell phones, …
– Printers, car electronics system, …
– Web servers, database servers, …
EE472 – Fall 2007
Lecture 1 - 11
P. Chiang with slides from C.
Kozyrakis (Stanford)
FUTURE: Why are we going here--?
EE472 – Fall 2007
Lecture 1 - 12
P. Chiang with slides from C.
Kozyrakis (Stanford)
Overall System Architecture
• Multiple interacting layers
– Term “architecture” used with all of them
Application
Libraries
Operating System
• This class focuses on
Drivers
VM SW
Scheduler
– Hardware architecture
• Memory, interconnect, IO
• Clusters
Processor
• Reliability & low power systems
VM HW
System Bus
Controller
Main
Graphics
Memory
HW
IO Bus(es)
Controller Controller
IO
Net
– Hardware-software interaction
• Programming for performance
• OS support
• Cluster programming
• Virtual machines & security
EE472 – Fall 2007
Lecture 1 - 13
P. Chiang with slides from C.
Kozyrakis (Stanford)
Application: Constraints & Opportunities
•
Applications drive machine ‘balance’
– Scientific computations
• Floating-point performance
• Main memory bandwidth
– Transaction/web processing
• ??
– Multimedia processing
• ??
– Embedded control
• ??
Architecture concepts typically exploit application behavior
EE472 – Fall 2007
Lecture 1 - 14
P. Chiang with slides from C.
Kozyrakis (Stanford)
Applications Change over Time
• Data-sets & memory requirements  larger
– Cache & memory architecture become more critical
• Standalone  networked
– IO integration & system software become more critical
• Single task  multiple tasks
– Parallel architectures become critical
• Limited IO requirements  rich IO requirements
– 60s: tapes & punch cards
– 70s: character oriented displays
– 80s: video displays, audio, hard disks
– 90s: 3D graphics; networking, high-quality audio
– 00s: real-time video, immersion, …
EE472 – Fall 2007
Lecture 1 - 15
P. Chiang with slides from C.
Kozyrakis (Stanford)
Application Properties to
Exploit in Computer Design
• Locality in memory/IO references
– Programs work on subset of instructions/data at any point in time
– Both spatial and temporal locality
• Parallelism
–
–
–
–
–
Data-level (DLP): same operation on every element of a data sequence
Instruction-level (ILP): independent instructions within sequential program
Thread-level (TLP): parallel tasks within one program
Multi-programming: independent programs
Pipelining
• Predictability
– Control-flow direction, memory references, data values
EE472 – Fall 2007
Lecture 1 - 16
P. Chiang with slides from C.
Kozyrakis (Stanford)
Technology Trends & Constraints:
Yearly Improvement
•
Integrated circuits: logic
– 60% more devices per chip
1992
– 15% faster devices
– Long wires don’t improve
•
1995
Integrated circuits: DRAM
– 60% more devices per chip
– 7% reduction in latency
1998
– 14% increase in bandwidth
•
Magnetic Disks
– 60% to 100% increase in density
•
IO/networking
– Little improvement in latency
– Large improvements in bandwidth
through fast/wide signaling
EE472 – Fall 2007
Lecture 1 - 17
2001
64x more devices since 1992
4x faster devices
P. Chiang with slides from C.
Kozyrakis (Stanford)
Changes in Technology & Applications lead to
Changes in Architecture
•
•
1970s
– Multi-chip CPUs
– 1 M - 64M transistors, 64b CPUs
– Semiconductor memory very
expensive
– Complex control to exploit instructionlevel parallelism
– Complex instruction sets (good
code density)
– Deep pipelines
– Microcoded control
•
1990s
– Multi-level caches
•
1980s
2000s
– 100 M - 5 B transistors
– 5K – 500 K transistors
– On-chip memory possible
– Slow wires, power consumption,
design, complexity, memory latency, IO
bottlenecks, …
– Simple, hard-wired control
– Multiprocessors & parallel systems
– Simple instruction sets
– Support & programming for
parallelism?
– Single-chip, pipelined CPUs
– Small on-chip caches
– <<your Ph.D. thesis goes here>>
Keeps computer architecture interesting and challenging
EE472 – Fall 2007
Lecture 1 - 18
P. Chiang with slides from C.
Kozyrakis (Stanford)
Rules of Thumb in Data Engineering
by J. Gray and Prashant Shenoy
Storage
1. Moore’s Law: Things get 4x denser every three years.
2. You need an extra bit of addressing every 18 months.
3. Storage capacities increase 100x per decade.
4. Storage device throughput increases 10x per decade.
5. Disk data cools 10x per decade.
6. Disk page sizes increase 5x per decade.
7. NearlineTape:OnlineDisk:RAM storage cost ratios are approximately
1:3:300.
8. In ten years RAM will cost what disk costs today.
9. A person can administer a million dollars of disk storage
– Disks are replacing tapes as backup devices.
– On random workloads, disk mirroring is preferable to RAID5 parity because it
spends disk space (which is plentiful) to save disk accesses (which are precious).
EE472 – Fall 2007
Lecture 1 - 19
P. Chiang with slides from C.
Kozyrakis (Stanford)
Metrics of Efficiency
• Desktop computing ($500 - $3K)
– Metrics: ??
– Prominent processors: Intel Pentium, AMD Athlon, PowerPC G5
• Server computing ($3K - $1M)
– Metrics: ??
– Prominent processors: IBM Power5, Sun UltraSparc, AMD Opteron
• Embedded computing ($10 - $500)
– Metrics: ??
– Prominent processors: ARM, MIPS, Motorola 68K, many others
Diversity in requirements leads to diversity in architectures
EE472 – Fall 2007
Lecture 1 - 20
P. Chiang with slides from C.
Kozyrakis (Stanford)
Performance Metrics
Plane
DC to
Paris
Speed
Passengers
Throughput
(pmph)
Boeing 747
6.5 hours
610 mph
470
286,700
BAD/Sud
Concorde
3 hours
1350 mph
132
178,200
• Latency or execution time or response time
– Wall-clock time to complete a task
– Important if all we have to run is a single or a time-critical time to run
• Bandwidth or throughput or execution rate
– Number of tasks completed per unit of time
• Bandwidth = total amount of work / total execution time
– Metric is independent of exact number of tasks executed
– Important when we have many tasks to run
• What about Power? What about Cost? What about Reliability?
EE472 – Fall 2007
Lecture 1 - 21
P. Chiang with slides from C.
Kozyrakis (Stanford)
Examples
• Latency metric: program execution time in seconds
CPUtime 

Seconds
Cycles Seconds


Pr ogram Pr ogram Cycle
Instructions
Cycles
Seconds


Pr ogram Instruction Cycle
 IC  CPI  CCT
– Your system architecture can affect all of them
• CPI: memory latency, IO latency, …
• CCT: cache organization, …
• IC: OS overhead, …
EE472 – Fall 2007
Lecture 1 - 22
P. Chiang with slides from C.
Kozyrakis (Stanford)
A is Faster than B?
• Given the CPUtime for machines A and B, A is X times faster than B
means:
CPUTimeB
X
CPUTimeA
• Example, CPUtimeA=3.4sec & CPUtimeB=5.3sec then
– A is 5.3/3.4=1.55 times faster than B or 55% faster
• If you start with bandwidth metrics of performance, use inverse ratio
X
EE472 – Fall 2007
BandWidth A
BandWidth B
Lecture 1 - 23
P. Chiang with slides from C.
Kozyrakis (Stanford)
Speedup and Amdahl’s Law
• Speedup = CPUtimeold / CPUtimenew
• Given an optimization x that accelerates fraction fx of program by a
factor of Sx, how much is the overall speedup?
Speedup 
CPUTimeold
CPUTimeold
1


CPUTimenew CPUTime [(1  f )  f x ] (1  f )  f x
old
x
x
Sx
Sx
• Lesson’s from Amdhal’s law
– Make common cases fast: as fx→1, speedup→Sx
– But don’t overoptimize common case: as Sx→, speedup→ 1 / (1-fx)
• Speedup is limited by the fraction of the code that can be accelerated
• Uncommon case will eventually become the common one
EE472 – Fall 2007
Lecture 1 - 24
P. Chiang with slides from C.
Kozyrakis (Stanford)
Amdahl’s Law Example
• If Sx=100, what is the overall speedup as a function of fx?
Speedup vs Optimized Fraction
100
90
80
70
Speedup
60
50
40
30
20
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fraction of Code Optimized
EE472 – Fall 2007
Lecture 1 - 25
P. Chiang with slides from C.
Kozyrakis (Stanford)
Historical Trend for Computer Performance
1000
i nt el 386
i nt el 486
i nt el pent i um
Integer Performance
i nt el pent i um 2
55% faster per year
i nt el pent i um 3
i nt el pent i um 4
100
i nt el i t ani um
A l pha 21064
A l pha 21164
A l pha 21264
Spar c
Super Spar c
10
Spar c64
M i ps
HP P A
P ower P C
AMD K6
AMD K7
1
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01
EE472 – Fall 2007
Lecture 1 - 26
P. Chiang with slides from C.
Kozyrakis (Stanford)
To Put it Into Perspective
• 1982-2000: computers getting 55% faster per year
– Total of 4,000x
– Significant cost improvements as well
• What if other areas showed similar improvement rates?
– Cars: 176,000 mph or 64,000 miles/gal
– Airplanes: LA to NY in 5.5sec (MACH 3200)
– Wheat: 320,000 bushels per acre
EE472 – Fall 2007
Lecture 1 - 27
P. Chiang with slides from C.
Kozyrakis (Stanford)
Digital System Cost
• Cost is a very important design constraint
– Most digital systems are consumer electronic produces
• Cost distribution for $1K PC
– Processor board: 37%
• Processor, memory, …
– IO devices: 37%
• Hard disk, DVD, monitor, keyboard, …
– Software: 20%
– Cabinet: 6%
• Integrated circuits represent significant part of the system cost
– Processor, memory, hard disk controller, graphics chips, networking chip
EE472 – Fall 2007
Lecture 1 - 28
P. Chiang with slides from C.
Kozyrakis (Stanford)
Cost of Integrated Circuits
Die cost  Testing cost  Packaging cost
IC cost 
Final test yiel d
Wafer cost
Die cost 
Dies per Wafer  Die yield


Defect_Den
sity

Die_area
 

 
Die Yield  Wafer_yiel d  1  
 


 
 

EE472 – Fall 2007
Lecture 1 - 29
P. Chiang with slides from C.
Kozyrakis (Stanford)
Chip Cost is a Function of Size
$250.00
Unpackaged Cost ($)
$200.00
$150.00
$100.00
$50.00
$0.00
0
2
4
6
8
10
12
14
16
18
20
Chip Size (mm)
Chip cost increases roughly with die area4
EE472 – Fall 2007
Lecture 1 - 30
P. Chiang with slides from C.
Kozyrakis (Stanford)
Cost – Performance Tradeoff
• The trade-off
– Chip cost is primarily a function of die area4
– But bigger dies provide more resources for higher performance
• The goal of a good architect
– Find the knee of the performance-cost curve OR
Performance
– Get maximum performance for a fixed cost target
Cost
EE472 – Fall 2007
Lecture 1 - 31
P. Chiang with slides from C.
Kozyrakis (Stanford)
Other Cost Contributors
• Testing cost
– Cost/die = (cost/hour x test time) / yield
– Could be $10-$20 or more for complex chips
• IC Packaging
– Depends on die size, number of pins, and power dissipation
• Cost of cooling system
– <2W no heat-sink, <10W no fan, >100+W liquid/spray cooling
• And most of all, do not forget VOLUME
– Cost of a modern IC fabrication facility: >$2B
– Cost of a set of masks for a wafer: $0.5M - $1M
– Design NRE cost: often ~$10M
– Need volume to amortize all this cost…
EE472 – Fall 2007
Lecture 1 - 32
P. Chiang with slides from C.
Kozyrakis (Stanford)
Cost Vs Price
• Price is really what your customer cares about
• Price components for a system vendor
– Component cost: buying the parts
• 47% of list price for $1K PC
– Direct costs: labor, warranties, dealing with scrap, …
• 10% of list price for $1K PC
– Gross margin: company overhead
• R&D, marketing, sales, buildings, maintenance , taxes, …
• 19% of list price for $1K PC
– Average discount: plan for volume discounts…
• 25% of list price for $1K PC
• As computers become commodity components, price matters a lot!
EE472 – Fall 2007
Lecture 1 - 33
P. Chiang with slides from C.
Kozyrakis (Stanford)
Historical Trend for Processor Price
EE472 – Fall 2007
Lecture 1 - 34
P. Chiang with slides from C.
Kozyrakis (Stanford)
Summary
• Computer architecture:
– Design of efficient systems given the requirements of applications and the
capabilities/constraints of technology
– Need to look a few years ahead with both applications & technology
• Applications
– Look for locality, parallelism, and predictability
• Technology
– Dealing with latency, power, and reliability are the upcoming challenges
• Performance & cost
– Two important efficiency metrics for most systems
– Latency Vs. bandwidth performance metrics
– Cost Vs. price
EE472 – Fall 2007
Lecture 1 - 35
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 36
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 37
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 38
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 39
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 40
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 41
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 42
P. Chiang with slides from C.
Kozyrakis (Stanford)
Multiple Processors on Single Chip
• Two processors on single-chip
• Two chips(w/ two processors) in single package
• 16 – 64 – 256 processors on single die
– Stream Processors
– Sun Niagara
• http://www.ece.ucdavis.edu/~ocin06/talks/ho.pdf
EE472 – Fall 2007
Lecture 1 - 43
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 44
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 45
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 46
P. Chiang with slides from C.
Kozyrakis (Stanford)
What does Moore’s Law buy you?
EE472 – Fall 2007
Lecture 1 - 47
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 48
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 49
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 50
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 51
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 52
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 53
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 54
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 55
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 56
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 57
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 58
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 59
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 60
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 61
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 62
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 63
P. Chiang with slides from C.
Kozyrakis (Stanford)
EE472 – Fall 2007
Lecture 1 - 64
P. Chiang with slides from C.
Kozyrakis (Stanford)