Transcript ppt

MS108 Computer System I
Lecture 1 Introduction
Prof. Xiaoyao Liang
2015/3/4
1
Course Details
•
•
•
•
•
•
•
•
Time: Wed 10:00-11:40am, Fri 10:00-11:40am
Location: 下院201
Course Website: http://www.cs.sjtu.edu.cn/~liang-xy/ms108/MS108-L*.ppt
http://www.cs.sjtu.edu.cn/~liang-xy/ms108/hw*.pdf
Instructor: Xiaoyao Liang, [email protected]
TA: TBD
Textbook: Computer Architecture:A Quantitative Approach,Fifth Edition/计算
机体系结构:量化研究方法(英文版•第5版) ISBN 9787111364580 (英文影印版),
John L.Hennessy, David A.Patterson著,机械工业出版社2012年1月1日出版
Reference: 计算机组成与设计:硬件、软件接口(原书第3或第4版) ,David
A.Patterson,John L.Hennessy著,机械工业出版社出版
Grades:
Homework (40%), Attendance (10%), Middle-term Exam (20%), Project (30%)
2
Course Prerequisites
• Computing Hardware or similar
Logic Design (computer arithmetic)
Basic ISA (what is a RISC instruction)
Pipelining (control/data hazards, forwarding)
Will review the above during the first couple
of weeks
• C Programming, Linux
• Compilers, OS, Circuits/VLSI background is a
plus, not needed
3
Course Importance
• Embarrassing if you are a BS in CS/CE and can’t
make sense of the following terms: DRAM, pipelining,
cache hierarchies, I/O, virtual memory
• Embarrassing if you are a BS in CS/CE and can’t decide
which processor to buy: 3 GHz Core2duo or 1GHz ARM
(helps us reason about performance/power)
• Obvious first step for chip designers, compiler/OS writers
• Will knowledge of the hardware help me write better
programs?
4
Course Importance
• Memory management: if we understand how/where data
is placed, we can help ensure that relevant data is nearby
• Thread management: if we understand how threads
interact, we can write smarter multi-threaded programs
 Why do we care about multi-threaded programs?
Average Joe Programmer Vs. Stephaney Programmer
5
Course Topics
• Focus on what modern computer architects worry
about (both academia and industry)
• Get through the basics of modern processor design
• Understand the interfaces between architecture
and system software (compilers, OS)
• System architecture and I/O (disks, memory,
multiprocessors)
• Look at technology trends, recent research ideas,
and the future of computing hardware
6
Course Arrangement
•
•
•
•
•
•
•
•
•
•
Introduction and Performance Metrics (1 week)
ISA/Basic Pipelining Review (2 week)
Hardware ILP (2 weeks)
Software ILP (1 week)
Caches/Memory (2 weeks)
Modern Processor Case Studies (2 weeks)
Multiprocessors/Multithreading (2 weeks)
Input/Output and Interconnects (1 week)
Research Trends (1 week)
Technology Trends impact on architecture (1 week)
7
What is Computer Architecture
8
Computers Are Everywhere
• General-Purpose Laptop/Desktop
 Productivity, interactive graphics, video, audio
 Optimize price-performance
 Examples: Intel Core2duo, Nvidia GTX
• Embedded Computers
 PDAs, cell-phones, sensors => Price, lifetime
 Examples: Iphone, Ipad, Android Phone
 Game Machines, Network uPs => Price-Performance
 Examples: Sony PS, Xbox, IBM 750FX
• Data Centers
 HPC, Cloud => Price, throughput, power, cooling
 Example: Google, Amazon
9
Microprocessor Capacity
2X transistors/Chip Every 1.5 years
Called “Moore’s Law”
Microprocessors have
become smaller, denser, and
more powerful.
Gordon Moore (co-founder of Intel)
predicted in 1965 that the
transistor density of
semiconductor chips would double
roughly every 18 months.
10
Microprocessor Speed
Growth in transistors per chip
Increase in clock rate
100,000,000
1000
10,000,000
1,000,000
i80386
i80286
100,000
R3000
R2000
100
Clock Rate (MHz)
Transistors
R10000
Pentium
10
1
i8086
10,000
i8080
i4004
1,000
1970 1975 1980 1985 1990 1995 2000 2005
Year
0.1
1970
1980
1990
2000
Year
Why bother with parallel programming? Just wait a year or two…
11
Microprocessor Performance
Move to multi-processor
12
Limit #1: Power Density
Can soon put more transistors on a chip than can afford to turn on.
-- Patterson ‘07
Scaling clock speed (business as usual) will not work
Sun’s
Surface
Power Density (W/cm2)
10000
Rocket
Nozzle
1000
Nuclear
Reactor
100
8086
Hot Plate
10 4004
8008 8085
386
286
8080
1
1970
1980
P6
Pentium®
486
1990
Year
Source: Patrick
Gelsinger, Intel
2000
2010
13
Limit #2: ILP Tapped Out
Application performance was increasing by 52% per year as measured by the
SpecInt benchmarks here
From Hennessy and Patterson,
Computer Architecture: A Quantitative
Approach, 4th edition, 2006
• ½ due to transistor density
• ½ due to architecture changes,
e.g., Instruction Level
Parallelism (ILP)
• VAX
: 25%/year 1978 to 1986
• RISC + x86: 52%/year 1986 to 2002Year
14
Limit #2: ILP Tapped Out
• Superscalar (SS) designs were the state of the art;
many forms of parallelism not visible to programmer
 multiple instruction issue
 dynamic scheduling: hardware discovers parallelism
between instructions
 speculative execution: look past predicted branches
 non-blocking caches: multiple outstanding memory ops
• You may have heard of these before, but you haven’t
needed to know about them to write software
• Unfortunately, these sources have been used up
Year
15
Limit #3: Chip Yield
Manufacturing costs and yield problems limit use of density
• Moore’s (Rock’s) 2nd law:
fabrication costs go up
• Yield (% usable chips)
drops
• Parallelism can help
Year
More smaller, simpler
processors are easier to
design and validate
Can use partially working
chips:
16
E.g., Cell processor (PS3)
Current Situation
• Chip density is
continuing
increasing
 Clock speed is
not
 Number of
processor cores
may double
instead
• There is little or
no hidden
parallelism (ILP)
to be found
• Parallelism must
be exposed to and
managed by
software
Source: Intel, Microsoft (Sutter) and
Stanford (Olukotun, Hammond)
17
Abstraction
• As an architect, our main job is to deal with tradeoffs
Performance, Power, Die Size, Complexity,
Applications Support, Functionality, Compatibility,
Reliability, etc.
• Technology trends, applications… How do we deal
with all of this to make real tradeoffs?
• Abstractions allow this to happen
• Focus is on metrics of these abstractions
Performance, Cost, Availability, Power
18
Computer Components
• Input/output devices
• Secondary storage: non-volatile, slower, cheaper
• Primary storage: volatile, faster, costlier
• Communication: Bus, cable
• CPU/processor
19
IC Manufacturing
20
Processor Technology Trend
• Integrated circuit technology
– Transistor density: 35%/year
– Die size: 10-20%/year
– Integration overall: 40-55%/year
• DRAM capacity:
• Flash capacity:
25-40%/year (slowing)
50-60%/year
– 15-20X cheaper/bit than DRAM
• Magnetic disk technology:
40%/year
– 15-25X cheaper/bit then Flash
– 300-500X cheaper/bit than DRAM
21
Memory and IO Technology Trend
• Bandwidth or throughput
– Total work done in a given time
– 10,000-25,000X improvement for processors
– 300-1200X improvement for memory and disks
• Latency or response time
– Time between start and completion of an event
– 30-80X improvement for processors
– 6-8X improvement for memory and disks
22
Bandwidth Vs. Latency
23