WELCOME TO PARALLEL COMPUTER ARCHITECTURE

Download Report

Transcript WELCOME TO PARALLEL COMPUTER ARCHITECTURE

WELCOME
TO
COMPUTER
ARCHITECTURE
计算机系统结构
Course Information
Textbook
– Hennessy & Patterson: Computer Architecture: A Quantitative
Approach (3rd edition), Morgan Kaufmann Publishers, 2002
• available in some computer bookstores
• also from: www.china-pub.com …
2
Class policies
• Assignments
– 4-5
• Project
– 1-2
• Exam
• http://www.tjut.edu.cn/xuebao/arch/
3
A good way to
summarize this class:
• It should teach/review the fundamentals of how a
processor actually does computation
• It should teach how a processor really does computation
– Another way to say this might be: “It should teach the
fundamentals of how a commercial processor does computation”
• It should show how tradeoffs/different options are
considered in the quest for better performance
• It should teach you how to evaluate tradeoffs
4
Goal of the course
• Goal is to understand this amazing performance
improvement that has taken place in the last 20
years
– Focus more on architectural aspects
– Little bit on integrated circuits
• What is computer architecture?
– The structure of a computer that a machine language
programmer must understand to write a correct (timing
independent) program for that machine
[Amdahl, Blaauw and Brookes, 1964 (IBM 360 team)]
5
Interface
Application
library
S/W
Operating System
complier
assembler
ISA
H/W
6
Computer Architecture Topics 1
Input/Output & Storage
Basics: DRAM, SRAM
Other Technologies (RDRAM)
Interleaving, Bus protocols…
Memory
Memory
Hierarchy
One or more levels of
Caches (L2, L3…)
L1 Cache
Disk, RAID,
NIC, FC, SCSI
Coherence,
Block Placement/Replacement
Bandwidth, Latency….
Addressing, Protection,
Exception Handling
Instruction Set Architecture
Pipelining and Instruction
Level Parallelism
Pipelining, Hazards
Resolution, Superscalar,
Prediction, Speculation
7
Computer Architecture Topics 2
P
M
S
S
S
P
P
M
Interconnection
Network
M
Shared Memory,
Message Passing,
Data Parallelism
S
S
S
Network Interfaces
Topologies, Routing, Latency Bandwidth, Reliability
8
Why learn it?
• Q1:
– You are thinking about buying a new computer. The
new computer includes the latest- and-greatest super
fast AMD processor. Your original computer includes
an Intel processor.
– To figure out whether you’d like to buy the new
machine, you want to know how much faster it will run
your programs.
– Borrowing the AMD computer from the store, you
conduct several experiments to figure out how fast
each machine runs your programs.
9
Why learn it?
• Q2:
– Assume your boss has asked you to achieve a speedup of 1.125
for your company’s flagship processor.
– In order to do this, she has suggested that you add a new
instruction to the instruction set that can replace existing
instructions and decrease the total instruction count in a program.
– The addition you are considering would bring the instruction
count down by a factor of 1.25 for the programs that run on your
processor. Should you take this addition to your boss and
consider your project complete?
10
Why learn it?
• Q3:
– You have just joined the architecture group of a company that
builds processors for embedded systems. You have been
assigned the enviable task of designing a new instruction set
architecture (ISA) from scratch for a new line of processors for
your company.
– Your boss tells you that by far the most expensive component is
instruction memory, so you should try your hardest to minimize
its size. (Note that data memory is not expensive, for some
obscure reason.) Your boss has also told you that you have to
use a register-based ISA; i.e., a stack or accumulator based ISA
is not an option.
11
Computer Design Process 1
• Iterative Process
Ideas/Concepts
Estimate Cost/
Projected Performance
Tradeoff
Modify
Tune
Computer
12
Computer Design Process 2
OR
Evaluate Existing
Systems for
Bottlenecks
Technology
Trends
Implement Next
Generation System
Simulate New
Designs and
Organizations
13
Chip Manufacturing Process
晶体点阵结构的晶片
–Wafer is a circular piece of silicon
containing numerous rectangular
dies
–One die is essentially a chip
14
Integrated Circuits 1
Die
Pentium 4 Processor
Wafer
15
Integrated Circuits 2
• Wafer is a circular piece of silicon containing numerous
rectangular dies
• One die is essentially a chip
– Example: an 8-inch wafer (diameter 8 inch) can contain 564 MIPS
R20000 dies implemented in 0.18 µm process
20 Defects
20 Bad Die
264 Gross Die
92% Yield
20 Defects
16 Bad Die
54 Gross Die
70% Yield
16
Architect’s job
• Design and engineer various parts of a
computer system to maximize performance and
programmability within the technology limits and
cost budget
– Technology limit could mean
• process/circuit technology (how fast can a
transistor be switched?)
• interconnect technology (how fast can a wire
communicate?) in case of microprocessor
architecture
– Cost limit could mean ..
• It isn't easy!
17
Computer Architecture:
Trends
18
Processor Technology
Trends
■
1.58 X per year
▲ 1.35 X per year
Processor performance: 1.58x per year
19
58% growth rate
• Two major architectural reasons
– Advent of RISC
– Introduction of caches
• Two major impacts
– Highest performance microprocessors today
outperform supercomputers designed less than 10
years ago
– Microprocessor-based products have dominated all
sectors of computing
• desktops, workstations, minicomputers are replaced
by servers, mainframes are replaced by
multiprocessors, supercomputers are built out of
commodity microprocessors (also a cost factor
dictated this trend)
20
The computer market
• Three major sectors
– Desktop:
• ranges from low-end PCs to high-end workstations; market trend
is very sensitive to price-performance ratio
– Server
• used in large-scale computing or service-oriented market such as
heavy-weight scientific computing, databases, web services, etc;
reliability, availability and scalability are very important; servers
are normally designed for high throughput
– Embedded
• fast growing sector; very price-sensitive; present in most day-today appliances such as microwave ovens, washing machines,
printers, network switches, palmtops, cell phones, smart cards,
game engines; software is usually specialized/tuned for one
particular system
21
The applications
• Very different in three sectors
– This difference is the main reason for different design styles in these
three areas
– Desktop market
• demands leading-edge microprocessors, high-performance
graphics engines; must offer balanced performance for a wide
range of applications; customers are happy to spend a
reasonable amount of money for high performance i.e. the metric
is price-performance
– Server market
• integrates high-end microprocessors into scalable
multiprocessors; throughput is very important; could be floatingpoint or graphics or transaction throughput
– Embedded market
• adopts high-end microprocessor techniques paying immense
attention to low price and low power; processors are either
general purpose (to some extent) or application-specific
22
Technology trends
• Very important to understand to increase the longevity of a
design
– IC technology
• transistor density increases by 35% per year; die size increases
by 10% to 20% per year;
• when combined, transistor count on chip increases by 55% per
year
– (roughly follows Moore’s law)
– DRAM technology
• density increases by 40% to 60% per year; cycle time decreases
by about 1/3 every 10 years; bandwidth per chip increases about
twice as fast as latency decreases
– Disk technology
• density increases by more than 100% per year; access time
decreases by 1/3 every 10 years
– Network technology
• bandwidth is the main focus and roughly doubles every year
23
Transistors Per Die
Trends
Intel Pentium 4 3.4 GHz has 178 M on
a 237 mm2 die
130 nm Itanium 2 has 410 M
transistors on a 374 mm2 die
Source: www.icknowledge.com
Transistor count doubles every 18 months
24
Die Size Trends
Source: www.icknowledge.com
Increases of die size range from 10% to 20% per year
25
DRAM Technology
Trends
year
1980
1983
1986
1989
1992
1995
1998
size
cycle time
64 Kbits 250 ns
256 Kbits 220 ns
1 Mbits
190 ns
4 Mbits
165 ns
16 Mbits 145 ns
64 Mbits 125 ns
256 Mbits 100 ns
DRAM density: 1.60x per year (4x in three years)
Access time has improved by 1/3 in 10 years
26
Hard-Disk Technology
Trends
Source: IBM HDD Evolution by Ed Grochowski at Almaden
Disk density: 1.50x - 1.60x per year (4x in three years)
27
Technology trends
• Feature size is an important parameter in process
technology
– Minimum size of transistor or wire
• 10µm in 1971, today 0.09 µm (some foundry still use 0.13 µm)
– Transistor density improves twice as fast as feature size drops
– A drop in transistor size dictates a drop in operating voltage for
proper functioning
• therefore supply voltage normally goes down as transistors
shrink
– All these lead to a complex relationship between process technology
and performance: turns out to be roughly linear with feature size
• Summary: a quadratic(平方) growth in transistor count
comes with a linear improvement in frequency
28
Technology trends
• Ever-shrinking process technology
–
–
–
–
–
–
Shorter gate length of transistors
Transistors can be clocked at faster rate
Transistors also get smaller
Can afford to pack more on the die
And die size is also increasing
What to do with so many transistors?
29
Technology trends
• Could increase L2 or L3 cache size
– Does not help much beyond a certain point
• Burns more power
• Could improve microarchitecture
– Better branch predictor or novel designs to improve
instruction-level parallelism (ILP)
• If cannot improve single-thread performance
have to look for thread-level parallelism (TLP)
– Multiple cores on the die (chip multiprocessors)
• IBM POWER4, POWER5, Intel Itanium 2
(Montecito planned)
30
Parallelism
• How to use so many transistors?
– Circuit-level parallelism
• 4 bits (Intel 4004)16 bits (Intel 80286)32 bits, 64bit processors now (Alpha processors, then Sun
UltraSPARC, now AMD Athlon64, Intel Itanium, 64-bit
Xeon is in making)
– Instruction-level parallelism (ILP)
• extract independent instructions in hardware and
execute them in parallel
– Fighting the memory latency caches, multi-threading
31
Technology – X86 Architecture Progression
Chip
Date
Transistor Count
Initial MIPS
4004
11/71
2300
0.06
8008
8080
8086
8088
4/72
4/74
6/78
6/79
3500
6000
29000
29000
0.06
0.1
0.3
0.3
286
386
486
2/82
10/85
4/89
134000
275000
1.2Million
0.9
5
20
Pentium
Pentium Pro
Pentium III
(Xeon)
3/93
3/95
2/99
3.1Million
5.5Million
10Million
100
300
500?
32
Technology trends
• To remain competitive in the market really need to plan for
the future technology
– Roughly 5-6 years to design a new architecture for high-end
microprocessor
– When it is released in the market you need to make sure that it
remains long enough to get the revenue
– Need to closely follow the market demand and should be able to
project it judicially over the next decade
– Depending on the market demand need to figure out what can be
afforded at a reasonable price using the next-generation technology
• Failure to project demand: your product is useless
• Failure to project technology: your competitor will exploit that to
come up with a better design
33
• Cost
Cost and price
– manufacturing expense
• Price
– at which the product is sold
• Competition decreases the gap between cost and price
(simple economics)
• Cost normally decreases over time because you start
moving down the learning curve
– This controls the yield(产出率): percent manufactured product that
survives testing
– Better yield naturally lowers cost
• Increased production volume also lowers cost
– Increased purchase, better manufacturing efficiency, amortized
development (摊分开发成本)
34
Price Trends (Pentium III)
35
Price Trends (DRAM memory)
36
Integrated Circuit Cost
Cost of IC 
Cost of Die  Cost of Testing  Cost of Packaging
Final Test Yield
Cost of Die 
Cost of Wafer
Dies per Wafer Die Yield
 Defectsper unit area  Die Area 
Die yield  Waferyield 1 




  4 for today's processors
Dies per Wafer 
Wafer Area
 Edge Effects Correction
Die Area
37

Why all this talk about
money???
• Add a new architectural gizmo to chip
• Chip die size increases
– Fewer dies per wafer
– More defective dies
• Die testing more expensive
– Must test whether gizmo works
• Die package more expensive
– Larger package, maybe more pins
– Gizmo needs power, may need better heat sink
38
Another way of looking
at it…
CPU,
DRAM,
Disk,
Case,…
Labor,
Scrap,
Warranty,
…
R&D,
Marketing,
Profits, …
39
Component Cost vs.
System Price
• Increase CPU price by $100
• Then
– Direct costs go up by ~$20
– Indirect costs go up by ~$40
– Discount goes up by ~$50
• List price of the system is now $210 up
– Then, if fewer get sold because of this,
indirect costs go up…
40
Cost, price, perf.
• Conversion of cost to price is a complex exercise
– Loosely speaking, average selling price = component cost + direct
manufacturing cost + gross margin of company
– List price = ASP(平均销售价格) + average discount
– A computer designer may not always be able to sell at list price due
to competition
• Depending on the target market cost, performance or both
may be important
– At one end of the spectrum lies the supercomputer market where
performance is the only goal
– The other end is a portion of embedded system market where low
price is the only goal, e.g. cell phone processors
– PC, workstation, and low-end server designers need to juggle cost
and performance judicially