Providing Fast and Safe Access to Next

Download Report

Transcript Providing Fast and Safe Access to Next

VLSI and Computer
Architecture Trends
ECE 25
Fall 2012
1
A Brief History
• 1958: First integrated
circuit
– Flip-flop using two
transistors
– From Texas Instruments
Courtesy Texas Instruments
• 2011
– Intel 10 Core Xeon
Westmere-EX
• 2.6 billion transistors
• 32 nm process
Courtesy Intel
2
Moore’s Law
• Historical growth rate
– 2x transistors & clock speeds every 2 years over 50 years
– 10x every 6-7 years
• Dramatically more complex algorithms previously not
feasible
– Dramatically more realistic video games and graphics animation
(e.g. Playstation 4, Xbox 360 Kinect, Nintendo Wii)
– 1 Mb/s DSL to 10 Mb/s Cable to 2.4 Gb/s Fiber to Homes
– 2G to 3G to 4G wireless communications
– MPEG-1 to MPEG-2 to MPEG-4 to H.264 video compression
– 480 x 270 (0.13 million pixels) NTSC to 1920x1080 (2 million
pixels) HDTV resolution to 2880x1800 (5 million pixels) Retina
Display
3
Standard Cells
NOR-3
XOR-2
4
Standard Cell Layout
5
NVIDIA GeForce 8800
(600+ million transistors, about 60+ million gates)
6
NVIDIA Kepler2 GK104 GPU
(1536 cores, 3.54 billion transistors)
7
Subwavelength Lithography
Challenges
Source: Raul Camposano, 2003
8
NRE Mask Costs
Source: MIT Lincoln Labs, M. Fritze, October 2002
9
ASIC NRE Costs Not Justified for
Many Applications
• A complex ASIC will have an NRE Cost of over
$40M = $28M (NRE Design Cost) + $12M (NRE
Mask Cost)
• Many “ASIC” applications will not have the
volume to justify a $40M NRE cost
• e.g. a $30 IC with a 33% margin would require
sales of 4M units (x $10 profit/IC) just to recoup
$40M NRE Cost
10
Power Density a Key Issue
•
•
•
•
Motivated mainly by power limits
Ptotal = Pdynamic + Pleakage
Pdynamic = ½ a C VDD2 f
Problem: power (heat dissipation) density has
been growing exponentially because clock
frequency (f) and transistor count have been
doubling every 2 years
11
Power Density a Key Issue
• Had scaling continued at previous pace, by 2005, high speed
processors would have power density of nuclear reactor by
2005, a rocket nozzle by 2010, and would become the power
density of the sun’s by 2015.
Courtesy: Intel
12
Before Multicore Processors
• e.g. Intel Itanium II
– 6-Way Integer Unit < 2% die area
– Cache logic > 50% die area
• Most of chip there to keep these 6
Integer Units at “peak” rate
• Main issue is external DRAM
latency (50ns) to internal clock
(0.25ns) is 200:1
• Increase performance by higher
clock frequency and more
complex pipelining & speculative
execution
INT6
Cache logic
13
Multicore Era
• Multicore era
– Operate at lower voltage and lower clock frequency
– Simpler processor cores
– Increase performance by more cores per chip
• e.g. Intel 10 Core Xeon Westmere-EX
– 1.73-2.66 GHz (vs. previous Xeons
at 4 Ghz)
1 core
14
Embedded Multicore Processors
• Embedded multicore processors replacing ASICs
– Much simpler processor cores, much smaller caches
• e.g. Tilera-GX: 100 processors
15
What Does the Future Look Like?
Corollary of Moore’s law: Number of cores will
double every 18 months
‘02
‘05
‘08
‘11
‘14
Research
16
64
256
1024
4096
Industry
4
16
64
256
1024
Source: MIT, A. Agrawal, 2009
16
ITRS Roadmap
• Semiconductor Industry Association forecast
– Intl. Technology Roadmap for Semiconductors
17
Power Revisited
•
•
•
•
Ptotal = Pdynamic + Pleakage
Pdynamic = ½ a C VDD2 f
Historically, Pleakage was negligible, Pdynamic dominated.
Power could be controlled by reducing VDD (which
used to be 5V, now about 1V).
• Lowering VDD requires lowering threshold voltage,
but Pleakage becomes more dominant, which is why
we are not reducing VDD much any more, or very
slowly.
18
The Utilization Wall
(Source: S. Swanson, M. Taylor 2010)
• Scaling theory
– S=
– Exponentially increasing
problem!
Classical scaling
Device count
Device frequency
Device power (cap)
Device power (VDD)
Utilization
S2
S
1/S
1/S2
1
Leakage limited scaling
Device count
Device frequency
Device power (cap)
Device power (VDD)
Utilization
S2
S
1/S
~1
1/S2
19
Age of “Dark Silicon”
(Source: S. Swanson, M. Taylor 2010)
Spectrum of tradeoffs
between # cores and
frequency.
2x4 cores @ 3 GHz
(8 cores dark)
(Industry’s Choice)
.…
e.g.; take
65 nm32 nm;
i.e. (s =2)
.…
4 cores @ 3 GHz
4 cores @ 2x3 GHz
(12 cores dark)
65 nm
32 nm
20
Moving Back to Specialized Silicon?
• “Traditional” Operating System
Dark Silicon
– Provide many “modules” in software that
“applications” call at run-time.
– OS modules only “loaded” in memory when used.
– Most OS modules unused at any moment.
• Moving OS into silicon?
Source: S. Swanson, M. Taylor,
GreenDroid Project, 2010
– Power more expensive than area.
– Specialized logic can improve energy efficiency 10-1000x.
– Possible idea: move power-intensive OS modules into silicon or build
specialized hardware accelerators.
– Most of these “silicon modules” would be “dark”: only “lite-up” when called.
– In another words, “waste” silicon to save power.
21
Questions?
22