Principles of Computer Architecture Dr. Mike Frank

Download Report

Transcript Principles of Computer Architecture Dr. Mike Frank

Semiconductor Technology
Basics
Why Semiconductors?
• Conductors always have a high concentration
of electrons in conduction bands
– states that are free to move through the material
• Insulators always have virtually zero electrons
in such bands
– conduction band energy is too high
– all the electrons are stuck in valance bands
• localized to particular atoms/molecules in the material
• Semiconductors have a conduction band whose
electron population is easily manipulated
– Sensitive to dopants, applied potentials, temperature
Electronic Structure of Silicon
• Silicon, atomic number: 14
– s+p orbitals of shell 3 are (together) half full
1s 2s
2p
3s
3p
– Like in Carbon (element 6), s,p orbitals can
rearrange to form four sp3 hybrid orbitals w.
tetrahedral symmetry:
– Each Si can share electrons with 4 neighboring Si’s
to fill all the 3sp orbitals... Stable tetrahedral lattice,
like diamond
Electrons & Holes
• At normal temperatures,
– a small percentage of
shell-3 electrons will be
free of the bond orbitals
• wandering thru the lattice…
– leaving a “hole” in the lattice point they left
• a hole acts like a positively charged particle
• Once created, holes can “move,” too…
– by a nearby electron hopping over to fill them
– however, hole mobility is usually lower than that of
electrons
Donor & Acceptor Dopants
• Boron (element 5) is one electron shy of having
a half-empty shell 2 that would fit Si lattice
1s 2s
2p
3s
3p
– Boron atoms readily accept extra mobile electrons
and lock them in place, forming a negative B- ion
• Reduces free-electron concentration, increases hole
concentration when implanted into silicon
• Phosphorus (element 15) has one too many
shell-3 electrons to fit in Si lattice
Forms P+ ion
– Donates the extra electron 1s 2s
readily to conduction band
2p
3s
• Increases free-electron conc., decreases hole conc.
3p
p-type vs. n-type Silicon
• Pure silicon:
– Has an equal number of positive & negative charge
carriers (holes & electrons, resp.)
• Acceptor-doped (e.g., boron-doped) silicon:
– Has a charge-carrier concentration heavily
dominated by positive charge carriers (holes, h+)
• Balanced by negative, immobile ions of acceptor atom
– We call it a “p-type” semiconductor.
• Donor-doped (e.g., phosphorus-doped) silicon
– Has charge-carrier concentration heavily dominated
by negative charge carriers (electrons, e-)
• Balanced by positive, immobile ions of donor atom
– Call it “n-type” semiconductor
pn junctions
• What happens when you put p-type and n-type
silicon in direct contact with each other?
– Near the junction, electrons from the n and holes
from the p diffuse into & annihilate each other!
– Forms a depletion region free of charge carriers
Depletion region
p-type
h+
B-
h+
h+
B-
n-type
B-
Bh+ h+
B-
B-
h+
h+ BBBB
BB- h+
B- h+ B
h+
h+
h+ BBh+
- B
B
B
h+
B
BB
B
B- h+ B
B
h+
h+
h+
h+
P+ e- + e- P+ e- + e- e- + eP
P
P+ P
e- eP+ + e- + e- + eP
P
P
+
+
P
P
+
+
+
e- P+
e- P
e- P
P
P+ eeP+
e+
P
P+
P+
e- P+
P+
P+
pn junction electrostatics
Depletion region
p-type
h+
B-
h+
h+
B-
B-
Bh+ h+
n-type
B-
B-
h+
h+ BBBB
BB- h+
B- h+ B
h+
h+
h+ BBB- B
B
h+
BBBBB- h+ BB
h+
h+
h+
h+
cf. Pierret ‘96
P+ e- + e- P+ e- + e- e- + eP
P
P+ P
e- eP+ + e- + e- + eP
P
P
+
+
P
P
+
+
+
e- P+
e- P
P
P
P+
eP+
e+
P
+
P+
e- P+
P+ e- P
P+
+
Charge density

Electric field
Electrostatic potential
Builtin
voltage
npn MOSFET (n-FET)
MetalOxideSemiconductor
FieldEffect
Transistor
Vbias
gate
electrode
n
Electron
potential
energy
(negative of
electric
potential)
e e e
e e e
e e e
n
p
p+ p+ p+ p+
p+ + p+ p+
p
e
e
e e e
e e e
e
e e e
Potential as seen
by electrons
When Vbias > 0
Gate voltage > Vt
CMOS Inverters
(a) CMOS inverter structure. (b) Transition curves.
Semiconductor Technology
Scaling
Technology Scaling: Notation
• Historically, device feature length scales have
decreased by ~12%/year.
– So: feature length   0.88year : 
– 1/  (1/0.88)year  1.14 year : 
• up 14%/year
• Meanwhile, typical CPU die diameters have
increased by ~2.3%/year. (Less stable trend.)
– Diameter  1.023year : 
– 1/Diameter  0.978year : 
• Quantities that are constant over time are
written as  1 : 
Resistance Scaling
• Fixed-shape wire (any shape):
R  /wt  / = 
– All dimensions scaling
t
equally.
– E.g. a local interconnect
in a small scaled logic
block / functional unit
w

Current flow
• Constant-length thin wire: R / = 
• Thin cross-chip wire: R / =  !
– Up 33%/year!
– Long-distance wires have to be extra thick to be fast
• But, fewer thick wires can fit!
Capacitance Scaling
• Fixed-shape structure (any):
C  w/s  / = 
– E.g. scaled devices/wires
• Per unit wire length:
– C  w/s  /   (constant)
• Cross-chip thin wire: C  
• Per unit area: C  /s  
– E.g., total on-chip cap./cm2
w
s
Some 1st-order
Semiconductor Scaling Laws
• Voltages V (due to e.g. punch-through )
• Long-term: temperature T (prevents leakage)
• Resistance:
– Fixed-shape wire:
R  /wt  / = 
– Thin cross-chip wire: R / = 
• Capacitance:
–
–
–
–
Fixed-shape structure: C  w/s  / = 
Per unit wire length: C   (constant)
Cross-chip wire:
C
Per unit area:
C  1/s  
Why Voltage Scaling?
• For many years, logic voltages were maintained at
fairly constant levels as transistors shrunk
– TTL 5V logic – was standard for many years
– later 3.3 V, now: ~1V within leading-edge CPUs
• Further shrinkage w/o voltage scaling is no longer
possible, due to various effects:
–
–
–
–
Punch-through
Device degradation from hot carriers
Gate-insulator failure
Carrier velocity saturation
• In general, things break down at high field strengths
– constant-field voltage scaling may be preferred
Punch-Through
Vbias
gate
electrode
n
e e e
e e e
n
p
p+ p+ p + p+
e e e
e e e
e e
e e e
e e e
e
Zero bias
Moderate bias
Strong bias
e
e
e e e
Very strong bias
Need for Voltage Scaling
Vbias
Vbias
gate
electrode
n
e e e
e e e
n
p
p+ p+ p+ p+
e e
e
p
n
e e e
eee p+p+p+p+ eee
eee
eee
eee e 
e 
e
eee
e e e
Smaller size & same voltage 
higher electric field strengths 
easier punch-through
e e e
e e e
e e e
n
e
e
Long-term Temperature Scaling?
• May be needed in the long term.
• Sub-threshold power dissipation across “off” transistors
is based on the leakage current density  exp(−Vt / T)
– Vt is the threshold voltage
• Must scale down with Vdd, or else transistor can’t turn on!
– T is the thermal voltage at temperature T
• Equal to kBT/q, where q is electron charge magnitude
• Voltage spread of individual electrons fr. thermal noise
• As voltages decrease,
– leakage power will dominate
– devices will become unable to store charge
• Unless (eventually), T  V    
• Only alternative to low T: Scaling halts!
– Probably what must happen, because low temps.
imply slow rate of quantum evolution.
Unfortunately,
lower T  fewer
charge carriers!
Delay Scaling
• Charging time delay t  RC :
–
–
–
–
Through fixed shape conductor: RC   = 
Thin constant-length wire: RC  
Via cross-die thin wire: RC  · = up 36%/yr!
Through a transistor: RC  · = 
• Implications:
– Transistors increasingly faster than long thin wires.
– Even becoming faster than fixed-shape wires!
– Local communication among chip elements is
becoming increasingly favored!
Performance scaling
• Performance characteristics:
– Clock frequency for small, transistor-delaydominated local structures: f  1/t   (up 14%/yr)
– Transistor density (per area): d = 1/ = 
– Perf. density RA = fd = ; chip area: A  
– Total raw performance (local transitions / chip /
time): R = fd A =  = 1.55year
• Increases 55% each year!
• Nearly doubles every 18 months (like Moore’s Law).
• Raw performance has (in the past) been harnessed for
improvements in serial microprocessor performance.
• Future architectures will need to move to more parallel
programming models to fully use further improvements.
Charges & Currents
• Charges & fields:
– Charge on a structure: Q = CV  
– Surface charge density: Q/A  
– Electric field strengths: E = V/  
• Currents:
Resistivity: Constant
– Peak current densities: J = E/  
– Peak current in a wire: I = JA  
– Channel-crossing times: t = /v  
• Due to constant e saturation velocity v  200 kmph
– Current in an on-transistor: I = Q/t  / = 
– Effective trans. on-resistance: R = V/I  / = 
• ~4-20 kΩ is typical for a min-sized transistor
Interconnect Scaling
• Since transistor delay dt scales as ,
• And wire delay dw (w. scaled cross-section size) for a
wire of length  scales as
RC  (/wt)(w/s) = 2/st  2/ = 2,
• Then to keep dw < dt (1-cycle access) requires:
2 < 
2 < / = 
 < 3/2
• So wire length in units of transistor length t is
/t < 3/2/ = 1/2 (down 6%/year)
• So number of devices accessible within a constant × dt
in 2-D goes as (1/2)2 = , in 3-D as (1/2)3 = 3/2.
– Circuits must be increasingly local.
Energy and Power
• Energy:
– Energy on a structure: E  QV  CV2  2 = 3
– Energy per-area: EA  CV2/A  3/2 = 
– Energy densities: E/3  3/3   (not a problem)
• Power levels:
– Per-area power: PA = EAf   =  (not a problem)
– Power per die: P = PAA   (up ~5%/year)
• Power-per-performance: PA/RA = / = 
• But, if constant-field scaling is not used (and it has not been, very
much, and cannot be much further) all the above scaling rates get
increased by the square of the field strength (F) scaling rate.
– Because V  F·, and E and P scale with V2.
3-D Scalability?
• Consider stacking circuits in 3-D within a
constant volume.
• # of layers n: /thickness  /  
• Total power: PT = P(flat chip)×n   = 
• Enclosing surface area AE: 
• Power flux (if not recycled): PT/AE = / = 
– For this to be possible, coolant velocity &/or
thermal conductivity must also increase as !
• Probably not feasible.
• Power recycling is needed to scale in 3-D!
Semiconductor Technology
Limits
Types of Limits
• Meindl ‘95 identifies several kinds of limits on
VLSI (from most to least fundamental):
– Theoretical limits (focus on energy & delay)
•
•
•
•
•
Fundamental limits (such as we already discussed)
Material limits (dependent on materials used)
Device limits (dependent on structure & geometry)
Circuit limits (dependent on circuit styles used)
System limits (dependent on architecture & packaging)
– Practical limits
• Design limits
• Manufacturing limits
Fundamental Limits
• Thermodynamic limits
– Minimum dissipation per bit erasure
• kT ln 2 limit. More stringent limits for reliability coming up.
– Subthreshold conduction leakage currents
• Ion/Ioff  exp(Vdd / T)
• Quantum mechanical limits
– Tunneling leakage currents (cf. Mead ’94, next slide)
– Energy-time uncertainty principle E  h/t
• Related to Margolus-Levitin bound tnop ≥ ½h/(E−E0)
• Electromagnetic (relativistic) limits
– Speed-of-light lower bound on delay for an
interconnect of a given length, t ≥ /c.
Tunneling Limit on Device Size
• This graph plots the de Broglie wavelength
λ = h(2mE)−1/2 of electrons of effective mass m having
kinetic energy equal to a given barrier height E.
• This is also
the min. barrier
width needed
to prevent
electrons from
tunneling with
probability
greater than
3.5×10−6.
Material Limits
• Carrier mobility (carrier velocity/field strength)
– Affects carrier velocity, on-current, transition time
– 6x higher in GaAs than in Si, but only at low field
• Carrier saturation velocity (max velocity)
– Nearly equal for Si and GaAs.
– Velocity maxes out @ ~100 nm/ps
– Occurs @ ~1-10 V/m in Si (depends on doping)
• Breakdown field strength Ec
– 33% higher in GaAs than Si
• Thermal conductivity – next slide
• Dielectric constants – slide after
Thermal Conductivity
• For a given (device+heat-sink) structure, P  K T
– P - rate of heat removal (power)
– K - thermal conductivity of materials used
– T - how much hotter is device than its surroundings
• K is 3x lower in GaAs than in Si
– Implies that GaAs is 3x slower than Si when speed is limited by
conductive cooling through substrate (often true)!
• Highest known K: Diamond!
– K = 2 mW/m·K, 14 times higher than Silicon!
– Can be a semiconductor if Boron-doped, or an insulator if not.
• Also has high mobility, high breakdown voltage, & good tolerance for
high-temperature operation.
– NTT recently demonstrated a diamond semiconductor capable
of 81 GHz frequencies in analog applications.
• Apollo Diamond in Massachusetts is developing a cheap manufacturing
capability for single-crystal diamond wafers using CVD.
Dielectric Constants
• Dielectric constants  = /0 = C/C0. SiO2  4
– Want high  in thin gate dielectrics,
• To maximize channel surface-charge density, & thus oncurrent, for given VG,on,
• But avoid very low thickness w. high tunneling leakage.
• But, material must also be an insulator! (SrTi = 310!)
– Want low  for thick interconnect (“field”)
insulators
• To minimize parasitic C and delay of interconnects
• Lowest  possible is that of vacuum (1). Air is close.
Some Device Limits
• MOSFET channel length
– Generally, the lower, the better!
• Reduces load capacitance & thus load charging time.
– But, lengths are lower-bounded by the following:
•
•
•
•
Manufacturing limits, such as lithography wavelengths.
Supply voltage lower-limits to keep a decent Ion/Ioff.
Depletion region thickness due to dopant density limits.
Yield, in the face of threshold variation due to statistical
fluctuation in dopant concentrations.
• Source-to-drain tunneling.
• Distributed RC network response time
– Limited by:
•  of wires (e.g. the recent shift from Al to Cu)
•  of insulators (at most, 4x less than SiO2 is possible)
• Widths, lengths of wires: limited by basic geometry
Circuit Limits
• Power supply voltage limits (later)
• Switching energy limits (later)
• Gate delays:
– Fundamentally limited by transistor characteristics, RC
network charging times
• each of which are limited as per previous slide
– There is a fastest possible logic gate in any given
device technology
• esp. considering it has to be switched by similar gates
– Static CMOS & its close relatives (precharged domino,
NORA) are probably close to the fastest-possible gates
using CMOS transistors in a given tech. generation.
System Limits
We’ll discuss these more later in the course…
•
•
•
•
•
Architectural limits
Power dissipation
Heat removal capability of packaging
Cycle time requirements
Physical size
Design & Design-Verification Limits
• Increasing complexity (# of devices/chip) leads
to continual new challenges in:
– Design organization
• modularity vs. efficiency
– Automatic circuit synthesis & layout
• circuit optimization
– Design verification
• layout-vs-schematic
• logic-level simulation
• analog (e.g. SPICE) modeling
– Testing and design-for-testability
• test coverage
Manufacturing Limits
See the ITRS ‘01 roadmap for these.
• Lithography resolution, tools
• Dopant implantation techniques
• Process changes for new device structures
• Assembly & packaging
• Yield enhancement
• Environmental / safety / health considerations
• Metrology (measurement)
• Product cost & factory cost
“Red brick wall” could be reached as early as 2006! --ITRS ‘03
Possible Endpoints for Electronics
• Merkle’s minimal “quantum FET”
• Mesoscale nanoelectronic devices based on
metal or semiconductor “islands”
– E.g. Single-electron transistors, quantum dots,
resonant tunneling transistors.
• Various organic molecular electronic devices
– diodes, transistors
• Inorganic atomic-scale devices
– 1-atom-wide chains of conductor/semiconductor
atoms precisely positioned on/in substrates
• Also discuss: Superconducting devices
Energy Limits in Electronics
• Origin of CV2/2 switching energy dissipation
• Thermal reliability bounds on CV2 scaling
– Voltage limits
– Capacitance limits
• Leakage trends in MOSFETs
Limit on Switching Energy
• Consider temporarily connecting a single
unknown bit to ground.
– Average dissipation is 1/4 CV2.
– At least T log 2 average dissipation is required to
erase a bit by Landauer’s principle.
– Therefore, CV2  4T log 2 = 4kBT ln 2.
0/1?
Entropy:
log 2
0
CV2/4
0
Entropy:
log 1 = 0
Reliability w. Thermal Noise
• Consider N logic nodes, 1 of which is high.
– Don’t know which: Entropy = log N.
• Then, connect them all to ground temporarily.
– Want them all to be 0, with high probability.
– Logical entropy is now 0.
• Log N entropy must be exported elsewhere.
• Requires T log N expenditure of energy.
– But, only ½CV2 energy was dissipated!
• So, to reliably do N arbitrary irreversible bit
operations requires at least ½CV2  T log N =
kBT ln N energy per logic node.
Illustration of Scenario
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
N
Entropy:
log N
CV2/2
½CV2  T log N
0
Entropy:
0
Thermal Capacitance
• What is the minimum entropy generation for a
structure of given capacitance C?
– Consider minimal node voltage V = (ln R)φT
• Needed to get desired on/off ratio of R.
• Let the thermal capacitance CT :≡ qe/T.
– At room temperature CT = 6 aF.
• Then we can derive an expression for minimum
entropy generation for our structure:
S  ½(log N) C/CT
• This implies that C  2(ln N) CT at minimum V.
Voltage Bounds for Reliability
• Suppose we are stuck with a given C. Then the
minimum voltage that we can tolerate is
CT
V  T 2 ln N
C
– One implication: If some nodes have C less than
thermal capacitance, then voltages cannot actually
approach the thermal voltage.
• Other lower bounds on node voltages:
V  T - to switch FETs strongly on & off
V >> VT - to avoid defects due to threshold variation
In Particular Generations
• Year 2001 technology, aggressive low-power:
– 9 knats per transistor-switching op
• Year 2012 projection:
– 2 knats
– 30x what’s needed for 1027 reliability (ln N=60)
• 1e9 nodes lasting 1e9 seconds at 1e9 hertz w/o error