Transcript powerpoint

10/27: Lecture Topics
• Survey results
• Current Architectural Trends
• Operating Systems Intro
– What is an OS?
– Issues in operating systems
Superscalar Pipelines
• Superscalar pipelines can execute
multiple instructions at once
– 2+ instructions in any stage of the pipeline
• Some processors allow 8 instructions to
be issued at once
• Most programs can only take advantage
of 1 or 2 issue slots
Out-of-Order Execution
• Allows you to execute any instruction
that you can
• Enables more issue slots to be filled
• Often out-of-order execution, but inorder commit
– that is, write back results in the order they
should have occurred
• Note: IA-64 is in-order
Longer Pipelines
• Pipelines are getting longer
– original RISC pipelines had 5 stages
– pipelines now have up to 20 stages
• Allows the clock cycle to be very fast
• Okay as long as you can accurately
predict branches (or get rid of them)
Speculation
• Prediction
–
–
–
–
better branch predictors (95% accurate)
predict many levels of branches
predict variable values
predict load addresses
• Simultaneously execute both paths of a branch
• Execute instructions even if there could be a
dependency
– sw after lw could be the same address, but probably
not
– let the sw execute and then fix it if you were wrong
Predicated Execution
• Predicated execution allows conditional
moves and conditional adds instead of
only conditional branches
• Avoids branches, which are bad because
pipelines are so long
• IA-64 almost everything in IA-64 is predicated
(many 1-bit predicate registers)
• HW problem with movn and movz was an
example of this
VLIW
• Long Instruction Words (LIW) and Very
Long Instruction Words (VLIW)
– each instruction contains multiple smaller
instructions that execute in parallel
– (V)LIW instructions can be 128 to 1024
bits long and contain 3 to 16 instructions
• It's the compiler's job to find
independent instructions to execute
Register Windows
• Saving registers on
the stack during
procedure call hurts
performance
• Register windows use
a stack of registers
that are allocated to a
procedure as it needs
it
Local
Name
Actual
Name
...
...
r76
r75
r74
r73
Baz()
Bar()
Foo()
t2
r72
t1
r71
t0
r70
t1
r69
t0
r68
t2
r67
t1
r66
t0
r65
...
...
Smarter Compilers
• VLIW requires good compilers
• Predicated execution and speculation needs
help from the compiler
• Old architectures had instructions to emulate
high-level constructions (bad)
• New architectures provide many general
instructions and instruction options
• IA-64 will keep compiler writers busy for a
decade
Multiple CPUs on a Chip
• Chip multiprocessors
–
–
–
–
multiple simple CPUs, but share a cache
can run multiple programs simultaneously
single programs are no faster
like a multiprocessor machine but cheaper
• Simultaneous Multithreading (SMT)
– more complex CPUs
– like chip multiprocessors + superscalar + out-oforder
– also improves single program performance
– developed at UW
– memory bandwidth is an issue for both
Funky Hardware on a Chip
• We can squeeze more and more transistors
on a chip
• What do we do with them?
• Bigger caches (boring)
• Put programmable hardware on the CPU
– FPGAs can be (re)programmed quickly
– hardware runs 1000X faster than software
• Graphics specific hardware
• Instruction Co-Processors
• Simultaneously run two copies of all
programs to avoid hardware glitches
Low Power
• CPUs are being put in everything, even
devices that have very small batteries
(tiny sensors)
• Need to make CPUs that use very little
power (only as much as they need)
– reduce the CPU clock frequency
– allow the OS to turn off part of the chip
• Transmeta is building chips that
emulate Intel x86, but with less power
Time to Market
• It used to be solely about being the
fastest
• Now being adequate is enough
• Being the first technology to fill a need
is the most important