Crusoe processor (Transmeta)
Download
Report
Transcript Crusoe processor (Transmeta)
TM5400/5600
TM5500/5800
TM6000
Jason Law
Byeong Kil Lee
Outline
•
•
•
•
•
•
•
Crusoe technology
Crusoe processors / architecture
Code morphing software
Crusoe hardware support for code morphing
LongRun power management
Performance comparison
Conclusion
Crusoe Technology
• Crusoe processor = Software + hardware
Code Morphing software
• Dynamically translates x86 instructions into VLIW instructions
• Provides x86 compatibility
• Optimization and scheduling by software
VLIW hardware
• 128 bit Very long Instruction Word Processor
• Simple and fast
• Fewer transistors
Low power
x86 compatibility
PC performance
1/4
3/4
Crusoe VLIW
Crusoe Processors
L1 cache : 128 K
DDRAM-SDRAM (100 to 133MHz)
SDRAM (66 to 133MHz)
Features
•
•
•
•
•
•
•
•
Lighter
Longer
Cooler
x86 compatibility (windows / Linux)
Upgradeable (by software)
Lower cost
MMX support ( not support for SSE / 3dnow! )
Target : ultra-light mobile notebooks, internet appliance,
high-density servers, embedded devices
• Products : SONY, Fusitsu, NEC, RLX technology, ….
Crusoe Architecture
TM5800
Cont.
• VLIW CPU : executing up to 4 operations in each cycle
– Molecule: long instruction word (128 bits molecule)
– All atoms within a molecule are executed in parallel, in order
• 2 ALU, 1FP, 1 load/store, 1 branch unit
• In-order 7-stage integer/10-stage FP pipeline
• 64 integer registers, 32 FP registers
Crusoe vs. x86
• The blue stuff is silicon, and the yellow is software
• Crusoe's blue part is smaller
• All of those hardware was moved off the die and into software
Code Morphing Software
:
A dynamic translation system, reside in a ROM,
First program to start executing when booting
• Drawing the H/W and S/W line
– Software: decoding x86 instructions and generating parallel molecule
– Hardware: execute using a simple, high-speed VLIW engine
• Decoding and scheduling
– Translation cache : CMS translates instructions once,
saving the resulting translation for re-use
Skip the translation in the next time
Code Morphing Software
Caching
• Translation cache :
– Resides in a separate memory space
– The size can be set at boot time, or OS can make the size adjustable
• Crusoe’s CMS monitor actual execution
– Keep track of which blocks of code execute most often
Optimizes them accordingly
– Keep track of which branches are most often taken
Annotate the code accordingly
Code Morphing Software
Filtering & Prediction
• Filtering : a wide choice of execution modes for x86 code
– Interpretation (no translation overhead),
– Translation,
– Highly optimized code(takes longest to generate)
: Run faster once translated
• Prediction
– Highly biased branch : frequently taken path
– Otherwise
: execute both path, select later
Code Morphing Software
Translation Process
• 1st pass (frontend)
– Translate the x86 instructions into a simple sequences of atoms
(temporary register used)
• 2nd pass(optimizer)
– Well-known compiler optimization
Common subexpression elimination, loop invariant removal,
Dead code elimination
• 3rd pass (scheduler) :
– Reorders the optimized atoms and groups them into individual
molecules
(Scheduling by software, more effective scheduling algorithms
and consider a larger window of instructions)
Advantages of the Code Morphing Software
Traditional x86 Processors
Crusoe Processor
with Code Morphing software
Translates instructions once,
Translates each x86 instruction
saving the resultant translation in a cache
every time it is encountered
for re-use
Full of complex, power-hungry
Transistors
Much of the processor functionality
is implemented in software
- less logic transistors, less power
- use effective optimization/schedule algorithm
- use a larger window of instruction
-…
Crusoe Hardware Support for Code Morphing
: Crusoe hardware has been designed specifically
with dynamic translation in mind.
• Crusoe's solution of exceptions
– All registers holding x86 state are shadowed
(two copies of each register, a working copy and a shadow copy)
– Normal atoms only update the working copy of the register
i) without encountering an exception :
"commit" operation : copies all working register into shadow registers
ii) exception occurs :
"rollback" operation : copies the shadow register values back into
the working registers.
Cont.
• Store operations by holding store data in a "gated store buffer "
– Only released to the memory system at the time of a commit
– On a rollback, stores not yet committed : dropped from the store buffer
• Safe reordering loads ahead of stores (Alias Hardware)
– The load a "load-and-protect" (data, the address and size of data)
– The store a "store-under-alias-mask " (checks for protected regions)
* In the event that the store operation overwrite the previously loaded data
the process raises an exception, and the runtime system can take corrective action.
Sample Translation Code
X86 instructions
Translated VLIW molecule
: They use 2 integer ALU atoms in a molecule
LongRun Power Management
• Crusoe was designed for good performance at very
low power
• Power = 1/2 CV2F
• Reduce transistor count to decrease capacitance
• Scale voltage and frequency dynamically to give just
enough performance for current workload
LongRun Power Management
Dynamic Power Management
• Frequency changes in steps of 33 MHz
• Voltage changes in steps of 25mV
• Supports up to 200 frequency/voltage changes per
second
• Can give cubic reductions in power consumption
– Reduce C2 and F
LongRun Power Management
Conventional Power Profile
LongRun Power Management
LongRun Power Profile
LongRun Power Management
ACPI Standard
• ACPI - Advanced Configuration and Power Interface
– joint standard of Microsoft, Intel, and Toshiba
• System level technique to reduce power
• Allows three low-power states that can be alternated
– AutoHALT - processor executes HLT instr
• Processor stops its internal clock
– QuickStart - Southbridge gives processor STPCLK signal
• Processor maintains cache coherency
– Deep Sleep - Southbridge disables processor CLK input
• Southbridge maintains cache coherency
LongRun Power Management
ACPI vs. LongRun
LongRun Power Management
Intel Speed Step
• Statically lowers voltage/frequency settings at startup
• Two operating points:
– AC power -- full performance
– DC power -- slightly lower performance
• Low granularity misses opportunities for power
savings
LongRun Power Management
How LongRun Compares
Performance
The 700 MHz TM5400 was quoted as having comparable performance to a 500-550 MHz Pentium III.
Transmeta didn't offer any conventional benchmarks. Rather, it compared the power utilized on a mobile P
entium III to the power utilized on a Crusoe when completing various tasks.
It appears that Transmeta would like to dictate to the mobile industry that power is what it's all about, not
speed. That is Transmeta's strong suit, but some normal benchmarks would have been nice. Why not show
them? If Crusoe did well in those benchmarks, do you think Transmeta wouldn't show them? I'm convinced
that the Crusoe is not performing as well as mobile AMD or Intel chips. For the markets it's aimed at, that's
not too big a deal, but I'd like to know.
- From a article by Rob Hughes, Jan 20, 2000
Relative Performance While Mobile (on Batteries)
TM5800 vs. Pentium III ULV
1.0
0.75
0.5
0.25
0
2001
CPUmark99 v1.1 Comparison
CPU + Core Logic power
Watt
8.0
6.0
4.0
2.0
0
Business Graphics Winmark v1.1 Comparison
CPU + Core Logic power
Watt
8.0
6.0
4.0
2.0
0
Conclusion
• Combination of hardware and software
• Using software
- To decompose complex instructions into simple atoms
- To schedule and optimize the atoms for parallel execution
Saves millions of logic transistors
Cuts power consumption (60~70%)
Enabling aggressive code optimization techniques
• LongRun power management
Cuts power consumption by factor of 2 to 10