RegionScout+Intelligent Checkpointing

Transcript RegionScout+Intelligent Checkpointing

Memory State Compressors
for Gigascale
Checkpoint/Restore
www.eecg.toronto.edu/aenao
Andreas Moshovos
[email protected]
Moshovos ©
1
Gigascale Checkpoint/Restore
Instruction Stream
Many instructions

Several Potential Uses:




Debugging
Runtime Checking
Reliability
Gigascale Speculation
Moshovos ©
2
Key Issues & This Study




Track and Restore Memory State
I/O?
This Work: Memory State Compression
Goals:



Minimize On-Chip Resources
Minimize Performance Impact
Contributions:



Used Value Prediction to simplify compression hardware
Fast, Simple and Inexpensive
Benefits whether used alone or not
Moshovos ©
3
Outline

Gigascale Checkpoint/Restore

Compressor Architecture: Challenges

Value-Prediction-Based Compressors

Evaluation
Moshovos ©
4
Our Approach to Gigascale CR
(GCR)
3
1
2
Checkpoint memory block
on first write
Restore all checkpointed
memory blocks
4
5

Checkpoint:


blocks that were written into
Current Memory State + Checkpoint = Previous Memory State
Checkpoints: Can be large (Mbytes) and we may want many
Moshovos ©
5
Checkpoint Storage Requirements
gcc
32M
mesa
twolf
1M
32K
16
G
64
M
4M
25
6K
16
K
1K
1K
Max. Checkpoint Size in Bytes
1G
Checkpoint Interval in Instructions
Moshovos ©
6
Compressor
Size
Resources & Performance
Main Memory
out-buffer
in-buffer
Alignment
Network
L1 Data Cache
Architecture of a GCR Compressor
Size
Previous work: Compressor = Dictionary-Based
Relatively Slow, Complex Alignment, order 10K of
Transistors
64K In-Buffer  ~3.7% Avg. Slowdown
Moshovos ©
7
Our Compression Architecture
Main Memory
Alignment
Network
Optional
Standalone:


Dictionary
Compressor
Simple Alignment
VP Compressor
L1 Data Cache
VP stage

out-buffer
in-buffer
~Compression, - Resources
In Combination:

-Resources (in-buffer), +Compression, +Performance
Moshovos ©
8
Value-Predictor-Based Compression
Input stream
Output stream
mispredicted
0
value
Value
Predictor
value
predicted
1
Moshovos ©
9
TIME
Example
0
VP
0
0
22
VP
0
22
22
VP
1
Moshovos ©
10
Block VP-Based Compressor
Input stream
Cache block
address
Output stream
Header (one word)
VP
0
word 0
VP
word 1
VP
1
1
mispredicted words
value
value
word 15
VP
Half-word alignment
single entry
predictors


Shown is Last-Outcome Predictor
Studied Others (four combinations per word)
Moshovos ©
11
Evaluation

Compression Rates


Compared with LZW
Performance

As a function of in-buffer size
Moshovos ©
12
Methodology

Simplescalar v3

SPEC CPU 2000 with reference inputs

Ignore first checkpoint to avoid artificially skewing the results

Simulated up to:

80Billion instructions (compression rates)
 5Billion instructions (performance)

8-way OOO Superscalar

64K L1D, L1I, 1M UL2
Moshovos ©
13
Compression Rate vs. LZW
LZW-16 bits
LO
LO+LZW
better
100%
75%
50%
25%
256M Instructions Checkpoint Interval
Moshovos ©
14
G
AV
ol
f
tw
bz
ip
2
vo
rte
x
ga
p
pa
rs
er
p
am
m
eq
ua
ke
m
cf
m
es
a
gc
c
vp
r
gz
ip
0%
Performance Degradation
better
1.00
0.96
0.92
0.88
gzip
vpr
gcc
LZW 1K


mesa
mcf
equake ammp
parser
LZW 64K
gap
vortex
bzip2
tw olf
LO+LZW 1K
LZW + 64K buffer = ~3.7% slowdown
LZW + LO + 1K buffer = 1.6% slowdown
Moshovos ©
15
AVG
Summary



Memory State Compression for Gigascale CR
Many Potential Applications
Used Simple Value-Prediction Compressors





Can be Used Alone
Can be Combined with Dictionary-based
Compressors



Few Resources
Low Complexity
Fast Performance
Reduced on-chip buffering
Better Performance
Main memory compression?
Moshovos ©
16

RegionScout+Intelligent Checkpointing

Transcript RegionScout+Intelligent Checkpointing

Directory