RegionScout+Intelligent Checkpointing
Download
Report
Transcript RegionScout+Intelligent Checkpointing
Memory State Compressors
for Gigascale
Checkpoint/Restore
www.eecg.toronto.edu/aenao
Andreas Moshovos
[email protected]
Moshovos ©
1
Gigascale Checkpoint/Restore
Instruction Stream
Many instructions
Several Potential Uses:
Debugging
Runtime Checking
Reliability
Gigascale Speculation
Moshovos ©
2
Key Issues & This Study
Track and Restore Memory State
I/O?
This Work: Memory State Compression
Goals:
Minimize On-Chip Resources
Minimize Performance Impact
Contributions:
Used Value Prediction to simplify compression hardware
Fast, Simple and Inexpensive
Benefits whether used alone or not
Moshovos ©
3
Outline
Gigascale Checkpoint/Restore
Compressor Architecture: Challenges
Value-Prediction-Based Compressors
Evaluation
Moshovos ©
4
Our Approach to Gigascale CR
(GCR)
3
1
2
Checkpoint memory block
on first write
Restore all checkpointed
memory blocks
4
5
Checkpoint:
blocks that were written into
Current Memory State + Checkpoint = Previous Memory State
Checkpoints: Can be large (Mbytes) and we may want many
Moshovos ©
5
Checkpoint Storage Requirements
gcc
32M
mesa
twolf
1M
32K
16
G
64
M
4M
25
6K
16
K
1K
1K
Max. Checkpoint Size in Bytes
1G
Checkpoint Interval in Instructions
Moshovos ©
6
Compressor
Size
Resources & Performance
Main Memory
out-buffer
in-buffer
Alignment
Network
L1 Data Cache
Architecture of a GCR Compressor
Size
Previous work: Compressor = Dictionary-Based
Relatively Slow, Complex Alignment, order 10K of
Transistors
64K In-Buffer ~3.7% Avg. Slowdown
Moshovos ©
7
Our Compression Architecture
Main Memory
Alignment
Network
Optional
Standalone:
Dictionary
Compressor
Simple Alignment
VP Compressor
L1 Data Cache
VP stage
out-buffer
in-buffer
~Compression, - Resources
In Combination:
-Resources (in-buffer), +Compression, +Performance
Moshovos ©
8
Value-Predictor-Based Compression
Input stream
Output stream
mispredicted
0
value
Value
Predictor
value
predicted
1
Moshovos ©
9
TIME
Example
0
VP
0
0
22
VP
0
22
22
VP
1
Moshovos ©
10
Block VP-Based Compressor
Input stream
Cache block
address
Output stream
Header (one word)
VP
0
word 0
VP
word 1
VP
1
1
mispredicted words
value
value
word 15
VP
Half-word alignment
single entry
predictors
Shown is Last-Outcome Predictor
Studied Others (four combinations per word)
Moshovos ©
11
Evaluation
Compression Rates
Compared with LZW
Performance
As a function of in-buffer size
Moshovos ©
12
Methodology
Simplescalar v3
SPEC CPU 2000 with reference inputs
Ignore first checkpoint to avoid artificially skewing the results
Simulated up to:
80Billion instructions (compression rates)
5Billion instructions (performance)
8-way OOO Superscalar
64K L1D, L1I, 1M UL2
Moshovos ©
13
Compression Rate vs. LZW
LZW-16 bits
LO
LO+LZW
better
100%
75%
50%
25%
256M Instructions Checkpoint Interval
Moshovos ©
14
G
AV
ol
f
tw
bz
ip
2
vo
rte
x
ga
p
pa
rs
er
p
am
m
eq
ua
ke
m
cf
m
es
a
gc
c
vp
r
gz
ip
0%
Performance Degradation
better
1.00
0.96
0.92
0.88
gzip
vpr
gcc
LZW 1K
mesa
mcf
equake ammp
parser
LZW 64K
gap
vortex
bzip2
tw olf
LO+LZW 1K
LZW + 64K buffer = ~3.7% slowdown
LZW + LO + 1K buffer = 1.6% slowdown
Moshovos ©
15
AVG
Summary
Memory State Compression for Gigascale CR
Many Potential Applications
Used Simple Value-Prediction Compressors
Can be Used Alone
Can be Combined with Dictionary-based
Compressors
Few Resources
Low Complexity
Fast Performance
Reduced on-chip buffering
Better Performance
Main memory compression?
Moshovos ©
16