Swap Compresion: a way to Increase Application
Download
Report
Transcript Swap Compresion: a way to Increase Application
Improving
Application Performance
through Swap Compression
R. Cervera, T. Cortes,
Y. Becerra and S. Lucas
Motivation
Problem
Large applications
Highly-loaded systems
Laptops
Laptops usually have less resources than desktops
Homework
Home machines are usually smaller than office machines
Solution
Swapping mechanism
But … IT IS VERY SLOW !!!
Objectives
Increase swapping performance
Without adding more memory
Increase swapping space
Without adding more disk
Performance is more important than space
Modify the kernel as little as possible
Mechanism feasible in user-level libraries
Traditional Swapping Path
Process B
Process C
Process D
Process A
Page Fault
Swap In
Swap Out
Operating System
Swap Device
Proposal
Cache for swapped pages
If the system uses memory for the cache
Then the applications lose this memory
Problem:
Less fast memory
More slow memory
Compression of swapped pages
More than one compressed page per buffer
Compressed Swapping v1.0
Operating System
D
D
C
Cache
Swap Device
Hit Ratio and Its Kinds
There are two kind of cache hits
Hits due write
Page recently swapped out
Hits due read
Page brought to the cache with another request
Many more hits due write than hits due read
Different swap-in and swap-out order
No spatial locality
Reads produce interferences in the cache
Optimizations
Different read and write paths
Reads do not place information in the cache
Reads use the cache (hits due write)
Reads do not modify the cache
Writes place information in the cache
Writes modify the cache
Batched writes
Write many cache buffers to disk at the same time
Write them contiguously
A page is never split among two buffers
Reads perform, at most, one disk access
Compressed Swapping v2.0
Operating System
D
D
C
Cache
Swap Device
Evaluated Environment
Environment
Pentium II - 350Mhz
64 Mbytes of memory
Ultra-SCSI disk, 128 Mbytes swap partition
Bencmarks
Bench
fft
fft x10
sort
sort x6
simulator
simulator x5
xanim
xanim x4
processes
1
10
1
6
1
5
1
4
I/O Compression
none
64.9%
none
61.2%
begin/end
46.5%
begin/end
51.3%
begin
6.7%
begin
1.1%
begin/end
27.8%
begin/end
37.9%
Performance Evaluation
Speedups better than 1.2
FFT
Speedup = 0.96
Bad compression ratio
Working set
Simulator x5
Good compression ratio
All pages fit in the cache
1,5
1,0
0,5
0,0
xanim
Better performance
2,0
sim
Cache size = 1Mbyte
Cleaning threshold =50%
sort
2,5
fft
Parameters
6,5
Cache-size Influence
No perfect cache size
Large caches are bad
Use too much application
memory
Recomended size
Around 1Mbyte
2,5
2
1,5
1
fft
fft x10
sort
sort x6
sim
sim x5
xanim
xanim x4
0,5
0
0
1024 2048 3072 4096
Batched-write Influence
Small values are bad
2,5
Small writes
Latency is not hidden
2,0
Large values
Mono-process bench
1,5
Good
Multi process bench
1,0
Reads confilct with
cleans
0,5
Recommended value
Around 10%
0,0
fft
fft x10
sort
sort x6
sim
sim x5
xanim
xanim x4
0 10 20 30 40 50 60 70
New-swap Capacity
Compression ratio
Fragmentation
160
128
96
64
32
0
xanim
Depends on
192
sim
Allows larger applications
224
sort
256
fft
Some improvement
achieved
Related Work
Compressed cache (Douglis93)
No difference between reads and writes
Limited batched writes
Performance gains only for applications with high
compression ratio
Single process benchmarks
MagnaRAM
Memory compression for Windows
Not very good results
Conclusions
Simple mechanism to:
Increase performance of large applications
Increase swap space
Easy to implement
Modified
6 routines and 2 files
Added
9 routines and 1 include file (.h)
Easy to port from one version to another
15 minutes
Can be used in out-of-core applications