Swap Compresion: a way to Increase Application

Download Report

Transcript Swap Compresion: a way to Increase Application

Improving
Application Performance
through Swap Compression
R. Cervera, T. Cortes,
Y. Becerra and S. Lucas
Motivation
 Problem



Large applications
Highly-loaded systems
Laptops
 Laptops usually have less resources than desktops

Homework
 Home machines are usually smaller than office machines
 Solution

Swapping mechanism
 But … IT IS VERY SLOW !!!
Objectives
 Increase swapping performance

Without adding more memory
 Increase swapping space

Without adding more disk
 Performance is more important than space
 Modify the kernel as little as possible
 Mechanism feasible in user-level libraries
Traditional Swapping Path
Process B
Process C
Process D
Process A
Page Fault
Swap In
Swap Out
Operating System
Swap Device
Proposal
 Cache for swapped pages


If the system uses memory for the cache
Then the applications lose this memory
Problem:
 Less fast memory
 More slow memory
 Compression of swapped pages

More than one compressed page per buffer
Compressed Swapping v1.0
Operating System
D
D
C
Cache
Swap Device
Hit Ratio and Its Kinds
 There are two kind of cache hits

Hits due write
 Page recently swapped out

Hits due read
 Page brought to the cache with another request
 Many more hits due write than hits due read

Different swap-in and swap-out order
 No spatial locality
 Reads produce interferences in the cache
Optimizations
 Different read and write paths

Reads do not place information in the cache
 Reads use the cache (hits due write)
 Reads do not modify the cache

Writes place information in the cache
 Writes modify the cache
 Batched writes

Write many cache buffers to disk at the same time
 Write them contiguously
 A page is never split among two buffers

Reads perform, at most, one disk access
Compressed Swapping v2.0
Operating System
D
D
C
Cache
Swap Device
Evaluated Environment
 Environment



Pentium II - 350Mhz
64 Mbytes of memory
Ultra-SCSI disk, 128 Mbytes swap partition
 Bencmarks
Bench
fft
fft x10
sort
sort x6
simulator
simulator x5
xanim
xanim x4
processes
1
10
1
6
1
5
1
4
I/O Compression
none
64.9%
none
61.2%
begin/end
46.5%
begin/end
51.3%
begin
6.7%
begin
1.1%
begin/end
27.8%
begin/end
37.9%
Performance Evaluation

Speedups better than 1.2
 FFT



Speedup = 0.96
Bad compression ratio
Working set
 Simulator x5


Good compression ratio
All pages fit in the cache
1,5
1,0
0,5
0,0
xanim
 Better performance
2,0
sim

Cache size = 1Mbyte
Cleaning threshold =50%
sort

2,5
fft
 Parameters
6,5
Cache-size Influence
 No perfect cache size
 Large caches are bad

Use too much application
memory
 Recomended size

Around 1Mbyte
2,5
2
1,5
1
fft
fft x10
sort
sort x6
sim
sim x5
xanim
xanim x4
0,5
0
0
1024 2048 3072 4096
Batched-write Influence
 Small values are bad

2,5
Small writes
 Latency is not hidden
2,0
 Large values

Mono-process bench
1,5
 Good

Multi process bench
1,0
 Reads confilct with
cleans
0,5
 Recommended value

Around 10%
0,0
fft
fft x10
sort
sort x6
sim
sim x5
xanim
xanim x4
0 10 20 30 40 50 60 70
New-swap Capacity


Compression ratio
Fragmentation
160
128
96
64
32
0
xanim
 Depends on
192
sim
Allows larger applications
224
sort

256
fft
 Some improvement
achieved
Related Work
 Compressed cache (Douglis93)




No difference between reads and writes
Limited batched writes
Performance gains only for applications with high
compression ratio
Single process benchmarks
 MagnaRAM

Memory compression for Windows
 Not very good results
Conclusions
 Simple mechanism to:


Increase performance of large applications
Increase swap space
 Easy to implement

Modified
 6 routines and 2 files

Added
 9 routines and 1 include file (.h)

Easy to port from one version to another
 15 minutes
 Can be used in out-of-core applications