ols-2007-resource_management_beancounters

Download Report

Transcript ols-2007-resource_management_beancounters

Resource Management:
Beancounters
Pavel Emelianov
[email protected]
Denis Lunev
[email protected]
Kirill Korotaev
[email protected]
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Current state


Per-process accounting and limiting (rlimits)

Manages individual processes

Memory limits are mostly ignored by the kernel
Group-based management


Absent
Global statistics

Not suitable for group isolation
Operating system resources

Memory

CPU time

IO bandwidth

Networking bandwidth

Disk space
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Beancounters basics

A beancounter manages a group of tasks

Resource counters parameters


held – the current consumption level

limit – the maximal allowed level of consumption

barrier – the "shortage warn" line – each resource
controller may take some precautions

fails – the number of allocation rejects
Beancounter is assigned once during process
lifetime
Accounting details
User space
Process
Kernel space
Beancounter
kernel object
Beancounters controlled resources




User memory

Miscellaneous
resources

Length of mappings

RSS

Number of tasks

Locked pages

Number of files
Dirty page cache

Number of sockets

Number of file locks

Number of PTYs

Number of signals

Active dentry cache
Kernel memory
Network buffers
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
User memory management



VMA lengths accounting

Graceful rejects of VM region allocation

Take precautions against overcommitment
RSS accounting

Real memory usage

OOM killer priorities
Dirty page cache accounting

IO statistics and scheduling
VMA lengths accounting
“Lengths of mappings” resource
“RSS” resource
Task address space
Reclaimable VMAs

Unused pages
VMAs classification
Used pages

Unreclaimable VMAs
Pages classification

unreclaimable:
private and anonymous

unused:
parts of mapped regions

reclaimable:
shared file mappings

used:
touched pages
VMA lengths accounting pros'n'cons

Pros

The way to track the
host commitment level

Graceful rejects of
address space
growths

Cons

Hard limiting of
address space growth
RSS accounting
First touch
N Touches
beancounter
page
page beancounter
Drawbacks

Additional pointer on the struct page

Extra locking during page faults
Shared pages accounting

Account the page to the first beancounter


Account a whole page for each beancounter


Non uniform statistics for similar beancounters
The values accounted are not related to the actual
memory usage
Account page's fractions the all beancounters

The “middle” way used in the beancounters
Page fractions accounting
Algorithm benefits
BC1

¼
1O(1) algorithm of
½
adding and removing

BC4
¼
BC2
The sum of RSS on all
BC3
beancounters is an
amount¼
of all actually
used pages
¼
½
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Dirty page cache accounting
First touch
N Touches
Dirty
Clean
IO beancounter
Last unmap
Unmap
RSS accounting pros'n'cons

Pros

Node memory
utilization statistics

Asynchronous IO
scheduling

Ground for fair page
reclamation

Cons

Performance issues

Memory consumption
by auxiliary data
structures
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Kernel memory management
Reason

Limited normal zone

Mainly for 32-bit arches
Major problem

Object freeing context

Reference counters

RCU
Kernel MM data structures (pages)

Buddy page allocator

page
Additional pointer on
the struct page
struct vm_struct

Vmalloc

0th page's pointer
...
Kernel MM data structures (slab)

Array of pointers after the slab
kmem_bufctl_t[N]
...
N objects
...
struct slab
beancounters
...
Kernel MM drawbacks

A slab can carry less objects

Slabs could become “offslab”
Slab name
Size-32
Size-64
Size-128
Size-256
Size-512
Size-1024
Size-2048
Size-4096
# of objects
Before
After
113
101
59
56
30
29
15
15
8
8
4
4
2
2
1
1
Offslab-ness
Before
After
–
–
–
–
–
–
–
–
+
+
+
+
+
+
+
+
Kernel MM pros'n'cons

Pros

Tracking of kernel
memory usage

Cons

No (all are already
optimized out)
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Network buffers accounting
Mainstream accounting shortcomings

slab overhead is not included

up to 30% for usual Ethernet frames

unpredictable difference for non-ethernet MTU
no way to recalculate skb->truesize

Implementation basics


Separate accounting for

send and receive buffers

TCP and all the other types of traffic
Implementation is straightforward:

account actual memory usage for objects with
undefined or infinite lifetime

select(2) compatibility

Buffer space guarantees
Packets context handling
beancounter
process
SKB
socket
SKB
Network
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Performance

RSS accounting – the bottleneck
No RSS
Test name
%
Process creation
97%
Execl Throughtput
99%
Pipe Throughtput
100%
Shell Scripts
96%
File Read
99%
File Write
101%
Full
%
91%
91%
99%
87%
98%
99%
Main future directions

Optimization

Pre-charging



On-demand accounting



Active dentry cache
RSS
RSS limits


Kernel memory
VMAs lengths
Page reclamation
Better TCP window management
That's all folks


Questions?
Comments?
http://download.openvz.org/~xemul/