ols-2007-resource_management_beancounters
Download
Report
Transcript ols-2007-resource_management_beancounters
Resource Management:
Beancounters
Pavel Emelianov
[email protected]
Denis Lunev
[email protected]
Kirill Korotaev
[email protected]
Agenda
Current state of resource management in the
Linux kernel
Beancounters overview
User memory management
I/O accounting
Kernel memory management
Network buffers accounting
Performance
Current state
Per-process accounting and limiting (rlimits)
Manages individual processes
Memory limits are mostly ignored by the kernel
Group-based management
Absent
Global statistics
Not suitable for group isolation
Operating system resources
Memory
CPU time
IO bandwidth
Networking bandwidth
Disk space
Agenda
Current state of resource management in the
Linux kernel
Beancounters overview
User memory management
I/O accounting
Kernel memory management
Network buffers accounting
Performance
Beancounters basics
A beancounter manages a group of tasks
Resource counters parameters
held – the current consumption level
limit – the maximal allowed level of consumption
barrier – the "shortage warn" line – each resource
controller may take some precautions
fails – the number of allocation rejects
Beancounter is assigned once during process
lifetime
Accounting details
User space
Process
Kernel space
Beancounter
kernel object
Beancounters controlled resources
User memory
Miscellaneous
resources
Length of mappings
RSS
Number of tasks
Locked pages
Number of files
Dirty page cache
Number of sockets
Number of file locks
Number of PTYs
Number of signals
Active dentry cache
Kernel memory
Network buffers
Agenda
Current state of resource management in the
Linux kernel
Beancounters overview
User memory management
I/O accounting
Kernel memory management
Network buffers accounting
Performance
User memory management
VMA lengths accounting
Graceful rejects of VM region allocation
Take precautions against overcommitment
RSS accounting
Real memory usage
OOM killer priorities
Dirty page cache accounting
IO statistics and scheduling
VMA lengths accounting
“Lengths of mappings” resource
“RSS” resource
Task address space
Reclaimable VMAs
Unused pages
VMAs classification
Used pages
Unreclaimable VMAs
Pages classification
unreclaimable:
private and anonymous
unused:
parts of mapped regions
reclaimable:
shared file mappings
used:
touched pages
VMA lengths accounting pros'n'cons
Pros
The way to track the
host commitment level
Graceful rejects of
address space
growths
Cons
Hard limiting of
address space growth
RSS accounting
First touch
N Touches
beancounter
page
page beancounter
Drawbacks
Additional pointer on the struct page
Extra locking during page faults
Shared pages accounting
Account the page to the first beancounter
Account a whole page for each beancounter
Non uniform statistics for similar beancounters
The values accounted are not related to the actual
memory usage
Account page's fractions the all beancounters
The “middle” way used in the beancounters
Page fractions accounting
Algorithm benefits
BC1
¼
1O(1) algorithm of
½
adding and removing
BC4
¼
BC2
The sum of RSS on all
BC3
beancounters is an
amount¼
of all actually
used pages
¼
½
Agenda
Current state of resource management in the
Linux kernel
Beancounters overview
User memory management
I/O accounting
Kernel memory management
Network buffers accounting
Performance
Dirty page cache accounting
First touch
N Touches
Dirty
Clean
IO beancounter
Last unmap
Unmap
RSS accounting pros'n'cons
Pros
Node memory
utilization statistics
Asynchronous IO
scheduling
Ground for fair page
reclamation
Cons
Performance issues
Memory consumption
by auxiliary data
structures
Agenda
Current state of resource management in the
Linux kernel
Beancounters overview
User memory management
I/O accounting
Kernel memory management
Network buffers accounting
Performance
Kernel memory management
Reason
Limited normal zone
Mainly for 32-bit arches
Major problem
Object freeing context
Reference counters
RCU
Kernel MM data structures (pages)
Buddy page allocator
page
Additional pointer on
the struct page
struct vm_struct
Vmalloc
0th page's pointer
...
Kernel MM data structures (slab)
Array of pointers after the slab
kmem_bufctl_t[N]
...
N objects
...
struct slab
beancounters
...
Kernel MM drawbacks
A slab can carry less objects
Slabs could become “offslab”
Slab name
Size-32
Size-64
Size-128
Size-256
Size-512
Size-1024
Size-2048
Size-4096
# of objects
Before
After
113
101
59
56
30
29
15
15
8
8
4
4
2
2
1
1
Offslab-ness
Before
After
–
–
–
–
–
–
–
–
+
+
+
+
+
+
+
+
Kernel MM pros'n'cons
Pros
Tracking of kernel
memory usage
Cons
No (all are already
optimized out)
Agenda
Current state of resource management in the
Linux kernel
Beancounters overview
User memory management
I/O accounting
Kernel memory management
Network buffers accounting
Performance
Network buffers accounting
Mainstream accounting shortcomings
slab overhead is not included
up to 30% for usual Ethernet frames
unpredictable difference for non-ethernet MTU
no way to recalculate skb->truesize
Implementation basics
Separate accounting for
send and receive buffers
TCP and all the other types of traffic
Implementation is straightforward:
account actual memory usage for objects with
undefined or infinite lifetime
select(2) compatibility
Buffer space guarantees
Packets context handling
beancounter
process
SKB
socket
SKB
Network
Agenda
Current state of resource management in the
Linux kernel
Beancounters overview
User memory management
I/O accounting
Kernel memory management
Network buffers accounting
Performance
Performance
RSS accounting – the bottleneck
No RSS
Test name
%
Process creation
97%
Execl Throughtput
99%
Pipe Throughtput
100%
Shell Scripts
96%
File Read
99%
File Write
101%
Full
%
91%
91%
99%
87%
98%
99%
Main future directions
Optimization
Pre-charging
On-demand accounting
Active dentry cache
RSS
RSS limits
Kernel memory
VMAs lengths
Page reclamation
Better TCP window management
That's all folks
Questions?
Comments?
http://download.openvz.org/~xemul/