Introduction

Download Report

Transcript Introduction

CIT 470: Advanced Network and
System Administration
Performance Monitoring
CIT 470: Advanced Network and System Administration
Slide #1
Topics
1.
2.
3.
4.
5.
6.
Performance monitoring.
Performance tuning.
CPU
Memory
Disk
Network
CIT 470: Advanced Network and System Administration
Slide #2
Performance Monitoring
Identify which aspect of performance
Latency: delay until initial access.
Throughput: rate of transfer/processing.
Identify which system component
CPU
Memory
Disk
Network
CIT 470: Advanced Network and System Administration
Slide #3
Performance Tuning Process
1. Learn the customer’s problem.
Identify specifically what’s wrong.
2. Find the problem’s cause and fix it.
1. When does the problem occur?
2. Has anything about the system changed?
3. What critical resource is affecting performance?
3. Have the right tools.
Historical monitoring data will show what’s normal
and identify any trends.
CIT 470: Advanced Network and System Administration
Slide #4
Experimenter Effect
Monitoring the system affects performance.
Monitoring tools use system resources.
If you’ve consistently monitored system, then
monitoring won’t alter system performance.
CIT 470: Advanced Network and System Administration
Slide #5
Performance Problem Solutions
1. Get more of needed resource.
Ex: Upgrade processor, use striped disk array.
2. Reduce system requirements.
Ex: Kill processes, move services to other hosts.
3. Eliminate inefficiency and waste.
Ex: Produce a static home page every 15 minutes
instead of regenerating each access.
4. Ration resource usage.
Ex: Set process priorities with renice.
Ex: Limit process resource usage with limit.
CIT 470: Advanced Network and System Administration
Slide #6
Monitoring Processes
uptime
Provides aggregate data about system load.
ps
Shows running processes with CPU, mem usage.
top
Updated list of running processes + summaries.
vmstat
Summary data about processes and CPU usage.
CIT 470: Advanced Network and System Administration
Slide #7
Uptime
Uptime provides the following data
How long system has been running.
Number of users logged in.
Average number of runnable processes.
In last 1, 5, 15 minutes.
Want a load average under 3.
Uptime example
> uptime
17:40 up 126 days, 8:03, 6 users,
load average: 1.40, 1.03, 0.55
CIT 470: Advanced Network and System Administration
Slide #8
vmstat
•
•
•
•
•
Number of Runnable and Blocked processes.
Memory (virtual, free, buffered, cached)
Blocks/second transferred in (bi) and out (bo)
Interrupts/sec (in) and context switches/sec (cs)
CPU usage by user, system, idle, and waiting.
> vmstat 5 4
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---r b
swpd
free
buff cache
si
so
bi
bo
in
cs us sy id wa
0 0 395716 45176 211284 88480
0
0
1
2
1
2 9 3 88 0
0 0 395716 45168 211300 88480
0
0
0
50 1035 1677 0 0 100 0
0 0 395716 45168 211300 88480
0
0
0
0 1040 1670 0 0 99 0
0 0 395716 45168 211300 88480
0
0
0
0 1033 1660 0 0 100 0
CIT 470: Advanced Network and System Administration
Slide #9
Identifying CPU Shortages
1. Short-term CPU spikes are normal.
2. Consistently high number of runnable
processes (r) in vmstat.
3. Consistent high total CPU usage (sy+us).
4. High system time compared to user time
and high context switches indicates system
is thrashing between processes instead of
doing user work.
CIT 470: Advanced Network and System Administration
Slide #10
Changing Process Priorities
Nice values
Positive values lower priorities.
Negative values increase priorities.
If you know a process will be a CPU hog,
nice +5 command_name
If you detect a CPU hog after it’s started,
renice 5 PID
CIT 470: Advanced Network and System Administration
Slide #11
Managing Processes with kill
TERM (default)
Terminates process execution (Ctrl-c).
Processes can catch or ignore signal.
KILL (9)
Terminates process execution.
Processes cannot catch or ignore.
Processes waiting on I/O will not die.
STOP
Suspends process execution until SIGCONT (Ctrl-z).
Useful for moving CPU hog out of way temporarily.
CIT 470: Advanced Network and System Administration
Slide #12
Imposing Limits on Processes
CPU time
ulimit –t secs
Maximum file size
ulimit –f KB
Maximum data segment
ulimit –d KB
Maximum stack size
ulimit –s KB
Maximum physical mem
ulimit –m KB
Maximum core size
ulimit –c KB
Maximum number procs
ulimit –u n
Maximum virtual mem
ulimit –v KB
CIT 470: Advanced Network and System Administration
Slide #13
Monitoring Memory
Use free to see how memory is used.
System will use most free memory for caching.
System will swap out inactive processes.
Don’t worry until free < 5% of total memory.
Use vmstat to detect paging activity.
Page out (so) rate greater than 0 consistently.
High page in (si) rate, as system uses the paging
facility to load programs into memory.
CIT 470: Advanced Network and System Administration
Slide #14
Managing Memory
1. Improving paging capacity.
Add new swapfiles with swapon.
Add new swap partitions.
2. Improving paging performance.
Use swap partitions instead of swap files.
Distribute swap resources across disks.
3. Migrate memory hogs to another host.
4. Add more memory.
CIT 470: Advanced Network and System Administration
Slide #15
Monitoring Disk I/O
Use iostat to get per disk statistics.
Transactions per second (tps).
Blocks read/written per second.
Managing disk performance problems.
Distribute heavily used data across disks/ctrlers.
Get more or faster disks.
Use RAID or LVM striping.
CIT 470: Advanced Network and System Administration
Slide #16
iostat
> iostat 2
Linux 2.6.15-23-386 (zim)
avg-cpu:
%user
8.55
Device:
hde
hdh
hdc
avg-cpu:
Device:
hde
hdh
hdc
%nice %system %iowait
0.18
3.22
0.09
tps
0.69
0.15
0.00
%user
0.17
03/26/2007
Blk_read/s
8.18
1.33
0.00
Blk_wrtn/s
9.43
3.37
0.00
%nice %system %iowait
0.00
0.17
0.00
tps
0.33
0.00
0.00
Blk_read/s
0.00
0.00
0.00
%steal
0.00
%steal
0.00
Blk_wrtn/s
21.33
0.00
0.00
CIT 470: Advanced Network and System Administration
%idle
87.96
Blk_read
89783416
14590831
9548
Blk_wrtn
103565744
36969599
0
%idle
99.67
Blk_read
0
0
0
Blk_wrtn
128
0
0
Slide #17
Managing Disk Capacity
Detecting disk resource usage.
List all partition usage with df –h
Identify high usage directories with du
Summary data: du –s
Highest usage directories: du -k /|sort –rn
Use find to detect disk hogs.
Use find –size to search for big files.
Use –atime +X to identify files that haven’t
been used in X days.
CIT 470: Advanced Network and System Administration
Slide #18
Managing Disk Shortages
1.
2.
3.
4.
5.
Add more disks.
Move files to remote fileservers.
Eliminate unnecessary files.
Compress large infrequently used files.
Impose disk quotas on users.
Soft limit: can be violated temporarily.
Hard limit: cannot be violated.
CIT 470: Advanced Network and System Administration
Slide #19
Network Statistics
> netstat -s
Tcp:
294750 active connections
openings
18042 passive connection
openings
9 failed connection attempts
6195 connection resets received
5 connections established
90553783 segments received
90005258 segments send out
16483 segments retransmited
1389 bad segments received.
15620 resets sent
Ip:
91081007 total packets received
6 with invalid headers
28 with invalid addresses
0 forwarded
0 incoming packets discarded
91080973 incoming packets
delivered
90418413 requests sent out
Udp:
270975 packets received
336 packets to unknown port
received.
6 packet receive errors
324228 packets sent
CIT 470: Advanced Network and System Administration
Slide #20
References
1. Mark Burgess, Principles of System and
Network Administration, Wiley, 2000.
2. Aeleen Frisch, Essential System
Administration, 3rd edition, O’Reilly, 2002.
3. Mike Loukides and Gian-Paolo D.
Musumeci, System Performance Tuning,
2nd edition, O’Reilly, 2003.
4. Evi Nemeth et al, UNIX System
Administration Handbook, 3rd edition,
Prentice Hall, 2001.
CIT 470: Advanced Network and System Administration
Slide #21