Traffic Monitoring and Analysis
Download
Report
Transcript Traffic Monitoring and Analysis
James Won-Ki Hong
Department of Computer Science and Engineering
POSTECH, Korea
[email protected]
POSTECH
CSED702Y: Software Defined Networking
1/46
Outline
Introduction
Motivation
Research Issues and Goals
Active Monitoring Techniques
Passive Monitoring Techniques
POSTECH
CSED702Y: Software Defined Networking
2/47
POSTECH
CSED702Y: Software Defined Networking
3/47
Introduction (1/9)
Growth of Internet Users
The Number of Internet users is growing
Source : www.internetworldstats.com
POSTECH
CSED702Y: Software Defined Networking
4/47
Introduction (2/9)
Growth of Internet Users
Internet traffic has increased dramatically
(Exabyte = 1 million
terabytes = 260 bytes)
Source: Cisco
POSTECH
CSED702Y: Software Defined Networking
5/47
Introduction (3/9)
Stand-alone applications can now utilize networking
Cooperative editing: Abiword, ACE, MS SharePoint Workspace
Browser-based software: Google Docs, Google Wave
Game console: Microsoft XBOX, Sony Playstation, Nintendo Wii
Network applications
Online games, shopping, banking, stock trading, network storage,
P2P applications
VOD, EOD (Education on Demand), VOIP, IPTV
Online game
POSTECH
VoIP
CSED702Y: Software Defined Networking
VOD
6/47
Introduction (4/9)
Client-Server
Traditional structure
server
Peer-to-Peer (P2P)
client
New concept between file sharing and transferring
Generates high volume of traffic
discovery, content,
transfer query
peer
peer
peer
Structures of applications are changing!
POSTECH
CSED702Y: Software Defined Networking
7/47
Introduction (5/9)
Types of Traffic
Static sessions vs. Dynamic sessions
connect
connect
Negotiate
&
allocate
use static
protocol,
port
use dynamic
protocol, port
disconne
ct
disconne
ct
control
data
Bursty data transfer vs. Streaming data transfer
packet
network
packet
network
Types of traffic are various and increasing!
POSTECH
CSED702Y: Software Defined Networking
8/47
Introduction (6/9)
Internet Protocol Distribution
protocol
Flows
Packets
Bytes
TCP
32,515
14.4%
1,797,176
86.3%
1,339,396,630
96.8%
UDP
54,561
24.2%
141,769
6.8%
27,812,586
2.0%
ICMP
138,253
61.3%
141,247
6.7%
15,720,410
1.1%
Others
125
0.0%
474
0.0%
32,160
0.0%
2003.09.16 – 19:36
POSTECH Internet Junction Traffic
Transport Protocol Distribution
The amount of UDP flows is increasing by P2P applications
The amount of ICMP flows is increasing by Internet worms
POSTECH
CSED702Y: Software Defined Networking
9/47
Introduction (7/9)
Internet Protocol Distribution
protocol
Flows
Packets
Bytes
TCP
42,533
5.8%
1,677,721
38.7%
1,288,490,188
39.9%
UDP
678,800
93.4%
2,621,440
60.5%
1,932,735,283
59.9%
ICMP
4,452
0.6%
31,256
0.7%
2,516,582
0.1%
Others
445
0.0%
3,099
0.0%
570,726
0.0%
2011.03.28 – 18:15
POSTECH Internet Junction Traffic
Transport Protocol Distribution
The amount of UDP flows is increasing by P2P, gaming &
multimedia streaming applications
POSTECH
CSED702Y: Software Defined Networking
10/47
Introduction (8/9)
Port Number Usage in TCP/UDP
Port number distribution in bytes
?
<1024
>=1024
2%
41%
?
98%
59%
TCP Server Listening Port Number Distribution
<1024
>=1024
UDP Port Number Distribution
Proportion of Internet applications
?
54%
21%
20%
HTTP
FTP
TELNET
SMTP
Others
5%
2003.09.16 – 19:36
0%
POSTECH
CSED702Y: Software Defined Networking
POSTECH Internet Junction Traffic
11/47
Introduction (9/9)
Port Number Usage in TCP/UDP
Port number distribution in bytes
?
0.75%
?
0.18%
< 1024
< 1024
99.25%
Others
Others
99.82%
UDP Port Number Distribution
TCP Server Listening Port Number Distribution
Proportion of Internet applications
11.403%
2.484%
?
http
ssl
tcp encap.
84.986%
smtp
pop
rtsp
ssh
2011.03.28 – 18:15
POSTECH Internet Junction Traffic
Others
POSTECH
CSED702Y: Software Defined Networking
12/47
Motivation (1/2)
Needs of Service Providers
Understand the behavior of their networks
Provide fast, high-quality, reliable service to satisfy customers and
thus reduce churn rate
Plan for network deployment and expansion
SLA monitoring, Network security
Increase Revenue!
• Usage-based billing for network users (like telephone calls)
• Marketing using CRM data
Needs of Customers
Want to get their money’s worth
Fast, reliable, high-quality, secure, virus-free Internet access
To Satisfy Service Providers’ Needs to Satisfy Their Customers!
POSTECH
CSED702Y: Software Defined Networking
13/47
Motivation (2/2)
Application Areas
POSTECH
Network Problem Determination and Analysis
Traffic Report Generation
Intrusion & Hacking Attack (e.g., DoS, DDoS) Detection
Service Level Monitoring (SLM)
Network Planning
Usage-based Billing
Customer Relationship Management (CRM)
Marketing
CSED702Y: Software Defined Networking
14/47
Issues in Traffic Monitoring
Choices
Single-point vs. Multi-point monitoring
• Number of probing or test packet generation point
In-service vs. Out-of-service monitoring
• Whether monitoring should be executed during service or not
Continuous vs. On-demand monitoring
• Monitoring executes continuously or by on-demand.
Packet vs. Flow-based monitoring
• Collect packets or flows from network devices.
One-way vs. Bi-directional monitoring
• Monitor forward path only / forward and return path
Trade-offs
POSTECH
Network bandwidth
Processing overhead
Accuracy
Cost
CSED702Y: Software Defined Networking
15/47
Problems
Capturing Packets
High-speed networks (Mbps Gbps Tbps)
High-volume traffic
Streaming media (Windows Media, Real Media, Quicktime)
P2P traffic
Network Security Attacks
Flow Generation & Storage
What packet information to save to perform various analysis?
How to minimize storage requirements?
Analysis
How to analyze and generate data needed quickly?
What kinds of info needs to be generated? Depends on
applications
POSTECH
CSED702Y: Software Defined Networking
16/47
Research & Development Goals
Develop Methods to
Capture all packets
Generate flows
Store flows efficiently
Analyze data efficiently
Generate various reports or information that are suitable for various
application areas
Develop a Flexible, Scalable Traffic Monitoring and
Analysis System for
High-speed
High-volume
Rich media IP networks
POSTECH
CSED702Y: Software Defined Networking
17/47
POSTECH
CSED702Y: Software Defined Networking
18/47
Network Monitoring Metrics (1/5)
Connectivity
Availability
Functionality
One way loss
Loss
RT loss
Network Monitoring
Metrics
One way delay
Delay
RT delay
Delay variance
Capacity
Utilization
Bandwidth
Throughput
POSTECH
CSED702Y: Software Defined Networking
19/47
Network Monitoring Metrics (2/5)
Availability
The percentage of a specified time interval during which the system
was available for normal use
What is supposed to be available?
• Service, Host, Network
Availabilities are usually reported as a single monthly figure
• 99.99% availability means that the service is unavailable for 4 minutes during a
month
One can test availability by sending suitable packets and observing
the answering packets (latency, packet loss)
Metrics
• Connectivity: the physical connectivity of network elements
• Functionality: whether the associated system works well or not
POSTECH
CSED702Y: Software Defined Networking
20/47
Network Monitoring Metrics (3/5)
Packet Loss
The fraction of packets lost in transit from a host to another during a
specified time interval
Internet packet transport works on a best-effort basis, i.e., a router
may drop them depending on its current conditions
A moderate level of packet loss is not in itself tolerable
• Some real-time services, e.g., VoIP, can tolerate some packet losses
• TCP resends lost packets at a slower rate
Metrics
• One way loss
• Round Trip (RT) loss
POSTECH
CSED702Y: Software Defined Networking
21/47
Network Monitoring Metrics (4/5)
Delay (Latency)
The time taken for a packet to travel from a host to another
Round Trip Time (RTT)
• Forward transport delay + server delay + backward transport delay
Forward transport delay is often not the same as backward
transport delay (may use different paths)
For streaming applications, high delay or delay variation (jitter) can
cause degradation on user-perceived QoS
Metrics
• One way delay
• Round Trip Time (delay)
• Delay variance (jitter)
POSTECH
CSED702Y: Software Defined Networking
22/47
Network Monitoring Metrics (5/5)
Throughput
The rate at which data is sent through the network, usually
expressed in bytes/sec, packets/sec, or flows/sec
Be careful in choosing the interval; a long interval will average out
short-term bursts in the data rate
• A good compromise is to use one- to five-minute intervals, and to produce daily,
weekly, monthly, and yearly plots
Link Utilization over a specified interval is simply the throughput for
the link expressed as a percentage of the access rate
Metrics
• Link Capacity (Mbps, Gbps)
• Throughput (bytes/sec, packets/sec, flows/sec)
• Utilization (%)
POSTECH
CSED702Y: Software Defined Networking
23/47
Traffic Monitoring Approaches (1/4)
Passive Monitoring
Active Monitoring
POSTECH
CSED702Y: Software Defined Networking
24/47
Traffic Monitoring Approaches (2/4)
Active Monitoring
Performed by sending test (probe) traffic into network
• Generate test packets periodically or on-demand
• Measure performance of test packets or responses
• Take the statistics
Impose extra traffic on network and distort its behavior in the process
Test packet can be blocked by firewall or processed at low priority by
routers
Mainly used to monitor network performance
Test packet
generator
Test packet
probe
Response
Probe
POSTECH
Target
host
CSED702Y: Software Defined Networking
25/47
Traffic Monitoring Approaches (3/4)
Passive Monitoring
Carried out by observing network traffic
• Collect packets from a link or network flow from a router
• Perform analysis on captured packets for various purposes
Network device performance degrades by mirroring or flow export
Used to perform various traffic usage/characterization analysis or
intrusion detection
Network link
Packet
Capture
Flow
Generation
Router
POSTECH
Traffic
Analysis
Traffic
Information
Flow
Data
CSED702Y: Software Defined Networking
26/47
Traffic Monitoring Approaches (3/4)
Comparison of Two Monitoring Approaches
Active Monitoring
Passive Monitoring
Configuration
Data size
Multi-point
Small
Network overhead
Additional traffic
Purpose
Delay, packet loss, availability
Single or multi-point
Large
Device overhead
No overhead if splitter is used
Throughput, traffic pattern, trend,
& detection
CPU Requirement
Low to Moderate
Advantages
Gain some benefits at the initial
stage of network construction,
because not much data gained
from passive one
Disadvantages
Cannot reflect network
characteristics
Need to generate the probe
messages which may cause
extra overhead to network
POSTECH
High
Measured result may show
the real network
characteristics
Does not need to generate
additional probe messages
Captured data has massive
volume size
Should have additional facility
to capture the mirrored packet
from network
CSED702Y: Software Defined Networking
27/47
POSTECH
CSED702Y: Software Defined Networking
28/47
Active Monitoring Techniques
ICMP-based Method
Diagnose network problems
Availability / Round-trip delay / Round-trip packet loss
TCP-based Method
One-way bandwidth / Round trip bandwidth
Bulk transfer rate
UDP-based Method
One-way packet loss / Round trip bandwidth
POSTECH
CSED702Y: Software Defined Networking
29/47
ICMP-based Method (1/5)
Active Monitoring – ICMP
Internet Control Message Protocol (ICMP), RFC 792
The purpose of ICMP messages is to provide feedback about
problems in the IP network environment
Delivered in IP packets
ICMP message format
• 4 byte of ICMP header and optional message
POSTECH
CSED702Y: Software Defined Networking
30/47
ICMP-based Method (2/5)
ICMP Functions
To announce network errors
• If a network, host, port is unreachable, ICMP Destination Unreachable Message
is sent to the source host
To announce network congestion
• When a router runs out of buffer queue space, ICMP Source Quench Message is
sent to the source host
To assist troubleshooting
• ICMP Echo Message is sent to a host to test if it is alive - used by ping
To announce timeouts
• If a packet’s TTL field drops to zero, ICMP Time Exceeded Message is sent to the
source host - used by traceroute
POSTECH
CSED702Y: Software Defined Networking
31/47
ICMP-based Method (3/5)
ICMP Drawbacks
ICMP messages may be blocked (i.e., dropped) by firewall and
processed at low priority by router
ICMP has also received bad press by being used in many denial of
service (DoS) attacks and because of the number of sites
generating monitoring traffic
As a consequence some ISPs disable ICMP even though this
potentially causes poor performance and does not comply with
RFC1009 (Internet Gateway Requirements)
In spite of these limitations, ICMP is still most widely used in active
network measurements
POSTECH
CSED702Y: Software Defined Networking
32/47
ICMP-based Method (4/5)
Ping
A simple application that runs on a host, typically supplied as part of
the host's operating system
Uses ICMP ECHO_REQUEST and ECHO_RESPONSE packets
Provides round-trip time and packet loss
For average measurement, run ping at regular intervals so as to
measure the site's latency and packet loss
POSTECH
CSED702Y: Software Defined Networking
33/47
ICMP-based Method (5/5)
Traceroute
Produces a hop-by-hop listing for each router along the path to the
target host
For each hop, it prints the round-trip time for the router
Algorithm: uses ICMP and TTL field in the IP header
• Send an ICMP packet with TTL=1
• First router sends back ICMP TIME_EXCEEDED
• Then send ICMP packet with TTL=2 and hear back from the second
router
• Continue till the destination is reached or TTL expires (default max
TTL=30)
It shows you only the forward path
• The reverse path is seldom the same
• To trace the reverse path one must run traceroute on the remote host
(reverse traceroute server, Looking Glass Server)
POSTECH
CSED702Y: Software Defined Networking
34/47
TCP-based Method
TCP – Throughput
NTP Synchronized hosts
Measurement
Source Machine
Measurement
Destination Machine
TCP
local time : t1
t1
100 KB
t2
Throughput (Mbps) =
POSTECH
local time : t2
105 x 8
t2(㎲) – t1(㎲)
CSED702Y: Software Defined Networking
35/47
UDP-based Method
UDP – One Way Loss
NTP Synchronized hosts
Measurement
Source Machine
Measurement
Destination Machine
UDP
1 Packet (1000 Byte)
100 KB
100 KB
One way Loss =
(%)
POSTECH
100 -
Received Packet Counts
x 100
Sent Packet Counts
CSED702Y: Software Defined Networking
36/47
POSTECH
CSED702Y: Software Defined Networking
37/47
Packet Capturing (1/2)
Packet Capturing
Packets can be captured using Port Mirroring or Network Splitter (Tap)
Probe system
Probe system
Mirroring
Splitting
Port Mirroring
Network Splitter
How it works
- Copies all packets
passing on a port to
another port
- Splits the signal and sends a
signal to original path and
another to probe
Advantage
- No extra hardware
required
- No processing overhead on
router/switch
Disadvantage
- Processing overhead on
router/switch
- Splitter hardware required
POSTECH
CSED702Y: Software Defined Networking
38/47
Packet Capturing (2/2)
Difficulties in packet capturing
Massive amount of data
• How much packet data is generated from 100 Mbps network in an hour?
Port speed ⅹIn&Out ⅹLink Utilization ⅹ sec/hour = throughput
100 Mbps ⅹ 2 ⅹ
0.5
ⅹ 3600
= 360 Gbps
Throughput / avg. packet lengthⅹ bytes of packet data = data size
360 Gbps / (1500 ⅹ 8) ⅹ
30
= 1 Gbyte
Processing of high-speed packets
• Processing time for 100 Mbps network
Port speed ⅹ In&Out ⅹ Link Utilization / average packet length
= 8333 packets/sec => 0.12 msec/packet
100 Mbps
1 Gbps
1 Tbps
Data size per hour (assume 0.5 link util) 1 Gbyte
10 Gbyte
10 Tbyte
Processing Time per packet
0.012 msec
0.012 μsec
POSTECH
0.12 msec
CSED702Y: Software Defined Networking
39/47
Sampling
Why We Need Sampling?
If the rate is too high to capture all packets reliably, there is no
alternative but to sample the packets
Sampling algorithms: every Nth packet or fixed time interval
1
2
3
4
5
6
7
8
9
10
11
(a) 2:1 sampling
0 msec
1 msec
2 msec
3 msec
4 msec
(b) 1 msec sampling
POSTECH
CSED702Y: Software Defined Networking
40/47
Flow Generation
Flow
Flow is a collection of packets with the same {SRC and DST IP
address, SRC and DST port number, protocol number}
Flow data can be collected from routers directly, or standalone flow
generator having packet capturing capability
Popular flow formats
• NetFlow (Cisco), sFlow (sFlow.org), IPFIX (IETF)
Issues in flow generation
•
•
•
•
What information should be included in a flow data?
How to generate flow data from raw packet information efficiently?
How to save bulk flow data into DB or binary file in a collector?
How long should the data be preserved?
flow 1
POSTECH
flow 2
flow 3
CSED702Y: Software Defined Networking
flow 4
41/47
Flow: NetFlow
Cisco NetFlow
An option configurable in Cisco routers that exports data on each IP
flow passed through an interface
NetFlow Export Datagram
Header
· Sequence number
· Record count
· Version number
Flow
Record
Flow
Record
Flow
Record
Flow
Record
Flow
Record
Flow format of Version5
From/To
Usage
• Packet Count
• Byte Count
• Source IP Address
• Destination IP Address
Time
of Day
• Start Timestamp
• End Timestamp
• Source TCP/UDP Port
• Destination TCP/UDP Port
Application
Port
Utilization
QoS
POSTECH
• Input Interface Port
• Output Interface Port
• Type of Service
• TCP Flags
• Protocol
•
•
•
•
•
Next Hop Address
Source AS Number
Dest. AS Number
Source Prefix Mask
Dest. Prefix Mask
CSED702Y: Software Defined Networking
Routing
and
Peering
42/47
Flow: sFlow
sFlow
Described in RFC 3176: “InMon Corporation's sFlow: A Method for
Monitoring Traffic in Switched and Routed Networks”
sFlow is a monitoring technology that gives visibility into the use of
networks, enabling performance optimization, accounting/billing for
usage, and defense against security threats
sFlow samples packets using statistical sampling theory
Format of Version 4
• Packet Header Data
• Header Protocol (Format of sampled header)
• Frame_length
• Header bytes
• Packet IP v4 Data
•
•
•
•
•
•
POSTECH
Length
Protocol (IP Protocol Type)
src_ip / dst_ip
src_port / dst_port
TCP flags
tos
CSED702Y: Software Defined Networking
43/47
Traffic Analysis Aspects
Spatial Aspect
The patterns of traffic flow relative to the network topology
Important for proper network design and planning
Identification of bottleneck & avoidance of congestion
Example: Flow aggregation by src, dst IP address or AS number
Temporal Aspect
The stochastic behavior of a traffic flow, described in statistical terms
Important for resource management and traffic control
Important for traffic shaping and caching policies
Example: Packet or byte per hour, day, week, month
Composition of Traffic
A breakdown of traffic according to the contents, application, packet
length, flow duration
Helps to explain its temporal and spatial characteristics
Example: game, streaming media traffic for a week from peer ISP
POSTECH
CSED702Y: Software Defined Networking
44/47
Traffic Classification/Identification (1/2)
Traffic Classification
Classifying traffic based on features passively observed in the traffic,
and according to specific classification goals
Types of Traffic Classification
Port-based approaches
• E.g., TCP port 20 and 21 FTP, TCP port 80 HTTP
Payload-based approaches
• E.g., “0x12BitTorrent protocol” BitTorrent
Machine Learning (ML)-based approaches
• Connection-related statistical information-including connection duration, interpacket arrival time, and packet
Accuracy
Strength
Weakness
Port-based
Low
Low computational cost
Low accuracy
Payload-based
High
Most accurate method
High computational cost
Exhaustive signature generation
ML-based
High
Can handle encrypted traffic
High computational cost
POSTECH
CSED702Y: Software Defined Networking
45/47
Traffic Classification/Identification (2/2)
In the Perspective of Network Layers
Network Layer
Transport Layer
Application Layer
• IP, ARP, RARP, etc.
• TCP, UDP, ICMP, etc.
• HTTP, HTTPS, SMTP, FTP, TELNET, SSH, POP, etc.
Classification Level in Practice (Classification Output)
Traffic clustering
• Bulk transfer, small transaction, etc.
Application-type
breakdown
• Web, game, P2P, messenger, streaming, mail, etc.
Application
protocol breakdown
Application
Breakdown
POSTECH
• HTTP, HTTPS, SMTP, FTP, TELNET, SSH, POP, etc.
• BitTorrent, MSN, NateOn, Filezilla FTP, etc.
CSED702Y: Software Defined Networking
46/47
Q&A
POSTECH
CSED702Y: Software Defined Networking
47/47