Traffic Monitoring and Analysis

Download Report

Transcript Traffic Monitoring and Analysis

James Won-Ki Hong
Department of Computer Science and Engineering
POSTECH, Korea
[email protected]
POSTECH
CSED702Y: Software Defined Networking
1/46
Outline





Introduction
Motivation
Research Issues and Goals
Active Monitoring Techniques
Passive Monitoring Techniques
POSTECH
CSED702Y: Software Defined Networking
2/47
POSTECH
CSED702Y: Software Defined Networking
3/47
Introduction (1/9)
 Growth of Internet Users
 The Number of Internet users is growing
Source : www.internetworldstats.com
POSTECH
CSED702Y: Software Defined Networking
4/47
Introduction (2/9)
 Growth of Internet Users
 Internet traffic has increased dramatically
(Exabyte = 1 million
terabytes = 260 bytes)
Source: Cisco
POSTECH
CSED702Y: Software Defined Networking
5/47
Introduction (3/9)
 Stand-alone applications can now utilize networking
 Cooperative editing: Abiword, ACE, MS SharePoint Workspace
 Browser-based software: Google Docs, Google Wave
 Game console: Microsoft XBOX, Sony Playstation, Nintendo Wii
 Network applications
 Online games, shopping, banking, stock trading, network storage,
P2P applications
 VOD, EOD (Education on Demand), VOIP, IPTV
Online game
POSTECH
VoIP
CSED702Y: Software Defined Networking
VOD
6/47
Introduction (4/9)
 Client-Server
 Traditional structure
server
 Peer-to-Peer (P2P)
client
 New concept between file sharing and transferring
 Generates high volume of traffic
discovery, content,
transfer query
peer
peer
peer
Structures of applications are changing!
POSTECH
CSED702Y: Software Defined Networking
7/47
Introduction (5/9)
 Types of Traffic
 Static sessions vs. Dynamic sessions
connect
connect
Negotiate
&
allocate
use static
protocol,
port
use dynamic
protocol, port
disconne
ct
disconne
ct
control
data
 Bursty data transfer vs. Streaming data transfer
packet
network
packet
network
Types of traffic are various and increasing!
POSTECH
CSED702Y: Software Defined Networking
8/47
Introduction (6/9)
 Internet Protocol Distribution
protocol
Flows
Packets
Bytes
TCP
32,515
14.4%
1,797,176
86.3%
1,339,396,630
96.8%
UDP
54,561
24.2%
141,769
6.8%
27,812,586
2.0%
ICMP
138,253
61.3%
141,247
6.7%
15,720,410
1.1%
Others
125
0.0%
474
0.0%
32,160
0.0%
2003.09.16 – 19:36
POSTECH Internet Junction Traffic
 Transport Protocol Distribution
 The amount of UDP flows is increasing by P2P applications
 The amount of ICMP flows is increasing by Internet worms
POSTECH
CSED702Y: Software Defined Networking
9/47
Introduction (7/9)
 Internet Protocol Distribution
protocol
Flows
Packets
Bytes
TCP
42,533
5.8%
1,677,721
38.7%
1,288,490,188
39.9%
UDP
678,800
93.4%
2,621,440
60.5%
1,932,735,283
59.9%
ICMP
4,452
0.6%
31,256
0.7%
2,516,582
0.1%
Others
445
0.0%
3,099
0.0%
570,726
0.0%
2011.03.28 – 18:15
POSTECH Internet Junction Traffic
 Transport Protocol Distribution
 The amount of UDP flows is increasing by P2P, gaming &
multimedia streaming applications
POSTECH
CSED702Y: Software Defined Networking
10/47
Introduction (8/9)
 Port Number Usage in TCP/UDP
 Port number distribution in bytes
?
<1024
>=1024
2%
41%
?
98%
59%
TCP Server Listening Port Number Distribution
<1024
>=1024
UDP Port Number Distribution
 Proportion of Internet applications
?
54%
21%
20%
HTTP
FTP
TELNET
SMTP
Others
5%
2003.09.16 – 19:36
0%
POSTECH
CSED702Y: Software Defined Networking
POSTECH Internet Junction Traffic
11/47
Introduction (9/9)
 Port Number Usage in TCP/UDP
 Port number distribution in bytes
?
0.75%
?
0.18%
< 1024
< 1024
99.25%
Others
Others
99.82%
UDP Port Number Distribution
TCP Server Listening Port Number Distribution
 Proportion of Internet applications
11.403%
2.484%
?
http
ssl
tcp encap.
84.986%
smtp
pop
rtsp
ssh
2011.03.28 – 18:15
POSTECH Internet Junction Traffic
Others
POSTECH
CSED702Y: Software Defined Networking
12/47
Motivation (1/2)
 Needs of Service Providers
 Understand the behavior of their networks
 Provide fast, high-quality, reliable service to satisfy customers and
thus reduce churn rate
 Plan for network deployment and expansion
 SLA monitoring, Network security
 Increase Revenue!
• Usage-based billing for network users (like telephone calls)
• Marketing using CRM data
 Needs of Customers
 Want to get their money’s worth
 Fast, reliable, high-quality, secure, virus-free Internet access
To Satisfy Service Providers’ Needs to Satisfy Their Customers!
POSTECH
CSED702Y: Software Defined Networking
13/47
Motivation (2/2)
 Application Areas








POSTECH
Network Problem Determination and Analysis
Traffic Report Generation
Intrusion & Hacking Attack (e.g., DoS, DDoS) Detection
Service Level Monitoring (SLM)
Network Planning
Usage-based Billing
Customer Relationship Management (CRM)
Marketing
CSED702Y: Software Defined Networking
14/47
Issues in Traffic Monitoring
 Choices
 Single-point vs. Multi-point monitoring
• Number of probing or test packet generation point
 In-service vs. Out-of-service monitoring
• Whether monitoring should be executed during service or not
 Continuous vs. On-demand monitoring
• Monitoring executes continuously or by on-demand.
 Packet vs. Flow-based monitoring
• Collect packets or flows from network devices.
 One-way vs. Bi-directional monitoring
• Monitor forward path only / forward and return path
 Trade-offs




POSTECH
Network bandwidth
Processing overhead
Accuracy
Cost
CSED702Y: Software Defined Networking
15/47
Problems
 Capturing Packets





High-speed networks (Mbps  Gbps  Tbps)
High-volume traffic
Streaming media (Windows Media, Real Media, Quicktime)
P2P traffic
Network Security Attacks
 Flow Generation & Storage
 What packet information to save to perform various analysis?
 How to minimize storage requirements?
 Analysis
 How to analyze and generate data needed quickly?
 What kinds of info needs to be generated?  Depends on
applications
POSTECH
CSED702Y: Software Defined Networking
16/47
Research & Development Goals
 Develop Methods to





Capture all packets
Generate flows
Store flows efficiently
Analyze data efficiently
Generate various reports or information that are suitable for various
application areas
 Develop a Flexible, Scalable Traffic Monitoring and
Analysis System for
 High-speed
 High-volume
 Rich media IP networks
POSTECH
CSED702Y: Software Defined Networking
17/47
POSTECH
CSED702Y: Software Defined Networking
18/47
Network Monitoring Metrics (1/5)
Connectivity
Availability
Functionality
One way loss
Loss
RT loss
Network Monitoring
Metrics
One way delay
Delay
RT delay
Delay variance
Capacity
Utilization
Bandwidth
Throughput
POSTECH
CSED702Y: Software Defined Networking
19/47
Network Monitoring Metrics (2/5)
 Availability
 The percentage of a specified time interval during which the system
was available for normal use
 What is supposed to be available?
• Service, Host, Network
 Availabilities are usually reported as a single monthly figure
• 99.99% availability means that the service is unavailable for 4 minutes during a
month
 One can test availability by sending suitable packets and observing
the answering packets (latency, packet loss)
 Metrics
• Connectivity: the physical connectivity of network elements
• Functionality: whether the associated system works well or not
POSTECH
CSED702Y: Software Defined Networking
20/47
Network Monitoring Metrics (3/5)
 Packet Loss
 The fraction of packets lost in transit from a host to another during a
specified time interval
 Internet packet transport works on a best-effort basis, i.e., a router
may drop them depending on its current conditions
 A moderate level of packet loss is not in itself tolerable
• Some real-time services, e.g., VoIP, can tolerate some packet losses
• TCP resends lost packets at a slower rate
 Metrics
• One way loss
• Round Trip (RT) loss
POSTECH
CSED702Y: Software Defined Networking
21/47
Network Monitoring Metrics (4/5)
 Delay (Latency)
 The time taken for a packet to travel from a host to another
 Round Trip Time (RTT)
• Forward transport delay + server delay + backward transport delay
 Forward transport delay is often not the same as backward
transport delay (may use different paths)
 For streaming applications, high delay or delay variation (jitter) can
cause degradation on user-perceived QoS
 Metrics
• One way delay
• Round Trip Time (delay)
• Delay variance (jitter)
POSTECH
CSED702Y: Software Defined Networking
22/47
Network Monitoring Metrics (5/5)
 Throughput
 The rate at which data is sent through the network, usually
expressed in bytes/sec, packets/sec, or flows/sec
 Be careful in choosing the interval; a long interval will average out
short-term bursts in the data rate
• A good compromise is to use one- to five-minute intervals, and to produce daily,
weekly, monthly, and yearly plots
 Link Utilization over a specified interval is simply the throughput for
the link expressed as a percentage of the access rate
 Metrics
• Link Capacity (Mbps, Gbps)
• Throughput (bytes/sec, packets/sec, flows/sec)
• Utilization (%)
POSTECH
CSED702Y: Software Defined Networking
23/47
Traffic Monitoring Approaches (1/4)
Passive Monitoring
Active Monitoring
POSTECH
CSED702Y: Software Defined Networking
24/47
Traffic Monitoring Approaches (2/4)
 Active Monitoring
 Performed by sending test (probe) traffic into network
• Generate test packets periodically or on-demand
• Measure performance of test packets or responses
• Take the statistics
 Impose extra traffic on network and distort its behavior in the process
 Test packet can be blocked by firewall or processed at low priority by
routers
 Mainly used to monitor network performance
Test packet
generator
Test packet
probe
Response
Probe
POSTECH
Target
host
CSED702Y: Software Defined Networking
25/47
Traffic Monitoring Approaches (3/4)
 Passive Monitoring
 Carried out by observing network traffic
• Collect packets from a link or network flow from a router
• Perform analysis on captured packets for various purposes
 Network device performance degrades by mirroring or flow export
 Used to perform various traffic usage/characterization analysis or
intrusion detection
Network link
Packet
Capture
Flow
Generation
Router
POSTECH
Traffic
Analysis
Traffic
Information
Flow
Data
CSED702Y: Software Defined Networking
26/47
Traffic Monitoring Approaches (3/4)
 Comparison of Two Monitoring Approaches
Active Monitoring
Passive Monitoring
Configuration
Data size
Multi-point
Small
Network overhead
Additional traffic
Purpose
Delay, packet loss, availability
Single or multi-point
Large
 Device overhead
 No overhead if splitter is used
Throughput, traffic pattern, trend,
& detection
CPU Requirement
Low to Moderate
Advantages
Gain some benefits at the initial
stage of network construction,
because not much data gained
from passive one
Disadvantages
 Cannot reflect network
characteristics
 Need to generate the probe
messages which may cause
extra overhead to network
POSTECH
High
 Measured result may show
the real network
characteristics
 Does not need to generate
additional probe messages
 Captured data has massive
volume size
 Should have additional facility
to capture the mirrored packet
from network
CSED702Y: Software Defined Networking
27/47
POSTECH
CSED702Y: Software Defined Networking
28/47
Active Monitoring Techniques
 ICMP-based Method


Diagnose network problems
Availability / Round-trip delay / Round-trip packet loss
 TCP-based Method


One-way bandwidth / Round trip bandwidth
Bulk transfer rate
 UDP-based Method
 One-way packet loss / Round trip bandwidth
POSTECH
CSED702Y: Software Defined Networking
29/47
ICMP-based Method (1/5)
 Active Monitoring – ICMP
 Internet Control Message Protocol (ICMP), RFC 792
 The purpose of ICMP messages is to provide feedback about
problems in the IP network environment
 Delivered in IP packets
 ICMP message format
• 4 byte of ICMP header and optional message
POSTECH
CSED702Y: Software Defined Networking
30/47
ICMP-based Method (2/5)
 ICMP Functions
 To announce network errors
• If a network, host, port is unreachable, ICMP Destination Unreachable Message
is sent to the source host
 To announce network congestion
• When a router runs out of buffer queue space, ICMP Source Quench Message is
sent to the source host
 To assist troubleshooting
• ICMP Echo Message is sent to a host to test if it is alive - used by ping
 To announce timeouts
• If a packet’s TTL field drops to zero, ICMP Time Exceeded Message is sent to the
source host - used by traceroute
POSTECH
CSED702Y: Software Defined Networking
31/47
ICMP-based Method (3/5)
 ICMP Drawbacks
 ICMP messages may be blocked (i.e., dropped) by firewall and
processed at low priority by router
 ICMP has also received bad press by being used in many denial of
service (DoS) attacks and because of the number of sites
generating monitoring traffic
 As a consequence some ISPs disable ICMP even though this
potentially causes poor performance and does not comply with
RFC1009 (Internet Gateway Requirements)
 In spite of these limitations, ICMP is still most widely used in active
network measurements
POSTECH
CSED702Y: Software Defined Networking
32/47
ICMP-based Method (4/5)
 Ping
 A simple application that runs on a host, typically supplied as part of
the host's operating system
 Uses ICMP ECHO_REQUEST and ECHO_RESPONSE packets
 Provides round-trip time and packet loss
 For average measurement, run ping at regular intervals so as to
measure the site's latency and packet loss
POSTECH
CSED702Y: Software Defined Networking
33/47
ICMP-based Method (5/5)
 Traceroute
 Produces a hop-by-hop listing for each router along the path to the
target host
 For each hop, it prints the round-trip time for the router
 Algorithm: uses ICMP and TTL field in the IP header
• Send an ICMP packet with TTL=1
• First router sends back ICMP TIME_EXCEEDED
• Then send ICMP packet with TTL=2 and hear back from the second
router
• Continue till the destination is reached or TTL expires (default max
TTL=30)
 It shows you only the forward path
• The reverse path is seldom the same
• To trace the reverse path one must run traceroute on the remote host
(reverse traceroute server, Looking Glass Server)
POSTECH
CSED702Y: Software Defined Networking
34/47
TCP-based Method
TCP – Throughput
NTP Synchronized hosts
Measurement
Source Machine
Measurement
Destination Machine
TCP
local time : t1
t1
100 KB
t2
Throughput (Mbps) =
POSTECH
local time : t2
105 x 8
t2(㎲) – t1(㎲)
CSED702Y: Software Defined Networking
35/47
UDP-based Method
UDP – One Way Loss
NTP Synchronized hosts
Measurement
Source Machine
Measurement
Destination Machine
UDP
1 Packet (1000 Byte)
100 KB
100 KB
One way Loss =
(%)
POSTECH
100 -
Received Packet Counts
x 100
Sent Packet Counts
CSED702Y: Software Defined Networking
36/47
POSTECH
CSED702Y: Software Defined Networking
37/47
Packet Capturing (1/2)
 Packet Capturing
 Packets can be captured using Port Mirroring or Network Splitter (Tap)
Probe system
Probe system
Mirroring
Splitting
Port Mirroring
Network Splitter
How it works
- Copies all packets
passing on a port to
another port
- Splits the signal and sends a
signal to original path and
another to probe
Advantage
- No extra hardware
required
- No processing overhead on
router/switch
Disadvantage
- Processing overhead on
router/switch
- Splitter hardware required
POSTECH
CSED702Y: Software Defined Networking
38/47
Packet Capturing (2/2)
 Difficulties in packet capturing
 Massive amount of data
• How much packet data is generated from 100 Mbps network in an hour?
 Port speed ⅹIn&Out ⅹLink Utilization ⅹ sec/hour = throughput
100 Mbps ⅹ 2 ⅹ
0.5
ⅹ 3600
= 360 Gbps
 Throughput / avg. packet lengthⅹ bytes of packet data = data size
360 Gbps / (1500 ⅹ 8) ⅹ
30
= 1 Gbyte
 Processing of high-speed packets
• Processing time for 100 Mbps network
Port speed ⅹ In&Out ⅹ Link Utilization / average packet length
= 8333 packets/sec => 0.12 msec/packet
100 Mbps
1 Gbps
1 Tbps
Data size per hour (assume 0.5 link util) 1 Gbyte
10 Gbyte
10 Tbyte
Processing Time per packet
0.012 msec
0.012 μsec
POSTECH
0.12 msec
CSED702Y: Software Defined Networking
39/47
Sampling
 Why We Need Sampling?
 If the rate is too high to capture all packets reliably, there is no
alternative but to sample the packets
 Sampling algorithms: every Nth packet or fixed time interval
1
2
3
4
5
6
7
8
9
10
11
(a) 2:1 sampling
0 msec
1 msec
2 msec
3 msec
4 msec
(b) 1 msec sampling
POSTECH
CSED702Y: Software Defined Networking
40/47
Flow Generation
 Flow
 Flow is a collection of packets with the same {SRC and DST IP
address, SRC and DST port number, protocol number}
 Flow data can be collected from routers directly, or standalone flow
generator having packet capturing capability
 Popular flow formats
• NetFlow (Cisco), sFlow (sFlow.org), IPFIX (IETF)
 Issues in flow generation
•
•
•
•
What information should be included in a flow data?
How to generate flow data from raw packet information efficiently?
How to save bulk flow data into DB or binary file in a collector?
How long should the data be preserved?
flow 1
POSTECH
flow 2
flow 3
CSED702Y: Software Defined Networking
flow 4
41/47
Flow: NetFlow
 Cisco NetFlow
 An option configurable in Cisco routers that exports data on each IP
flow passed through an interface
 NetFlow Export Datagram
Header
· Sequence number
· Record count
· Version number
Flow
Record
Flow
Record
Flow
Record
Flow
Record
Flow
Record
 Flow format of Version5
From/To
Usage
• Packet Count
• Byte Count
• Source IP Address
• Destination IP Address
Time
of Day
• Start Timestamp
• End Timestamp
• Source TCP/UDP Port
• Destination TCP/UDP Port
Application
Port
Utilization
QoS
POSTECH
• Input Interface Port
• Output Interface Port
• Type of Service
• TCP Flags
• Protocol
•
•
•
•
•
Next Hop Address
Source AS Number
Dest. AS Number
Source Prefix Mask
Dest. Prefix Mask
CSED702Y: Software Defined Networking
Routing
and
Peering
42/47
Flow: sFlow
 sFlow
 Described in RFC 3176: “InMon Corporation's sFlow: A Method for
Monitoring Traffic in Switched and Routed Networks”
 sFlow is a monitoring technology that gives visibility into the use of
networks, enabling performance optimization, accounting/billing for
usage, and defense against security threats
 sFlow samples packets using statistical sampling theory
 Format of Version 4
• Packet Header Data
• Header Protocol (Format of sampled header)
• Frame_length
• Header bytes
• Packet IP v4 Data
•
•
•
•
•
•
POSTECH
Length
Protocol (IP Protocol Type)
src_ip / dst_ip
src_port / dst_port
TCP flags
tos
CSED702Y: Software Defined Networking
43/47
Traffic Analysis Aspects
 Spatial Aspect




The patterns of traffic flow relative to the network topology
Important for proper network design and planning
Identification of bottleneck & avoidance of congestion
Example: Flow aggregation by src, dst IP address or AS number
 Temporal Aspect




The stochastic behavior of a traffic flow, described in statistical terms
Important for resource management and traffic control
Important for traffic shaping and caching policies
Example: Packet or byte per hour, day, week, month
 Composition of Traffic
 A breakdown of traffic according to the contents, application, packet
length, flow duration
 Helps to explain its temporal and spatial characteristics
 Example: game, streaming media traffic for a week from peer ISP
POSTECH
CSED702Y: Software Defined Networking
44/47
Traffic Classification/Identification (1/2)
 Traffic Classification
 Classifying traffic based on features passively observed in the traffic,
and according to specific classification goals
 Types of Traffic Classification
 Port-based approaches
• E.g., TCP port 20 and 21  FTP, TCP port 80  HTTP
 Payload-based approaches
• E.g., “0x12BitTorrent protocol”  BitTorrent
 Machine Learning (ML)-based approaches
• Connection-related statistical information-including connection duration, interpacket arrival time, and packet
Accuracy
Strength
Weakness
Port-based
Low
Low computational cost
Low accuracy
Payload-based
High
Most accurate method
High computational cost
Exhaustive signature generation
ML-based
High
Can handle encrypted traffic
High computational cost
POSTECH
CSED702Y: Software Defined Networking
45/47
Traffic Classification/Identification (2/2)
 In the Perspective of Network Layers
Network Layer
Transport Layer
Application Layer
• IP, ARP, RARP, etc.
• TCP, UDP, ICMP, etc.
• HTTP, HTTPS, SMTP, FTP, TELNET, SSH, POP, etc.
 Classification Level in Practice (Classification Output)
Traffic clustering
• Bulk transfer, small transaction, etc.
Application-type
breakdown
• Web, game, P2P, messenger, streaming, mail, etc.
Application
protocol breakdown
Application
Breakdown
POSTECH
• HTTP, HTTPS, SMTP, FTP, TELNET, SSH, POP, etc.
• BitTorrent, MSN, NateOn, Filezilla FTP, etc.
CSED702Y: Software Defined Networking
46/47
Q&A
POSTECH
CSED702Y: Software Defined Networking
47/47