Characterizing and detecting relayed traffic: A case study using Skype

Download Report

Transcript Characterizing and detecting relayed traffic: A case study using Skype

Characterizing and
detecting relayed traffic:
A case study using Skype
Kyoungwon Suh, Daniel R. Figueiredo,
Jim Kurose, Don Towsley
University of Massachusetts
to appear in INFOCOM 2006
Presented by Ming-Tsang
Outline
• Introduction
• Problem Definition
• Characterization of Skype-Relayed Traffic
– Experiment I: two controlled Skype nodes
– Experiment II: a controlled relay node
• Detection of Skype-Relayed Traffic
• Conclusion
Relay Node
• Acts as a bridge between remote nodes
• Advantages
– Traversing through firewall or NAT
– Avoiding congested or faulty paths
• Disadvantages
– User: slower communication, financial costs
– ISP: a large amount of relayed traffic
Multimedia Relayed Traffic
• None of previous works have investigated
multimedia traffic relays
• ISPs are likely interested in the question, “is the
traffic relayed through ours network?”
Outline
• Introduction
• Problem Definition
• Characterization of Skype-Relayed Traffic
– Experiment I: two controlled Skype nodes
– Experiment II: a controlled relay node
• Detection of Skype-Relayed Traffic
• Conclusion
A burst of packets
• A contiguous piece of a flow
– With a minimum average data rate (at least 10kbps)
– With a minimum duration (at least 30s)
• Exponential Weighted Moving Average
Ai = (1 – α)Ai-1 + αIi
– Ii: the average data rate of the i-th second of the flow
– A burst starts if Ai > R1 (α = 0.75)
– A burst ends if Ai < R2 (α = 0.15)
Problem Definition
• Consider two bursts to be carrying relayed traffic
– Opposite directions
– The same end-host within the network being
monitored
– Different end-hosts outside the monitored network
• The problem of relay detection is to determine if
two bursts that could be carrying relayed traffic
are in fact carrying relayed traffic
Statistical Metrics
• Si,j (Ei,j): the difference between the start (end)
time of bursts i and j
– Should be very small, as packets cannot be stored at
the relay node for long time
• Bi,j: the ratio between the number of bytes
carried by bursts i and j
– Should be close to one, since Skype does not
perform complex transformations of the relayed traffic
• Xi,j: the maximum cross correlation between time
series Yi and Yj
– Should have a very high degree
Outline
• Introduction
• Problem Definition
• Characterization of Skype-Relayed Traffic
– Experiment I: two controlled Skype nodes
– Experiment II: a controlled relay node
• Detection of Skype-Relayed Traffic
• Conclusion
Experiment I
Start Time Difference
• Left: All are less than 5s
• Right: Little effect when introducing packet loss
End Time Difference
• Left: All are less than 5s
• Right: longer, especially in TCP_in_TCP_out,
due to packet retransmissions and the loss of
TCP FIN
Maximum Cross-Correlation
many pkts retransmitted
in the 1th burst, while the
2nd burst will not
• UDP_in_UDP_out yields the best value, owing to not shaped
by congestion/flow control
• Regardless of transport protocol combination and packet
loss rate, 95% of all Skype-relayed traffic have a value at
least 0.37
Burst Size Ratio
• Left: very close to 1
• Right: the largest ratio in the TCP_in_TCP_out,
as many packets will be retransmitted
Outline
• Introduction
• Problem Definition
• Characterization of Skype-Relayed Traffic
– Experiment I: two controlled Skype nodes
– Experiment II: a controlled relay node
• Detection of Skype-Relayed Traffic
• Conclusion
Experiment II
Start (End) Time Difference
• Both for the vast majority are small
• Consistent with the results of the previous
experiment when no packet loss is introduced
Maximum Cross-Correlation
• Similar to the distributions of UDP_in_UDP_out
and UDP_in_TCP_out
• Because 96% of relays are UDP_in_UDP_out
and UDP_in_TCP_out types
Burst Size Ratio
• 99% of ratios lie below 1.15, indicating the vast
majority bursts in Skype have very similar sizes
Outline
• Introduction
• Problem Definition
• Characterization of Skype-Relayed Traffic
– Experiment I: two controlled Skype nodes
– Experiment II: a controlled relay node
• Detection of Skype-Relayed Traffic
• Conclusion
Heuristic for Skype Traffic
Identification
• Identify the IP address of all hosts that have
executed Skype
– Inspect the payload of packets destined to the wellknown server when verifying version
• Determine the port number used by Skype host
to send/receive traffic
– Count how many times a given source port number is
used right after a Skype version message occurs
– If the same port number is used many times to
different hosts within the next few packets, it is the
Skype port
Detection Scenario & Benchmark
• After applying the heuristic, remove non-voice flows that have no
bursts or have a very low bit rate
• The flows whose a common host is within the network, and start and
end time difference are smaller than 30s are true Skype-relayed
traffic
• 17-hour packet trace, 381 true Skype relays in 12193 possible
relayed bursts
True Positive/Negative Ratios
• True positive = (num of true Skype relays
classified as Skype relays)/(num of true
Skype relays)
• False positive = (1 – true negative) = (num of
false Skype relays classified as Skype
relays)/(num of false Skype relays)
• So, the higher value the true positive/negative
ratios are, the better the performance is
Start (End) Time Difference
• By increasing the threshold, true positive ratio↑,
but true negative ratio↓
• At 1s, both ratios are larger than 0.9
Burst Size Ratio
Maximum Cross-Correlation
• When the threshold is set to 0.55, the metric
yields both ratios 0.92
• Within the different metrics, maximum cross
correlation provides the best criteria for
detection
Multiple Metrics
• A trade-off between true positive and true
negative ratios
• To find the thresholds for multiple metrics,
perform a brute-force search over the parameter
space
• 0.96 for both ratios
–
–
–
–
11s for start time difference
13s for end time difference
1.33 for burst size ratio
0.38 for maximum cross correlation
Outline
• Introduction
• Problem Definition
• Characterization of Skype-Relayed Traffic
– Experiment I: two controlled Skype nodes
– Experiment II: a controlled relay node
• Detection of Skype-Relayed Traffic
• Conclusion
Conclusion
• Propose several metrics to characterize
relayed traffic
• With those metrics, propose a
methodology for detection of Skyperelayed traffic based on thresholds
• The approach relies solely on flow-level
traffic characterizations, rather than on
application- or protocol-specific
information
Ming-Tsang’s Thoughts
• They fully take advantage of the properties
of voice call to define the metrics
• However, are the metrics suitable for other
multimedia applications?
• Although the use of the relay node brings
advantages to network application
developers, people start thinking what
problems it would introduce.