Transcript Document

Project Schedule
Victor Gau, Yi-Hsien Wang, Trevor Bosaw,
and Jenq-Neng Hwang
2007.12.14
Current P2P Identification Products
Many Companies have products to detect P2P traffic – usually to control
or limit the amount of bandwidth these applications consume.
Products can be:
• Port Based
•DPI Based
•Flow Based
Project: Build a Streaming Media Detector

To develop a Streaming Media (Both Unicast and P2P) traffic identifier and
controller to enable networks to identify which flows contain Streaming
Media.
As carriers upgrade networks, the ability to deliver internet video is becoming
a key differentiator
Enabling theoretical 20-100mbps connectivity to the home is only important if
consumer applications (Joost, Babelgum, Xunlei, ppstream, etc.) can be
delivered with High QOS.
State-of-the-art Edge Routers can support flow based QOS
By elevating the QOS of flows needed for streaming media, the consumer
experience can be enhanced





Related work
P2P Optimized Traffic Control


Riad Hartani and Joe Neil, Caspian Networks
http://www.apricot.net/apricot2005/slides/KT8_2.pdf
Optimizing the Internet Quality of Service and Economics for the Digital
Generation


Lawrence Roberts, Anagran Inc.
http://www.itu.int/osg/spu/presentations/2006/worldtelecom2006/lroberts-itutelecom2006.pdf

Anagran flow router (FR-1000)
http://www.anagran.com/
Tasks
1. Classify network traffic



Identify Streaming Media traffic on a Flow by
Flow Basis, using Port, DPI and Flow
information.
Implement a Control Model to update and
manage new signatures for both DPI and
Flows
Analyze the accuracy of these methods on a
real world network that heavily uses
streaming media applications
http://www.apricot.net/apricot2005/slides/KT8_2.pdf
Consideration 1
 Which one should be used in this project?

Port/Signature-base identification



Accuracy of identifying known protocol
Ability to identify encrypted or new P2P protocol
Flow-based identification


Ability to identify encrypted or new P2P protocol
False positive rate (could annoy users)
 We will design our own algorithms.
Consideration 2
 What would be the performance metrics used
in this project?

Accuracy of detection on a per flow basis




Further deep dives on accuracy – per byte, per 5tuple, per packet.
Average delay on a per flow basis before
identification is accomplished
Ease of update of Patterns and Flow Behavior
False Negatives – identification of background
P2P file delivery as streaming media is very
undesireable.
Task 1
Classify Network Traffic
1.1 Capture Packets
 Construct a detailed list of current P2P and unicast streaming
methods and clients/trackers/servers, as well as methods
employed to defeat traffic shaping employed by these methods
(for example azereus supports, encryption, proxying control
traffic, etc.).
 Capture packets generated by popular Internet applications.
 P2P file Sharing and Streaming




eMule, FastTrack, BitTorrent, Gnutella, …
Joost, Babelgum, ppstream, Xunlei,
HTTP, FTP, mail, streaming, game, telnet, …
Instant Messaging Services

Skype, MSN, Yahoo! Messenger
 Capture real world packets.
Packet Information
 Src [IP, port]
 Dest [IP, port]
 Type (TCP or UDP)
 Size
 Arrival time
 Payload
1.2 Extract Characteristics
 Flow duration
 Packet counts of the flow
 Average packet rate of the flow
 Packet size distribution of the flow
 Total bytes of the flow
 Average transmission rate of the flow
…
1.3 Find Particular Flow Behavior
 Find the behavior observed by others.

T. Karagiannis et al.



F. G. Chou



Both TCP and UDP (Ratio)
Ratio of the number of distinct ports versus
number of distinct IPs
Packet size switching frequency per flow
Packet size standard deviation per flow
…
 Observe the behavior by ourselves.
1.4 Design Program
 Packet parser/analyzer
 Flower identifier based on neural network
…
1.5 Test & Improve the Program
 Use the identifier on real world data
 Optimize its performance


Detection rate
False positive rate


Per Flow
Per Byte