Transcript ppt - MMLab

Googling the Internet
Unconstrained Endpoint Profiling
Ionut Trestian, Supranamaya Ranjan,
Alekandar Kuzmanovic, Antonio Nucci
Reviewed by Lee Young Soo
Introduction
For understanding
what people are
doing on the
Internet
Analyze
operational
network trace.
 Obtaining ‘raw’ packet trace from operational
networks can be very hard.
 Accurately classifying in an online fashion at
high speeds is an inherently hard problem.
Unconstrained Endpoint Profiling
 Introduction of a novel methodology.
No operational traces are available
Packet-level traces are available
Sampled flow-level traces are available
 Internet access trend analysis for four world
regions.
Methodology
 Rule Generation
 Querying Google using a sample ‘seed set’ of random IP
address from the networks in four world regions.
 Constrain top N keywords that could be meaningfully
used for endpoint classification.
Methodology
Methodology
 Web Classifier
 Rapid URL search
 Hit text search
 Example URL : www.robtex.com/dns/32.net.ru.html
Methodology
 IP tagging
URL based tagging
General hit text based tagging
Hit text based tagging for Forums
Post-date & username is in the vicinity of the IP address
=> forum user
Presence of following keywords
:http:\, ftp:\, ppstream:\, mms:\
=> http share, ftp share, streaming node
Methodology
 Examples
200.101.18.182-inforum.insite.com
URL based tagging
61.172.249.13-ttzai.com
Hit text based tagging for Forum
Information come from
 Web logs
 Proxy logs
 Forums
 Malicious list
 Server list
 P2P communication
Evaluation
 When No Traces are Available.
 When Packet-Level Trace are Available.
 When Sampled Trace are Available.
When No Traces are Available
 Applying the unconstrained endpoint approach
on a subset of the IP range belonging to four
ISPs shown in above table.
When No Traces are Available
When No Traces are Available
Correlation with operational traces.
Correlation with other sources.
Unconstrained endpoint profiling
approach can be effectively used to
estimate application popularity trends.
When Packet-Level Trace are Available
BLINC
UEP
Off-line tool
Superior classification
result
Cannot classify
particularly at
application level
Variable quality result
for different traces
Efficiently operate
online
When Packet-Level Trace are Available
 Collect most popular 5% of IP address and tag
them by applying the methodology.
 Use this information to classify the traffic flow.
When Packet-Level Trace are Available
When Sampled Trace are Available
 Due to sampling, insufficient amount of data
remains in the trace, and hence the graphlets
approach simply does not work.
 Popular endpoint are still present in the trace,
despite sampling.
When Sampled Trace are Available
 Endpoint approach remains largely unaffected
by sampling.
Endpoint Profiling
 Endpoint Clustering
Employ clustering in networking has been done
before : Autoclass algorithm.
A set of tagged IP addresses from region’s network
Input to the endpoint clustering algorithm.
Endpoint Profiling
 Browsing, browsing and chat or mail seems to
be most common behavior.
Endpoint Profiling
 Traffic Locality
Conclusion
 UEP
 Accurately predict application and protocol usage trends when no
network traces are available.
 Dramatically out perform when packet traces are available.
 Retain high classification capabilities when flow-level traces are
available.
 Profile endpoints residing at four different world regions.
 Network applications and protocols used in these region.
 Characteristics of endpoint classes that share similar access patterns.
 Clients’ locality properties.