Transcript ppt - MMLab
Googling the Internet
Unconstrained Endpoint Profiling
Ionut Trestian, Supranamaya Ranjan,
Alekandar Kuzmanovic, Antonio Nucci
Reviewed by Lee Young Soo
Introduction
For understanding
what people are
doing on the
Internet
Analyze
operational
network trace.
Obtaining ‘raw’ packet trace from operational
networks can be very hard.
Accurately classifying in an online fashion at
high speeds is an inherently hard problem.
Unconstrained Endpoint Profiling
Introduction of a novel methodology.
No operational traces are available
Packet-level traces are available
Sampled flow-level traces are available
Internet access trend analysis for four world
regions.
Methodology
Rule Generation
Querying Google using a sample ‘seed set’ of random IP
address from the networks in four world regions.
Constrain top N keywords that could be meaningfully
used for endpoint classification.
Methodology
Methodology
Web Classifier
Rapid URL search
Hit text search
Example URL : www.robtex.com/dns/32.net.ru.html
Methodology
IP tagging
URL based tagging
General hit text based tagging
Hit text based tagging for Forums
Post-date & username is in the vicinity of the IP address
=> forum user
Presence of following keywords
:http:\, ftp:\, ppstream:\, mms:\
=> http share, ftp share, streaming node
Methodology
Examples
200.101.18.182-inforum.insite.com
URL based tagging
61.172.249.13-ttzai.com
Hit text based tagging for Forum
Information come from
Web logs
Proxy logs
Forums
Malicious list
Server list
P2P communication
Evaluation
When No Traces are Available.
When Packet-Level Trace are Available.
When Sampled Trace are Available.
When No Traces are Available
Applying the unconstrained endpoint approach
on a subset of the IP range belonging to four
ISPs shown in above table.
When No Traces are Available
When No Traces are Available
Correlation with operational traces.
Correlation with other sources.
Unconstrained endpoint profiling
approach can be effectively used to
estimate application popularity trends.
When Packet-Level Trace are Available
BLINC
UEP
Off-line tool
Superior classification
result
Cannot classify
particularly at
application level
Variable quality result
for different traces
Efficiently operate
online
When Packet-Level Trace are Available
Collect most popular 5% of IP address and tag
them by applying the methodology.
Use this information to classify the traffic flow.
When Packet-Level Trace are Available
When Sampled Trace are Available
Due to sampling, insufficient amount of data
remains in the trace, and hence the graphlets
approach simply does not work.
Popular endpoint are still present in the trace,
despite sampling.
When Sampled Trace are Available
Endpoint approach remains largely unaffected
by sampling.
Endpoint Profiling
Endpoint Clustering
Employ clustering in networking has been done
before : Autoclass algorithm.
A set of tagged IP addresses from region’s network
Input to the endpoint clustering algorithm.
Endpoint Profiling
Browsing, browsing and chat or mail seems to
be most common behavior.
Endpoint Profiling
Traffic Locality
Conclusion
UEP
Accurately predict application and protocol usage trends when no
network traces are available.
Dramatically out perform when packet traces are available.
Retain high classification capabilities when flow-level traces are
available.
Profile endpoints residing at four different world regions.
Network applications and protocols used in these region.
Characteristics of endpoint classes that share similar access patterns.
Clients’ locality properties.