CMU Active Learning Talk

Download Report

Transcript CMU Active Learning Talk

Network Mapping and
Anomaly Detection
Athina Markopoulou (Irvine)
Robert Calderbank (Princeton)
Rob Nowak (Madison)
MURI Kickoff Meeting September 19, 2009
Outline
Challenges
- Applications
- Mathematics
Preliminary Results
- Detecting Malicious Traffic Sources
(Athina Markopoulou)
- Network Topology Id
- Network-wide Anomaly Detection
Research Directions
Application Challenges
Network Mapping: Infer network topology/connectivity
from minimal measurements
Detecting Topology Changes: Quickly sense changes
in routing or connectivity
Network-wide Anomalies: Detect weak and distributed
patters of anomalous network activity
Predicting Malicious Traffic: Identify network sources
that are likely to launch future attacks
Mathematical Challenges
Vastly Incomplete Data: Impossible to monitor a network
everywhere and all the time. Where and when should we
measure?
Large-scale Inference: Inference of high-dimensional
signals/graphs from noisy and incomplete data. Robust
statistical data analysis and scalable algorithms are crucial.
Network Representations: Statistical analysis matched to
network structures. Can network data be ‘sparsified’ using
new representations and transformations?
Network Prediction Models: New ‘network-centric’ statistical
methods are needed to cluster network nodes for robust
prediction from limited datasets.
Predicting Malicious Traffic Sources
Predictive Blacklisting
as an Implicit Recommendation System
• Problem: predict sources of malicious traffic on the Internet
– Blacklists:
• list of worst offenders (source IP addresses or prefixes)
• used to block (or to further scrub) traffic originating from those sources
– Goal:
• Predict malicious sources that are likely to attack a victim in the future based on
past logs
• Prediction vs. Detection
• strictly speaking, this is not “detection”
• but it does require finding patterns in the data
Traditional Blacklist Generation
• Local Worst Offender List (LWOL)
–
–
Most prolific local offenders
Reactive but not proactive
• Global Worst Offender List (GWOL)
–
–
–
Most prolific global offenders
Might contain irrelevant offenders
Non prolific attackers are elusive to GWOL
• State of the art: Collaborative Blacklisting
–
–
–
–
J. Zhang, P. Porras, and J. Ullrich, “Highly Predictive Blacklisting”, USENIX
Security 2008 (best paper award)
Key idea: A victim is likely to be attacked not only by sources that attacked
this particular, but also by sources that attacked “similar victims”
Methodology: Use link-analysis (pagerank) on the victims similarity graph to
predict future attacks
First methodological development in this problem a long time!
Formulating Predictive Blacklisting
as an Implicit Recommendation System
Recommendation system
Predictive Blacklisting
(e.g. Netflix, Amazon)
Attackers
Users
2
?
?
-
13
-
1
6
?
3
?
1
4
9
Victims
Items ( movies)
3
?
2
?
?
?
3
?
?
37
--
?3
2
R = Rating Matrix
?
--
?
4
?
4
1
??
12?
--
1
1?
2?
??
?
9
-27?
8?
?21
-?
??
6?
-11
2
?
?
?
?
R(t) = Attack Matrix
?
?
-
8
Collaborative Filtering (CF)
different techniques capture different patterns in the data
• Multi-level Prediction
– Individual level: (attacker, victim)
• use time series to project past trends
– Local level: neighborhood-based CF
• group “similar” victim networks,(knn)
– notion of similarity accounts for common attackers and time
• groups of attackers attacking the same victims
– find them using the cross-association (CA) method
– [Global level: factorization-based CF (in progress)]
• find latent factors in the data using, e.g. SVD
• Combine ratings from different predictors
9
Tested our approach on Dshield data
6-month of logs
• Dshield.org is a central repository of shared logs
– Several victim organizations submit their IDS logs (flow data)
– The repository analyzes the logs and provides a predictive blacklist,
tailored to each victim
Princeton
UCI
Dshield.org
Several different patterns co-exist in the data
and should be detected and used for prediction
11
Preliminary results
• A combination of methods significantly improves the
hit count of the blacklist
– up to 70% (57% on avg) compared to the state-of-the art (HPB)
Combined method
State-of-the-art
(HPB)
Older method
(GWOL)
• and there is much room for improvement
Challenges & Future Directions
• Get closer to the upper bound
– Latent factor techniques
– Dealing with missing data
• Adversarial model
• Scalability
• Hopefully interactions with other people in this group
• F. Soldo, A. Le, A. Markopoulou, http://arxiv.org/abs/0908.2007
Network Mapping
Network Mapping
Existing Methods:
Active probing
(e.g., traceroute)
Lumeta Corporation
New Approach:
Passive monitoring
The Data
Hop-counts from end-hosts to monitors;
extracted from TTL fields of traffic at monitors
end-hosts
monitors
5
8
13 10 11
9
2
8
16 7
4
4
5
14 12
12
4
7
2
14 4
5
9
12 8
5
11 4
13 6
5
7
7
12 15
15
2
5
8
10 9
6
12 11 15 5
6
4
9
10 5
5
8
7
1
12 14 2
5
6
13 2
6
7
7
12 10 7
5
13
15 10
10
10 3
10
8
10 9
8
11
11
15 5
4
3
7
4
7
4
6
9
11
?
10 4
18 11 10
10
5
6
13 10 5
9
4
10 11 16
16
?8
Clustering End-Hosts
honeypots
end-hosts
5
13 10
2 8
7
4 4
12
4
14
12 8
11 4
6
5 7
15
8
9
12
6 4 9
10 5 5 8
12 14
15 10
6 7 10
7
10
11
4
7
4
6
10 4
10
13 10 5
10 11 16
Problem: Use hop-count data to automatically
cluster end-hosts into topologically relevant
groups (e.g., subnets)
Intuition: End-hosts with similar hop-counts are
probably close together
Challenge: Clustering with missing data
2-d histogram of hopcounts; ellipses indicate
end-hosts from different
subnets
Matrix Completion
5
8
13 10 1
9
2
8
16 7
4
4
5
14 12
4
7
2
14 4
5
9
12 8
5
11 4
13 6
5
7
7
12 15
2
5
8
10 9
6
12 11 15 5
8
10 9
6
4
9
10 5
5
8
7
1
12 14 2
5
6
13 2
6
7
7
12 10 7
15 10
10 3
8
11
15 5
4
3
7
4
7
4
6
9
5
10 4
18 11 10
5
6
13 10 5
9
4
10 11 16
SVD of hop-count
matrix is low-rank
r
13 10
2 8
7
4 4
12
4
14
12 8
11 4
6
5 7
15
8
9
12
6 4 9
10 5 5 8
12 14
15 10
6 7 10
7
10
11
4
7
4
6
10 4
10
13 10 5
10 11 16
observed hop-counts are
random samples of entries of
complete hop-count matrix
Results
mixture model
clusters from complete data
0
fraction of complete data
1
clusters from 25% data
Network-wide Anomaly Detection
Distributed Network Anomaly Detection
Observation model:
Binary pattern 0/1
Signal strength
unknown
Weak in strength:
signal
Invisible in per node signature
Weak in extent: # affected nodes
Invisible in network wide aggregate
Detecting weak and sparse patterns
Prior work: Can detect weak and unstructured patterns by exploiting
multiplicity. (Ingster, Jin-Donoho’03, Abramovich et al ’01)
Subtle adaptive testing procedures: Higher criticism, False discovery control
Localizable
signal strength
# active nodes
sparsity
Now you see it, now you don’t
Network anomaly patterns
Interactions
Nodes
In addition to multiplicity, can we exploit the (possibly non-local) dependencies
between node measurements to boost performance?
Method must be adaptive to network interaction structure.
How do node interactions effect thresholds of detectability/localizability?
Modeling network anomaly patterns
Latent multi-scale
dependencies
Observed network node
measurements
Latent multi-scale Ising model :
strength of interaction
# edge agreements
Hierarchical clustering and basis learning
Hierarchical correlation clustering
Unbalanced Haar Basis
Theorem: Consider a latent multi-scale Ising model with uniform node interaction strength .
With probability
,
(1) the correct dependency structure (tree) can be learnt using
observations x by hierarchical correlation clustering.
(2) the number of non-zero basis coefficients for an x drawn at random is is
i.i.d network
.
Detection of anomalies in transform domain
Weak patterns are amplified by the sparsifying transform adapted to network topology, while
noise characteristics remain the same.
Signal is focused
and strength
is amplified
Wavelet coefficients
Network data
Detection vs. Signal Strength
1
0.9
Detection using
Wavelet coefficients
0.8
0.7
0.6
0.5
Detection using
Original data
0.4
0.3
0.2
0.1
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Coherent activation patterns result in few non-zero basis
coefficients and can be detected with much smaller signal strength
Internet anomaly detection
unknown
network
Monitor
Sample delay covariance matrix
Example basis vectors learnt from O(log p) network
measurements using hierarchical clustering
Compression achieved for real Internet RTT data
Research Directions
Active Sensing: Sequential algorithms that automatically
decide where, what and when to measure?
Online Large-scale Inference: on-line and near real-time
network monitoring to detect topology changes and traffic
anomalies.
Wireless Network Sensing: Exploitation of sparsity and
diversity in wireless networks for fast and robust
identification of network-wide characteristics.
New Network Representations: Relationships between
wavelet representations and persistent homology.
Extra slides
Network Discovery
10
6
network structure is unknown;
infer network
routing/topology by
‘triangulation’
5
13
11
Network Discovery
10
6
?
Unfortunately, many
hop-counts are not
observed
5
13
11
?