Anomaly Detection in Communication Networks

Download Report

Transcript Anomaly Detection in Communication Networks

Association Mining via
Co-clustering of Sparse Matrices
Brian Thompson*, Linda Ness†,
David Shallcross†, Devasis Bassu†
*
†
Clustering
 Given: A sparse binary matrix
 Goal: Cluster the rows so that similar rows are in the
same cluster
R1
R2
R3
 Challenges:
 Don’t know the number of clusters a priori
 Need solution to be efficient; making all pairwise
comparisons is too expensive
Association Mining via Co-clustering of Sparse Matrices
Co-Clustering
 Given: A sparse binary matrix
 Goal: Cluster the rows and columns so that they form
large, dense biclusters
R1
R2
R3
 Challenges:
C1 C2 C3
 Don’t know the number of clusters a priori
 Need solution to be efficient; making all pairwise
comparisons is too expensive
Association Mining via Co-clustering of Sparse Matrices
The 1-minute talk
What do we want to do?
Association Mining via Co-clustering of Sparse Matrices
Event
Timestamp
Alice: “Go to WSDM!”
Feb. 7, 2012 9:45pm
Bob  Chris: (private)
Feb. 8, 2012 12:30am
Chris: “RT @Alice ...”
Feb. 8, 2012 12:37am
Edge-centric
Alice
Node-centric
Dave
Dave
Alice
Bob
Bob
Chris
Chris
Eve
Eve
3.0
18.7
0.3
13.6
Association Mining via Co-clustering of Sparse Matrices
The 2-minute talk
What is our approach?
Association Mining via Co-clustering of Sparse Matrices
Problem Description
 Given: A network (G;T)
discrete-event sequence:
 G = (V,E) is a graph
 T is a set of discrete-event sequences
corresponding to elements of G
 Goals:
 Identify recent correlated activity
 Measure influence of one entity on another
 Challenges:
 Scalability - comparing every set or even pair of entities is
too expensive
 Variability – different entities have very different properties
Association Mining via Co-clustering of Sparse Matrices
Approach
8:00 am
User:
10:00 am
12:00 pm
NOW!
alice1337
recency
User: bob_iz_kool
pairwise gap
Inter-arrival Time Distribution
xmin
xmax
Association Mining via Co-clustering of Sparse Matrices
The 5-minute talk
How does our model address
temporal variability in a network?
Association Mining via Co-clustering of Sparse Matrices
The REWARDS Model
REneWal theory Approach for Real-time Data Streams
 We model a stream of communication data as a
renewal process: a sequence of time-stamped events
sampled from a distribution of inter-arrival times (IATs)
Inter-Arrival Time Distribution
xmin
xmax
Discrete-event sequence:
t1 t2
t3 t4
Association Mining via Co-clustering of Sparse Matrices
t5
The REWARDS Model
REneWal theory Approach for Real-time Data Streams
 Given a stream of time-stamped events, we estimate
the parameters of the renewal process for each node
or edge based on its event inter-arrival times
Inter-Arrival Time Distribution
xmin
xmax
Discrete-event sequence:
t1 t2
t3 t4
Association Mining via Co-clustering of Sparse Matrices
t5
Recency
 Goal: highlight recent activity
 Key idea: more recent = more relevant
8:00 am
User:
10:00 am
12:00 pm
NOW!
alice1337
User: bob_iz_kool
 We define the recency of Φ at time 𝑡 to be:
𝑅𝑒𝑐Φ 𝑡 = 1 −
𝐴𝑔𝑒 ∗
𝐹Φ
𝐴𝑔𝑒Φ 𝑡
This normalizes the data so we can compare sequences
with vastly different temporal properties.
Association Mining via Co-clustering of Sparse Matrices
Pairwise Gaps
 Goal: measure influence of entity A on entity B
 Key idea: study pairwise (A,B)-gaps
8:00 am
10:00 am
12:00 pm
NOW!
User: alice1337
User: bob_iz_kool
 We can derive the limit distribution
𝐺𝑎𝑝 ∗
𝐹Φ,Ψ
of pairwise
gaps between consecutive (Φ, Ψ) event pairs assuming
independence, and compare with actual data.
Association Mining via Co-clustering of Sparse Matrices
Divergence
 Based on the Kolmogorov-Smirnov statistic:
Fn(x)
F(x)
1
Compares EDF Fn(x)
to hypothetical CDF F(x)
0.8
0.6
0.4
𝑲𝑺 𝑭𝒏 || 𝑭 = 𝐬𝐮𝐩 𝑭𝒏 (𝒙) − 𝑭(𝒙)
KS = 0.32
0.2
0
0
0.2
0.4
0.6
0.8
1
 Recency divergence compares recency values for a set
of nodes or edges to the Triangle(0,1) distribution
 Gap divergence compares pairwise (A,B)-gaps to the
theoretical distribution if A and B were independent
Association Mining via Co-clustering of Sparse Matrices
LBNL Case Study
Association Mining via Co-clustering of Sparse Matrices
Acknowledgements/Disclaimer
 This research was supported by the Intelligence
Advanced Research Projects Activity (IARPA) via Air
Force Research Laboratory (AFRL) contract number
FA8650-10-C-706. The U.S. Government is authorized to
reproduce and distribute reprints for Governmental
purposes notwithstanding any copyright annotation
thereon. The views and conclusions contained herein are
those of the authors and should not be interpreted as
necessarily representing the official policies or
endorsements, either expressed or implied, of IARPA,
AFRL, or the U.S. Government.
 Any misinformation, mistakes, or misunderstanding
resulting from this talk are solely the fault of the speaker.
Association Mining via Co-clustering of Sparse Matrices
Association Mining via Co-clustering of Sparse Matrices