Anomaly Detection in Communication Networks
Download
Report
Transcript Anomaly Detection in Communication Networks
Association Mining via
Co-clustering of Sparse Matrices
Brian Thompson*, Linda Ness†,
David Shallcross†, Devasis Bassu†
*
†
Clustering
Given: A sparse binary matrix
Goal: Cluster the rows so that similar rows are in the
same cluster
R1
R2
R3
Challenges:
Don’t know the number of clusters a priori
Need solution to be efficient; making all pairwise
comparisons is too expensive
Association Mining via Co-clustering of Sparse Matrices
Co-Clustering
Given: A sparse binary matrix
Goal: Cluster the rows and columns so that they form
large, dense biclusters
R1
R2
R3
Challenges:
C1 C2 C3
Don’t know the number of clusters a priori
Need solution to be efficient; making all pairwise
comparisons is too expensive
Association Mining via Co-clustering of Sparse Matrices
The 1-minute talk
What do we want to do?
Association Mining via Co-clustering of Sparse Matrices
Event
Timestamp
Alice: “Go to WSDM!”
Feb. 7, 2012 9:45pm
Bob Chris: (private)
Feb. 8, 2012 12:30am
Chris: “RT @Alice ...”
Feb. 8, 2012 12:37am
Edge-centric
Alice
Node-centric
Dave
Dave
Alice
Bob
Bob
Chris
Chris
Eve
Eve
3.0
18.7
0.3
13.6
Association Mining via Co-clustering of Sparse Matrices
The 2-minute talk
What is our approach?
Association Mining via Co-clustering of Sparse Matrices
Problem Description
Given: A network (G;T)
discrete-event sequence:
G = (V,E) is a graph
T is a set of discrete-event sequences
corresponding to elements of G
Goals:
Identify recent correlated activity
Measure influence of one entity on another
Challenges:
Scalability - comparing every set or even pair of entities is
too expensive
Variability – different entities have very different properties
Association Mining via Co-clustering of Sparse Matrices
Approach
8:00 am
User:
10:00 am
12:00 pm
NOW!
alice1337
recency
User: bob_iz_kool
pairwise gap
Inter-arrival Time Distribution
xmin
xmax
Association Mining via Co-clustering of Sparse Matrices
The 5-minute talk
How does our model address
temporal variability in a network?
Association Mining via Co-clustering of Sparse Matrices
The REWARDS Model
REneWal theory Approach for Real-time Data Streams
We model a stream of communication data as a
renewal process: a sequence of time-stamped events
sampled from a distribution of inter-arrival times (IATs)
Inter-Arrival Time Distribution
xmin
xmax
Discrete-event sequence:
t1 t2
t3 t4
Association Mining via Co-clustering of Sparse Matrices
t5
The REWARDS Model
REneWal theory Approach for Real-time Data Streams
Given a stream of time-stamped events, we estimate
the parameters of the renewal process for each node
or edge based on its event inter-arrival times
Inter-Arrival Time Distribution
xmin
xmax
Discrete-event sequence:
t1 t2
t3 t4
Association Mining via Co-clustering of Sparse Matrices
t5
Recency
Goal: highlight recent activity
Key idea: more recent = more relevant
8:00 am
User:
10:00 am
12:00 pm
NOW!
alice1337
User: bob_iz_kool
We define the recency of Φ at time 𝑡 to be:
𝑅𝑒𝑐Φ 𝑡 = 1 −
𝐴𝑔𝑒 ∗
𝐹Φ
𝐴𝑔𝑒Φ 𝑡
This normalizes the data so we can compare sequences
with vastly different temporal properties.
Association Mining via Co-clustering of Sparse Matrices
Pairwise Gaps
Goal: measure influence of entity A on entity B
Key idea: study pairwise (A,B)-gaps
8:00 am
10:00 am
12:00 pm
NOW!
User: alice1337
User: bob_iz_kool
We can derive the limit distribution
𝐺𝑎𝑝 ∗
𝐹Φ,Ψ
of pairwise
gaps between consecutive (Φ, Ψ) event pairs assuming
independence, and compare with actual data.
Association Mining via Co-clustering of Sparse Matrices
Divergence
Based on the Kolmogorov-Smirnov statistic:
Fn(x)
F(x)
1
Compares EDF Fn(x)
to hypothetical CDF F(x)
0.8
0.6
0.4
𝑲𝑺 𝑭𝒏 || 𝑭 = 𝐬𝐮𝐩 𝑭𝒏 (𝒙) − 𝑭(𝒙)
KS = 0.32
0.2
0
0
0.2
0.4
0.6
0.8
1
Recency divergence compares recency values for a set
of nodes or edges to the Triangle(0,1) distribution
Gap divergence compares pairwise (A,B)-gaps to the
theoretical distribution if A and B were independent
Association Mining via Co-clustering of Sparse Matrices
LBNL Case Study
Association Mining via Co-clustering of Sparse Matrices
Acknowledgements/Disclaimer
This research was supported by the Intelligence
Advanced Research Projects Activity (IARPA) via Air
Force Research Laboratory (AFRL) contract number
FA8650-10-C-706. The U.S. Government is authorized to
reproduce and distribute reprints for Governmental
purposes notwithstanding any copyright annotation
thereon. The views and conclusions contained herein are
those of the authors and should not be interpreted as
necessarily representing the official policies or
endorsements, either expressed or implied, of IARPA,
AFRL, or the U.S. Government.
Any misinformation, mistakes, or misunderstanding
resulting from this talk are solely the fault of the speaker.
Association Mining via Co-clustering of Sparse Matrices
Association Mining via Co-clustering of Sparse Matrices