Balancing Risk and Utility in Flow Trace Anonymization
Download
Report
Transcript Balancing Risk and Utility in Flow Trace Anonymization
Balancing Risk and Utility in
Flow Trace Anonymization
Martin Burkhart, ETH Zurich
[email protected]
Joint work with Daniela Brauckhoff, Elisa Boschi, Martin May
Motivation
Sharing of traffic measurements is crucial
Only a limited set of sources available
Reproducibility of results
Dynamics / variability of traffic
Get the big picture (e.g. Internet Storm Center)
Keep up with globalized attacks (e.g. botnets)
More and more traces are collected but not shared
Data protection legislation
Security concerns
Competitive advantage
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
2
State-Of-The-Art: Anonymization
Black Marking
Truncation
E.g. last bits of IP addresses
Permutation
Random
(Partial) Prefix-preserving IP address permutation
Enumeration
E.g. Timestamps: keep the logical order of events
Categorization
Randomization (data mining community)
K-Anonymity (data mining community)
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
3
The Tradeoff in Anonymization
It‘s a trade-off
Risk(t)
RU-Maps
Algorithm X
X
t=0.4
X
t=0.2
t: Anony. Strength
X
t=0.1
X Prefix Pres.
X t=0.7
X-Axis: Utility(t)
Y-Axis: Risk(t)
X Random Perm.
Sweet
Spot
Utility(t)
Not quantitatively studied, lack of metrics
Strongly dependent on the application / attacker model
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
4
A Case Study: IP Address Truncation
Techniques that permute IP addresses 1:1 are reversible
Characteristic object sizes/frequencies, behavioral profiling, fingerprint
active ports, exploit prefix structure
Apply IP address truncation and evaluate the risk and utility
dimensions
Lower risk:
Hosts are aggregated to subnets
IP address
8 bits trunc.
16 bits trunc.
123.45.67.89
123.45.67.0
123.45.0.0
123.45.67.123
123.45.67.0
123.45.0.0
123.45.12.34
123.45.12.0
123.45.0.0
Lower utility:
Resolution of entities is reduced
Quantifying the tradeoff: How bad is it in numbers?
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
5
Internal vs. External Prefixes
Unique Count (log)
Factor 3
Factor 53
Asymmetry in prefixes
external
Internal (AS 559)
Is this reflected in
Risk reduction?
Utility reduction?
x=8
Prefix length (32-x)
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
6
Measuring Utility of Truncated Data
Specific application: anomaly detection
Compare detection quality of scans and (D)DoS attacks
in original and truncated data
Two IP-based metrics
Unique address count
Address entropy
3 weeks of NetFlow data
~ 43 billion flows
SWITCH network
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
7
Measuring Detection Quality
Ground truth: Manual identification of scans/(D)DoS attacks
Run a Kalman filter on metric timeseries
Utility measured by AUC (area under the ROC curve)
Vary
threshold
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
8
Utility of Truncated Data
Internal metrics degrade faster than external metrics
Counts degrade faster than Entropy
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
9
Approximating Risk of Host Identification
In general: Truncation of x bits leads to
2^(32-x) prefixes with 2^x addresses per prefix
But: only a fraction (A) of potential addresses is usually
active
129.130.80.
1, 2, 3, ...
10, 11, 12,
...
240, 241, ...
254, 255
e.g. A = 10%
Hence, On average A*2^x addresses per prefix
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
10
Risk of Truncated Data
1
risk ( x) x
2 A
Ain 10.5% (total: 2.2 million)
Aout 0.08% (total: 4.3 billion)
Risk for external addresses is higher due to sparcity!
Constant offset: log 2 (
Martin Burkhart, ETH Zurich
Ain
)7
Aext
Balancing Risk and Utility in Flow Trace Anonymization
11
The Risk-Utility Tradeoff
No truncation
4 bits
8 bits
12 bits
16 bits
best tradeoff
Metric
Martin Burkhart, ETH Zurich
x
Utility
Risk
internal entropy 8
0.94
0.035
internal entropy 12
0.87
0.002
external entropy 16
0.97
0.02
Balancing Risk and Utility in Flow Trace Anonymization
12
Conclusion
We made a quantitative evaluation of the risk-utility
tradeoff in anonymization
Entropy is much more resistant to truncation than unique
counts
Risk and utility degrade faster for internal addresses
For detection of scans and (D)DoS attacks, it is possible
to get a good tradeoff with high utility and low risk
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
13
Thank You for the Attention
Martin Burkhart, ETH Zurich
Balancing Risk and Utility in Flow Trace Anonymization
14