Transcript ppt

The Devil and Packet Trace
Anonymization
Authors:
Ruoming Pang, Mark Allman, Vern Paxson
and Jason Lee
Published: ACM SIGCOMM Computer Communication
Review, Volume 36 , Issue 1 ,January 2006
Presenter: Ping Wang
Overview
Problem
How to anonymize the packet traces before
released
Goal
Try to preserve as much as possible information
Background
Why share?
Verify the previous results
Compare to the competing ideas on the same
data
Provide a broader view
Who share?
NLANR’s PMA packet traces
CAIDA’s skitter measurement
LBNL’s internal traffic
Background cont.
Available anonymization tools
tcpdpriv
Ipsumdump
tcpurify
Not general enough, and most of them focus on
only the header field, primarily IP addresses
A New tool - tcpmkpub
Provides a general framework for
anonymizing traces
It is based on explicit rules for each
header field
An example specification
 All fileds must be specified with a name, length,
action(“KEEP”, “ZERO”, function)
An example specification cont.
Supports case statement for the header
fields which can vary
Anonymization Policy
 Checksums
 Link layer
 Network layer
 Transport layer
Checksums
Replace the original checksum C0 with Cc
For those cannot be verified checksum
The packet has been corrupted
Insert “1”
The original packet is truncated
Use Cc (note in meta-data)
For those checksum is optional, like UCP,
use zero as the checksum
Link layer
Ethernet address is 6 bytes
High 3 bytes represent the NIC vendor
Scrambling the entire 6 byte address is not
good for research
Scrambling only the lower 3 bytes is not
good for the vendor
Remapping these two parts seperately
Network layer (1) – focus on IP address
External addresses
Use the prefix-preserving address
anonymization scheme proposed in other paper
Internal addresses
not use prefix-preserving address
anonymization scheme
Use a prefix which is not used by external
addresses within anonymous packet
subnet and host portions are mapped
seperately.
Network layer (1)
Scanners
Many organizations run a scanner as part of
security operation
Trend to hit addresses in some order, like a.b.c.1,
a.b.c.2, a.b.c.3, etc.
Keep the scanner’s IP address uniform across the
trace, and flag it in the meta-data. And for the
destinations of the sans, use different mapping. For
exmaple: X1, X2 belongs to one subnet Y
Not involve scanner, map to X’1, X’2 in subnet Y’
Involve scanner, map to X’’1, X’’2 in subnet Z1 and Z2
Network layer (3)
Multicast addresses
preserved
Private addresses
preserved
Invalid addresses
Remap it as the subnet existed, but note this
information in the meta-data.
Transport layer
Preserve both port numbers and sequence
numbers
Rewrite timestamp options
Transform the timestamp into separate
increasing counters
Reason: Clock drift manifest in timestamp
options can be leveraged to fingerprint a
physical machine
Testing
Can the transformed traces really be used?
Use p0f to do OS fingerprinting
Use tcpsum to find the number of packets and
bytes in both the original and transformed
traces
Test cont.
Are the transformed traces really
anonymous?
Check tcpmkpub’s own log file
Look for some string in the anonymized traces
e.g. “Document”, “Setting”, “ConfirmFIleOp”
Look for like IP addresses
Look for string versions of IP addresses
MAC addresses
Check timestamps
Paper contributions
Develop a tool, tcpmkpub, for
implementing arbitrary anonymization
policy;
Use meta-data to help researchers to deal
with lost information
Invalid checksum, scanner IP
Beyond IP address obfuscation, explore
many other dangerous details
timestamp, Ethernet addresses, etc.
Paper weaknesses
Only give two experiments to show the
anonymized traces are useful
Could have given some anonymization
results to make the policy more clear.
For example, in the scanner case, addresses
a.b.c.1, a.b.c.2, a.b.c.3, what they would look
like if they are involved in scaning traffic, and
what if not
Future work
Keep more consistency between the
original and anonymized traces
Study online anonymization
Provide a tool which can be easily used for
validation the anonymized traces
Provide a tool for creating an
anonymization policy for tcpmkpub
Questions?