Transcript ppt
The Devil and Packet Trace
Anonymization
Authors:
Ruoming Pang, Mark Allman, Vern Paxson
and Jason Lee
Published: ACM SIGCOMM Computer Communication
Review, Volume 36 , Issue 1 ,January 2006
Presenter: Ping Wang
Overview
Problem
How to anonymize the packet traces before
released
Goal
Try to preserve as much as possible information
Background
Why share?
Verify the previous results
Compare to the competing ideas on the same
data
Provide a broader view
Who share?
NLANR’s PMA packet traces
CAIDA’s skitter measurement
LBNL’s internal traffic
Background cont.
Available anonymization tools
tcpdpriv
Ipsumdump
tcpurify
Not general enough, and most of them focus on
only the header field, primarily IP addresses
A New tool - tcpmkpub
Provides a general framework for
anonymizing traces
It is based on explicit rules for each
header field
An example specification
All fileds must be specified with a name, length,
action(“KEEP”, “ZERO”, function)
An example specification cont.
Supports case statement for the header
fields which can vary
Anonymization Policy
Checksums
Link layer
Network layer
Transport layer
Checksums
Replace the original checksum C0 with Cc
For those cannot be verified checksum
The packet has been corrupted
Insert “1”
The original packet is truncated
Use Cc (note in meta-data)
For those checksum is optional, like UCP,
use zero as the checksum
Link layer
Ethernet address is 6 bytes
High 3 bytes represent the NIC vendor
Scrambling the entire 6 byte address is not
good for research
Scrambling only the lower 3 bytes is not
good for the vendor
Remapping these two parts seperately
Network layer (1) – focus on IP address
External addresses
Use the prefix-preserving address
anonymization scheme proposed in other paper
Internal addresses
not use prefix-preserving address
anonymization scheme
Use a prefix which is not used by external
addresses within anonymous packet
subnet and host portions are mapped
seperately.
Network layer (1)
Scanners
Many organizations run a scanner as part of
security operation
Trend to hit addresses in some order, like a.b.c.1,
a.b.c.2, a.b.c.3, etc.
Keep the scanner’s IP address uniform across the
trace, and flag it in the meta-data. And for the
destinations of the sans, use different mapping. For
exmaple: X1, X2 belongs to one subnet Y
Not involve scanner, map to X’1, X’2 in subnet Y’
Involve scanner, map to X’’1, X’’2 in subnet Z1 and Z2
Network layer (3)
Multicast addresses
preserved
Private addresses
preserved
Invalid addresses
Remap it as the subnet existed, but note this
information in the meta-data.
Transport layer
Preserve both port numbers and sequence
numbers
Rewrite timestamp options
Transform the timestamp into separate
increasing counters
Reason: Clock drift manifest in timestamp
options can be leveraged to fingerprint a
physical machine
Testing
Can the transformed traces really be used?
Use p0f to do OS fingerprinting
Use tcpsum to find the number of packets and
bytes in both the original and transformed
traces
Test cont.
Are the transformed traces really
anonymous?
Check tcpmkpub’s own log file
Look for some string in the anonymized traces
e.g. “Document”, “Setting”, “ConfirmFIleOp”
Look for like IP addresses
Look for string versions of IP addresses
MAC addresses
Check timestamps
Paper contributions
Develop a tool, tcpmkpub, for
implementing arbitrary anonymization
policy;
Use meta-data to help researchers to deal
with lost information
Invalid checksum, scanner IP
Beyond IP address obfuscation, explore
many other dangerous details
timestamp, Ethernet addresses, etc.
Paper weaknesses
Only give two experiments to show the
anonymized traces are useful
Could have given some anonymization
results to make the policy more clear.
For example, in the scanner case, addresses
a.b.c.1, a.b.c.2, a.b.c.3, what they would look
like if they are involved in scaning traffic, and
what if not
Future work
Keep more consistency between the
original and anonymized traces
Study online anonymization
Provide a tool which can be easily used for
validation the anonymized traces
Provide a tool for creating an
anonymization policy for tcpmkpub
Questions?