Automatic for the people

Download Report

Transcript Automatic for the people

Automatic for the people:
Reducing inadvertent leaks by
personal machines
Landon Cox
Duke University
Inadvertent leaks
• Usability and privacy: A Study of Kazaa ...
‣ Good and Krekelberg, CHI, 2003
‣ In 12 hours, found 150 inboxes on Kazaa
‣ Observed people downloading dummy inbox
• Problem hasn’t gone away
Stories from 2009
Technical solution?
Process
Servers: Asbestos, HiStar, Flume
Languages: Jif, Laminar, Resin
Desktop: PrivacyScope, TightLip
Reference
monitor
Process
Process
Automation
Network
IPC
Policy
User
Files
Admin
Dev
Automatic policy specific.
• State of the art: pattern matching
‣ Look for strings that look like SSNs, CCs, etc.
‣ find_SSNs, Firefly, SENF, Spider, etc.
‣ A bit brittle and error-prone
‣ High false positive/negative rates
• Let’s take a different approach
Key observations
1) Personal machines often cache sensitive data
2) Servers force clients to access files using crypto
3) Crypto is general technique, used across admin. domains and applications
RedFlag overview
•
•
Identifies processes that store decrypted data
‣
Unobtrusive (requires no user input)
‣
Compatible with legacy applications
‣
Compatible with existing Internet protocols
High-level insights
‣
Stop trying to figure out what sensitive data looks like
‣
Use heuristics of how sensitive data is handled
Caveats
•
We cannot stop all inadvertent leaks
‣
•
Stop large, important class of leaks
Trust and threat model
‣
Uncompromised host
‣
No IP spoofing or DNS hijacking
‣
Correct, trusted reference monitor (take your pick)
‣
Buggy/absent access-control policies
RedFlag system overview
Monitor
sockets
Inspect
process
Compose
rules
Monitoring sockets
•
•
Goal
‣
Try to identify incoming encrypted data
‣
Only at application level (e.g., SSL)
Easy for most widely used apps
‣
•
Look at remote port (e.g., 443 or 993)
Not always sufficient
‣
Non-standard ports: Skype, Groove, Groupwise
‣
XMPP sends SSL, non-SSL data to same port (5222/TCP)
Information entropy
•
Compute entropy score for ambiguous ports
‣
Negligible performance overhead
‣
If score above threshold (~7.9 bits/byte), invoke inspection process
•
Can induce false positives
‣
Compressed data sent in the clear (e.g., mp3s)
‣
On-the-fly compression schemes (e.g., http content-coding=gzip)
•
Luckily, doesn’t need to be 100% accurate
‣
Really just a performance optimization to save work
‣
Only used as a first-pass filter
‣
Correct any mistakes in inspection phase
RedFlag system overview
Monitor
sockets
Inspect
process
Compose
rules
Inspect process
•
Goals of inspection
‣
Infer when file write depends on network read
‣
Determine whether file write is decrypted data
•
Use taint-tracking
‣
Too slow to perform in critical path of desktop apps
‣
Perform asynchronously via deterministic replay
‣
Fork if network monitor flags process (port or entropy)
‣
Log libc calls in original, use log in replay process
‣
Attach taint-tracker to replayed process (e.g., PIN)
‣
Perform analysis on a free core in the background
Taint tracking
• Implement with PIN
‣ Rewrite instructions to propagate taint
‣ Record taint in shadow memory
• Key questions
‣ What are the taint sources?
‣ What info to send to the policy composer?
Address space
ID
Source
1
74.125.45.83:443
2
10.212.1.3:443
...
-
“/tmp/attach.pdf, 74.125.45.83:443”
Taint label (byte)
<!DOCTYPE html
PUBLIC ...
}
63
000001
}
}
Shadow memory
Fine when there is no ambiguity about the source
But what about ambiguous ports?
Ambiguous ports
• Search process memory for AES s-boxes
‣ S-boxes are set by algorithm designer
‣ S-boxes are unlikely to appear randomly
‣ (also look for well-known transformations)
Ambiguous ports
•
If we find s-boxes in a library data section
‣
Assume image is a crypto library
‣
Vast majority of crypto libraries include AES implementation
•
Instrument lib to set “crypto bit” of inbound taint labels
‣
If crypto bit == 1, network data was “routed” through crypto lib
‣
If crypto bit == 0, assume network data was not decrypted
•
Taint label (byte)
Also use s-boxes as taint source
‣
‣
Data derived from s-boxes have “AES bit” set
1 1 0 0 0 0 0 1
Can use to gauge strength of crypto algorithm
}
AES bit
Crypto bit
ID index
RedFlag system overview
Monitor
sockets
Inspect
process
Compose
rules
Compose rules
• Taint-tracking gives three pieces of info
‣ Description of network source
‣ If data was routed through crypto library
‣ If data was derived from AES s-box
• Can use this to compose policies
Compose rules
•
•
Same source
‣
Allow sensitive files to be copied back to their source
‣
Raise alert otherwise
‣
Generalize hostnames (e.g., *.google.com)
Obfuscation vs. confidentiality
‣
Many P2P clients use crypto to obfuscate
‣
Aren’t trying to protect data so use weak algorithms
‣
(e.g., BitTorrent and LimeWire explicitly do not support AES)
‣
If ambiguous port + no AES, then ignore file
RedFlag implementation
• Runs on Ubuntu 8.10
• Modified Jockey for logging/replay
‣
Supports multi-threaded programs
‣
User-level thread library
• PIN tool for tainting
‣
Based on sequential taint tracker from Speck
‣
Modified to allow tainting during replay
‣
Implemented s-box search, crypto and AES bits in taint
label
Evaluation
•
•
Accuracy
‣
How well can RedFlag identify crypto libraries using s-boxes?
‣
How well does RedFalg categorize sensitive files?
Performance
‣
Will asynchronous taint-tracking fall behind?
Identifying crypto libraries
• Looked at 10 Ubuntu programs
‣
Email: checkgmail, thunderbird
‣
IM: pidgin
‣
P2P: Azureus, Limewire, Skype, Transmission
‣
Web: Firefox, Opera, wget
• Successfully identified crypto libs in all
‣
Including custom implementations, plugins (flash player)
‣
Interesting case: Opera folds crypto into exectable
Categorizing sensitive files
• Non-sensitive files
‣ Used Firefox
‣ Loaded 30 most popular webistes (alexa)
‣ RedFlag produced no false positives/negatives
• Sensitive files
‣ Downloaded 17 representative sensitive docs
‣ Firefox, thunderbird, pidgin
Categorizing sensitive files
Taint-tracking performance
Conclusions
•
RedFlag automates policy specification
‣
Heuristic-based approach
‣
Monitor process behavior, not file content
‣
Sensitive files usually downloaded using crypto
‣
Deal with ambiguous ports using entropy scores, AES s-boxes
•
Evaluation highlights
‣
Automatically identified crypto libraries
‣
Correctly categorized files in 45/47 scenarios
‣
No false positives, three false negatives
‣
Sufficient idle time in long-running process
Thanks!
I’m happy to take questions