Mapping Internet Sensors with Probe Response
Download
Report
Transcript Mapping Internet Sensors with Probe Response
By John Bethencourt, Jason Franklin, and Mary Vernon
Computer Sciences Department
University of Wisconsin, Madison
Published in the Proceedings of the 14th USENIX Security
Symposium
Presented by: Peter Matthews
Outline
Internet Sensor Networks
Probe Response Attack
Case Study: Sans Internet Storm Center
Simulation Results
Generalizing the attack
Countermeasures
Related Work
Conclusion
Internet Sensor Networks
A collection of systems which monitor portions of the
Internet and produce statistics related to Internet
traffic patterns and anomalies.
Log collection and analysis centers
Collaborative intrusion detection systems
Honeypots / honeynets
Network telescopes / darknets
Internet Sensor Networks
Internet sensors are useful for distributed intrusion
detection and monitoring such as:
Quickly detecting worm outbreaks and new attacks
before a large number of vulnerable systems are
compromised
Providing useful aggregation of the occurrence of
relatively rare events
Determining the prevalence of malicious activity like
port scans, DoS attacks, and botnets
Blacklisting hosts controlled by malicious users
Sensor Privacy is Critical
The integrity of an Internet sensor network is based upon the
assumption that the IP addresses of systems that serve as sensors
are secret.
If the addresses were known, information obtained by the
network could no longer be considered an accurate picture of
internet activity because attackers could
Avoid the addresses in malicious activity like port scanning and
worm spreading, allowing it to go undetected
Skew sensor statistics to hide malicious activity
Flood the addresses with extraneous activity, causing valid
information to be lost in noise
Damage is permanent
Organizations cannot easily change the IP addresses available to
them
Internet sensor networks cannot arbitrarily pick who will
participate
General Attack Idea
Probe an IP address with activity that will be reported if the
address is monitored
Wait for next report to be published, check for the activity,
and decide whether the address was monitored
Repeat for every IP address
Making the Problem Tractable
There are simply too many addresses to check sequentially
~2.1 billion valid, routable IP addresses
Most logs only submitted to the ISC hourly
So, check many in parallel
Very small fraction of IP addresses are monitored, so send
same probe to many addresses
If no activity is reported then can rule out all of them
Else, report provides the number of monitored addresses
Since activity reported by port, send probes with different
ports to run many independent tests at the same time
Divide and conquer
Partition the IP space into search intervals to manage this
parallelism
Only one TCP packet necessary for each probe
Probe Response Algorithm
The basic probe response algorithm operates in two
stages.
Stage I -- Probe the entire routable IP space to count the
number of sensors in each search interval, Si. Drop
empty search intervals.
Stage II -- Iteratively probe each remaining interval, Ri,
discarding empty intervals until all individual sensors
are located.
First Stage
Begin with list of 2.1 billion valid IP addresses and n
low utilization ports
Divide IP range into n search intervals S1, S2, …, Sn
Send TCP SYN packet on port Pi to each address in Si
Wait time interval and retrieve port report
Rule out intervals corresponding to ports with no
activity
Packets
on port P1
S1
Packets
on port P2
Packets
on port P3
S2
S3
Packets
on port Pn
Sn
Second Stage
Distribute ports among k remaining intervals R1, R2,
…, Rk, assigning n/k ports to each
For each Ri
Divide into n/k + 1 subintervals
Send a probe on port Pj to each address in the jth
subinterval
No need to probe last subinterval, as can infer number
of monitored addresses it contains from total for parent
interval
Repeat second stage with non-empty subintervals until
all addresses are marked as monitored or unmonitored
P1
P2
P3
P4
Sample Run
Noise
Ports
Reports of
Activity
561
≤5
19,364
≤ 10
41,357
≤ 15
51,959
≤ 20
56,305
≤ 25
What if other activity is
present in port reports?
For a large number of ports,
there is a very low average level
of activity
Use only ports that
consistently have less than k
reports per time interval
Send k packets in each probe
Divide the number of reports
by k and round down
Case Study:
To evaluate the threat of probe response attacks,
paper analyzes the feasibility of mapping the SANS
Internet Storm Center
One of the most important examples of systems
which collect, analyze, and report data from
Internet sensors
A challenging network to map
Over 680,000 IP addresses monitored
These are broadly scattered across the IP
address space
ISC Sensors
Currently, the ISC collects packet filter logs
Logs primarily contain failed connection attempts
More than 2,000 organizations and individuals participate
Logs are typically uploaded on an hourly basis
ISC Reports
The ISC publishes a number of
reports and statistics
Attack uses “port reports”
Lists the amount of activity on
each destination port
Simulation Scenarios
T1 attacker 1.544 Mbps of upload bandwidth
Fractional T3 attacker 38.4 Mbps of upload
bandwidth
OC6 attacker 384 Mbps of upload bandwidth
Monitored locations based on accurate, if obfuscated,
representation of the IP addresses monitored by the
ISC
Results
Bandwidth
Set of Addresses
Data Sent
Time to Map
OC6
ISC
1300 GB
2 days, 22 hours
Fractional T3
ISC
687 GB
4 days, 16 hours
T1
ISC
440 GB
33 days, 17 hours
Fractional T3
Average cluster
size ≥ 10
~600 GB
~2 days
Fractional T3
Totally Random
-- No Clustering
~860 GB
~9 days
Supersets and Subsets
Superset of the monitored addresses
E.g., only interested in simply avoiding detection
If a T1 attacker allows .001 false negative rate
Runtime reduced from 33 days and 17 hrs to 15 days and 18 hrs
Misses 26 percent of the sensors
Subset of the monitored addresses
E.g., only interested in flooding the monitored
addresses with noise
If a T3 attacker allows a .94 false positive rate
Runtime reduced from 112 to 78 hrs
Returns 3.5 million false positives
Key Simulation Results
Probe response attacks are feasible and thus pose a real
threat
Both a real set of monitored IP addresses and various
synthetic sets can be mapped in reasonable time
Mapping is possible even with very limited resources
An attacker with only a DSL line could do it in ~4 months
Time to complete only depends on upload bandwidth
Does not require significant state or complete TCP
handshakes
Thus, botnet utilization would not pose a problem
Generalizing the Attack
In our attack, an attacker gains information by:
Sending probes with different destination ports to
different ranges of IP addresses
Noting for which ports activity is reported
Using this information to determine the set of IP
addresses that could have received probes
The destination port appearing in the probe sent out
and later in the port reports is used by the attacker as a
covert channel in a message to themselves
Covert Channels
Many possible fields typically
appearing in ISN reports are
suitable for use by attackers as
covert channels in a message to
themselves
Using one or more of these
covert channels, an attacker
can encode information about a
destination IP address in a
packet
Time / date
Source IP
Source network
Source port
Destination network
Destination port
Protocol (TCP or UDP)
Packet length
Captured payload signature
Countermeasures
Existing approaches to hide sensitive report data
Hashing, Encryption, and Permutations
Hash report fields such as source address to hide data but
allow test for matching
Prefix-preserving permutations obscure source / destination
host addresses while still allowing useful analysis of
relationships between hosts
Bloom Filters
Data structure allows for a space efficient set membership
tests with a configurable false positive rate
Vulnerable to iterative probe response attacks
These do not prevent probe attacks, as sufficient covert
channels remain
IPv6
Increases IP addresses from 32 bits to 128 bits
2^128 = 3.4 × 1038
Makes TCP/UDP scanning impractical
Effective countermeasure if widely adopted
However, accounts for a tiny fraction of the used
addresses and the traffic in the public Internet
Like egress filtering, widespread adoption is difficult to
achieve
Limiting Information
Restrict the information provided in public reports in some
way
Keep reports private
Eliminate all public access to reports
Limits the usefulness of the sensor network
Only publish most significant events
Still provides some useful information, but not a complete
depiction of Internet conditions
Attackers may be able to avoid detection by keeping their
levels of activity below thresholds
Query limiting
Require payment, computational tasks, or CAPTCHA to
perform query
Only slows mapping attacks
Delayed Reporting
Retain the reports for a specified period of time
before their release
For example, release last week's data
Forces attacker to either wait a long period
between iterations of attack or use a non-adaptive
algorithm
Delaying reports greatly reduces the usefulness of
the sensor network in providing real-time
notification of new attacks and other phenomena
Random Input Sampling
Use a random sample of the logs received in a time
interval order to generate reports
E.g., suppose an analysis center has a 80% likelihood of
discarding a log it receives
To reduce the false negative rate, the attacker would
have to send multiple probes
To reduce the false negative rate of 80% to 1%, an
attacker would need to a twenty-fold increase in
bandwidth
Conclusion – Paper Strengths
Presents a simple, clear, and feasible attack
Presents a good survey of existing and proposed
countermeasures to attacks on sensor anonymity
Well written
Conclusion – Paper Weaknesses
Countermeasures are not explored to same depth as
attack
No simulation-based examination of countermeasure
effectiveness
Simulation may be overly simplistic
Unfortunate real-world effects like packet loss might
have a fairly significant effect given that each step of
iterative algorithm builds on results of the last
Future Work
Non-adaptive algorithms
Mentioned in paper, do not require an iterative approach
Effective countermeasures
Combining random sampling, query limiting, and minimal
level of randomly delayed log reporting?
Is there simply some inevitable trade-off between utility
and anonymity?
How to quantify
Public-private approaches – significant events made public,
the rest requiring privileged access
An opposite, if idealistic, approach – make sensors so
pervasive that anonymity is no longer required