Mapping Internet Sensors with Probe Response

Download Report

Transcript Mapping Internet Sensors with Probe Response

By John Bethencourt, Jason Franklin, and Mary Vernon
Computer Sciences Department
University of Wisconsin, Madison
Published in the Proceedings of the 14th USENIX Security
Symposium
Presented by: Peter Matthews
Outline
 Internet Sensor Networks
 Probe Response Attack
 Case Study: Sans Internet Storm Center
 Simulation Results
 Generalizing the attack
 Countermeasures
 Related Work
 Conclusion
Internet Sensor Networks
 A collection of systems which monitor portions of the
Internet and produce statistics related to Internet
traffic patterns and anomalies.
 Log collection and analysis centers
 Collaborative intrusion detection systems
 Honeypots / honeynets
 Network telescopes / darknets
Internet Sensor Networks
 Internet sensors are useful for distributed intrusion
detection and monitoring such as:
 Quickly detecting worm outbreaks and new attacks
before a large number of vulnerable systems are
compromised
 Providing useful aggregation of the occurrence of
relatively rare events
 Determining the prevalence of malicious activity like
port scans, DoS attacks, and botnets
 Blacklisting hosts controlled by malicious users
Sensor Privacy is Critical
 The integrity of an Internet sensor network is based upon the
assumption that the IP addresses of systems that serve as sensors
are secret.
 If the addresses were known, information obtained by the
network could no longer be considered an accurate picture of
internet activity because attackers could
 Avoid the addresses in malicious activity like port scanning and
worm spreading, allowing it to go undetected
 Skew sensor statistics to hide malicious activity
 Flood the addresses with extraneous activity, causing valid
information to be lost in noise
 Damage is permanent
 Organizations cannot easily change the IP addresses available to
them
 Internet sensor networks cannot arbitrarily pick who will
participate
General Attack Idea
 Probe an IP address with activity that will be reported if the
address is monitored
 Wait for next report to be published, check for the activity,
and decide whether the address was monitored
 Repeat for every IP address
Making the Problem Tractable
 There are simply too many addresses to check sequentially
 ~2.1 billion valid, routable IP addresses
 Most logs only submitted to the ISC hourly
 So, check many in parallel
 Very small fraction of IP addresses are monitored, so send
same probe to many addresses


If no activity is reported then can rule out all of them
Else, report provides the number of monitored addresses
 Since activity reported by port, send probes with different
ports to run many independent tests at the same time
 Divide and conquer
 Partition the IP space into search intervals to manage this
parallelism
 Only one TCP packet necessary for each probe
Probe Response Algorithm
 The basic probe response algorithm operates in two
stages.
 Stage I -- Probe the entire routable IP space to count the
number of sensors in each search interval, Si. Drop
empty search intervals.
 Stage II -- Iteratively probe each remaining interval, Ri,
discarding empty intervals until all individual sensors
are located.
First Stage
 Begin with list of 2.1 billion valid IP addresses and n
low utilization ports
 Divide IP range into n search intervals S1, S2, …, Sn
 Send TCP SYN packet on port Pi to each address in Si
 Wait time interval and retrieve port report
 Rule out intervals corresponding to ports with no
activity
Packets
on port P1
S1
Packets
on port P2
Packets
on port P3
S2
S3
Packets
on port Pn
Sn
Second Stage
 Distribute ports among k remaining intervals R1, R2,
…, Rk, assigning n/k ports to each
 For each Ri
 Divide into n/k + 1 subintervals
 Send a probe on port Pj to each address in the jth
subinterval
 No need to probe last subinterval, as can infer number
of monitored addresses it contains from total for parent
interval
 Repeat second stage with non-empty subintervals until
all addresses are marked as monitored or unmonitored
P1
P2
P3
P4
Sample Run
Noise
Ports
Reports of
Activity
561
≤5
19,364
≤ 10
41,357
≤ 15
51,959
≤ 20
56,305
≤ 25
 What if other activity is
present in port reports?
 For a large number of ports,
there is a very low average level
of activity
 Use only ports that
consistently have less than k
reports per time interval
 Send k packets in each probe
 Divide the number of reports
by k and round down
Case Study:
 To evaluate the threat of probe response attacks,
paper analyzes the feasibility of mapping the SANS
Internet Storm Center
 One of the most important examples of systems
which collect, analyze, and report data from
Internet sensors
 A challenging network to map
 Over 680,000 IP addresses monitored
 These are broadly scattered across the IP
address space
ISC Sensors
 Currently, the ISC collects packet filter logs
 Logs primarily contain failed connection attempts
 More than 2,000 organizations and individuals participate
 Logs are typically uploaded on an hourly basis
ISC Reports
 The ISC publishes a number of
reports and statistics
 Attack uses “port reports”
 Lists the amount of activity on
each destination port
Simulation Scenarios
 T1 attacker 1.544 Mbps of upload bandwidth
 Fractional T3 attacker 38.4 Mbps of upload
bandwidth
 OC6 attacker 384 Mbps of upload bandwidth
 Monitored locations based on accurate, if obfuscated,
representation of the IP addresses monitored by the
ISC
Results
Bandwidth
Set of Addresses
Data Sent
Time to Map
OC6
ISC
1300 GB
2 days, 22 hours
Fractional T3
ISC
687 GB
4 days, 16 hours
T1
ISC
440 GB
33 days, 17 hours
Fractional T3
Average cluster
size ≥ 10
~600 GB
~2 days
Fractional T3
Totally Random
-- No Clustering
~860 GB
~9 days
Supersets and Subsets
 Superset of the monitored addresses
 E.g., only interested in simply avoiding detection
 If a T1 attacker allows .001 false negative rate


Runtime reduced from 33 days and 17 hrs to 15 days and 18 hrs
Misses 26 percent of the sensors
 Subset of the monitored addresses
 E.g., only interested in flooding the monitored
addresses with noise
 If a T3 attacker allows a .94 false positive rate


Runtime reduced from 112 to 78 hrs
Returns 3.5 million false positives
Key Simulation Results
 Probe response attacks are feasible and thus pose a real
threat
 Both a real set of monitored IP addresses and various
synthetic sets can be mapped in reasonable time
 Mapping is possible even with very limited resources

An attacker with only a DSL line could do it in ~4 months
 Time to complete only depends on upload bandwidth
 Does not require significant state or complete TCP
handshakes
 Thus, botnet utilization would not pose a problem
Generalizing the Attack
 In our attack, an attacker gains information by:
 Sending probes with different destination ports to
different ranges of IP addresses
 Noting for which ports activity is reported
 Using this information to determine the set of IP
addresses that could have received probes
 The destination port appearing in the probe sent out
and later in the port reports is used by the attacker as a
covert channel in a message to themselves
Covert Channels
 Many possible fields typically
appearing in ISN reports are
suitable for use by attackers as
covert channels in a message to
themselves
 Using one or more of these
covert channels, an attacker
can encode information about a
destination IP address in a
packet
Time / date
Source IP
Source network
Source port
Destination network
Destination port
Protocol (TCP or UDP)
Packet length
Captured payload signature
Countermeasures
 Existing approaches to hide sensitive report data
 Hashing, Encryption, and Permutations


Hash report fields such as source address to hide data but
allow test for matching
Prefix-preserving permutations obscure source / destination
host addresses while still allowing useful analysis of
relationships between hosts
 Bloom Filters
 Data structure allows for a space efficient set membership
tests with a configurable false positive rate
 Vulnerable to iterative probe response attacks
 These do not prevent probe attacks, as sufficient covert
channels remain
IPv6
 Increases IP addresses from 32 bits to 128 bits
 2^128 = 3.4 × 1038
 Makes TCP/UDP scanning impractical
 Effective countermeasure if widely adopted
 However, accounts for a tiny fraction of the used
addresses and the traffic in the public Internet
 Like egress filtering, widespread adoption is difficult to
achieve
Limiting Information
 Restrict the information provided in public reports in some
way
 Keep reports private
 Eliminate all public access to reports
 Limits the usefulness of the sensor network
 Only publish most significant events
 Still provides some useful information, but not a complete
depiction of Internet conditions
 Attackers may be able to avoid detection by keeping their
levels of activity below thresholds
 Query limiting
 Require payment, computational tasks, or CAPTCHA to
perform query
 Only slows mapping attacks
Delayed Reporting
 Retain the reports for a specified period of time
before their release
For example, release last week's data
 Forces attacker to either wait a long period
between iterations of attack or use a non-adaptive
algorithm
 Delaying reports greatly reduces the usefulness of
the sensor network in providing real-time
notification of new attacks and other phenomena

Random Input Sampling
 Use a random sample of the logs received in a time
interval order to generate reports
 E.g., suppose an analysis center has a 80% likelihood of
discarding a log it receives
 To reduce the false negative rate, the attacker would
have to send multiple probes
 To reduce the false negative rate of 80% to 1%, an
attacker would need to a twenty-fold increase in
bandwidth
Conclusion – Paper Strengths
 Presents a simple, clear, and feasible attack
 Presents a good survey of existing and proposed
countermeasures to attacks on sensor anonymity
 Well written
Conclusion – Paper Weaknesses
 Countermeasures are not explored to same depth as
attack
 No simulation-based examination of countermeasure
effectiveness
 Simulation may be overly simplistic
 Unfortunate real-world effects like packet loss might
have a fairly significant effect given that each step of
iterative algorithm builds on results of the last
Future Work
 Non-adaptive algorithms
 Mentioned in paper, do not require an iterative approach
 Effective countermeasures
 Combining random sampling, query limiting, and minimal
level of randomly delayed log reporting?
 Is there simply some inevitable trade-off between utility
and anonymity?
 How to quantify
 Public-private approaches – significant events made public,
the rest requiring privileged access
 An opposite, if idealistic, approach – make sensors so
pervasive that anonymity is no longer required