Cores - Systems and Networking - University of California, San Diego
Download
Report
Transcript Cores - Systems and Networking - University of California, San Diego
Surviving Internet Catastrophes
Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith
Marzullo, and Geoffrey M. Voelker
Hot Topics in Operating Systems (HotOS’03)
USENIX Annual Technical Conference (USENIX’05)
University of California, San Diego - 2004
A typical day of an Internet worm…
Host A runs the
Widows OS
Host B runs runs
the Sux OS
I exploit a
vulnerability in the
Widows OS…
Shut
up!!
A
Data
…but not in the Sux
OS!
B
2
Outline
Introduction
System model
Host diversity
Searching for replica sets
heuristics and simulations
The Phoenix Recovery System
Implementations
Security issues
Prototype evaluation
Conclusions
3
Setting up the stage
Past worm outbreaks
Code red (2001): compromised over 359,000 hosts
Nimda (2001): multiple forms of infection
Slammer (2003): fastest worm in history (90% of vulnerable hosts
in 10 minutes)
Witty (2004): first to contain malicious payload
Coping with worms
Containment is hard [Moore03]
Not possible if human intervention required
Automatic detection [Singh04]
Problem: Network evasion
Recover from catastrophes [HotOS03]
Goal: minimize data loss
4
Defining the problem
How are Internet pathogens successful?
Shared vulnerabilities
Vulnerability: design or implementation flaw in a software system
Survivable data
Replicate data
Informed replication
Replica sets based on shared vulnerabilities
How do we identify sets of vulnerabilities?
Common software systems
Leverage Internet diversity
5
Challenges
Understand the limitations
Appropriate settings
Quantify diversity
Searching for replica sets
Scalable
Balance load
Small replica sets
6
System model
A set of hosts (H)
A host fails by losing its state
A set of attributes (A)
Attribute = software system
Operating systems + Applications
Attributes (Software systems)
{
,
,
{
,
,
{
,
}
One operating system
Applications
A set of configurations
( C 2 A)
Conf : H C
{
,
Configurations
Configuration
}
,
,
}
}
7
Cores
A set S H is a core iff: A' A : a A' : h S : a Conf (h)
S is minimal
{
,
,
{
,
,
{
{
,
,
}
Configurations
Ideally A’ = A
}
,
,
}
Cores
}
8
Host diversity
Diversity: distribution of configurations
Skewed: not uniform
Study of the UCSD network
nmap tool
Port scans: detect open ports
OS fingerprinting: guess OS out of error messages
Total number of scanned devices: 11,963
2,963 general-purpose hosts (port data + OS)
Conservative assumptions
Same open port = run the same service
Ignore OS versions
9
Top 10 operating systems and services
OS
Service
Windows
54.1%
netbios-ssn
55.3%
Solaris
10.1%
epmap
50.4%
Mac OS X
10.0%
microsoft-ds
39.0%
Linux
10.0%
sshd
30.7%
Mac OS
6.9%
sunrpc
25.3%
FreeBSD
2.2%
active directory
24.8%
IRIX
2.0%
smtp
19.4%
HP-UX
1.1%
httpd
18.0%
BSD/OS
0.9%
ftpd
17.8%
Tru64 Unix
0.7%
printer
15.6%
10
Configuration distribution
Distribution is skewed
50% of hosts comprise:
All: 20%
Multiple: 15%
Top 100: 8%
11
Visualizing diversity
Qualitative view
More diversity
across operating
systems
Still a fair amount
of diversity for the
same OS
12
Searching for cores
What is the practical problem?
Determine replica sets
Our approach: find cores
Computing a core of optimal size is NP-complete
Use heuristics
Host as both client and server
Client: request cores
Server: participates in cores
Core
Host that requests it (original copy)
Replicas
13
Basic idea
Attributes (Software systems)
Configuration
{
Configuration
{
,
,
Possible cores
,
}
,
Configuration
}
{
,
,
}
or
Configuration
{
,
,
}
14
Representing advertised configurations
Container abstraction
Containers (B)
One for each operating system in A
Each container b B has a set SB(b) of sub-containers,
one for each non-OS attribute in A
A host h advertises its configuration by associating itself
with every sub-container s SB(b)
b is the container for the OS of h
s is the sub-container in SB(b) for some attribute of h
15
Container abstraction
{
,
,
{
,
,
}
}
{
,
,
{
,
,
}
}
16
Heuristics
Random
Ignore configurations
Choose randomly a number n of hosts from H
Uniform
I.
Different OS
1.
2.
3.
II.
Same OS (same b where h is placed)
1.
2.
Choose a container b randomly
Choose a sub-container sb randomly from b
Choose a host randomly from sb
Choose a sub-container sb randomly from b
Choose a host randomly from sb
Weighted: containers weighted by the number of hosts
Doubly-weighted: sub-containers also weighted
17
Simulations
Population: 2,963 general-purpose hosts
One run: Each host computes a core
Questions
How much replication is needed?
How many other hosts a particular host has to service?
How well chosen cores protect hosts?
Metrics
Average core size (core size)
Core size averaged across all the hosts
Maximum load (load)
Maximum number of other hosts that any host services
Average coverage (coverage)
Coverage: percentage of attributes covered in a core
18
A sample run
Random
Better load balance
Worse coverage
Worse core size
Load is too high for other
heuristics
Proposed modification
Limit the load of each host
Intuition: force load balance
Each host services at most L
other hosts
L = load limit or simply limit
Core
size
Coverage Load
Random
5
0.977
12
Uniform
2.56
0.9997
284
Weighted
2.64
0.9995
84
DWeighted 2.58
0.9997
91
19
Core size
Random increases
linearly with load
Intrinsic to the
heuristic
Other heuristics
Core size less than 3
For many hosts, one
single replica
20
Coverage
Lower bound on limit: 2
Dependent on the diversity
Uniform: limit at least 3 to
achieve 3 nines coverage
Weighted: achieves 3 nines
coverage for limit values at
least 2
Random: core size at least
9 to achieve same coverage
21
Uncovered hosts
Share of hosts that are
not fully covered is
small
Uniform
Limit 3: slightly over 1%
Limit > 4: around 0.5%
Weighted
Around 0.5%
Random
Core size greater than 8
to achieve similar results
22
Load variance
Downside of uniform
Worst variance
Variance is similar for
small values of limit
Load limit forces
better distribution
23
Summary of simulation results
How many replicas are needed?
Around 1.3 on average
How many other hosts a particular host has to service?
Uniform: 3 for good coverage
Weighted: 2 for good coverage
How well chosen cores protect hosts?
Uniform: coverage greater than 0.999, L 3
Weighted: coverage greater than 0.999, L 2
Uniform heuristic
Simpler
Weighted heuristics
Better load balance
24
Translating to real pathogens
Uniform, limit > 3, tolerates with high probability attacks to
a single attribute
Previous worms
One or more vulnerabilities on a single platform
Our approach tolerates
Attacks to vulnerabilities on the same software system, possibly
cross-platform
Attacks to vulnerabilities on different software systems in the same
platform
Attacks to vulnerabilities on different software systems,
cross-platform
Extensible approach
25
Exploits on k attributes
Illustrate with k=2
A variant of uniform
1. Client c chooses a host h
with different OS
2. Find a core for c using
uniform
3. Find a core for h using
uniform
4. Combine the 2 cores to
form a 2-resilient core
L
2-cov
1-cov
Core size
5
0.76
0.86
4.18
6
0.86
0.92
4.58
7
0.95
0.99
5.00
8
0.97
1.00
5.11
9
0.98
1.00
5.16
10
0.98
1.00
5.17
26
The Phoenix Recovery System
Backup data on cores
Requirement: set of
operating systems and
applications is not known
Macedon framework
Pastry DHT
Advertising configurations
Container Zone
Sub-container Sub-zone
OS hint lists
Empty zones
Doesn’t need to be accurate
27
Protocol
Client
Server
Request
Reply
Server Client
Announce
Restore
request
Data
Restore
Announce
Backup
mode
Recovery
mode
28
Security in Phoenix
Using security primitives
Security goals
Data privacy: no host other than the owner of the data can
obtain any partial information from the data stored on a server host
Data integrity: any tampering of the backup data should be
detectable by the client host
Data availability: if a client stores data in an honest server, then
it is eventually able to recover its data
Two modes
Basic: software libraries
Enhanced: requires devices such as smartcards
Cannot prevent servers from acting maliciously
Proofs of operations
29
Prototype evaluation
On PlanetLab
Total number of hosts: 63
62 PlanetLab hosts
1 UCSD host
Configurations manually set
63 randomly chosen out of the 2,963
30
Evaluation results
L
Core size
Coverage
Load var.
Imp.
Sim. Imp. Sim. Imp. Sim.
3
2.12
2.22
1
1
1.65
1.94
5
2.10
2.23
1
1
2.88
2.72
7
2.10
2.22
1
1
4.44
3.33
Imp. = implementation
Sim. = simulation
Simulated attack
Parameters
Backup file: 5MB
L=3
Interval between
announcements: 120s
Target: Windows hosts (60%)
Caused hosts to crash almost
simultaneously
All hosts recovered
For 35: avg 100s
For 3: several minutes
(transient network failures)
31
Conclusions
Informed replication
Replica sets based on attributes
Internet catastrophes: software systems
Survivable data at a low replication cost
Core size is less than 3 on average
Hosts service at most 3 other hosts
Diversity study
Approach is realistic
Side-effects of load limit scheme
Upper bounds the amount of work any host has to accomplish
Constrain damage in case of individual malicious behavior
32
Future work
Real deployment
Tune current prototype
Security features
Cope with real threats
More data sets to determine diversity
Mechanism to monitor resource usage
Informed replication
With other approaches for cooperative backup
With other types of attributes
E.g. Resource utilization
33
END
34