Cores - Systems and Networking - University of California, San Diego

Download Report

Transcript Cores - Systems and Networking - University of California, San Diego

Surviving Internet Catastrophes
Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith
Marzullo, and Geoffrey M. Voelker
Hot Topics in Operating Systems (HotOS’03)
USENIX Annual Technical Conference (USENIX’05)
University of California, San Diego - 2004
A typical day of an Internet worm…
 Host A runs the
Widows OS
 Host B runs runs
the Sux OS
I exploit a
vulnerability in the
Widows OS…
Shut
up!!
A
Data
…but not in the Sux
OS!
B
2
Outline




Introduction
System model
Host diversity
Searching for replica sets
 heuristics and simulations
 The Phoenix Recovery System
 Implementations
 Security issues
 Prototype evaluation
 Conclusions
3
Setting up the stage
 Past worm outbreaks
 Code red (2001): compromised over 359,000 hosts
 Nimda (2001): multiple forms of infection
 Slammer (2003): fastest worm in history (90% of vulnerable hosts
in 10 minutes)
 Witty (2004): first to contain malicious payload
 Coping with worms
 Containment is hard [Moore03]
 Not possible if human intervention required
 Automatic detection [Singh04]
 Problem: Network evasion
 Recover from catastrophes [HotOS03]
 Goal: minimize data loss
4
Defining the problem
 How are Internet pathogens successful?
 Shared vulnerabilities
 Vulnerability: design or implementation flaw in a software system
 Survivable data
 Replicate data
 Informed replication
 Replica sets based on shared vulnerabilities
 How do we identify sets of vulnerabilities?
 Common software systems
 Leverage Internet diversity
5
Challenges
 Understand the limitations
 Appropriate settings
 Quantify diversity
 Searching for replica sets
 Scalable
 Balance load
 Small replica sets
6
System model
 A set of hosts (H)
 A host fails by losing its state
 A set of attributes (A)
 Attribute = software system
 Operating systems + Applications
Attributes (Software systems)
{
,
,
{
,
,
{
,
}
 One operating system
 Applications
 A set of configurations
( C  2 A)
 Conf : H  C
{
,
Configurations
 Configuration
}
,
,
}
}
7
Cores
 A set S H is a core iff:  A'  A : a  A' : h  S : a  Conf (h)
 S is minimal
{
,
,
{
,
,
{
{
,
,
}
Configurations
 Ideally A’ = A
}
,
,
}
Cores
}
8
Host diversity
 Diversity: distribution of configurations
 Skewed: not uniform
 Study of the UCSD network
 nmap tool
 Port scans: detect open ports
 OS fingerprinting: guess OS out of error messages
 Total number of scanned devices: 11,963
 2,963 general-purpose hosts (port data + OS)
 Conservative assumptions
 Same open port = run the same service
 Ignore OS versions
9
Top 10 operating systems and services
OS
Service
Windows
54.1%
netbios-ssn
55.3%
Solaris
10.1%
epmap
50.4%
Mac OS X
10.0%
microsoft-ds
39.0%
Linux
10.0%
sshd
30.7%
Mac OS
6.9%
sunrpc
25.3%
FreeBSD
2.2%
active directory
24.8%
IRIX
2.0%
smtp
19.4%
HP-UX
1.1%
httpd
18.0%
BSD/OS
0.9%
ftpd
17.8%
Tru64 Unix
0.7%
printer
15.6%
10
Configuration distribution
 Distribution is skewed
 50% of hosts comprise:
 All: 20%
 Multiple: 15%
 Top 100: 8%
11
Visualizing diversity
 Qualitative view
 More diversity
across operating
systems
 Still a fair amount
of diversity for the
same OS
12
Searching for cores
 What is the practical problem?
 Determine replica sets
 Our approach: find cores
 Computing a core of optimal size is NP-complete
 Use heuristics
 Host as both client and server
 Client: request cores
 Server: participates in cores
 Core
 Host that requests it (original copy)
 Replicas
13
Basic idea
Attributes (Software systems)
Configuration
{
 
Configuration
{
,
,
Possible cores
,
}
,
Configuration
}
{
,
,
}
or
Configuration
{
,
,
}
14
Representing advertised configurations
 Container abstraction
 Containers (B)
 One for each operating system in A
 Each container b  B has a set SB(b) of sub-containers,
one for each non-OS attribute in A
 A host h advertises its configuration by associating itself
with every sub-container s  SB(b)
 b is the container for the OS of h
 s is the sub-container in SB(b) for some attribute of h
15
Container abstraction
{
,
,
{
,
,
}
}
{
,
,
{
,
,
}
}
16
Heuristics

Random



Ignore configurations
Choose randomly a number n of hosts from H
Uniform
I.
Different OS
1.
2.
3.
II.
Same OS (same b where h is placed)
1.
2.


Choose a container b randomly
Choose a sub-container sb randomly from b
Choose a host randomly from sb
Choose a sub-container sb randomly from b
Choose a host randomly from sb
Weighted: containers weighted by the number of hosts
Doubly-weighted: sub-containers also weighted
17
Simulations
 Population: 2,963 general-purpose hosts
 One run: Each host computes a core
 Questions
 How much replication is needed?
 How many other hosts a particular host has to service?
 How well chosen cores protect hosts?
 Metrics
 Average core size (core size)
 Core size averaged across all the hosts
 Maximum load (load)
 Maximum number of other hosts that any host services
 Average coverage (coverage)
 Coverage: percentage of attributes covered in a core
18
A sample run
 Random
 Better load balance
 Worse coverage
 Worse core size
 Load is too high for other
heuristics
 Proposed modification
 Limit the load of each host
 Intuition: force load balance
 Each host services at most L
other hosts
 L = load limit or simply limit
Core
size
Coverage Load
Random
5
0.977
12
Uniform
2.56
0.9997
284
Weighted
2.64
0.9995
84
DWeighted 2.58
0.9997
91
19
Core size
 Random increases
linearly with load
 Intrinsic to the
heuristic
 Other heuristics
 Core size less than 3
 For many hosts, one
single replica
20
Coverage
 Lower bound on limit: 2
 Dependent on the diversity
 Uniform: limit at least 3 to
achieve 3 nines coverage
 Weighted: achieves 3 nines
coverage for limit values at
least 2
 Random: core size at least
9 to achieve same coverage
21
Uncovered hosts
 Share of hosts that are
not fully covered is
small
 Uniform
 Limit 3: slightly over 1%
 Limit > 4: around 0.5%
 Weighted
 Around 0.5%
 Random
 Core size greater than 8
to achieve similar results
22
Load variance
 Downside of uniform
 Worst variance
 Variance is similar for
small values of limit
 Load limit forces
better distribution
23
Summary of simulation results
 How many replicas are needed?
 Around 1.3 on average
 How many other hosts a particular host has to service?
 Uniform: 3 for good coverage
 Weighted: 2 for good coverage
 How well chosen cores protect hosts?
 Uniform: coverage greater than 0.999, L  3
 Weighted: coverage greater than 0.999, L  2
 Uniform heuristic
 Simpler
 Weighted heuristics
 Better load balance
24
Translating to real pathogens
 Uniform, limit > 3, tolerates with high probability attacks to
a single attribute
 Previous worms
 One or more vulnerabilities on a single platform
 Our approach tolerates
 Attacks to vulnerabilities on the same software system, possibly
cross-platform
 Attacks to vulnerabilities on different software systems in the same
platform
 Attacks to vulnerabilities on different software systems,
cross-platform
 Extensible approach
25
Exploits on k attributes
 Illustrate with k=2
 A variant of uniform
1. Client c chooses a host h
with different OS
2. Find a core for c using
uniform
3. Find a core for h using
uniform
4. Combine the 2 cores to
form a 2-resilient core
L
2-cov
1-cov
Core size
5
0.76
0.86
4.18
6
0.86
0.92
4.58
7
0.95
0.99
5.00
8
0.97
1.00
5.11
9
0.98
1.00
5.16
10
0.98
1.00
5.17
26
The Phoenix Recovery System
 Backup data on cores
 Requirement: set of
operating systems and
applications is not known
 Macedon framework
 Pastry DHT
 Advertising configurations
 Container  Zone
 Sub-container  Sub-zone
 OS hint lists
 Empty zones
 Doesn’t need to be accurate
27
Protocol
Client
Server
Request
Reply
Server Client
Announce
Restore
request
Data
Restore
Announce
Backup
mode
Recovery
mode
28
Security in Phoenix
 Using security primitives
 Security goals
 Data privacy: no host other than the owner of the data can
obtain any partial information from the data stored on a server host
 Data integrity: any tampering of the backup data should be
detectable by the client host
 Data availability: if a client stores data in an honest server, then
it is eventually able to recover its data
 Two modes
 Basic: software libraries
 Enhanced: requires devices such as smartcards
 Cannot prevent servers from acting maliciously
 Proofs of operations
29
Prototype evaluation
 On PlanetLab
 Total number of hosts: 63
 62 PlanetLab hosts
 1 UCSD host
 Configurations manually set
 63 randomly chosen out of the 2,963
30
Evaluation results
L
Core size
Coverage
Load var.
Imp.
Sim. Imp. Sim. Imp. Sim.
3
2.12
2.22
1
1
1.65
1.94
5
2.10
2.23
1
1
2.88
2.72
7
2.10
2.22
1
1
4.44
3.33
 Imp. = implementation
 Sim. = simulation
 Simulated attack
 Parameters
 Backup file: 5MB
 L=3
 Interval between
announcements: 120s
 Target: Windows hosts (60%)
 Caused hosts to crash almost
simultaneously
 All hosts recovered
 For 35: avg 100s
 For 3: several minutes
(transient network failures)
31
Conclusions
 Informed replication
 Replica sets based on attributes
 Internet catastrophes: software systems
 Survivable data at a low replication cost
 Core size is less than 3 on average
 Hosts service at most 3 other hosts
 Diversity study
 Approach is realistic
 Side-effects of load limit scheme
 Upper bounds the amount of work any host has to accomplish
 Constrain damage in case of individual malicious behavior
32
Future work
 Real deployment
 Tune current prototype
 Security features
 Cope with real threats
 More data sets to determine diversity
 Mechanism to monitor resource usage
 Informed replication
 With other approaches for cooperative backup
 With other types of attributes
 E.g. Resource utilization
33
END
34