Transcript Slide 1
Building Defensive Architectures
Using Backdoors
Liviu Iftode
Department Of Computer Science
Rutgers University
([email protected])
2
Rebooting Not Always
Acceptable
Service.com
3
Traditional (Eager) Recovery
Approaches
• Checkpointing (process, virtual machine)
• Hot machine backups (state machine
replication, primary-backup)
• Limitation: time-consuming and costly
– Intrusive during failure-free execution
– Require dedicated machines
– Require stable storage available after the
failure
4
Lazy State Recovery?
• OS and application state still in system
memory
Operating System
CPU
Mem
NIC Disk TTY
…
• Can we recover or repair the state lazily
(after the failure occurred)?
5
How to Access the State?
CPU and OS resources not available
6
The World Today
Internet
Attacks
Failure
10:00pm EST
3:00am GMT
8:30am IST
• Computer maintenance requires human intervention: slow and expensive
• Emails and phone calls do not scale
• Worse at planetary scale: different time zones and language barriers
7
Our Vision : Defensive Architectures
Internet
Attacks
Failure
10:00pm EST
3:00am GMT
8:30 AM IST
• Computer Systems perform self-defensive management cooperatively
• Access to memory possible even in presence of failures and attacks
• Operating System not involved
8
Outline
• Motivation
• Backdoor Architecture
– Components
– Remote Repair using Backdoors
– Lazy State Recovery using Backdoors
• Defensive Architectures using Backdoors
• Future Work and Conclusions
9
Backdoor
Backdoor:
a hidden software or hardware mechanism, usually created for
testing and troubleshooting
--American National Standard for Telecommunications
10
The Backdoor Idea
“Front door”
CPU
Mem
NIC
Backdoor
Management Infrastructure
• Backdoor provides an alternative access to system resources
– A programmable I/O device for physical platforms
– A virtual machine for virtual platforms over a virtual machine monitor (VMM)
• Backdoors can be connected over a private network, specialized
interconnect, or even a cellular link
11
Backdoor Architecture
Monitor System
Target System
CPU
CPU
BD
Design Principles
•
– Failure of target OS must not
impair BD
BD
Mem
Mem
I/O
I/O
•
Hardware
– Programmable device with
processor and memory
•
Software
– OS extensions for remote healing
– Firmware extensions for BD
programming
Nonintrusiveness
– BD operations must not involve
processors of the target system
•
•
Availability
Access control
– monitor and target systems
negotiate access permissions
•
Integrity
– target system cannot modify the
result of a BD operation
•
Responsiveness
– target system cannot block BD
operation
12
Backdoor Implementation
Target
Memory
MONITOR
(Remote-R)
Monitor
Memory
CPU
CPU
RECOVER/REPAIR
(Remote-R/W)
BD
BD
NIC CPU
13
Nonintrusive Remote Healing
• Three components
– Detection, Diagnosis, and Action
• Performed nonintrusively from a remote
machine
– Zero-cycle monitoring and failure detection for target
system
– Remote extraction of useful state from a “hung”
system for diagnosis and recovery
– In-place repair of OS state of a “damaged” system
14
Backdoor Software Architecture
• Monitoring and Failure Detection
– Sensor Box: system health indicators (sensors) provided by the
target OS in its memory
– Sensor: <UniqueID, Update Deadline , Value>
• Repair of damaged OS State
– Externalized State: OS state data that the BD can read
– Remote Access Hooks: OS control data that the BD can write to
perform repairing actions
• Recovery of light-weight state
– Continuation Box: fine-grain OS and application state that the BD
can transfer between systems to migrate running applications
15
Outline
• Motivation
• Backdoor Architecture
– Components
– Remote Repair using Backdoors
– Lazy State Recovery using Backdoors
• Defensive Architectures using Backdoors
• Future Work and Conclusions
16
Failure Model
• Computer system freeze
– OS bug: hang, crash, deadlock, etc.
– OS damage: resource exhaustion, DoS attack
– hardware: peripheral device stops responding
• Fail-stop (no erratic behavior)
– memory not wiped out during failure
17
Monitoring and Failure Detection
• Target OS updates progress sensors in Sensor Box
• Monitor BD reads Sensor Box periodically, checks
counters
– Failure = counter stalled beyond its deadline
Backdoor
Sensor Box
Target
OS
<Timer interrupts>
<Context switches>
<NIC interrupts>
Monitor
…
18
Diagnosis and Repair
• Diagnosis
– Inspect live OS data structures in target’s memory
through( the externalized state)
– Identify damaged OS state e.g. resource exhaustion
due to memory hogging processes
• Repair
– Modify target OS memory (remote access hooks) to
correct damaged state (e.g. remove memory hogging
processes by “injecting” a kill signal in its process
control block)
19
Backdoor Prototype
• Myrinet LanaiX NIC
• Modified firmware and low level GM library
• Modified FreeBSD 4.8 kernel
– Sensor Box
– Externalized State and Remote Memory Hooks
• Two resource exhaustion case studies
– Memory exhaustion
– CPU starvation
• Experimental setup
– Dell Poweredge 2600 servers with 2.4 GHz dual Intel Xeon, 1GB
RAM, 2GB swap, Myrinet Lanai X NIC
– Benchmark: a simple CPU-bound application
20
Effectiveness of Remote Repair
20
Execution time (s)
Impaired system
With remote repair
15
10
5
0
0
2
4
6
8
10
12
14
16
Number of memory hog processes
21
Repair Timeline
Memory pressure
Remote
Repair
Local cleanup of damaged state
Detection
Diagnosis & Repair
End of repair
0
0.5
1
1.5
2
2.5
3
Time (s)
22
Outline
• Motivation
• Backdoor Architecture
– Components
– Remote Repair using Backdoors
– Lazy State Recovery using Backdoors
• Defensive Architectures using Backdoors
• Future Work and Conclusions
23
Internet Services and Servers
C1
C2
Internet
servers
Server 1
24
Internet Services and Servers
C1
Internet
servers
C2
service
Server 1
Server 2
25
Service Continuation (SC)
C1
Streaming
Server 1
GET “96.3FM”
C2
SC2 = {“96.3FM”, “2nd song”}
Streaming
Server 2
26
Service Continuation (SC)
C1
Streaming
Server 1
C2
Streaming
Server 2
SC2 = {“96.3FM”, “2nd song”}
27
Service Continuation (SC)
C1
Streaming
Server 1
C2
Streaming
Server 2
SC2 = {“96.3FM”, “2nd song”}
28
Continuation Box (CB)
• Idea
– extract “essential” state
– pass it to similar application on a healthy machine
• CB encapsulates fine-grained server state
associated with a client session
– OS data, e.g., data in transit through IPC channels
– Application data
• Application may need to cooperate with the
OS!
29
Lazy Extraction of Continuation Box
Continuation
Box
Recovered
State
CPU
OS
Memory
BD
Victim machine
(crashed)
Memory
BD
Recovery machine
(healthy)
30
Recovery: What and How?
Victim machine
Recovery machine
1
recv
recv
2
Backdoor
3
recv
CB = ???
Time
31
Solution: Continuation Box API
Victim machine
1
Recovery machine
recv
2
Backdoor
recv
3 export()
recv
log
import()
CB
Time
3
recv
32
Service Continuation Structure
Front-end server
process
Client 1
TCP/IP
Back-end server
process
SC_APP
SC_IO
IPC
SC1
Client 2
SC2
33
Service Continuation API
•
•
•
•
•
export SC_APP
import SC_APP
create_sc for a client session
associate I/O channel with the SC
open_sc from an I/O channel
34
A Server with Service Continuation
while (cid = accept()) {
scid = create_sc(cid)
if (import(scid, &{file_name, offset}) == NULL) {
receive(cid, file_name)
offset = 0
}
fd=open(file_name)
seek(fd, offset)
while (read(fd, block, size) != EOF) {
send(cid, block, size)
offset += size
export(scid, {file_name, offset})
}
}
35
Case Study: Multi-tier Internet
Service
Front-End (FE)
Apache web server
Middle Tier (MT)
JBoss app. server
Back-End
MySQL DB server
36
Recoverable Service
37
Recoverable Service
• Experimental setup
– Dell PowerEdge 2600 servers, 2.4 GHz dual
Intel Xeon, 1GB RAM, 1Gb Ethernet
– Workload modeled after TPC-W
• Fault injection in FE and MT nodes
– synthetic freeze, emulated freeze by remote
OS locking, bugs inserted in network drivers
• Evaluation
– Low cost, low overhead under load
– Fast recovery
38
Low Cost
• Monitoring
– < 1% CPU @ 100 ms sampling period (100
sensors)
• Continuation Box
– API export/import < 30 us
– Extraction 358 us for 10 KB CB
39
Low Overhead under Load
8,000
Base
Recoverable FE
7,000
Recoverable FE+MT
Requests/min
6,000
5,000
4,000
3,000
2,000
1,000
0
20
100
300
500
Clients
700
900
1,100
40
Recovery Timeline
Failure
Recovery latency
Detection Latency
Detection
Import CB
Recovery done
0
5
10
15
20
25
30
Time (ms)
41
Outline
• Motivation
• Backdoor Architecture
– Components
– Remote repair using Backdoors
– Lazy State Recovery using Backdoors
• Defensive Architectures using Backdoors
• Future work and Conclusions
42
Defensive Architectures Using
Backdoors
• Autonomous Backdoors
– BDs are programmed to execute defensive activities during
bootstrap, then “sealed”
– Tamper-resistant during normal execution: OS cannot alter or stop
BD execution
– BDs communicate among themselves to execute certain defensive
activities cooperatively
• Hierarchical Defensive Architectures
– Defensive Computer Architecture (DCA) : Single computer system
equipped with BD
– Defensive Network Architecture (DNA) : Cluster nodes equipped
with BD connected over high-speed private network
– Defensive Inter-Network Architecture (DINA) : Loosely coupled
DNA clusters over a Wide Area Network
43
Applications of Defensive Architectures
• Smart Watchdog for DCA
– Continuously monitor the system memory
– Identify and enforce OS invariants in the host memory or the I/O
system
– Search for virus/worm signatures
• Continuous Remote Logging and Integrity Verification
over DNA
– Continuously retrieve logged data from system memory
– Send it to another node in the DNA
– Cooperative OS integrity verification
• Defensive News Agency over DINA
– A global secure information network
– Critical system controllers (routers, GRID control nodes,
PlanetLab peers, etc.) subscribe to it
– Multiple DNAs publish information to the system
– System propagates information of interest about the Internet,
individual networks, or hosts
44
Future Work
• Virtual Backdoor
– Realization of the BD for virtual environments e.g. VMMs and PlanetLab
– Enables planetary-scale system monitoring and management
• Orion : Holistic Approach to System Monitoring
– Continuous non-intrusive physical memory inspection over BD
– Identify memory modification patterns and correlate them to predict
unstable system states
• BD Language
– BD can execute basic building blocks of defensive actions
– Express complex defensive actions using the basic building blocks
• Security
– Prevent malicious users from using BDs to perform remote attacks
– Authenticate and verify actions before performing them
• BD over the Phone
– Use cellular link to access the BD for system management operations
45
People Behind Backdoors
•
•
•
•
•
•
•
•
Florin Sultan
Aniruddha Bohra
Pascal Gallard (INRIA/IRISA, France)
Iulian Neamtiu (University of Maryland)
Stephen Smaldone
Yufei Pan
Arati Baliga
Tzvika Chumash
46