NSF_scenarios

Download Report

Transcript NSF_scenarios

RADS Conceptual Architecture
Programming
Abstractions
For Roll-back
(Necula
Crash-Only
Middleware &
Servers,
System O&C
Infrastructur
e (Fox)
Protocols Enabling
Fast Detection &
Route Recovery,
Network O&C
Infrastructure
(Katz, Stoica)
User
Operator
Prototype Applications:
E-voting, Messaging,
E-Mail, etc.
Client
Server
SLT Services
Distributed
Middleware
SLT Services
Distributed
Middleware
PNE Edge
Network
ApplicationSpecific
Overlay Network
EdgePNE
Network
Router
Router
Commodity
Internet & IP networks
Benchmarks,
Tools for
Human
Operators
(Patterson)
Online
Statistical
Learning
Algorithms
(Jordan)
• Reduction to practice of online SLT and
observe/analyze/act infrastructure
•Reusable embeddable components
1
Apps and Science
•
•
•
•
Messaging (Randy’s scenario)
Voting systems
Online medical records system
“Volunteer coordination” for disaster response
2
What are “SLT Services”?
• “SLT clients” are client or server apps, middleware or
OS layer, machine hardware, programmable network
elements, ...
• Monitoring hooks for SLT clients
• Control hooks for SLT clients
• Database(s) for aggregating SLT client data
• Plug-ins for online and offline analysis
3
Macroscopic behaviors
• Application diversity
• “Fail over” to another whole infrastructure
– Completely separate app architecture (client, server,
middleware,
– Free: provisioning across different services (eg messaging)
• Use VM/appliance based migration for the
servers
4
Reflections from 9/11 (from
Douglas Yoshida, MD, Bellevue
Hosp & NYU Med Ctr)
• In a crisis, patients needing medical attention
brought to closest hospital, not most
appropriate hospital (absent better
information)
• Baseline EMS comms in NYC: no direct contact
between ED’s and ambulances; sometimes
doctors would scramble to “clear out” ER’s,
then wait for hours for patients to arrive
• Cell phone and landline failure impeded
communication between hospitals
– “Needed separate inter-hospital radio comms with direct
link to onsite command center”
5
More reflections
• Families flooding hospitals trying to find out
about their loved ones
– No other way to get the info out
– Creates potential security nightmare for hospital (“If
terrorists had wanted to attack hospitals, it would have
been easy”)
• Lack of info leads to frustration and “disaster
voyeurism”
– Med students and attendings flocked down to Ground Zero
because they were frustrated at not being able to help
w/in their own hospital
– Too many doctors around each stretcher; poor
allocation/distribution of resources
6
Multiple communication channels
• Closed: inter-hospital
• Semi-closed: hospital/command
site/firefighters etc
• Open/unidirectional: communication to public
about condition of victims (can be largely
unidirectional)
• Open/bidirectional: volunteer coordination
7
Server app
Fault
injection
OS
Fault
injection
Overlay/PNE’s
Internet
PNE
OS
Mdlware
monitors monitors monitors
Middleware
App-specific
monitors
Fault
injection
Client app
External
monitors
Fault
injection
Recovery &
policy DB
Anomaly Novelty Clusterdet
det
ing
Results fusion
Policy selection
8
From
JBoss
to JAGR
Stalls user
requests

during recovery
J2EE Application
Stall
Proxy
Servlet/JSP
Container
Http Server
Client
Requests
Builds
fault
EJB
propagation
map, based on observed
EJB
failures
Restart
single EJBs,
EJB
redeploy apps, or restart
whole app-server
EJB
EJB
EJB
Persistence tier
Application Server (JBoss)
External
Monitors
Fault
Injector
detects app-specific,
end-to-end failures in requests
(also app-generic using
character histograms)
Recovery
Agent
Internal
Monitors
Recovery
Map
E2EMon:
Before
deployment, use
controlled faults to build
Recovery Map
9
From
JBoss
to JAGR
Stalls user
requests

during recovery
J2EE Application
Stall
Proxy
Servlet/JSP
Container
Http Server
Client
Requests
Builds
fault
EJB
propagation
map, based on observed
EJB
failures
Restart
single EJBs,
EJB
redeploy apps, or restart
whole app-server
EJB
ExcMon:
detects Java
exceptions in the
application & app server
EJB
EJB
PPMon:
Persistence tier
detects
“anomalous” behaviors
Application Server (JBoss)
External
Monitors
Fault
Injector
detects app-specific,
end-to-end failures in requests
(also app-generic using
character histograms)
Recovery
Agent
Internal
Monitors
Recovery
Map
E2EMon:
Before
deployment, use
controlled faults to build
Recovery Map
10