External Advisory Board Annual Meeting January 13, 2006

Download Report

Transcript External Advisory Board Annual Meeting January 13, 2006

System Management in
Challenged Networks
CENS Seminar – November 17th, 2006
Martin Lukac *
Lewis Girod * †
Deborah Estrin *
* UCLA CENS - † MIT CSAIL
Outline
• Meso American Subduction Experiment
(MASE): A Challenged Network
• Data Delivery
• System Management
• The Future
Seismic Deployment Application Requirements
50 standalone Caltech sites
62 wirelessly connected UCLA sites
• Extensive: 500 Km from Acapulco through Mexico
City to Tampico
• Dense: 1 sensor every 5-10 Km
• High bandwidth: Data acquisition rate: 3 - 24 bit
channels at 100Hz each
• Online and Reliable: Semi real-time (on the order
of days), reliable data delivery to UCLA for
analysis
• Online system management
– Query state, change configuration, update binaries
– Can not interfere with data delivery
• Application driven topology: application
determines sensor placement
– Infrastructure does not (Can’t rely on pre-existing cell
or power infrastructure)
MASE: Given these
requirements, we
deployed solar powered
seismic stations equipped
with 802.11b
%18 - A
MASE 13 Node
%152 - B
%69 - C
%77 - D
•%107
Network
- E topology does not reflect the
mostly
%42 - Flinear physical topology
%81 - G
•%202
Routing
- Hand other services can not use
physical
%76 - I topology
%106 - J
%95 - K
%53 - L
%157 - M
Cuernavaca Line
Data paths
B
F
G
D
C
E
M
I
J
L
H
K
A
A – sink
Direct inet
connection
How challenged is the MASE network?
• Frequent unpredictable
disconnections
– Rainy season: sites flood (some 24x7),
trees grow
– Wind: misaligned antennas
– Equipment malfunction: amps burn,
voltage regulators break
• Poor and unstable links
– Connectivity secondary concern for site
selection
– Stretched links highly susceptible to
weather and environment
• Human effort is a critical resource
– Installation, maintenance, protection
Networking support needed for both
data acquisition and system management
• Data delivery – Bandwidth driven
– Bandwidth: 20-40 of MB per day per station
– Latency: get the data eventually, but reliably
– Many to one routing
• System Management – Latency driven
– Bandwidth: usually less than 10’s of KB’s
– Latency: as fast as possible
– One to all routing and back
Well-known limitations of existing techniques
• Data delivery and system management
techniques designed for wired or always-onwireless do not work well
– Typical tools use TCP to create and maintain an end to
end session to deliver a stream of data over multiple
hops
– These are “online applications” which expect reliable
links with low latencies
• Patterns of poor links, disconnections, and
disruptions
– Difficult to obtain and maintain end-to-end connections
– Intermittent end-to-end connections insufficient to
achieve necessary bandwidth and latency
Our Contributions
• Real world application and deployment of
Delay Tolerant Networking (DTN) techniques
for data delivery
• Disruption Tolerant Shell (DTS): a tool for
system management on challenged networks
that performs better than traditional tools
Summary
• MASE: A Challenged Network
–Poor and erratic links
–Frequent unpredictable disruptions
• Data Delivery
• System Management
• The Future
Data Delivery using DTN Techniques
• Buffer data into hour long bundles (1-3 MB)
• Deliberate one hop bundle transfer
• Path to sink determined by best ETX
• Improvement over end-to-end
– Not affected by path disconnections
– Keeps retrying on single link instead of full path
– Continual ‘progress’ being made towards sink
– More efficient use of bandwidth in face of
disconnections and bottlenecks
A
X
X
B
X
X
C
F
end-to-end
hop-by-hop
Upcoming Features
• Currently piggyback data movement log
with actual data
– No global time stamping of log events
• Want coarse grained global time (one
second)
– Will be able to recreate ‘movie’ of file
movement for entire network
– Can help spot network problems and
bottlenecks
• Upload data to SensorBase.org
– Makes it easy to visualize and browse data
collection status
– RSS feed can provide access to anyone who
wants to monitor problems or generic status of
network
Data Acknowledgement
• Nodes keep their own bundles until ACK’ed by sink
– Many ways of doing ACK’s
• First try for ACK implementation worked
– Push bundle ID into StateSync (disseminates information to
all the nodes in the network)
– But… usage model not quite right… too many entires, too
much churn for StateSync (can explain better later)
• Second try
– Use ‘file dissemination’ feature of DTS to distribute ACK list
once a day
– Use DTS to remove list once we know all nodes have file
Summary
• MASE: A Challenged Network
– Poor and erratic links
– Frequent unpredictable disruptions
• DTN Style Data Delivery
– Resilient to path disconnections
– Efficient use of bandwidth
• System Management
• The Future
System Management
• Existing management tool:
remote shell (ssh)
• Modified management tool:
Disruption Tolerant Shell
– Asynchronous remote shell to all
nodes in network simultaneously
– Provides node management
capabilities when end-to-end
connections are unavailable or fail
– Ensures that commands will
succeed: as long as there is
eventually a connection between a
node and any other node that
already has the command
df –h
ls /opt/dts/file_mover | wc
A
E
B
C
F
Commands
Responses
D
Extra Fun Features of DTS
• Guaranteed in order execution from
source node
• Reboot and crash safe
• Implicit feed back on nodes and
links: spot bottlenecks, dead nodes
• Execute a command on individual
nodes
• Push a file to all nodes
– Distribute new script or component
Upcoming Features
• Web interface
– Command line interface is nice
for me
• Takes a bit of getting used to
– Web interface more intuitive for
asynchronous model
• Constant feeds of frequently
executed commands
– Disk space, file counts,
q330/gurlap status, link quality
• SensorBase.org
– Accountability log: load all
commands and responses and
metadata for those
– DTS analysis and implicit network
feedback: just point and click
Reliable State Synchronization
A
• StateSync: reliable and efficient publishsubscribe mechanism
PUBLISH
Commands
Responses
• Implements a broadcast dissemination protocol
SYNCHRONIZE
– Published data is scoped
– DTS publishes commands and responses one hop
B
• Works well for applications that require:
– Reliable delivery
– Have a few Kbytes of data to share
– Data has lifetime that is long compared to system
latency requirements
– Suitable for DTN since it does not use end-to-end
connections
Commands
PUBLISH
Responses
SYNCHRONIZE
PUBLISH
C
Commands
Responses
DTS latency results
• Compare latency of DTS to
parallel ssh
• DTS is faster 90% of the time,
comparable to the rest
• DTS reaches 100% of nodes
– ssh requires retries from the source
node
• Latency can vary by day, but
DTS always faster or
comparable to ssh
What makes DTS better than ssh?
• StateSync data model: tables of key value
pairs
– DTS has a command table and response
table
• Each node republishes a command and
response tables one hop
A
Cmd A-1
Resp A-1-A
Resp A-1-B
Resp A-1-C
• Logging mechanism
– Do not republish whole table
– Only send changes to tables: small amount of
information
– More efficient use of bandwidth in face of
disconnections
• Retransmission protocol
– Keeps retrying on individual links
– Not affected by path disconnections
– No overhead of creating and maintaining endto-end connection
B
Cmd A-1
Resp A-1-A
Resp A-1-B
Resp A-1-C
Future of StateSync
• StateSync allows data to be published N hops
– When publish N hops, not end to end but expect data path (the flow) to
be maintained with refresh beacons
– If refreshes from source or node in flow stop, statesync will not
propagate information
– Not idea for frequent disconnections
• DTS publishes data one hop
– Gets around problem by republishing another nodes data as its own
– Statesync only publishes one hop
• Tweaks
– Allow flows to be propagated even when no refresh from source or
node along data path
– Tunable latency parameters
– Report metrics about itself
• DTS can then publish data N hops
– Lowers RAM usage, lowers number of packets
Site Installation
Mexico Xyoli Pérez-Campos, Mario Islas Herrera, Oscar Martínez Susano, Jorge Soto, Aida
Quezada Reyes, Arturo Iglesias, Lizbeth Espejo, Luis Antonio Placencia Gómez, Luis Edgar
Rodriguez, Fernando Greene
USA Paul Davis, Allen Husker, Igor Stubailo, Richard Guy, Sam Irving, Martin Lukac,
Alma Quezada, Steve Skinner, Irving Flores
Our Contributions
• Real world application and deployment of
Delay Tolerant Networking (DTN) techniques
for data delivery
• Disruption Tolerant Shell (DTS): a tool for
system management on challenged networks
that performs better than traditional tools
Summary
• MASE: A Challenged Network
– Poor and erratic links
– Frequent unpredictable disruptions
• DTN Style Data Delivery
– Resilient to path disconnections
– Efficient use of bandwidth
• System Management
– DTS viable tool for system management for
challenged networks
• The Future
Whats Next?
• Have a tool that works
– Understand conceptually why it works better
• We have a high level analysis: per link
bandwidth
• Network is being pulled out in Feburary
Work in Progress
• Need better network characterization
– Long-Distance 802.11b Links: Performance Measurements
Experience,
Chebrolu,
B.(RSS)
Raman,
Sen source-destination
– ITT
Vinayakand
analyzed
receivedK.signal
strength
for aS.single
Kanpur,
2006
pair in the
UNAMMobicom
line.
– Use their driver to collect per packet: received signal strength,
silence
value,
MAC
packet
& -81dBm
subtype,(~10%
CRCofcheck
Max RSS:
-46dBm
(~83%
of data)
Mintype
RSS:
data)
succeeded or not, MAC address information, MAC sequence
Difference
of 35dB
number
information
Max/Min for IIT-Kanpur's -70dBm / -90dBm
– Is our network different then theirs? Antennas, chipsets are
Difference
of 20dB
the
same. Our
network is not always way up high… and do
not have good link quality all the time.
Next do this on Cuernavaca line. Maybe it will have higher variation than that
• Coordinated IP level dumps on entire network
of UNAM.
– Can’t stop data flow
Synchronize
dumps
between
nodes since RTS-CTS is off
High –variation
might be
from inter-link
interference
– Coordinate
driver information
See what
RTS-CTSwith
does.
– How do the long links affect the transfers?
If still–high
linkhidden
variation,
then Mexico
network
is intrinsically
different
from
Huge
terminal
problem,
does
rts/cts seem
to help?
that in India. May be our network is in between Boston's urban Roofnet
and Kanpur's rural network?
New Applications
• DTS and DTN ideas/techniques can (must?)
be applied to two new CENS applications
– GeoNet
– SHM (Structure Health Monitoring)
GeoNet: Rapidly Deployable Challenged Network
• Platform to support high data rate rapidly
deployed large-scale WSN
– Deploy 100-1000 nodes after event at a
separation of 0.5-1Km
AENSbox
GPS
• Software tools for rapid deployment
– Must make real time decision about sensor
location vs. network connectivity tradeoff
– Need as much feedback from network as
possible
• Power efficient platform such as LEAP
needs appropriate software architecture.
• Network time synchronization when no
GPS available
• Data deliver & system management
• Take advantage of dual radios?
Geophone
802.15.4
WIFI
SHM
• SHM framework to improve safety and reliability
of aerospace, civil and mechanical infrastructure
by detecting damage before it reaches a critical
state
• Initially targeting tall buildings
• Still a challenged network
– Building structure (walls, ceilings), people, other
networks, ‘stuff’
SHM Framework
Embedded
Network
Structural
System
Sensors
SHMBox
Network Health
Sensors
SHMBox
Network
Health
Damage
State
FEM
Model
Engineering
Demand
Parameters
Fragility
Curves
Fragility Curves
Probability
Monitored Zones
of Interest
Event Detection
Damage
Measure
Thank you!
[email protected]
Demo!
Thanks to Igor and Derek for all the pictures and diagrams!
Teotihuacan, 2006
MASE Wireless Seismic Station
15 dBi YAGI or 24 dBi Parabolic 2.4GHz antenna
70 watt solar panel, GPS
mast and guy wires
Quanterra Q330 24-bit digitizer
sensor controller
2.4GHz amp
car battery
CDCC (CENS Data
Communication Controller)
Guralp 3T seismometer
30
Science!
Following slides prepared by Roy Clayton (CalTech)
and Igor Stubailo (UCLA – CENS)
The Middle America Subduction
Experiment (MASE).
Why Mexico? Slab detachment theory.
B
•A subduction zone is an
area on Earth where two
tectonic plates meet and
move towards one another,
with one sliding underneath
the other and moving down
into the mantle, at a speed
of several inches per year.
•Typically, an oceanic plate
slides underneath a
continental plate, and this
often creates a zone with
many volcanoes and
earthquakes.
Ferrari, 2004, Geology
32
Similarities of Mexico City and Los Angeles locations
•LA and Mexico City are major
centers of commerce which sit
upon compliant sedimentary
basins.
•Both are subject to damaging
earthquakes and how
earthquakes excite resonant
shaking
33
Great potential of high station density
•Achieve 20 times better resolution than before.
•Provide visualization of the upper mantle and the subduction process, coast to
coast across Mexico.
•The data collected is very valuable to scientists in seismology, geodesy,
geochemistry, geology, computational geodynamics, geophysics, and others
34
Russian Event (Kamchatka) : April 20, 2006, M=7.7
35
First results: detect flat slab with receiver functions
36
Rob Clayton, Caltech, 2006