NC: A Verifiable, Executable Coordination Paradigm for

Download Report

Transcript NC: A Verifiable, Executable Coordination Paradigm for

NC: A Verifiable, Executable
Coordination Paradigm for
Real-Time Communication
Sebastian Fischmeister
[email protected]
Department of Computer and Information Science
University of Pennsylvania
1
Real-Time Systems
• Right value at the wrong time can cause an error!
 How to guarantee timeliness in the system?
No RT
Computer
simulation
Soft RT
User
interface
Internet
video, audio
Cruise
control
Hard RT
Tele
communication
Flight
control
Electronic
engine
2
Distributed RT Systems
• Multiple nodes together realize one real-time application.
 How to bound communication delays in the system?
Shared
medium
Brake pedal
Fuel tank
Tires
Brake
Board
computer
Collisions, retransmissions, delays!
 need for a coordination model for controlling access.
3
Time Division Multiple Access
• TDMA is a method for realizing collision-free access to a shared medium.
• Partition access into time slices (=slots) and rounds.
• Assigns slots to nodes for exclusive access using some algorithm.
• Example: Round robin with four nodes
Slot
N1
N2
N3
N4
t
Round
Slot #
1
2
3
4
Sender
N1
N2
N3
N4
Receiver
…
…
…
…
4
Flavors of TDMA
• Great for safety-critical systems, but too inflexible.
statically allocated
One static phase:
dynamic
Only dynamic:
Static + dynamic:
Static, dynamic, free:
static
static
dynamic
dynamic
fre
e
• Still:
– Decisions are made at the beginning of the round/phase.
– Phase order stays the same.
– Typically independent of the software.
5
Goals of Network Code
• Goals:
– Allow close interaction between the application and the medium
access control.
– Allow developers to program their own access control.
– Allow decisions during the communication round.
– Should be apt for safety-critical hard real-time systems.
6
Network Code
[ISCC05]
• Executable programmable coordination model for real-time
communication with a domain specific language.
– Split application into: program logic + communication logic
– Communication logic is programmed separately (=Network Code program)
– An interpreter (=Network Code machine) executes the program at run time
Program logic
NC program
Node1
Node2
Application
set X
Application
get X
Value space
get X
Value space
set X
NCM
NCM
Medium
tx M/ch
7
Research Results/Status
• Theory:
– Underlying concept and idea of Network Code [ISCC05]
– Verification of Network Code programs [RTAS06]
• Given a set of programs, are they safe to run concurrently?
– Metrics framework for measuring programs [EmSoft06]
• Overhead, throughput, avg. cycle length, avg. waiting time
– Composition algorithms for programs [submitted]
• Given several applications, how can they run together?
• Simulation:
– In round decisions increase throughput and performance. [EmSoft06]
– Gains are kept despite overhead introduced during composition. [submitted]
• Implementation:
– Implementation RTLinuxPro[RTAS06,TC07]+Ethernet, PIC18F+CAN , FPGA
8
How does it work?
9
Example: Transmit X in M
•
•
•
Slot length is 10t.
Node 1 transmits variable X every 20t in message M on channel ch.
Node 2 receives this message and stores it Y.
0
Sender
10
20
t
L0: create(M, X)
send(ch, M, _)
wait(20)
goto(L0)
Receiver
La: wait(10)
receive(ch, Y)
wait(10)
goto(La)
10
Complex Example
• Transmit a value with temporal replication tolerating 2 failures
– Sensor values A to E. (2n+1)
– Decide at the end of the round what the sensor value is.
– Take the majority vote (e.g., ‘4,4,3,2,4’ is 4)
Standard:
A
B
C
D
E
t
Vote decisive? Skip ahead!
Improved version:
A
B
C
D
E
t
11
Complex Example Code
A
B
C
D
E
t
Node A:
La: create(MA, A)
send(ch, MA)
wait(20)
receive(ch, B)
wait(10)
receive(ch, C)
if(voteDecisive, La)
wait(10)
receive(ch, D)
if(voteDecisive, La)
wait(10)
receive(ch, E)
goto(La)
12
System Overview
Application
layer
Network code
program
RT application
set/get
Dynamic
TDMA
layer
Queue
Variable space
Soft queue
Hard queue
Message transceiver
Queue
Network code
machine
Value det.
layer
set/get
Coordination
Access control
Communication medium
13
Data Flow (Hard Values)
Application
layer
Network code
program
RT application
set/get
Dynamic
TDMA
layer
Queue
Variable space
Soft queue
Hard queue
Message transceiver
Queue
Network code
machine
Value det.
layer
set/get
Coordination
Access control
Communication medium
14
Data Flow (Soft Values)
Application
layer
Network code
program
RT application
set/get
Dynamic
TDMA
layer
Queue
Variable space
Soft queue
Hard queue
Message transceiver
Queue
Network code
machine
Value det.
layer
set/get
Coordination
Access control
Communication medium
15
Verification & Checking
[RTAS06]
• What if I have a bug in my Network Code program?
Prgm 1:
La:
wait(20)
send(_,_,_)
if(guardX, La)
wait(10)
send(_,_,_)
goto(La)
Prgm 2:
Lb: wait(10)
send(_,_,_)
wait(30)
goto(Lb)
 a verification framework with model checking
• Use operational semantics of instructions to model system state.
• Convert program instructions to transformations of the system state.
• Check properties are reachability questions using model checking.
16
Example Properties
• Integrity checks
–
–
–
–
• Behavioral checks
All guards present
Valid jump addresses
Successor address
Consequential code
– Non-empty send
– Local message lifecycle
• Distributed checks
–
–
–
–
Collision-free communication
Schedulability
Non-empty receive
Message reception within timeout
17
Metrics Framework
[EmSoft06]
• What is the overhead? What is the throughput? What is the
cycle?
– Convert the Network Code Program into a timed tree communication
schedule.
– Compute metrics on the tree schedule using DFS & model checking.
• Timed tree communication schedule:
– Nodes encode messages releases and delays
– Transitions encode mutually excl. decisions
(exhaustive, deterministic)
– Leaves cause a reset
(X,3)
!g1
(Y,4)
g1
(Z,2)
18
Example Metrics
A
A
B
C
D
g1
E
g2
B
t
• Longest cycle length = longest branch
• Guard overhead
• End-to-end constraints for messages
g1
C
-
!g1
g2
-
D
!g2
E
• Probabilistic tree schedules:
– Average cycle length = average branch length
– Average waiting times
19
Is it better?
20
Inverted Pendulum
• Pendulum sitting on a cart with one degree of freedom.
• Measure the angle w.r.t. to straight up.
• The cart moves horizontally to balance the pendulum.
• Temporal replication to tolerate two faults:
– 5 measurement units (A to E)
– Report measurements to the controller.
– Controller updates the kart at the round.
21
Simulation Results
Standard schedule:
A
B
C
D
E
t
Network code:
Vote decisive? Skip ahead!
A
B
C
D
E
t
Network Code is better*
22
Simulation Results
Mean time to failure
With an increase in the mean time to failure, standard TDMA stays the same
Network Code utilize this increase in reliability.
23
Implementation Measurements
• Given the shortest program, what is the fastest execution rate?
– Output queue must not overflow.
– Input queue must not under-/overflow.
– Slot structure must not be violated.
• RTLinuxPro + direct link (100 MBit/s)
– 10% latency increase due to computation time
– 2.96MByte/s with a 99% guarantee of a successful transmission
– 2.65MByte/s with 99.999% guarantee
• Interrupts preempt instructions, cause jitter, cause delays.
– Send() instruction: 372ns (39.47%), 733ns (99%), 19090ns (99.999%)
=> FPGA! (10us)
24
Conclusions
• Communication behavior differs among applications;
custom protocols/schedules can:
– Significantly improve the application’s performance
– Incorporate/guarantee new properties at the communication layer
• Network Code allows programming custom TT communication behavior.
– Programs are checkable, guarantee properties in the distributed
system.
– Programs are measurable with tree schedules.
– Can run real-time and non-real-time traffic on the same network.
• Future work: multi-resource tree schedules and resource DSLs,
program generation, development methodology
25
Other/Past Research
• Logical execution model has deficits when considering communication.
– Task chains[DASC04]
– Case study[RTAS05]
• How to deterministically reconfigure RT software?
– Use virtualization and separate behavior & functionality[ECRTS05]
• Are mobile agents systems secure? (year 2000)
– No![MA01]
• How to build efficient proactive location-based systems?
– Decentralized architecture[LNAI]
– Use notification service[PERCOM03]
26
Outlook
27
DSLs & Interfaces
Current Approach
close()
Resources
open()
DSL1
send()
x++
X()
read()
malloc()
Software
Component
DSL2
DSL3
receive()
y+=5%x
write()
spawn()
fun()
Network Code
Medium
• What are good abstractions for resources, what are good DSLs?
• What makes the interface of a software component? Is the DSL the interface?
• How can we generate code for a component?
• How do we provide hardware support for DSLs? (e.g., programmable memory)
28