MoreOnONOS - CSE Labs User Home Pages
Download
Report
Transcript MoreOnONOS - CSE Labs User Home Pages
More on ONOS
Open Network OS by ON.LAB
Current ONOS Architecture Tiers
subsystem components
Consistency Issues in ONOS
•
•
Switch mastership
Network graph
CSci8211:
SDN Controller Design: More on ONOS
1
Key Performance Requirements
High Throughput:
~500K-1M paths
setups / second
~1-2M network state
ops / second
High Volume:
Apps
ONOS
Global Network View / State
~10GB of network
state data
Difficult challenge!
high throughput | low latency |
consistency | high availability
ONOS Prototype II
Prototype 2: focus on improving performance, esp. event latency
RAMcloud for data store: Blueprints APIs directly on top
•
Optimized data model
table for each type of net. objects (switches, links, flows, …)
minimize # of references bw elements: most updates require 1 r/w
(In-memory) topology cache at each ONOS instance
RAMcloud: low-latency, dist. key-value store w/ 15-30 us read/write
eventual consistent: updates may receive at diff. times & orders
atomicity under schema integrity: no r/w by apps midst an update
Event notifications using Hazelcast, a pub-sub system
CSci8211:
SDN Controller Design: ONOS
3
Current ONOS Architecture Tiers
Northbound
Abstraction:
network graph
- application
intents
-
Apps
Northbound - Application Intent Framework
(policy enforcement, conflict resolution)
Core:
- distributed
- protocol
independent
Southbound
Abstraction:
- generalized
OpenFlow
- pluggable &
extensible
Distributed Core
(scalability, availability, performance, persistence)
Southbound
(discover, observe, program, configure)
OpenFlow
NetConf
...
Distributed Architecture
for Fault Tolerance
Apps
NB Core API
Distributed Core
(state management, notifications, high-availability & scale-out)
SB Core API
Adapters
Adapters
Adapters
Adapters
Protocols
Protocols
Protocols
Protocols
Modularity Objectives
● Increase architectural coherence, testability and
maintainability
o establish tiers with crisply defined boundaries and
responsibilities
o setup code-base to follow and enforce the tiers
● Facilitate extensibility and customization by partners and users
o
unit of replacement is a module
● Avoid speciation of the ONOS code-base
o APIs setup to encourage extensibility and community code
contributions
● Preempt code entanglement, i.e. cyclic dependencies
o reasonably small modules serve as firewalls against cycles
● Facilitate pluggable southbound
Performance Objectives
● Throughput of proactive provisioning actions
o path flow provisioning
o global optimization or rebalancing of existing path flows
● Latency of responses to topology changes
o path repair in wake of link or device failures
● Throughput of distributing and aggregating state
o batching, caching, parallelism
o dependency reduction
● Controller vs. Device Responsibilities
o defer to devices to do what they do best, e.g low-latency
reactivity, backup paths
ONOS Services & Subsystems
•
Device Subsystem - Manages the inventory of infrastructure
devices.
• Link Subsystem - Manages the inventory of infrastructure links.
• Host Subsystem - Manages the inventory of end-station hosts and
their locations on the network.
• Topology Subsystem - Manages time-ordered snapshots of network
graph views.
• PathService - Computes/finds paths between infrastructure devices
or between end-station hosts using the most recent topology graph
snapshot.
• FlowRule Subsystem - Manages inventory of the match/action flow
rules installed on infrastructure devices and provides flow metrics.
• Packet Subsystem - Allows applications to listen for data packets
received from network devices and to emit data packets out onto the
network via one or more network devices.
ONOS Services & Subsystems
ONOS Core Subsystem Structure
App
App
Component
Component
command
query &
command
Listener
command
add & remove
query &
command
notify
AdminService
Listener
add & remove
notify
Service
AdminService
Service
sync & persist
Manager
Component
Store
Manager
Component
Store
sync & persist
AdapterService
AdapterRegistry
AdapterService
command
sensing
AdapterRegistry
command
register & unregister
sensing
Adapter
register & unregister
Adapter
Adapter
Adapter
Component
Component
Adapter
Adapter
Component
Component
Protocols
Protocols
ONOS Modules
● Well-defined
relationships
● Basis for
customization
● Avoid cyclic
dependencies
onlab-util-misc
onlab-util-osgi
onlab-util-rest
onos-api
onos-of-api
onos-core-store
onos-of-ctl
onos-core-net
onos-of-adapter-*
...
ONOS Southbound
● Attempt to be as generic as
possible
● Enable
partners/contributors to
submit their own
device/protocol specific
providers
● Providers should be
stateless; state may be
maintained for optimization
but should not be relied
upon
ONOS Southbound
● ONOS supports multiple
southbound protocols,
enabling a transition to
true SDN.
● Adapters provide
descriptions of
dataplane elements to
the core - core utilizes
this information.
● Adapters hide protocol
complexity from ONOS.
Descriptions
● Serve as scratch pads to
pass information to core
● Descriptions are immutable
and extremely short lived
● Descriptions contains URI
for the object they are
describing
● URI also encode the
Adapter the device is
linked to
Adapter Patterns
1. Adapter registers with core
a. Core returns a
AdapterService bound
to the Adapter
2. Adapter uses
AdapterService to notify
core of new events (device
connected, pktin) via
Descriptions
This is where the
magic happens
Manager
Component
AdapterService
2
sensing
AdapterRegistry
3
1
register & unregister
4
Adapter
Adapter
Component
Protocols
3. Core can use Adapter to issue commands to elements under
Adapter control
4. Eventually, the adapter unregisters itself; core will invalidate the
AdapterService
Distributed Core
● Distributed state management framework
o built for high-availability and scale-out
● Different types of state require different types of
synchronization
o fully replicated
o master / slave replicated
o partitioned / distributed
● Novel topology replication technique
o logical clock in each instance timestamps events observed in
underlying network
o logical timestamps ensure state evolves in consistent and
ordered fashion
o allows rapid convergence without complex coordination
o applications receive notifications about topology changes
Consistency Issues in ONOS
Recall Definitions of Consistency Models
Strong Consistency
• Upon an update to the network state by an instance, all
subsequent reads by any instance returns last updated value.
• Strong consistency adds complexity and latency to distributed
data management
Eventual consistency is a relaxed model
• allowing readers to be behind for a short period of time
• a liveness property
There are other models: e.g., serial consistency, causal consistency
Q: what consistency model or models should ONOS adopt?
ONOS Distributed Core
● Responsible for all state management concerns
● Organized as a collection of “stores”
o Ex: Topology, Link Resources, , Intents, etc.
● Properties of state guide ACID vs BASE choice
Q: What are possible categories of various states in a
(logically centralized) SDN control plane, e.g.,
ONOS?
ONOS Global Network View
Applications
Applications
Applications
Applications
observe
program
Network State
•
Topology
(Switch, Port, Link, …)
•
Network Events
(Link down, Packet In, …)
•
Flow state
(Flow table, connectivity paths, ...)
Switch
Port
Link
Host
Intent
FlowPath
FlowEntry
CSci8211:
SDN Controller Design: ONOS
ONOS State and Properties
State
Properties
Network Topology
Eventually consistent, low
latency access
Flow Rules, Flow Stats
Eventually consistent,
shardable, soft state
Switch - Controller mapping
Distributed Locks
Strongly consistent, slow
changing
Application Intents
Resource Allocations
Strongly consistent, durable,
Immutable
Switch Mastership: Strong Consistency
Network Graph
Distributed
Network OS
Registry
All instances
Switch A Master = NONE
Instance 1
Instance 2
Switch A
Switch A
Master
A ==
Master = NONE
ONOS 1
Switch
Switch A
A
Master
Master ==
ONOS
NONE1
All instances
Switch A Master =
NONE
Timeline
Master elected for switch A
Delay of Locking & Consensus
Instance 3
Switch A
Master
Master
==
A = ONOS
ONOS
NONE1
1
Instance 1 Switch A Master = ONOS 1
Instance 2 Switch A Master = ONOS 1
Instance 3 Switch A Master = ONOS 1
ONOS Switch Mastership
& Topology Cache
“Tell me about your
slice?”
Cache
ONOS Switch Mastership:
ONOS Instance Crash
ONOS Switch Mastership:
ONOS Instance Crash
ONOS uses Raft to achieve consensus and strong consistency
Switch Mastership Terms
Switch mastership term is incremented every time mastership
changes
Switch mastership term is tracked in a strongly consistent
store
Why Strong Consistency
for Master Election
Weaker consistency might mean Master election on instance
1 will not be available on other instances.
That can lead to having multiple masters for a switch.
Multiple Masters will break our semantic of control
isolation.
Strong locking semantic is needed for Master Election
Eventual Consistency in Network Graph
Network Graph
Distributed
Network
OS
DHT
Instance
1
Instance
2
Switch A
SWITCH
A
STATE=
State =INACTIVE
ACTIVE
Switch
Switch AA
State ==INACTIVE
State
ACTIVE
All instances
Switch A STATE =
INACTIVE
Timeline
Switch Connected to ONOS
Instance 1 Switch A = ACTIVE
Instance 2 Switch A =
INACTIVE
Instance 3 Switch A =
INACTIVE
Delay of Eventual
Consensus
Instance
3
Switch
Switch AA
STATE
STATE ==
INACTIVE
ACTIVE
All instances
Switch A STATE = ACTIVE
What about link failure?
Both ends may detect it!
Topology Events and Eventual Consistency
Cache
Cache
Cache
detection of topology event
update distributed topology store
Topology Events and Eventual Consistency
Cache
Cache
Cache
Topology Events and Eventual Consistency
Instance 1 crashed?
Cache
Cache
Instance 1 crashed?
Cache
Switch Event Numbers
Topology as state machine
Events: switch/port/link up or down
Each event has a unique logical timestamp: [switch id, term number,
event number]
Events are timestamped on arrival and broadcast
Partial ordering of topology event: “stale” events are dropped on
receipt
Anti-entropy : peer-to-peer, lightweight, gossip protocol: quickly
bootstraps newly joined ONOS instance
Key challenge: view should be consistent with the network, not
(simply) other views!
Cost of Eventual Consistency
Short delay will mean the switch A state is not ACTIVE on some
ONOS instances in previous example (!?)
Applications on one instance will compute flow through the
switch A, while other instances will not use the switch A for
path computation (!?)
Eventual consistency becomes more visible during control plane
network congestion
Q: What are possible consequences of eventual consistency of
network graph?
Eventual consistency of network graph eventual consistency of
flow entries? re-do path computation/flow setup?
Why is Eventual Consistency
Good enough for Network State?
Physical network state changes asynchronously
Strong consistency across data and control plane is too hard
Control apps know how to deal with eventual consistency
In distributed control plane, each router makes its own decision
based on old info from other parts of the network: it works fine
But in the current distributed control plane, destination-based,
shortest path routing is used; this guarantees eventual consistency of
routing tables computed by each individual router!
Strong Consistency is more likely to lead to inaccuracy of network
state as network congestions are real
How to reconcile and address this challenge?
Consistency: Lessons
One Consistency does not fit all
Consequences of delays need to be well understood
More research needs to be done on various states using
different consistency models
Network OS (logically centralized control plane):
distributed stores and consistent network state (network
graph, flow entries & various network states) also unique
challenges!
ONOS Service APIs
Application Intent Framework
● Application specifies high-level intents; not low-level rules
o
focus on what should be done, rather than how it should be done
● Intents are compiled into actionable objectives which are
installed into the environment
o
e.g. HostToHostIntent compiles into two PathIntents
● Resources required by objectives are then monitored
o
e.g. link vanishes, capacity or lambda becomes available
● Intent subsystem reacts by recompiling intent and re-installing
revised objectives
Will come back and discuss this later in the semester if we have
time!
Reflections/Lessons Learned:
Things We Got Right
Control isolation (sharding)
Divide network into parts and control them exclusively
Load balancing -> we can do more
Distributed data store
That scales with controller nodes with HA -> though we need
low latency distributed data store
Dynamic controller assignment to parts of network
Dynamically assign which part of network is controlled by
which controller instance -> we can do better with
sophisticated algorithms
Graph abstraction of network state
Easy to visualize and correlate with topology
Enables several standard graph algorithms
37
What’s Available in ONOS
Today?
● ONOS with all its key features
o
o
o
o
o
high-availability, scalability*, performance*
northbound abstractions (application intents)
southbound abstractions (OpenFlow adapters)
modular code-base
GUI
● Open source
o ONOS code-base on GitHub
o documentation & infrastructure processes to engage the community
● Use-case demonstrations
o SDN-IP, Packet-Optical
● Sample applications
o reactive forwarding, mobility, proxy arp
Please do check out the current ONOS code distribution on ONOS project site
https://wiki.onosproject.org