MoreOnONOS - CSE Labs User Home Pages

Download Report

Transcript MoreOnONOS - CSE Labs User Home Pages

More on ONOS
Open Network OS by ON.LAB

Current ONOS Architecture Tiers


subsystem components
Consistency Issues in ONOS
•
•
Switch mastership
Network graph
CSci8211:
SDN Controller Design: More on ONOS
1
Key Performance Requirements
High Throughput:
~500K-1M paths
setups / second
~1-2M network state
ops / second
High Volume:
Apps
ONOS
Global Network View / State
~10GB of network
state data
Difficult challenge!
high throughput | low latency |
consistency | high availability
ONOS Prototype II

Prototype 2: focus on improving performance, esp. event latency

RAMcloud for data store: Blueprints APIs directly on top
•

Optimized data model



table for each type of net. objects (switches, links, flows, …)
minimize # of references bw elements: most updates require 1 r/w
(In-memory) topology cache at each ONOS instance



RAMcloud: low-latency, dist. key-value store w/ 15-30 us read/write
eventual consistent: updates may receive at diff. times & orders
atomicity under schema integrity: no r/w by apps midst an update
Event notifications using Hazelcast, a pub-sub system
CSci8211:
SDN Controller Design: ONOS
3
Current ONOS Architecture Tiers
Northbound
Abstraction:
network graph
- application
intents
-
Apps
Northbound - Application Intent Framework
(policy enforcement, conflict resolution)
Core:
- distributed
- protocol
independent
Southbound
Abstraction:
- generalized
OpenFlow
- pluggable &
extensible
Distributed Core
(scalability, availability, performance, persistence)
Southbound
(discover, observe, program, configure)
OpenFlow
NetConf
...
Distributed Architecture
for Fault Tolerance
Apps
NB Core API
Distributed Core
(state management, notifications, high-availability & scale-out)
SB Core API
Adapters
Adapters
Adapters
Adapters
Protocols
Protocols
Protocols
Protocols
Modularity Objectives
● Increase architectural coherence, testability and
maintainability
o establish tiers with crisply defined boundaries and
responsibilities
o setup code-base to follow and enforce the tiers
● Facilitate extensibility and customization by partners and users
o
unit of replacement is a module
● Avoid speciation of the ONOS code-base
o APIs setup to encourage extensibility and community code
contributions
● Preempt code entanglement, i.e. cyclic dependencies
o reasonably small modules serve as firewalls against cycles
● Facilitate pluggable southbound
Performance Objectives
● Throughput of proactive provisioning actions
o path flow provisioning
o global optimization or rebalancing of existing path flows
● Latency of responses to topology changes
o path repair in wake of link or device failures
● Throughput of distributing and aggregating state
o batching, caching, parallelism
o dependency reduction
● Controller vs. Device Responsibilities
o defer to devices to do what they do best, e.g low-latency
reactivity, backup paths
ONOS Services & Subsystems
•
Device Subsystem - Manages the inventory of infrastructure
devices.
• Link Subsystem - Manages the inventory of infrastructure links.
• Host Subsystem - Manages the inventory of end-station hosts and
their locations on the network.
• Topology Subsystem - Manages time-ordered snapshots of network
graph views.
• PathService - Computes/finds paths between infrastructure devices
or between end-station hosts using the most recent topology graph
snapshot.
• FlowRule Subsystem - Manages inventory of the match/action flow
rules installed on infrastructure devices and provides flow metrics.
• Packet Subsystem - Allows applications to listen for data packets
received from network devices and to emit data packets out onto the
network via one or more network devices.
ONOS Services & Subsystems
ONOS Core Subsystem Structure
App
App
Component
Component
command
query &
command
Listener
command
add & remove
query &
command
notify
AdminService
Listener
add & remove
notify
Service
AdminService
Service
sync & persist
Manager
Component
Store
Manager
Component
Store
sync & persist
AdapterService
AdapterRegistry
AdapterService
command
sensing
AdapterRegistry
command
register & unregister
sensing
Adapter
register & unregister
Adapter
Adapter
Adapter
Component
Component
Adapter
Adapter
Component
Component
Protocols
Protocols
ONOS Modules
● Well-defined
relationships
● Basis for
customization
● Avoid cyclic
dependencies
onlab-util-misc
onlab-util-osgi
onlab-util-rest
onos-api
onos-of-api
onos-core-store
onos-of-ctl
onos-core-net
onos-of-adapter-*
...
ONOS Southbound
● Attempt to be as generic as
possible
● Enable
partners/contributors to
submit their own
device/protocol specific
providers
● Providers should be
stateless; state may be
maintained for optimization
but should not be relied
upon
ONOS Southbound
● ONOS supports multiple
southbound protocols,
enabling a transition to
true SDN.
● Adapters provide
descriptions of
dataplane elements to
the core - core utilizes
this information.
● Adapters hide protocol
complexity from ONOS.
Descriptions
● Serve as scratch pads to
pass information to core
● Descriptions are immutable
and extremely short lived
● Descriptions contains URI
for the object they are
describing
● URI also encode the
Adapter the device is
linked to
Adapter Patterns
1. Adapter registers with core
a. Core returns a
AdapterService bound
to the Adapter
2. Adapter uses
AdapterService to notify
core of new events (device
connected, pktin) via
Descriptions
This is where the
magic happens
Manager
Component
AdapterService
2
sensing
AdapterRegistry
3
1
register & unregister
4
Adapter
Adapter
Component
Protocols
3. Core can use Adapter to issue commands to elements under
Adapter control
4. Eventually, the adapter unregisters itself; core will invalidate the
AdapterService
Distributed Core
● Distributed state management framework
o built for high-availability and scale-out
● Different types of state require different types of
synchronization
o fully replicated
o master / slave replicated
o partitioned / distributed
● Novel topology replication technique
o logical clock in each instance timestamps events observed in
underlying network
o logical timestamps ensure state evolves in consistent and
ordered fashion
o allows rapid convergence without complex coordination
o applications receive notifications about topology changes
Consistency Issues in ONOS
Recall Definitions of Consistency Models
 Strong Consistency
• Upon an update to the network state by an instance, all
subsequent reads by any instance returns last updated value.
• Strong consistency adds complexity and latency to distributed
data management
 Eventual consistency is a relaxed model
• allowing readers to be behind for a short period of time
• a liveness property
 There are other models: e.g., serial consistency, causal consistency
Q: what consistency model or models should ONOS adopt?
ONOS Distributed Core
● Responsible for all state management concerns
● Organized as a collection of “stores”
o Ex: Topology, Link Resources, , Intents, etc.
● Properties of state guide ACID vs BASE choice
Q: What are possible categories of various states in a
(logically centralized) SDN control plane, e.g.,
ONOS?
ONOS Global Network View
Applications
Applications
Applications
Applications
observe
program
Network State
•
Topology
(Switch, Port, Link, …)
•
Network Events
(Link down, Packet In, …)
•
Flow state
(Flow table, connectivity paths, ...)
Switch
Port
Link
Host
Intent
FlowPath
FlowEntry
CSci8211:
SDN Controller Design: ONOS
ONOS State and Properties
State
Properties
Network Topology
Eventually consistent, low
latency access
Flow Rules, Flow Stats
Eventually consistent,
shardable, soft state
Switch - Controller mapping
Distributed Locks
Strongly consistent, slow
changing
Application Intents
Resource Allocations
Strongly consistent, durable,
Immutable
Switch Mastership: Strong Consistency
Network Graph
Distributed
Network OS
Registry
All instances
Switch A Master = NONE
Instance 1
Instance 2
Switch A
Switch A
Master
A ==
Master = NONE
ONOS 1
Switch
Switch A
A
Master
Master ==
ONOS
NONE1
All instances
Switch A Master =
NONE
Timeline
Master elected for switch A
Delay of Locking & Consensus
Instance 3
Switch A
Master
Master
==
A = ONOS
ONOS
NONE1
1
Instance 1 Switch A Master = ONOS 1
Instance 2 Switch A Master = ONOS 1
Instance 3 Switch A Master = ONOS 1
ONOS Switch Mastership
& Topology Cache
“Tell me about your
slice?”
Cache
ONOS Switch Mastership:
ONOS Instance Crash
ONOS Switch Mastership:
ONOS Instance Crash
ONOS uses Raft to achieve consensus and strong consistency
Switch Mastership Terms
 Switch mastership term is incremented every time mastership
changes
 Switch mastership term is tracked in a strongly consistent
store
Why Strong Consistency
for Master Election
 Weaker consistency might mean Master election on instance
1 will not be available on other instances.
 That can lead to having multiple masters for a switch.
 Multiple Masters will break our semantic of control
isolation.
 Strong locking semantic is needed for Master Election
Eventual Consistency in Network Graph
Network Graph
Distributed
Network
OS
DHT
Instance
1
Instance
2
Switch A
SWITCH
A
STATE=
State =INACTIVE
ACTIVE
Switch
Switch AA
State ==INACTIVE
State
ACTIVE
All instances
Switch A STATE =
INACTIVE
Timeline
Switch Connected to ONOS
Instance 1 Switch A = ACTIVE
Instance 2 Switch A =
INACTIVE
Instance 3 Switch A =
INACTIVE
Delay of Eventual
Consensus
Instance
3
Switch
Switch AA
STATE
STATE ==
INACTIVE
ACTIVE
All instances
Switch A STATE = ACTIVE
What about link failure?
Both ends may detect it!
Topology Events and Eventual Consistency
Cache
Cache
Cache
detection of topology event 
update distributed topology store
Topology Events and Eventual Consistency
Cache
Cache
Cache
Topology Events and Eventual Consistency
Instance 1 crashed?
Cache
Cache
Instance 1 crashed?
Cache
Switch Event Numbers
 Topology as state machine
 Events: switch/port/link up or down
 Each event has a unique logical timestamp: [switch id, term number,
event number]
 Events are timestamped on arrival and broadcast
 Partial ordering of topology event: “stale” events are dropped on
receipt
 Anti-entropy : peer-to-peer, lightweight, gossip protocol: quickly
bootstraps newly joined ONOS instance
Key challenge: view should be consistent with the network, not
(simply) other views!
Cost of Eventual Consistency
 Short delay will mean the switch A state is not ACTIVE on some
ONOS instances in previous example (!?)
 Applications on one instance will compute flow through the
switch A, while other instances will not use the switch A for
path computation (!?)
 Eventual consistency becomes more visible during control plane
network congestion
Q: What are possible consequences of eventual consistency of
network graph?
Eventual consistency of network graph  eventual consistency of
flow entries?  re-do path computation/flow setup?
Why is Eventual Consistency
Good enough for Network State?
 Physical network state changes asynchronously

Strong consistency across data and control plane is too hard

Control apps know how to deal with eventual consistency
 In distributed control plane, each router makes its own decision
based on old info from other parts of the network: it works fine
 But in the current distributed control plane, destination-based,
shortest path routing is used; this guarantees eventual consistency of
routing tables computed by each individual router!
 Strong Consistency is more likely to lead to inaccuracy of network
state as network congestions are real

How to reconcile and address this challenge?
Consistency: Lessons
 One Consistency does not fit all
 Consequences of delays need to be well understood
 More research needs to be done on various states using
different consistency models
Network OS (logically centralized control plane):
distributed stores and consistent network state (network
graph, flow entries & various network states) also unique
challenges!
ONOS Service APIs
Application Intent Framework
● Application specifies high-level intents; not low-level rules
o
focus on what should be done, rather than how it should be done
● Intents are compiled into actionable objectives which are
installed into the environment
o
e.g. HostToHostIntent compiles into two PathIntents
● Resources required by objectives are then monitored
o
e.g. link vanishes, capacity or lambda becomes available
● Intent subsystem reacts by recompiling intent and re-installing
revised objectives
Will come back and discuss this later in the semester if we have
time!
Reflections/Lessons Learned:
Things We Got Right
 Control isolation (sharding)
 Divide network into parts and control them exclusively
 Load balancing -> we can do more
 Distributed data store
 That scales with controller nodes with HA -> though we need
low latency distributed data store
 Dynamic controller assignment to parts of network
 Dynamically assign which part of network is controlled by
which controller instance -> we can do better with
sophisticated algorithms
 Graph abstraction of network state
 Easy to visualize and correlate with topology
 Enables several standard graph algorithms
37
What’s Available in ONOS
Today?
● ONOS with all its key features
o
o
o
o
o
high-availability, scalability*, performance*
northbound abstractions (application intents)
southbound abstractions (OpenFlow adapters)
modular code-base
GUI
● Open source
o ONOS code-base on GitHub
o documentation & infrastructure processes to engage the community
● Use-case demonstrations
o SDN-IP, Packet-Optical
● Sample applications
o reactive forwarding, mobility, proxy arp
Please do check out the current ONOS code distribution on ONOS project site
https://wiki.onosproject.org