슬라이드 1 - Pohang University of Science and Technology
Download
Report
Transcript 슬라이드 1 - Pohang University of Science and Technology
CogMan: Cognitive Network
Management Architecture
- PhD Thesis Defense -
Sungsu Kim
[email protected]
Supervisor: Prof. James Won-Ki Hong
June 27, 2013
Distributed Processing & Network Management Lab.
Dept. of Computer Science and Engineering
POSTECH, Korea
Sungsu Kim, POSTECH
PhD Thesis Defense
1/37
Table of
Contents
01 Introduction
Network management approaches
Research motivation
Problems
Research approach
02 Related Work
Autonomic control loop
Human cognition model
03 CogMan
Conceptual representation of CogMan
Cognitive Control loop
Reasoning for the Reflective Loop
04 Validation
SDN overview
Failure recovery problems in SDN
Experiment results
05 Concluding Remarks
Summary
Contributions
Future work
Sungsu Kim, POSTECH
PhD Thesis Defense
2/37
Introduction
Sungsu Kim, POSTECH
PhD Thesis Defense
3/37
Network Management Approaches
Traditional approach
Autonomic approach
Autonomic Network
Management System
Decision making
Administrator
Monitoring data:
Port up/down state,
Number of packet
in/out,
Network alarms
Commands for
reconfiguration
Analyze
Monitor
Execute
Commands for
reconfiguration
Monitoring data:
Port up/down state,
Number of packet
in/out,
Network alarms
Managed network
Sungsu Kim, POSTECH
Policy
repository
Managed network
PhD Thesis Defense
4/37
Research Motivation
Previous studies have discussed various autonomic
network management technologies
Existing autonomic network management technologies are
heavily dependent on policies to fix problems
Autonomic network management systems are not widely
deployed in real networks and most networks are managed
by human administrators
In new networking architectures, such as Software Defined
Networking (SDN) and OpenFlow networks, network control
is centralized, so an autonomic network management
approach is appropriate for control and management
Sungsu Kim, POSTECH
PhD Thesis Defense
5/37
Problems in Autonomic Network Management
Understanding of current state of the managed
network is weak
Autonomic network management systems cannot
solve complex problems
Response time of autonomic network management
systems is not fast
Sungsu Kim, POSTECH
PhD Thesis Defense
6/37
Research Approach
Previous
Researches
Existing autonomic network management systems
cannot handle complex problems
Autonomic network management systems are not
deployed in real networks
Efficient management of complex
problems
Proposed
Method
An autonomic network management architecture
based on the human cognition model
Validation of the architecture in an SDN network
Sungsu Kim, POSTECH
PhD Thesis Defense
7/37
Related Work
Sungsu Kim, POSTECH
PhD Thesis Defense
8/37
Related Work (1/3)
IBM MAPE [IBM, ‘03]
Sensors
Effectors
Autonomic Manager
Analyze
Plan
Monitor Knowledge Execute
Sensors
Effectors
Managed
Resources
Sungsu Kim, POSTECH
PhD Thesis Defense
9/37
Related Work (2/3)
FOCALE control loop [Strassner, ‘07]
Context Manager
Policy Manager
Policies control application of intelligence
Autonomic Manager
Control
Control
Control
YES
Managed
Managed
Resource
Resource
Model
-Based
Model-Based
Translation
Translation
Analyze
AnalyseData
Data
and
Events
and Events
Determine
Actual State
Control
Current State =
Desired State?
NO
Ontological
Comparison
Reasoning and
Learning
Define New Device
Configuration(s))
Control
Sungsu Kim, POSTECH
PhD Thesis Defense
10/37
Related Work (3/3)
Human cognition model [Shrobe, 06]
Control
Actuation
Reflective
Perceptual
goal
Intellectual
goals
Motor Goal
Deliberative
Conceptual
gist
Recall and
attention
algorithm
Behavioral
plan
Perception
Emotions
Reactive
Sensory
image
Sensorimotor
transformation
Actions:
Posture
Locomotion
Body
World
Sungsu Kim, POSTECH
PhD Thesis Defense
11/37
CogMan: Cognitive Network
Management Architecture
Sungsu Kim, POSTECH
PhD Thesis Defense
12/37
Conceptual Representation of CogMan
User interface
Business goals
Policy manager
Policy
Autonomic manager
Normalized
data
Observe &
Normalize
Vendorspecific
data
A set
Compare state and classify problems
Reactive loop
of actions
for
Reactive:
a
single
failure,
reconfiguration
Compare
backup pathDeliberative
is prepared loop
Backup path
Deliberative: a single
failure,loop
Reflective
backup
path is prepared,
Information
optimal
pathmapping
is required
model
Support
Cisco data
Juniper
data
Port down
Port down
Act
alarm
Reflective: multiple failures &alarm
Reasoning
backup path is failed
Correlate alarms
Reasoning is required to
Vendorsolve complex problems
Backup path specific
commands
Managed resource(s)
Sungsu Kim, POSTECH
PhD Thesis Defense
13/37
Cognitive Control Loop (1/2)
Original FOCALE control loop + human
cognition model
Reflective
Perception
Compare
Decision making
Plan &
Decide
Perception
Control
Actuation
Perceptual
goal
Intellectual
goals
Motor Goal
Concept
ual
gist
Recall and
attention
algorithm
Behavioral
plan
Actuation
Act
Deliberative
Normalize
Emotions
Reactive
Observe
Sensory
image
Sensorimotor
transformation
Actions:
Posture
Locomotion
Body
Managed resource(s)
World
Human cognition model
FOCALE
Sungsu Kim, POSTECH
PhD Thesis Defense
14/37
Cognitive Control Loop (2/2)
Perception
Deliberative loop
Decision
making
Problems
defined
by
policy
Actuation
Reflective
Reactiveloop
Compare
Complex problems
Normalize
Deliberative
Reactivealgorithm
loop
Reasoning
is
Plan
&
necessary
Decide
Problems
can be
Act
solved fast
Reflective
Observe
Reasoning
Managed resource(s)
Sungsu Kim, POSTECH
PhD Thesis Defense
15/37
Reasoning for the Reflective Loop
Reasoning algorithm is used to solve complex
problems
Multiple failures cannot be solved if backup
paths are failed
We propose a Fast Flow Setup (FFS) algorithm to
recover multiple failures in SDN networks
• FFS recovers failures fast even if backup paths are failed
• FFS reduces load of an SDN controller
Sungsu Kim, POSTECH
PhD Thesis Defense
16/37
Validation: Fault Management in
SDN Networks
Sungsu Kim, POSTECH
PhD Thesis Defense
17/37
Software Defined Networking (SDN)
SDN: separation of data and control
planes
Routing
Controller
Logicallycentralized control
API to the data
plane
(e.g., OpenFlow)
Switches
Traditional networks
Sungsu Kim, POSTECH
SDN networks
PhD Thesis Defense
18/37
OpenFlow Flow Table Entry
Action
Matching fields
Stats
Packet + byte counters
1.
2.
3.
4.
5.
L1
Switch
Port
Forward packet to port(s)
Encapsulate and forward to controller
Drop packet
Send to normal processing pipeline
Modify Fields
L3
L2
VLAN
ID
MAC
src
MAC
dst
Eth
type
IP
Src
IP
Dst
L4
IP
Prot
TCP
sport
TCP
dport
+ mask what fields to match
Sungsu Kim, POSTECH
PhD Thesis Defense
19/37
Failure Recovery in SDN Networks
Traditional IP networks
• Distributed routing protocols reroute packets to
alternative paths
• Manual reconfiguration
• Path protection (MPLS)
SDN networks
• Protection
− Backup paths
− Fast failure recovery time (less than 50ms)
• Restoration
− Redirect affected flows one by one
− Failure recovery time is relatively long
Sungsu Kim, POSTECH
PhD Thesis Defense
20/37
Restoration Example
1. Obtain affected flows (host1host2)
2. Find an alternative path for each flow
path: <ACED>
Controller
3. set up alternative paths
Port down
message
Port down
message
Working path
B
D
A
Host 1
Sungsu Kim, POSTECH
C
Backup path E
PhD Thesis Defense
Host 2
21/37
Protection Example
Controller
Set working and backup paths
1. Switch A detects port down
2. Send packets to the backup path
Working path
B
D
A
Host 1
Sungsu Kim, POSTECH
C
Backup path
PhD Thesis Defense
E
Host 2
22/37
Problems in SDN Fault Management
Protection can recover a failure in 50ms
• Protection is the best solution for a single failure
Problems of the protection mechanism
• Extra packet exchanges are required during flow setup
• Protection cannot handle multiple failures that affect both
working and backup paths
• Practically, providing perfect protection to all links is
difficult
Restoration is an appropriate method for multiple
failures
• Failure recovery time of restoration is longer than 200ms
Sungsu Kim, POSTECH
PhD Thesis Defense
23/37
Why Restoration Takes Too Long?
controller calculate the path
Flow setup example The
between host1 and host 2
path= <ABD>
Add flow entries to A, B, and D
Controller
Ask
controller
dst:
host2
B
D
A
Host 1
Sungsu Kim, POSTECH
C
E
PhD Thesis Defense
Host 2
24/37
Fast Flow Setup (FFS)
Original flow setup requires many packet
exchanges
• We propose a Fast Flow Setup (FFS) algorithm
• FFS implants path information to a flow entry
• Reduce the number of packet exchanges for flow setup
Original flow setup
Fast Flow Setup (FFS)
Control packet
exchange
1+ n
2
Delay
(1+n)*t
2* t
Controller load
high
low
Tasks of
switches
Flow entry setup
IP header inspection,
Flow entry setup
n=number of switches in a path
t= latency between the controller and a switch
Sungsu Kim, POSTECH
PhD Thesis Defense
25/37
Example of the FFS Algorithm
1. The controller calculates the path
between host1 and host 2
path= <ABD>
2. Implant path <BD> into flow table
entry
Controller
Flow entry
Ask
controller
dst: host2
<DB>
dst:
host2
D
dst:
host2
B
D
A
dst:
host2
dst:
host2
Host 1
Sungsu Kim, POSTECH
D
B
C
E
PhD Thesis Defense
Host 2
26/37
The Proposed SDN Fault Management
Reactive
• Predefined backup path
• Single failure
• Short duration flow
Deliberative
• Predefined backup path
• Single failure
• Long-lived flow
Reflective
• No predefined backup path
• Multiple failures
• FFS algorithm is used for recovery
Sungsu Kim, POSTECH
PhD Thesis Defense
27/37
System Architecture
Port state alarm
Observe
Port state handler
Normalize
Alarm clustering
Compare
Affected flow detector
Plan & Decide
Routing
Reasoning
Path encoder
Act
Flow table modifier
Flow_mod
message
Sungsu Kim, POSTECH
CogMan processes
Functions for actual fault management
PhD Thesis Defense
28/37
Prototype Implementation
CogMan
FOCALE
Management module
•Protection
•FFS algorithm
•CogMan
•FOCALE
•MAPE
MAPE
Floodlight Controller
S1
Host 1
S3
Controller core
OpenFlow network
• Topology
construction
•Fault injection
S4
S6
S2
S5
Host n
Sungsu Kim, POSTECH
PhD Thesis Defense
29/37
Recovery Time (Single Failure)
Number of affected flows = 10
Restoration
CogMan (protection)
Sungsu Kim, POSTECH
PhD Thesis Defense
30/37
Recovery Time (Multiple Failures)
CogMan (FFS) vs. FOCALE (restoration)
Minimum: recovery time of the first affected flow
Maximum: recovery time of the last affected flow
Sungsu Kim, POSTECH
PhD Thesis Defense
31/37
Packet Exchange Ratio
Number of affected flows = 50
Packet exchange ratio
Traffic volume
Packet exchanges between the controller and switches
Sungsu Kim, POSTECH
PhD Thesis Defense
32/37
Packet Exchanges for Flow Setup
Number of packet exchanges
Analytic and measured difference
Number of packet exchanges required to set up flow (normal vs. protection)
Sungsu Kim, POSTECH
PhD Thesis Defense
33/37
Concluding Remarks
Sungsu Kim, POSTECH
PhD Thesis Defense
34/37
Summary
Autonomic network management technologies are
required to solve complex problems
Autonomic network management architecture based on
the cognition model is proposed
FFS is proposed for fast recovery of multiple failures
The algorithm and architecture are validated by
conducting experiments in an SDN network
Sungsu Kim, POSTECH
PhD Thesis Defense
35/37
Contributions
The problems of network management approaches are described
By applying a human cognition model to FOCALE control loop, we
propose CogMan which is able to handle complex problems
A novel failure recovery mechanism, which can be used instead of
restoration, is described for fast failure recovery in SDN networks
A complete monitoring, analysis, and recovery cycle of managing
fault in SDN networks is described. This thesis shows that the
proposed methods recover various failure cases by conducting
experiments in our testbed
Sungsu Kim, POSTECH
PhD Thesis Defense
36/37
Future Work
Validation of the proposed methods in a large-scale
testbed
Combination of protection and FFS for recovery from
multiple failures in 50ms
Applying CogMan to other management cases
• E.g., Quality of Service (QoS) management of video streaming
services
Feasibility test for replacing the current flow setup
algorithm
Sungsu Kim, POSTECH
PhD Thesis Defense
37/37
Q&A
바쁘신 와중에도 시간 내주셔서 감사합니다
Sungsu Kim, POSTECH
PhD Thesis Defense
38/37
Publications (1/2)
International Journal/Magazine Papers (2)
•
•
Sungsu Kim, Joon-Myung Kang, Sin-seok Seo, and James Won-Ki Hong, “ A Cognitive Model based Approach for
Autonomic Fault Management in OpenFlow Networks,” International Journal of Network Management (IJNM),
(submitted) (SCIE).
Taesang Choi, Tae-Ho Lee, Nodir Kodirov, Jaegi Lee, Doyeon Kim, Joon-Myung Kang, Sungsu Kim, John
Strassner, and James Won-Ki Hong, “HiMang: Highly Manageable Network and Service Architecture for New
Generation”, Journal of Communications and Networks, vol. 13, no. 6, pp. 547-551, Dec. 30, 2011. (SCI)
International Conference/Workshop Papers (9)
•
•
•
•
•
•
Sungsu Kim, Sin-seok Seo, Joon-Myung Kang, Guy Pujolle, and James Won-Ki Hong, “Autonomic Resource
Allocation for Video Streaming Services in Content Delivery Networks,” Global Information Infrastructure and
Networking Symposium (GIIS 2012), Chroni, Venezuela, Dec. 2012.
Sungsu Kim, Sin-seok Seo, Joon-Myung Kang, and James Won-Ki Hong, “ Autonomic Fault Management based
on Cognitive Control Loops,” 2012 IEEE/IFIP International Workshop on Management of the Future Internet
(ManFI 2012), Maui, Hawaii, USA, April 20, 2012, pp. 1104-1110.
Sungsu Kim, John Strassner, and James Won-Ki Hong, “Semantic Overlay Network for Peer-to-Peer Hybrid
Information Search and Retrieval,” 12th IFIP/IEEE International Symposium on Integrated Network Management
(IM 2011), Dublin, Ireland, May 23-27, 2011, pp. 430-437.
Arum Kwon, Joon-Myung Kang, Sin-seok Seo, Sung-Su Kim, Jae Yoon Chung, John Strassner, and James WonKi Hong, “The Design of a Quality of Experience Model for Providing High Quality Multimedia Services,” Lecture
Notes in Computer Science, Vol. 6473, Modelling Autonomic Communication Environments, 5th International
Workshop on Modelling Autonomic Communication Environments (MACE 2010), Niagara Falls, Canada, Oct. 28,
2010, pp. 24-36.
Sin-seok Seo, Sung-Su Kim, Nazim Agoulmine, and James Won-Ki Hong, “On Achieving Self-Organization in
Mobile WiMAX Network,” the 5th IEEE/IFIP International Workshop on Broadband Convergence Networks (BcN
2010), Osaka, Japan, Apr. 19, 2010, pp. 43-50.
Sung-Su Kim, Young J. Won, John Strassner, and James Won-Ki Hong, “Manageability of the Internet:
Management with New Functionality,” the 12th IEEE/IFIP Network Operations and Management Symposium
(NOMS 2010), Osaka, Japan, Apr. 19-23, 2010.
Sungsu Kim, POSTECH
PhD Thesis Defense
39/37
Publications (2/2)
•
•
•
John Strassner, SungSu Kim, and James Won-Ki Hong, “Using Semantics to Learn About Routing Data for
Improved Network Management in the Future Internet,” the 1st IEEE/IFIP International Workshop on Knowledge
Management for Future Services and Networks, Osaka, Japan, Apr. 23, 2010.
John Strassner, Sung-Su Kim, James Won-Ki Hong, “Semantic Routing for Improved Network Management in the
Future Internet,” Recent Trends in Wireless and Mobile Networks (WiMo), 2010.
Sung-Su Kim, Young J. Won, Mi-Jung Choi, James W. Hong, and John Strassner, “Towards Management of the
Future Internet,” IFIP/IEEE Workshop on Management of the Future Internet (conjunction with IM 2009), New
York, USA, June 5, 2009, pp. 1-6.
Domestic Journal / Conference Papers (6)
Sungsu Kim, POSTECH
PhD Thesis Defense
40/37
Appendix
Sungsu Kim, POSTECH
PhD Thesis Defense
41/37
Related Work
Knowledge
Representation
Autonomic
Network
Management
Management
Architecture
Control Loop
Sungsu Kim, POSTECH
PhD Thesis Defense
42/37
Knowledge Representation
Sungsu Kim, POSTECH
PhD Thesis Defense
43/37
Knowledge Representation
Data
Knowledge
Information
Information model
• A representation of concepts and
relationships, constraints, rules, and
operations to specify data semantics
Feature
DEN-ng
SID
CIM
Patterns
Many more used
that SID
4
Not used
Policy model
DEN-ng v6.6.4
DEN-ng v3.5
Simple IETF model
ECA model
YES
YES
NO
Metadata model
YES
NO
NO
Sungsu Kim, POSTECH
PhD Thesis Defense
44/37
Policy Continuum
John gets a gold service
Unique ID
Subscribe
SLA
Gold
Silver Bronze
Business View
Network/System
View
Device View
SRC/DST
IP Address
Device
configuration
Sungsu Kim, POSTECH
DiffServ,
bandwidth
configuration
PhD Thesis Defense
45/37
Model based Translation Layer
DEN-ng
Vendor-neutral
commands/data
Intermediate
CLI
Cisco
Juniper
Nortel
Event ev= new Event();
ev. Type
ev. Problem
MBTL
SNMP
Event {
Source=IP address;
Problem=egp_neighbor_loss}
Managed Resources
Trap name:
egpNeighborLoss
Raw data
Sungsu Kim, POSTECH
PhD Thesis Defense
46/37
Control Loop (2/3)
OODA loop [Boyd, ‘95]
Observe
Unfolding
Circumstances
Outside
Information
Orient
Implicit Guidance
and Control
Observations
Decide
Act
Implicit Guidance
and Control
Cultural
Traditions
Genetic
Heritage
Analyses &
Synthesis
New
Information
Previous
Experience
Decision
Act on Hypothesis
Action
Act on Decision
Act on Unfolding Interaction with the Environment
Sungsu Kim, POSTECH
PhD Thesis Defense
47/37
Hierarchical Management
Architecture
Sungsu Kim, POSTECH
PhD Thesis Defense
48/37
Management Architecture
Client-server based architecture
• Centralized management
• Poor scalability
P2P based management architecture [14]
• Highly distributed and scalable
− Load balancing of management tasks
• Overhead for exchanging information between
management nodes
Hierarchical management architecture [13]
• Distributed and scalable
• May not appropriate for dynamic environment, such
as virtual networks or cloud computing
− Require algorithms for structuring management nodes
Sungsu Kim, POSTECH
PhD Thesis Defense
49/37
Related Projects
Comparison with related projects
Knowledge
Plane [20]
4WARD [21]
AutoI [22]
FAME [24]
CogMan
Autonomic
principle
No
Yes
Yes
Yes
Yes
Self-organizing
Yes
Yes
Yes
Yes
Yes
Knowledge
representation
Not defined
Data-oriented
mapping
Model-based
translation
Model-based
translation
Model-based
translation
Accommodates
heterogeneous
data
No
Limited
Yes
Yes
Yes
Structure of
components
No
Yes
No
No
Yes
Sungsu Kim, POSTECH
PhD Thesis Defense
50/37
Solution Approach (3/4)
Hierarchical management architecture
Network domain
ANM
. . .
ANM
AEM
.. .
Managed
Resource
Sungsu Kim, POSTECH
Network
ANM
AEM
AEM
Managed
Resource
Managed
Resource
PhD Thesis Defense
.. .
AEM
Network device
Managed
Resource
51/37
Detailed Algorithms and
Implementation
Sungsu Kim, POSTECH
PhD Thesis Defense
52/37
Alternative Path Setting (1/2)
Calculate alternative paths in advance
• E.g., K-shortest algorithm
Push alternative path into OpenFlow header
• Extract different part of an old path and an
alternative path
in: 1
out: 2
in: 2
out: 1
1
Original path
in: 1
Alternative path
Sungsu Kim, POSTECH
2
out: 3
in: 2
in: 2
in: 1
5
3
out: 1
in: 2
4
PhD Thesis Defense
out: 2
out: 1 in: 1
in: 3
4
out: 2
1
out: 1
3
out: 1
1
in: 1
Path difference
in: 2
5
out: 2
out: 1
3
53/37
Alternative Path Setting (2/2)
Put path difference to OpenFlow header
• ofp_action_output, pad field
• out port numbers of switches are put into pad
field
struct ofp_action_output {
uint16_t type;
uint16_t len;
uint32_t port;
uint16_t max_len;
uint8_t pad[6];
};
in: 1
Path difference
out: 2
in: 2
1
in: 2
4
out: 1
3
Switch 3 Switch 4 Switch 1
flag
pad
out: 1
1
pad [0]
Sungsu Kim, POSTECH
NULL
pad [1]
NULL
1
1
2
pad [2]
pad [3]
pad [4]
pad [5]
PhD Thesis Defense
54/37
Failure Recovery Procedure (1/2)
If a switch cannot send a packet as an action
specified
1: Checks flow table action again
2: If a port state is down or deleted
3:
examine ofp_action_output pad field
4:
If flag ==1,
5:
change the out port of flow table action as pad [5]
6:
Set the alternative path to IP option field in the packet
Switch 3 Switch 4 Switch 1
flag
1
Output action path
pad [0]
NULL
pad [1]
NULL
1
1
pad [2]
pad [3]
pad [4]
flag
IP options
pad [5]
Switch 3 Switch 4
1
[0]
Sungsu Kim, POSTECH
2
NULL
[1]
NULL
NULL
1
1
[2]
[3]
[4]
[5]
PhD Thesis Defense
55/37
Failure Recovery Procedure (2/2)
When switch receives a new flow that flow
table does not know
1: If flag in IP option == 1
2:
Make new output action as specified in IP option
3:
Add flow and action to the flow table
4:
Delete option [5] and shift right 8 bits
5:
If option [5]==NULL
6:
Remove IP option field
flag
Switch 3 Switch 4
1
IP options
[0]
NULL
[1]
NULL
NULL
1
1
[2]
[3]
[4]
[5]
flag
New IP options
Switch 3
1
[0]
Sungsu Kim, POSTECH
NULL
[1]
NULL
NULL
NULL
[2]
[3]
[4]
PhD Thesis Defense
1
[5]
56/37
Evaluation (2/4)
Recovery time of FFS (# of running flow=100)
3
interarrival time (ms)
2.5
2
1.5
1
20ms
0.5
0
Sungsu Kim, POSTECH
0
10
20
30
packet count
40
PhD Thesis Defense
50
60
57/37
Evaluation (4/4)
Recovery time of FFS (# of running
flow=200)
3
interarrival time (ms)
2.5
2
1.5
1
0.5
20ms
0
Sungsu Kim, POSTECH
0
10
20
30
packet count
40
PhD Thesis Defense
50
60
58/37
Evaluation
Sungsu Kim, POSTECH
PhD Thesis Defense
59/37
Use Case: Fault Management
Perception
Decision making
Actuation
Reactive
Act
Compare
Normalize
Deliberative
Plan &
Decide
Reactive case: backup paths are
prepared for the failure
TransformReflective
vendor specific data
Port down messages,
Deliberative
case: redirect to the
to vendor-neutral data
Observe
End-to-end connectivity
optimal path (temporal backup path) Reasoning
failed message
Reflective case: no backup path
is
Correlate
network alarms
prepared
Causality graph
QoS
alarm
SwitchDecide
to
switch
ping failed
Switch
port down
Managed resource(s)
Sungsu Kim, POSTECH
PhD Thesis Defense
60/37