Autonomous Configuration and Repair of Networks

Download Report

Transcript Autonomous Configuration and Repair of Networks

Autonomous Configuration, Repair, and
Monitoring of Networks Using a
Collaboration Overlay and Reinforcement
Learning:
OR
Home Networks for the Masses
Eitan Fenson, Rich Howard, Michael Littman, Liza Ogden,
Alex Santoro, Eugene Seroka, Phil Straw, Barbara
Weinstein
www.pnphome.com
© 2003 PnP Networks, all rights reserved
N E T W O R K S
QUESTION ?
• What is the Most
Complex Thing
in the Home?
© 2003 PnP Networks, all rights reserved
QUESTION ?
• What is the Most
Complex Thing
at the Office?
© 2003 PnP Networks, all rights reserved
QUESTION ?
• Put Them Together In A Typical Home or Small
Business And What Do You Get?
© 2003 PnP Networks, all rights reserved
Home Networking Today

PC Based

General Purpose

Lower Protocol Layers Exposed

Designed For The Office Environment

Difficult Instructions/Customer Support

Multi vendor, Multi OS, Multi Application, etc.

No single point of responsibility for performance
© 2003 PnP Networks, all rights reserved
Goal: Stage 3 Technologies
• Stage 1 Technology—”Geeks” only
– Examples:
• Cars in 1910
• Networks in 1980
• Stage 2 Technology—popular, but extensive
user knowledge needed.
– Examples:
• Cars in 1950
• Networks now
• Stage 3 Technology—invisible, focus on results
– Examples:
• “Soccermobile” today
• Data networks that function like the phone network
© 2003 PnP Networks, all rights reserved
Netgear
Easy Install
Access Point
Netgear
Easy Install
Access Point
Just click on
each of the
16
parameters
to modify-then click
“OK” and
you are
done!
Evolution of Networking
• Most users want their tools to work, not be a challenge.
• Computers are getting to this point, but networks are
far away.
• Example: Home Networks.
– Many technical solutions, few “non-technical” ones.
– Usually heterogeneous equipment, OS, software.
• Out-of-date drivers, missing patches, conflicting
hardware/software.
• No single point of responsibility for problems.
– Wireless (e.g. IEEE 802.11) should be the solution.
•
•
•
•
Quirky HW/SW--hard to install.
RF networking far more complex than wireline networking
Difficult to maintain.
RF range and coverage issues
– Wizard mentality.
• Nobody responsible for complete functioning.
• Users must become experts (or hire them).
• No easily accessible universal knowledge base of solutions.
© 2003 PnP Networks, all rights reserved
Autonomous Network Management
• We have MIPS to burn, use them to truly
simplify the tasks.
• Network design, installation, maintenance,
repair, upgrade should be automated.
– Learning algorithms for analyzing and classifying
network states.
– Collect successes (and failures) in a knowledge base.
– Partner and consult with human experts (leveraging
wizards).
– Automated “single-point-of-responsibility” solution.
• Difficult research challenge (Math, CS, human
factors, economics).
– Support from DARPA and strong commercial interest.
– First field trials underway.
© 2003 PnP Networks, all rights reserved
Elements of a Solution
• Communication between nodes (“Collaboration Bus”)
– Should need physical link only
– Low overhead
• Sensors on CB
– State information for each node
• E.g. OS, NIC firmware rev., registry info., etc.
– Traffic information
• Essential for diagnosis and intrusion detection
• Compression through pattern analysis and feature detectors
• Cognitive system (“Cost Sensitive Fault Remediation”)
–
–
–
–
–
Based on Reinforcement Learning
Seeded with best information available
Learn and update through experience
Ability to pool experience between systems
Ability to call for and accept help from remote expert
• Effectors on CB for troubleshooting and repair
• Security at all levels
© 2003 PnP Networks, all rights reserved
Collaboration Bus
Remote Node
IP Multicast
WAN Tunnel
Wireless Link
Wired Link
Hello
My parameters are:
I see this traffic:
Here’s what I know about other nodes:
“Sneakernet”
Please do this for me:
© 2003 PnP Networks, all rights reserved
Link
Collaboration Bus Features
• Needs only physical link.
– Information shared even before network is working
– Sensing and repair commands transported even in fault
states
•
•
•
•
Works with heterogeneous nodes, OS, etc.
XML message format allows for hierarchical organization
Remote nodes linked through encrypted tunnel
Can build up collaborative picture of network
– Hidden nodes can be observed indirectly
• Patterns of traffic and states are available
– Fault diagnosis
– Intrusion and virus detection
• Low overhead by using learned feature maps
© 2003 PnP Networks, all rights reserved
Cognitive System
• Focus: improve performance on fault repair task.
• Minimize cost (e.g. elapsed time) to repair faults including:
– Cost of each diagnostic measurement
– Cost of each attempted repair action
– Weighted by learned statistics of fault states
• Continuous measurements
– Incomplete, inaccurate, noisy, and conflicting
• Learn from successes and failures.
• Make best use of expert input when needed.
• Challenging problem in machine learning
– Developing new varieties of Reinforcement Learning
© 2003 PnP Networks, all rights reserved
Definitions
Cost-sensitive fault remediation:
• Set of underlying fault states
• Actions observe state attributes with costs
• Actions remediate fault states with costs,
repair some states, fail on others
© 2003 PnP Networks, all rights reserved
Modeling Assumptions
Hard problem; simplifying assumptions:
• Fault remediation episodes independent.
• High level actions provide information or
repair.
• Otherwise, no fault mode changes.
• Want minimum cost repair.
© 2003 PnP Networks, all rights reserved
Cost-sensitive Fault Remediation
• Cost-sensitive diagnosis
– no state transitions, actions informational only
– episode ends when a diagnosis made
• Cost-sensitive fault remediation
– still no state change, repair is all or nothing
– episode continues until objective achieved
© 2003 PnP Networks, all rights reserved
Example Observables
# Action
#
Action
1
Is the network medium physically
connected?
9
Does my netmask setting look valid?
2
Is the active interface wireless?
10
Does my DNS setting look valid?
3
Is
11
Does my gateway setting look valid?
4
Can I ping my IP?
12
Can I Reach PnP?
5
Can I ping locahost?
13
Can PnP reach DNS?
6
Can I ping my Gateway?
14
Is the DHCP DNS setting valid?
7
Can I do a DNS lookup?
15
Is the DHCP gateway setting valid?
8
Does my IP setting look valid?
16
Can PnP reach Yahoo.com?
the active
Enabled?
interface
DHCP-
© 2003 PnP Networks, all rights reserved
Example Remedial Actions, Faults
#
Remedial action
Cost
Success
(Failure)
Faults Repaired
1
Renew DHCP lease
500
- Local IP is wrong (DHCP)
- Local DNS Settings are wrong (DHCP)
- Route to the internet is bad (no good
route known, DHCP)
2
Plug your network cable back in
(or
restore
your
wireless
connection)
250
3
Check your router's physical
connection to your ISP
10,000
(10,000)
- Router is disconnected from ISP
4
Check your router's physical
connection to your LAN
10,000
(10,000)
- Router can't be reached by local
machine
5
Contact ISP and report that
their DNS Server appears to
be down
100,000
(10,000,000)
- ISP DNS Server is down
(500,000)
(100,000)
-Wired
Network
is
physically
disconnected
- Wireless Network is unreachable (SSID
or WEP setting may be incorrect, or
radio is out of range)
© 2003 PnP Networks, all rights reserved
Practical Considerations
• CSFR big improvement over classification.
• Fault modes analogous to classes, but
automatically identified; autonomous.
• But, how know if repair succeeded?
– “Primary complaint”
– Monitors
© 2003 PnP Networks, all rights reserved
DEMO MOVIE
© 2003 PnP Networks, all rights reserved
Algorithmic Limitations
Optimal if few faults, (nearly) deterministic
observables, easily explored.
Moving ahead:
• need smart exploration
• better scaling with many (related) faults
• handle noisy, continuous observables
• deal with some side effects
© 2003 PnP Networks, all rights reserved
Reinforcement Learning: Future
How do we scale RL to large problems?
• Discretization: Map from observation to state.
+ uses well-understood technology
 hard to determine appropriate resolution
• Parameterized function approximation
+ no need to pre-specify mapping to state
 many negative results (theoretical and practical)
• Memory-based methods
+ good match for continuous observations
 not as well studied
© 2003 PnP Networks, all rights reserved
Memory Based Methods
Idea: Save all experience. Value function computed by
finding similar instances in DB.
• Tabular
(Moore & Atkeson 92)
• Continuous
(Santamaria, et al. 98; Moore et al. 95)
• Partially observable
(McCallum 95)
• Combine! (vision, speech, continuous measurements)
Advantages:
• efficient in experience (the expensive commodity)
• no catastrophic forgetting (rare events!)
• memory is cheap
© 2003 PnP Networks, all rights reserved
Other Future Directions
• Use the Collaboration Bus more fully in
decision-making and acting
– Take advantage of multiple perspectives, cooperation
– Have the CB function as active agent in synthesizing
global perspective for otherwise limited or isolated
nodes
• Develop more context aware decision making
– System should know what it’s doing and why
– Applications include robot teams and collaboration in
search/rescue and battlefield situations
© 2003 PnP Networks, all rights reserved
WiFi Hotspot Management
• Fast growing application segment
– McDonald’s, Starbuck’s, etc.
• Excellent first application
– Exercise Collaboration Bus
– Collect data for cognitive system
– Commercial value before learning available
• Providers
– Configuration, security, liability
• Users
– Configuration assistance
– Instant file share
– Privacy protection
© 2003 PnP Networks, all rights reserved
Collecting and Delivering Information
Wi-Fi Hotspot
PnP Central
Wi-Fi Users
Database
Wi-Fi Access
Point
Internet
POWERFAULT DATA ALARM
PnP Monitor
Gateway
Usage Trends
Security Alerts
Other Company Hotspots
POWERFAULT DATA ALARM
POWERFAULT DATA ALARM
© 2003 PnP Networks, all rights reserved
Browser-based Reports
•Usage
•Trends
•Security
Analysis of Network Information
• Network traffic
• User demographics
• Type of user activity (web, email)
• Bandwidth use
• Average session lengths
• Usage by location
• Recurring usage (specific and multiple locations)
• Security
© 2003 PnP Networks, all rights reserved
Security
• Inward security for business and customers
• Outward security for liability protection
• Ability to protect users within the network
• Daily security scans
• Intra-packet analysis allows real-time logging
and third party escrowing of traffic payloads:
–
–
–
–
–
Instant messages
Email
ftp
telnet
other
© 2003 PnP Networks, all rights reserved