Learning and Inference in the Knowledge Plane

Download Report

Transcript Learning and Inference in the Knowledge Plane

Learning and Inference in the
Knowledge Plane
Thomas G. Dietterich
School of EECS
Oregon State University
Corvallis, Oregon 97331
http://www.eecs.oregonstate.edu/~tgd
Claim: KP applications will be
driven by learned models
Example:
configuration/traffic model
captures the tripartite
relationship between
network configurations,
traffic properties, and
network performance
measures.
Traffic
Network
Model
Performance
Measures
Configurations
Configuration X + Traffic Y ) Performance level Z
Network Model Drives
Configuration
traffic mix
performance objectives
Network Model
Traffic
Configuration
Engine
proposed network
configuration
Performance
Measures
Configurations
Roles for Learning
 Learn network model

measure…
configuration information
 traffic properties (protocol mix, session lengths,
error rates …)
 performance measures (throughput, E2E delays,
errors, application-level measures)


fit model
Roles for Learning (2)
 Improving the configuration engine

Learn repair rules



Observe operator-initiated repairs
Learn heuristics for rapidly finding good
solutions
Cache previous good solutions
Models for WHY
Sensor Model
Sensor
Network Model
Traffic
Variables Measured,
Error Bounds,
Costs
Performance
Measures
Configurations
Network and Sensor Models Drive
Diagnosis and Repair
DE
DEchooses
executes
measurement
DE outputs
measurement
observed anomaly
or
User
makes
ororintervention
user’s complaint
diagnosis
interventioncomplaint
DE receives
results
Network Model
Sensor Model
Diagnosis Engine
Sensors
diagnosis and
recommended repair
Interventions
Example: Bayesian Network
Drives Diagnosis
Semantics
 Every node stores a
conditional
probability
distribution:
Bat State
Charge
P(Power|BS,C)
Ok
Ok
0.99
Ok
Bad
0.10
Worn
Ok
0.45
Worn
Bad
0.01
Diagnostic Process
 Interventions:



Observation: Observe Radio
Repair attempts: Fill gas tank
Observe & Repair: Inspect fuel pump, replace if bad
 Algorithm:



Compute P(component is bad | evidence)
Repair component that maximizes P(bad)/cost
Choose observation that maximizes value of
information
Example
Choose SparkPlugs as next component to
repair. Repair it. Update probabilities,
and repeat.
Role for Learning
 Learn sensor model


Basic sensor model is manually engineered
Learn error bounds and costs (e.g., time
delay, traffic impact)
Anomaly Detection Models
Monitor
network for
unusual traffic,
configurations,
and routes
Traffic
Measure of
“Normalness”
Configurations
Measure of
“Normalness”
Routes
Measure of
“Normalness”
Anomalies are phenomena to be understood,
not alarms to be raised.
Role for Learning
 Learn these models by observing traffic,
configurations, routes
Model Properties
 Spatially Distributed

Replicated/Cached
 Hierarchical

Multiple levels of abstraction
 Constantly Maintained
Available Technology:
Configuration Engine
 Existing formulations: constraintsatisfaction problems (CSP) with
objective function
 Systematic and repair-based methods
 Some ideas for how to incorporate
learning
Available Technology (2):
Diagnostic Engine
 Special cases where optimal diagnosis is
tractable


Single fault; all actions are repair attempts
Single fault; all actions are pure observations
 Widely-used heuristic

One-step value of information (greedy approx)
 Fully-general approach


Partially-observable Markov Decision Process
Some approximation algorithms are available
Available Technology (3):
Anomaly Detection
 Unsupervised Learning Methods



Clustering
Probability Density Estimation
One-class Classification Formulation
Research Gaps (1):
Spatially-distributed multi-level
models of traffic
 What are the right variables to model?




Packet-level statistics (RTT, throughput, jitter, …)
Connection-level statistics
Routing statistics
Application statistics
 What levels reveal anomalies?
 What levels can best be related to
performance goals and configuration settings?
Research Gaps (2): Learning
models of configurations
 Network components

Switches, routers, firewalls, web servers, file
servers, wireless access points, …
 LANs and WANs
 Autonomous Systems
 What are the right levels of abstraction?
 Can these things be observed?
Research Gaps (3):
Relational learning
 Network models are relational



(traffic, configuration, performance)
network structure is a graph
routes are paths with attached properties
 Relational modeling is a relatively young
area of ML
Research Gaps (4): Distributed
learning and reasoning
 Distributed model construction




Bottom-up summary statistics (easy)
Mixed (bottom-up/top-down) information flow
(unexplored)
Essential for higher-level modeling
Opportunities for data sharing at lower levels
 Distributed configuration
 Distributed diagnosis

Opportunities for inference sharing
Research Gap (5):
Openness
 Standard AI Models assume a fixed set
of classes/categories/faults/components
 How do we reason about the possible
existence of new classes, new
components, new protocols (including
intrusions/worms)?
 How do we evaluate such systems?
Application/Model Interface
 Subscription?


Applications subscribe to regular model
updates/summaries
Applications specify the models they want
the KP to build/maintain
 Query?

Applications make queries to models?
Application/KP Interface:
Two possibilities
 KP provides inference services in addition to
model services

WHY? client sends end-user complaint to TP where
inference engine operates
 Inference is performed on end-user machine?

WHY? client does inference, just sends queries to
TPs

Some inference about end-user’s machine needs to
happen locally. Maybe view as local TP?
Concluding Remarks
 KP Applications will be driven by learned
models



traffic models
sensor models
models of “normalness”
 Models are acquired by mix of human
authoring and machine learning
 Main research challenges arise from multiple
levels of abstraction and world-wide
distribution
KP support for learning models
 Example: HP Labs email loop detection
system


Bronstein, Das, Duro, Friedrich, Kleyner, Mueller,
Singhal, Cohen (2001)
Convert mail log data into four “detectors” and
combine using a Bayesian network
KP support for learning models(2)
 Variables to be measured:


Raw sensors: mail log (when received, from, to, size, time of
delivery attempt, status of attempt)
Derived variables (10-minute windows)





IM: # incoming msgs
IMR: # incoming msgs / # outgoing msgs
PEAK: magnitude of peak bin in message size histogram
SHARPNESS: ratio of magnitude of peak bin to average size of
four neighboring non-empty bins (excluding 2 nearest-nbr bins on
either side of peak bin)
Summary statistics

mean and standard deviation of the derived variables trimmed to
remove outliers beyond ±3.5 σ
KP Services
 Provide a language so that I can define



raw features
derived features
summary statistics
 Provide subscription service so that I can collect the
summary statistics

automatic or semi-automatic support for aggregating summary
statistics from multiple sources (e.g., multiple border SMTP
servers)
 Provide subscription service so that I can sample the
derived features (to build a supervised training data
set)
Statistical Aggregation
Middleware
 Routines for aggregating statistics

Example: given



I can compute


S1 = i x1,i and N1
S2 = j x2,j and N2, from two independent sources
S3 = S1 + S2 and N3 = N1 + N2
From these, I can compute the mean value:

 = S3/N3
 To compute variance, I need SS1 = i x21,i
Model-Fitting Middleware
 Given summary statistics, compute
probabilities in a Bayesian network
Mail Loop
IM
IMR
PEAK
SHARPNESS
Sufficient Statistics
 Andrew Moore (CMU): For nearly every
learning algorithm, there is a set of
statistics sufficient to permit that
algorithm to fit its models
 There is usually a scalable way of
collecting and aggregating these
sufficient statistics
One KP Goal
 Design the services for defining sensors,
defining derived variables, defining
sufficient statistics, and defining
aggregation methods
 Scalable, secure, etc.
Hierarchical Modeling
 Abstract (or aggregate) model could treat
conclusions/assertions of other models as
input variables
 “One model’s inference is another model’s raw
data”


probably requires associated meta-data:
provenance, age of original data
major issue: assessing the independence of
multiple data sources (don’t want to double-count
evidence): requires knowledge of KP/network
topology
Fusing Multiple Subscriptions
 Multiple KPs and/or multiple KPapps
may register for the same sufficient
statistics
 Fuse their subscriptions to save
computation
 Keep meta data on non-independence