Learning and Inference in the Knowledge Plane
Download
Report
Transcript Learning and Inference in the Knowledge Plane
Learning and Inference in the
Knowledge Plane
Thomas G. Dietterich
School of EECS
Oregon State University
Corvallis, Oregon 97331
http://www.eecs.oregonstate.edu/~tgd
Claim: KP applications will be
driven by learned models
Example:
configuration/traffic model
captures the tripartite
relationship between
network configurations,
traffic properties, and
network performance
measures.
Traffic
Network
Model
Performance
Measures
Configurations
Configuration X + Traffic Y ) Performance level Z
Network Model Drives
Configuration
traffic mix
performance objectives
Network Model
Traffic
Configuration
Engine
proposed network
configuration
Performance
Measures
Configurations
Roles for Learning
Learn network model
measure…
configuration information
traffic properties (protocol mix, session lengths,
error rates …)
performance measures (throughput, E2E delays,
errors, application-level measures)
fit model
Roles for Learning (2)
Improving the configuration engine
Learn repair rules
Observe operator-initiated repairs
Learn heuristics for rapidly finding good
solutions
Cache previous good solutions
Models for WHY
Sensor Model
Sensor
Network Model
Traffic
Variables Measured,
Error Bounds,
Costs
Performance
Measures
Configurations
Network and Sensor Models Drive
Diagnosis and Repair
DE
DEchooses
executes
measurement
DE outputs
measurement
observed anomaly
or
User
makes
ororintervention
user’s complaint
diagnosis
interventioncomplaint
DE receives
results
Network Model
Sensor Model
Diagnosis Engine
Sensors
diagnosis and
recommended repair
Interventions
Example: Bayesian Network
Drives Diagnosis
Semantics
Every node stores a
conditional
probability
distribution:
Bat State
Charge
P(Power|BS,C)
Ok
Ok
0.99
Ok
Bad
0.10
Worn
Ok
0.45
Worn
Bad
0.01
Diagnostic Process
Interventions:
Observation: Observe Radio
Repair attempts: Fill gas tank
Observe & Repair: Inspect fuel pump, replace if bad
Algorithm:
Compute P(component is bad | evidence)
Repair component that maximizes P(bad)/cost
Choose observation that maximizes value of
information
Example
Choose SparkPlugs as next component to
repair. Repair it. Update probabilities,
and repeat.
Role for Learning
Learn sensor model
Basic sensor model is manually engineered
Learn error bounds and costs (e.g., time
delay, traffic impact)
Anomaly Detection Models
Monitor
network for
unusual traffic,
configurations,
and routes
Traffic
Measure of
“Normalness”
Configurations
Measure of
“Normalness”
Routes
Measure of
“Normalness”
Anomalies are phenomena to be understood,
not alarms to be raised.
Role for Learning
Learn these models by observing traffic,
configurations, routes
Model Properties
Spatially Distributed
Replicated/Cached
Hierarchical
Multiple levels of abstraction
Constantly Maintained
Available Technology:
Configuration Engine
Existing formulations: constraintsatisfaction problems (CSP) with
objective function
Systematic and repair-based methods
Some ideas for how to incorporate
learning
Available Technology (2):
Diagnostic Engine
Special cases where optimal diagnosis is
tractable
Single fault; all actions are repair attempts
Single fault; all actions are pure observations
Widely-used heuristic
One-step value of information (greedy approx)
Fully-general approach
Partially-observable Markov Decision Process
Some approximation algorithms are available
Available Technology (3):
Anomaly Detection
Unsupervised Learning Methods
Clustering
Probability Density Estimation
One-class Classification Formulation
Research Gaps (1):
Spatially-distributed multi-level
models of traffic
What are the right variables to model?
Packet-level statistics (RTT, throughput, jitter, …)
Connection-level statistics
Routing statistics
Application statistics
What levels reveal anomalies?
What levels can best be related to
performance goals and configuration settings?
Research Gaps (2): Learning
models of configurations
Network components
Switches, routers, firewalls, web servers, file
servers, wireless access points, …
LANs and WANs
Autonomous Systems
What are the right levels of abstraction?
Can these things be observed?
Research Gaps (3):
Relational learning
Network models are relational
(traffic, configuration, performance)
network structure is a graph
routes are paths with attached properties
Relational modeling is a relatively young
area of ML
Research Gaps (4): Distributed
learning and reasoning
Distributed model construction
Bottom-up summary statistics (easy)
Mixed (bottom-up/top-down) information flow
(unexplored)
Essential for higher-level modeling
Opportunities for data sharing at lower levels
Distributed configuration
Distributed diagnosis
Opportunities for inference sharing
Research Gap (5):
Openness
Standard AI Models assume a fixed set
of classes/categories/faults/components
How do we reason about the possible
existence of new classes, new
components, new protocols (including
intrusions/worms)?
How do we evaluate such systems?
Application/Model Interface
Subscription?
Applications subscribe to regular model
updates/summaries
Applications specify the models they want
the KP to build/maintain
Query?
Applications make queries to models?
Application/KP Interface:
Two possibilities
KP provides inference services in addition to
model services
WHY? client sends end-user complaint to TP where
inference engine operates
Inference is performed on end-user machine?
WHY? client does inference, just sends queries to
TPs
Some inference about end-user’s machine needs to
happen locally. Maybe view as local TP?
Concluding Remarks
KP Applications will be driven by learned
models
traffic models
sensor models
models of “normalness”
Models are acquired by mix of human
authoring and machine learning
Main research challenges arise from multiple
levels of abstraction and world-wide
distribution
KP support for learning models
Example: HP Labs email loop detection
system
Bronstein, Das, Duro, Friedrich, Kleyner, Mueller,
Singhal, Cohen (2001)
Convert mail log data into four “detectors” and
combine using a Bayesian network
KP support for learning models(2)
Variables to be measured:
Raw sensors: mail log (when received, from, to, size, time of
delivery attempt, status of attempt)
Derived variables (10-minute windows)
IM: # incoming msgs
IMR: # incoming msgs / # outgoing msgs
PEAK: magnitude of peak bin in message size histogram
SHARPNESS: ratio of magnitude of peak bin to average size of
four neighboring non-empty bins (excluding 2 nearest-nbr bins on
either side of peak bin)
Summary statistics
mean and standard deviation of the derived variables trimmed to
remove outliers beyond ±3.5 σ
KP Services
Provide a language so that I can define
raw features
derived features
summary statistics
Provide subscription service so that I can collect the
summary statistics
automatic or semi-automatic support for aggregating summary
statistics from multiple sources (e.g., multiple border SMTP
servers)
Provide subscription service so that I can sample the
derived features (to build a supervised training data
set)
Statistical Aggregation
Middleware
Routines for aggregating statistics
Example: given
I can compute
S1 = i x1,i and N1
S2 = j x2,j and N2, from two independent sources
S3 = S1 + S2 and N3 = N1 + N2
From these, I can compute the mean value:
= S3/N3
To compute variance, I need SS1 = i x21,i
Model-Fitting Middleware
Given summary statistics, compute
probabilities in a Bayesian network
Mail Loop
IM
IMR
PEAK
SHARPNESS
Sufficient Statistics
Andrew Moore (CMU): For nearly every
learning algorithm, there is a set of
statistics sufficient to permit that
algorithm to fit its models
There is usually a scalable way of
collecting and aggregating these
sufficient statistics
One KP Goal
Design the services for defining sensors,
defining derived variables, defining
sufficient statistics, and defining
aggregation methods
Scalable, secure, etc.
Hierarchical Modeling
Abstract (or aggregate) model could treat
conclusions/assertions of other models as
input variables
“One model’s inference is another model’s raw
data”
probably requires associated meta-data:
provenance, age of original data
major issue: assessing the independence of
multiple data sources (don’t want to double-count
evidence): requires knowledge of KP/network
topology
Fusing Multiple Subscriptions
Multiple KPs and/or multiple KPapps
may register for the same sufficient
statistics
Fuse their subscriptions to save
computation
Keep meta data on non-independence