KMC-20050525 - Kansas State University

Download Report

Transcript KMC-20050525 - Kansas State University

First Annual LEAP-KMC Workshop
Data Mining and Machine Learning
for Predicting Energetics
Wednesday, 25 May 2005
William H. Hsu
Laboratory for Knowledge Discovery in Databases
Department of Computing and Information Sciences
Kansas State University
http://www.kddresearch.org
This presentation is:
http://www.kddresearch.org/KSU/CIS/KMC-20050525.ppt
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Outline of Talk
• Research Overview
– Data Mining in LEAP-KMC: problem specification
– Design Choices for Machine Learning
• General Approaches
– Time Series
– Unsupervised Learning
– Supervised Learning: Spatial Pattern Recognition (see Methodology)
• Machine Learning Methodologies
–
–
–
–
Traditional Inducers (Waikato Environment for Knowledge Analysis)
Graphical Models: Bayesian Networks, DBNs
Genetic and Evolutionary Computation (GEC)
Artificial Neural Networks (ANNs)
• Early Results
• Data Model Requirements
• Current and Continuing Work
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Data Mining in LEAP-KMC
Rahman, Kara,
et al. (2004)
Central Atom
1st shell
2nd shell
3rd shell
Source Data
Data Mining Process
Target Model: Equations,
Parametric Form
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Design Choices for
Machine Learning
Determine Type of
Training Experience
“Keyframed” States
& Intermediate
Trajectories
Typed
Transitions
Occupancy Vector
State Representation
Determine
Target Function
Transition
Prediction Model
External
Function Estimator
Determine Representation of
Learned Function
Polynomial
Linear Comb. of
Occupancy Variables
Artificial neural
network
Determine
Learning Algorithm
Linear
Regression
Gradient
Descent
Completed Design
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Linear Time Series Models
•
Linear Models
– Moving Average (MA(q)) model aka finite impulse response (FIR) filter
q
• x t    b j  e t  j 
j 0
• x(t) = observed stochastic process
• e(t) = external signal (e.g., white Gaussian noise)
– Autoregressive (AR(p)) model aka infinite impulse response (IIR) filter
p
• x t    ai  x t  i   e t 
i 0
p
• If e(t) is additive WGN: x t    ai  x t  i 
i 0
– Autoregressive moving average (ARMA(p,q)) model
p
q
i 0
j 0
• x t    ai  x t  i    b j  e t  j 
• Combines AR and MA model parameters (can express either or both)
•
Order of a Linear Model: p, q (e.g., AR(1))
•
Learning: Finding Hyperparameters (p, q), Parameters (ai, bj)
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Temporal Probabilistic Reasoning
•
Goal: Estimate P(X ti | y 1r )
•
Filtering: r = t
Adapted from Murphy (2001), Guo (2002)
– Intuition: infer current state from observations
– Applications: signal identification
– Variation: Viterbi algorithm
•
Prediction: r < t
– Intuition: infer future state
– Applications: prognostics
•
Smoothing: r > t
– Intuition: infer past hidden state
– Applications: signal enhancement
•
Applications: Modeling
– State transitions (prediction)
– Energetics (estimation)
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Unsupervised Learning [1]:
Objectives
• Unsupervised Learning
– Given: data set D
x
Supervised
Learning
fˆx 
f(x)
x
Unsupervised
Learning
y
• Vectors of attribute values (x1, x2, …, xn)
• No distinction between input attributes and output attributes (class label)
– Return: (synthetic) descriptor y of each x
• Clustering: grouping points (x) into inherent regions of mutual similarity
• Vector quantization: discretizing continuous space with best labels
• Dimensionality reduction: projecting many attributes down to a few
• Feature extraction: constructing (few) new attributes from (many) old ones
• Intuitive Idea
– Want to map independent variables (x) to dependent variables (y = f(x))
– Need to discover y based on numerical criterion (e.g., distance metric)
– Don’t always know what “dependent variables” (y) are
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Unsupervised Learning [2]:
Clustering
DimensionalityReducing
Projection (x’)
Clusters of
Similar Records
Delaunay
Triangulation
Voronoi
(Nearest Neighbor)
Diagram (y)
Cluster Formation and Segmentation Algorithm (Sketch)
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Graphical Models [1]:
Bayesian Networks
•
Conditional Independence
– X is conditionally independent (CI) from Y given Z (sometimes written X  Y | Z) iff
P(X | Y, Z) = P(X | Z) for all values of X, Y, and Z
– Example: P(Thunder | Rain, Lightning) = P(Thunder | Lightning)  T  R | L
•
Bayesian (Belief) Network
– Acyclic directed graph model B = (V, E, ) representing CI assertions over 
– Vertices (nodes) V: denote events (each a random variable)
– Edges (arcs, links) E: denote conditional dependencies
•
n
Markov Condition for BBNs (Chain Rule): P X , X ,  , X   P X | parents X 
 i
1
2
n
i
i 1
Each node is conditionally
independent of all others given its
Markov blanket: parents + children +
children’s parents
From slides for Russell & Norvig 2e (2003)
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Graphical Models [2]:
Inference and Learning
•
General-Case BBN Structure Learning: Use Inference to Compute Scores
•
Optimal Strategy: Bayesian Model Averaging
– Assumption: models h  H are mutually exclusive and exhaustive
– Combine predictions of models in proportion to marginal likelihood
• Compute conditional probability of hypothesis h given observed data D
• i.e., compute expectation over unknown h for unseen cases
• Let h  structure, parameters   CPTs




P x m 1 | D  P x 1 , x 2 ,  , x n | x 1 , x 2  ,  , x m 


P x m 1 | D, h  P h | D 

 
 


hH
Posterior Score
Marginal Likelihood
Prior over Parameters
P h | D   P D | h   P h 
 P h   P D | h, Θ   P Θ | h  dΘ

Prior over Structures
Kansas State University KDD Lab (www.kddresearch.org)
Likelihood
Kansas State University
Department of Computing and Information Sciences
Genetic and Evolutionary Computation [1]:
Simple Genetic Algorithm (SGA)
•
Algorithm Simple-Genetic-Algorithm (Fitness, Fitness-Threshold, p, r, m)
// p: population size; r: replacement rate (aka generation gap width), m: string size
– P  p random hypotheses
// initialize population
– FOR each h in P DO f[h]  Fitness(h)
// evaluate Fitness: hypothesis  R
– WHILE (Max(f) < Fitness-Threshold) DO
– 1. Select: Probabilistically select (1 - r)p members of P to add to PS
P hi  
– 2. Crossover:

f hi 
p
j 1
 
f hj
– Probabilistically select (r · p)/2 pairs of hypotheses from P
– FOR each pair <h1, h2> DO
PS += Crossover (<h1, h2>)
// PS[t+1] = PS[t] + <offspring1, offspring2>
– 3. Mutate: Invert a randomly selected bit in m · p random members of PS
– 4. Update: P  PS
– 5. Evaluate: FOR each h in P DO f[h]  Fitness(h)
– RETURN the hypothesis h in P that has maximum fitness f[h]
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Genetic and Evolutionary Computation [2]:
Genetic Programming (GP)
Adapted from The Genetic Programming Notebook © 2002 Jaime J. Fernandez
http://www.geneticprogramming.com
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Artificial Neural Networks [1]:
Backpropagation of Error
• Recall: Backprop Training Rule Derived from Error Gradient Formula
• Algorithm Train-by-Backprop (D, r)
– r: constant learning rate (e.g., 0.05)
– Initialize all weights wi to (small) random values
– UNTIL the termination condition is met, DO
FOR each <x, t(x)> in D, DO
Input the instance x to the unit and compute the output o(x) = (net(x))
FOR each output unit k, DO
δk  ok x 1 ok x t k x   ok x 
FOR each hidden unit j, DO
δ j  h j x 1  h j  x 

k outputs
Output Layer
o1
Hidden Layer h1
v j ,kδk
o2
h2
h3
v42
h4
u 11
Input Layer
x1
x2
x3
Update each w = ui,j (a = hj) or w = vj,k (a = ok)
wstart-layer, end-layer  wstart-layer, end-layer +  wstart-layer, end-layer
wstart-layer, end-layer  r end-layer aend-layer
– RETURN final u, v
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Artificial Neural Networks [2]:
Recurrence and Time Series
•
Representing Time Series with ANNs
– Feedforward ANN: y(t + 1) = net (x(t))
– Need to capture temporal relationships
•
Solution Approaches
– Directed cycles
– Feedback
• Output-to-input [Jordan]
• Hidden-to-input [Elman]
• Input-to-input
– Captures time-lagged relationships
• Among x(t’  t) and y(t + 1)
• Among y(t’  t) and y(t + 1)
– Learning with recurrent ANNs
• Elman, 1990; Jordan, 1987
• Principe and deVries, 1992
• Mozer, 1994; Hsu and Ray, 1998
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Mixture Models, Instance-Based Learning (IBL),
and Radial Basis Functions (RBFs)
• Mixture Model Construction [cf. McCullagh and Nelder, 1990]
– Update-Inducer
• Single training step for each expert module

• e.g., ANN: one backprop cycle, aka epoch
– Compute-Activation
g1
• Depends on ME architecture
Gating
Network
• Idea: smoothing of “winner-take-all” (“hard” max)
• Softmax activation function (Gaussian mixture model)
gl 
e
k
 
w l x
e
 
w j x
g2
x
y1
Expert
Network
y2
Expert
Network
j 1
• Possible Modifications
– Batch (as opposed to online) updates: lift Update-Weights out of outer FOR loop
– Classification learning (versus concept learning): multiple yj values
– Arrange gating networks (combiner inducers) in hierarchy (HME)
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Kernel Methods: Radial Basis Function (RBF)
Networks and Support Vector Machines (SVM)
•
What Are RBF Networks?
– Global approximation to target function f, in terms of linear combination of local
approximations
– Typical uses: image, signal classification
– Different kind of artificial neural network (ANN)
– Closely related to distance-weighted regression, but “eager” instead of “lazy”
•
Activation Function
…
1
…
a1(x) a2(x)
an(x)
k
– where ai(x) are attributes describing instance x and f  x   w 0   w u  K u d  x u , x 
u 1
– Common choice for Ku: Gaussian kernel function
Ku d x u , x   e

1 2
d  xu , x 
2σu2
[Mitchell, 1997]
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Results [1]:
Supervised Learning – Energy Estimation
Results for 36-bit occupancy vector, 10-fold cross-validation
Target attribute: external energy function (numeric)
Source data: Baza C500, Step16MDD
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Results [2]:
Unsupervised Learning - Clustering
Cluster Tree for 36-bit Occupancy Vector
Database from Cobweb – WEKA
238 merges, 186 splits, 1106 clusters
Visualization of Energy (x) vs. Cluster Membership (y)
using Clusters found by Expectation-Maximization (EM) – WEKA
20 clusters, log likelihood = -11.66
Kansas State University KDD Lab (www.kddresearch.org)
Customizable Visualization
Interface (Future Work):
King
Kansas State University
Department of Computing and Information Sciences
Results [3]:
Unsupervised and Supervised - Classification
Results for 36-bit occupancy vector, Unsupervised Discretize (scalar quantization) filter
Algorithm: J48 (cf. Quinlan’s C4.5)
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
559
154
0.1683
0.3532
78.4011 %
21.5989 %
3 bins
=== Confusion Matrix ===
a
b
c
<-- classified as
115 15
3 |
a = '(-inf-0.333333]'
14 221 59 |
b = '(0.333333-0.666667]'
2 61 223 |
c = '(0.666667-inf)
Correctly Classified Instances
Incorrectly Classified Instances
Mean absolute error
Root mean squared error
362
351
0.1065
0.281
50.7714 %
49.2286 %
=== Confusion Matrix ===
a b c d e f g h i j
26 2 1 2 1 0 0 0 0 0
2 22 2 2 0 0 1 0 0 0
0 1 32 10 2 4 1 0 2 0
2 7 10 20 7 3 3 1 1 0
0 0 0 7 23 15 7 4 0 0
0 0 2 1 15 71 21 9 7 0
0 2 1 3 7 19 60 24 13 1
0 0 2 0 3 8 27 40 19 4
0 0 0 0 3 7 11 20 57 6
0 0 0 1 1 2 1 4 7 11
|
|
|
|
|
|
|
|
|
|
<-- classified as
a = '(-inf-0.1]'
b = '(0.1-0.2]'
c = '(0.2-0.3]'
d = '(0.3-0.4]'
e = '(0.4-0.5]'
f = '(0.5-0.6]'
g = '(0.6-0.7]'
h = '(0.7-0.8]'
i = '(0.8-0.9]'
j = '(0.9-inf)'
Kansas State University KDD Lab (www.kddresearch.org)
10 bins
Kansas State University
Department of Computing and Information Sciences
Plan and Progress
• Progress to Date
– Supervised, unsupervised learning: baseline experiments using WEKA
– Draft design of extended data model and annotators
• Timeline
– 4th Q 2004 - 1st Q 2005: from SL-KMC to LEAP-KMC
– Understanding, redesigning KMC data models: database, trace files, etc.
– Preliminary experiments using traditional inducers
– 2nd Q 2005: extensible data models and scalable learning algorithms
– Data model for 36-bit, 200-bit geometry: transitions, in-betweening
– Refined learning algorithms: interpolation, prediction
– 2nd half 2005: dynamic models, general ANN & GP architecture; info vis
– 2005-2006: theory-guided constructive induction appplication
– 2006-2007: 3-D spatial representations (esp. graphical models)
– 2007 and beyond: learning equivalence classes, abstractions
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Continuing Work: Change of Representation
(Parameter Tuning Wrappers)
Dtrain (Inductive Learning)
D:
Training Data
[2] Representation Evaluator
for Learning Problems
Dval (Inference)

I:e
Inference Specification
α
f(α)
Representation Candidate
Fitness
Representation
Genetic Wrapper for
Change of Representation
and Inductive Bias Control
[1] Genetic Algorithm
α̂
Optimized
Representation
Hsu, Guo, Perry, Stilson (GECCO 2002);
Hsu (Information Sciences, 2005)
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Future Work:
3-D Modeling and Graphical Models
Dynamic Bayes Net
for Predicting 3-D
Energetics
Continuing Work:
Speeding up Approximate Inference using Edge Deletion - J. Thornton (2005)
Bayesian Network tools in Java (BNJ) v4 - W. Hsu, J. M. Barber, J. Thornton (2005)
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
References:
Machine Learning, KDD, Monte Carlo Simulation
• Machine Learning, Data Mining, and Knowledge Discovery
– K-State KDD Lab: literature survey and resource catalog (2005)
http://www.kddresearch.org/Resources
– Bayesian Network tools in Java (BNJ): Hsu, Barber, Guo, King, Thornton
http://bnj.sourceforge.net
– KDD Lab Thesis Repository
http://www.kddresearch.org
Learning in Computational Physics
– Illinois Genetic Algorithms Lab (IlliGAL)
http://www-illigal.ge.uiuc.edu
– K-State Visualization Lab
http://www.kddresearch.org/Groups/Visualization
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences
Acknowledgements
• Kansas State University Lab for Knowledge Discovery in Databases
– Undergraduate programmers: Andrew L. King ([email protected]), Joanne
Stone ([email protected])
– Other grad students: Jeffrey M. Barber ([email protected])
– Graduate research assistants: Jason Li ([email protected]), Waleed
Al-Jandal ([email protected])
• Joint Work with
– KSU Department of Physics: Dr. Talat Rahman, Dr. Abdelkader Kara, Dr.
Ahlam Al-Rawi, Altaf Karim
– KSU Computing and Information Sciences: Dr. Virgil Wallentine, Charlie
Thornton, Arthi Ramakrishnan
• Other Research Affliates
– University of Toledo: Dr. Jacques Amar
– Russian Academy of Sciences: Dr. Oleg Trushin
– Hong Kong University of Science & Tech: Dr. Haipeng Guo
Kansas State University KDD Lab (www.kddresearch.org)
Kansas State University
Department of Computing and Information Sciences