h - Information Sciences Institute

Download Report

Transcript h - Information Sciences Institute

Pattern Evaluation and
Process Control
Wei-Min Shen
Information Sciences Institute
University of Southern California
2/5/98
UCLA Data Mining Short Course
1
Outline
•
•
•
•
•
Intuition of Interestingness
Principles for Measuring Interestingness
Existing Measurement Systems
Minimal Description Length Principle
Methods for Process Control
2/5/98
UCLA Data Mining Short Course
2
Why Is a Pattern “Interesting”?
•
•
•
•
•
•
I did not know X before
It contradicts my thinking (surprise)
It is supported by the majority of the data
It is an exception of the usual cases
Occam’s Razor: Simple is better
More?
2/5/98
UCLA Data Mining Short Course
3
The Types of Classification Rule
• Let h be a hypothesis and e the evidence,
then respect to any given tuple, we have
– Characteristic rule: h  e
– Discriminate rule: e  h
• e and h can be interpreted as sets of tuples
satisfying e and h respectively
2/5/98
UCLA Data Mining Short Course
4
A Few Definitions
• Given a discriminate rule R: e  h
– |e| is the cover of the rule
– |he|/|e| is the confidence, reliability, or certainty
factor of the rule
• R is “X% complete”: if |he|/|h| = X%
(e satisfies X% of |h|)
• R is “Y% discriminate”: if |¬he|/|¬h| = (100-Y)%
(e satisfies (100-Y)% of |¬h|)
2/5/98
UCLA Data Mining Short Course
5
Principles for Measuring “I”
• 1. I = 0 if h and e are statistically
independent
– e and h have no relation at all
• 2. I  monotonically with |he| when |h|,
|¬h|, and |e| remain the same
– I relates to reliability
2/5/98
UCLA Data Mining Short Course
6
Principles for Measuring “I”
• 3. I  monotonically with |h| (or |e|) when
|he|, |e| (or |h|), and |¬h| remain the same
– I relates to completeness
• 4. I  monotonically with |e| when
reliability |he|/|e|, |h|, and |¬h| remain the
same
– I relates to cover when reliability is the same
2/5/98
UCLA Data Mining Short Course
7
Treat Discriminate and
Characteristic Rules Differently
• Principles 1,2,3,4 apply to both discriminate
and characteristic rules
• 5. Treat discriminate and characteristic rules
differently
– Rule
–A
–B
E
H
Fever Flu
Sneeze Flu
Discrim Complete
80%
30%
30%
80%
• As discriminate rule I(A) > I(B)
• As characteristic rule I(B) > I(A)
2/5/98
UCLA Data Mining Short Course
8
Existing Measurement Systems
–
–
–
–
2/5/98
RI (Piatetsky-Shapiro 91)
J (Smyth and Goodman 92)
CE (Hong and Mao 91)
IC++ (Kamber and Shinghal 96)
UCLA Data Mining Short Course
9
IC++ Measurement for
Characteristic Rules
•
•
•
•
•
•
•
Given h, e, let rule d: eh and rule c: he
Nec(d) = P(¬e|h)/P(¬e|¬h)
Suf(d) = P(e|h)/P(e|¬h)
for he, C++= if 0Nec(d)<1 then (1-Nec(d))*P(h), else 0.
for h¬e, C+-= if 0Suf(d)<1 then (1-Suf(d))*P(h), else 0.
for ¬he, C-+= if 0<Nec(d)< then (1-1/Nec(d))*P(¬h), else 0.
for ¬h¬e, C--= if 0<Suf(d)< then (1-1/Suf(d))*P(¬h), else 0.
2/5/98
UCLA Data Mining Short Course
10
Minimal Description Length
Principle
• The goodness of a theory or hypothesis (H)
relative to a set a data (D) is measured:
– The sum of
• The length of H
• The length of explanation of D using H
– Assuming both use the optimal coding schema
2/5/98
UCLA Data Mining Short Course
11
The Derivation of MDL
• Based on probability theory, the best hypothesis H
with respect to D is:
– the max of P(H)P(D|H)
– or the max of logP(H) + logP(D|H)
– or the min of -logP(H) - logP(D|H)
• Since the optimal encode of a set is related to the
probability of the elements, so we have MDL
– the min of |coding1(H)| + |coding2(D|H)|
2/5/98
UCLA Data Mining Short Course
12
An Illustration of MDL
One line theory:
explanation length = 294.9
2/5/98
Two line theory:
explanation length = 298.7
UCLA Data Mining Short Course
13
Fit Points with Lines
– Theory = lines (#,angle,length,center)
– Explanation: for each point:
– the line it belongs to
– the position on the line
– the distance to line
– Notice that the current coding is (x,y)
– It is different if we choose coding (r,theta)
2/5/98
UCLA Data Mining Short Course
14
Process Control
• The Goal: to predict future from past
• The Given: the past data sequence
• The methods:
– Adaptive Control Theory
– Chaotic theory
– State Machines
2/5/98
UCLA Data Mining Short Course
15
Chaotic Theory
•
•
•
•
•
The data sequence may appear chaotic
The underlying model may be very simple
Extreme sensitive to initial condition
Difficult to make long term prediction
Short term prediction is possible
2/5/98
UCLA Data Mining Short Course
16
An Example Chaotic Sequence
1.0
s(k)
0.5
0.0
20
40
60
Time step k
80
100
The simple logistic map model:
sk+1= a sk (1 - sk), where a=4
2/5/98
UCLA Data Mining Short Course
17
Steps of Using Chaotic Theory
• Reconstruction of state space:
– xk = [xk, xk-, …, xk-(m-1) ]T
– where  is a time delay, m is the embedding dimension
• Taken’s theorem:, one can always find an
embedding dimension m2[d]+1, where [d] is the
integer part of the attractor’s dimension, to
preserve the invariant measures
• Central task: chose m and 
2/5/98
UCLA Data Mining Short Course
18
State Machine Approach
• Identify the number of states by clustering
all points in the sequence
• Construct a transition function by learning
from the sequence
2/5/98
UCLA Data Mining Short Course
19
Construction & Synchronization
• Environment = (A, P, Q, r) where |P|<|Q|
• Model = (A, P, S, t)
– Visibly equivalent
– Perfect
– Synchronized
• The Construction problem
– when and how to construct new model states
• The Synchronization problem
– how to determine which model state is current
2/5/98
UCLA Data Mining Short Course
20
Learning with a Reset Button
• Two environmental states p and q (they may
appear the same to the learner) are different if and
only if there exists a sequence e of actions that
leads from p and q to states that are visibly
different
• The interaction with the environment
– Membership Query
– Equivalence Query: “yes” or a counter example
2/5/98
UCLA Data Mining Short Course
21
Observation Table
•
•
•
•
•
•
Model states: {row(s) : s in S}
Initial state: row(l)
Final state: {row(s) : s in S and T(s)=1
Transitions: (row(s),a) = row(sa)
Closed table: s,as’ row(sa)=row(s’)
Consistent table: row(s)=row(s’)  row(sa)=row(s’a)
E (experiments)
States (actions from init state)
Transitions
2/5/98
S
T: Observations
SxA
UCLA Data Mining Short Course
22
L* Algorithm
• Initialize T for l and each action in A
• Loop
Use membership queries to make T
complete, closed, and consistent
If EQ(T)=w /* an counter example */
then add w and all its prefixes into S;
Until EQ(T)=yes.
2/5/98
UCLA Data Mining Short Course
23
The Little Prince Example
• A counter example ftf for M3 (Fig 5.3), the
model ends at rose, but the real observation
is volcano
• An inconsistency in T4 (Tab 5.5), where
row(f)=row(ft), but row(ff)  row(ftf).
2/5/98
UCLA Data Mining Short Course
24
Homing Sequence
• L* is limited by a reset button
• Homing Sequence h: if two observation sequences
of executing h are the same, then these two
sequences lead to the same state
• Let q<h> be observation sequence, and
qh the ending state, then h is defined as
• for all p, q: [p<h>=q<h>]  [ph=qh]
• e.g., {fwd} is a homing seq for Little Prince
2/5/98
UCLA Data Mining Short Course
25
Properties of Homing Seq
• Every FDA has a homing sequence
• Can be constructed from a FDA by appending
actions (<n) that distinguish a pair of states
• The length of this construction is n2
• There are FDA whose shortest h is n2 long
• h can be used as a reset
• h cannot guarantee go to a fixed state
2/5/98
UCLA Data Mining Short Course
26
L* with a Homing Sequence h
• Every time a reset is needed, repeat h until
you see the desired observation sequence
• Or for each possible observation sequence
of h, make a copy of L* (see Fig 5.6)
2/5/98
UCLA Data Mining Short Course
27
Learning the Homing Sequence
• If h is not a homing sequence, then we may
discover that the same observation sequence
 produced by executing h may lead us to
two different states, p and q, for there is a
sequence of actions x that p<x>  q<x>
• then, a better approximation of homing
sequence is hx
2/5/98
UCLA Data Mining Short Course
28
L* + Learning h
• Assume a homing sequence h, initially h=l
• When h is shown to be incorrect, extend h,
and discard all copies of L* and start again
• When h is incorrect, then there exists x such
that qh<x>  ph<x>, even if q<h>=p<h>
2/5/98
UCLA Data Mining Short Course
29
Learning h and the Model
• Revist and Shapire’s algorithm (Fig 5.7)
• Little Prince Example (notice the
inconsistency produced by ff in Fig 5.10)
2/5/98
UCLA Data Mining Short Course
30