Information Mining with Relational and Possibilistic Graphical Models

Download Report

Transcript Information Mining with Relational and Possibilistic Graphical Models

Information Mining
with
Relational and Possibilistic
Graphical Models
Prof. Dr. Rudolf Kruse
University of Magdeburg
Faculty of Computer Science
Magdeburg, Germany
[email protected]
N
SF
EURO
UZZY
Example: Continuously Adapting Gear Shift Schedule in VW New Beetle
classification of driver / driving situation
by fuzzy logic
fuzzification
inference
machine
defuzzification
gear shift
computation
interpolation
accelerator pedal
filtered speed of
accelerator pedal
number of
changes in
pedal direction
rule
base
sport
factor [t]
determination
of speed limits
for shifting
into higher or
lower gear
depending on
sport factor
gear
selection
sport factor [t-1]
N
SF
EURO
UZZY
Continuously Adapting Gear Shift Schedule: Technical Details
 Mamdani controller with 7 rules
 Optimized program
24 Byte RAM
AG4
}
on Digimat
702 Byte ROM
 Runtime 80 ms
12 times per second a new sport factor is assigned
 How to find suitable rules?
N
SF
EURO
UZZY
Information Mining
 Information mining is the non-trivial process of
identifying valid, novel, potentially useful, and
understandable information and patterns in
heterogeneous information sources.
 Information sources are
 data bases,
 expert background knowledge,
 textual description,
 images,
 sounds, ...
N
SF
EURO
UZZY
Information Mining
Problem
Information
Information Modeling
Understanding Understanding Preparation
Evaluation
Deployment
Determine
Problem
Objectives
Collect Initial
Information
Select Information
Select
Modeling
Technique
Evaluate
Results
Plan
Deployment
Assess
Situations
Describe
Information
Clean Information
Generate Test Review
Design
Process
Determine
Information
Mining Goals
Explore
Information
Construct In- Build Model
formation
Verify
Produce Project Information
Plan
Quality
Integrate Information
Assess Model
Determine
Next Steps
Plan Monitoring and
Maintenance
Produce Final
Results
Review
Project
Format Information
N
SF
EURO
UZZY
Example: Line Filtering
 Extraction of edge segments (Burns’ operator)
 Production net:
edges  lines  long lines  parallel lines  runways
N
SF
EURO
UZZY
SOMAccess V1.0
Available on CD-ROM: G. Hartmann, A. Nölle, M. Richards, and R. Leitinger
(eds.), Data Utilization Software Tools 2 (DUST-2 CD-ROM), Copernicus
Gesellschaft e.V., Katlenburg-Lindau, 2000 (ISBN 3-9804862-3-0)
N
SF
EURO
UZZY
Current Research Topics
 Multi-dimensional data analysis: Data warehouse and OLAP
(on-line analytical processing)
 Association, correlation, and causality analysis
 Classification: scalability and new approaches
 Clustering and outlier analysis
 Sequential patterns and time-series analysis
 Similarity analysis: curves, trends, images, texts, etc.
 Text mining, web mining and weblog analysis
 Spatial, multimedia, scientific data analysis
 Data preprocessing and database completion
 Data visualization and visual data mining
 Many others, e.g., collaborative filtering
N
SF
EURO
UZZY
Fuzzy Methods in Information Mining
here: Exploiting quantitative and qualitative
information
 Fuzzy Data Analysis (Projects with Siemens)
 Dependency Analysis (Project with Daimler)
N
SF
EURO
UZZY
Analysis of Imprecise Data
Fuzzy Database
A
B
C
1
Large
Very
large
Medium
2
2.5
Medium
About 7
3
[3,4]
Small
[7,8]

A
Linguistic
modeling
C


1
2
3

Computing with
words
The mean w.r.t. A is
„approximately 5“
B
Statistics with
fuzzy sets
Linguistic
approximation
Mean of attribute A
N
SF
EURO
UZZY
Fuzzy Data Analysis
Strong law of large numbers (Ralescu, Klement, Kruse,
Miyakoshi, ...)
Let {xk | k 1} be independent and identically distributed
fuzzy random variables such that E||supp x1|| <  . Then
 x1  x2    xn

d
, E (co( x1 ))   0
n


Books:Kruse, Meyer: Statistics with Vague Data, Reidel, 1987
Bandemer, Näther: Fuzzy Data Analysis, Kluwer, 1992
Seising, Tanaka and Guo, Wolkenhauer, Viertl, ...
N
SF
EURO
UZZY
Analysis of Daimler/Chrysler Database
 Database: ~ 18.500 passenger cars
> 100 attributes per car
 Analysis of dependencies between special equipment and
faults.
 Results used as a starting point for technical experts looking
for causes.
N
SF
EURO
UZZY
Bayesian Networks
Qualitative Part
knowledge about
(conditional)
independence,
causality, ...
directed acyclic graph
A
B
C
+
Quantitative Part
=
Model
local models on
low-dimensional spaces
ABC
P(A,B,C)
abc
abc
abc
abc
0.8
0.1
0.1
0.0
unique joint
model on the
(high-dimensional)
space
N
SF
EURO
UZZY
Example: Genotype Determination of Jersey Cattle
variables: 22, state space 6  1013, parameters: 324
Graphical Model
• node
 random variable
Phenogr.1
(3 diff.)
• edges
 conditional dependencies
Phenogr.2
(3 diff.)
Genotype
(6 diff.)
• decomposition
 P ( X 1 ,, X 22 ) 
22
 P( X
i
| parents ( X i ))
i 1
• diagnosis
 P(  | knowledge)
N
SF
EURO
UZZY
Learning Graphical Models
data
+
prior information
A
Inducer
B
C
local models
N
SF
EURO
UZZY
The Learning Problem
known structure
B
A
C
complete data
A
<a4,
<a3,
B
b3,
b2,
C
c1>
c4>
Statistical Parametric
Estimation (closed from eq.):
statistical parameter fitting,
ML Estimation,
Bayesian Inference, ...
incomplete data Parametric Optimization:
(missing values,
hidden variables,...)
A
<a4,
<a3,
B
?,
b2,
C
c 1>
?>
EM,
gradient descent, ...
unknown structure
B
A
C
Discrete Optimization over
Structures (discrete search):
likelihood scores,
MDL
Problem:
search complexity
heuristics
Combined Methods:
structured EM
only few approaches
Problems:
criterion for fit?
new variables?
local maxima?
fuzzy values?
N
SF
EURO
UZZY
Information Mining
18.500 passenger cars
Linguistic
modeling
130 attributes per car
Fuzzy Database
Imprecise data
Computing with
words
IF air conditioning and
electr. roof top
Then more battery
faults
Learning
graphical models
Rule
generation
relational/possibilistic
graphical model
N
SF
EURO
UZZY
A Simple Example
Example World
Relation
color shape size
• 10 simple geometric objects, 3 attributes
• one object is chosen at random and examined.




















small
medium
small
medium
medium
large
medium
medium
medium
large
• Inferences are drawn about the unobserved attributes.
N
SF
EURO
UZZY
The Reasoning Space
Geometric Interpretation
Relation
color shape size




















small
medium
small
medium
medium
large
medium
medium
medium
large







large
medium
small
Each cube represents one tuple
N
SF
EURO
UZZY
Prior Knowledge and Its Projections

 
  






large
medium
small

 
large
medium
small
   
  






large
medium
small
large
medium
small
N
SF
EURO
UZZY
Cylindrical Extensions and Their Intersection
   



large
medium
small
   
Intersecting the cylindrical extensions
of the projection to the subspace
formed by color and shape and of the
projection to the subspace formed by
shape and size yields the original
three-dimensional relation.
   





large
medium
small

large
medium
small
N
SF
EURO
UZZY
Reasoning

Let it be known (e.g. from an observation) that the given object is green.
This information considerably reduces the space of possible value
combinations.

From the prior knowledge it follows that the given object must be
- either a triangle or a square and
- either medium or large
   
   





large
medium
small

large
medium
small
N
SF
EURO
UZZY
Reasoning with Projections
The same result can be obtained using only the projections to the
subspaces without reconstructing the original three-dimensional
space:
   
s m l
color
extend

size
shape
project
project

extend




   
This justifies a network representation color
s m l
shape
size
N
SF
EURO
UZZY
Interpretation of Graphical Models
 Relational Graphical Model
 Decomposition + local models
 Example
colour
shape
graph
size
colour
shape
size
hypergraph
 Learning a relational graphical model
 Searching for a suitable decomposition
+ local relations
N
SF
EURO
UZZY
Genotype Determination of Danish Jersey Cattle
Assumptions about parents:
risk about misstatement
genotype mother
Reliability of databases
genotype father
Inheritance rules
genotype child,
6 possible values
4 lysis values
measured by photometer
Blood group determination
N
SF
EURO
UZZY
Qualitative Knowledge
parental error
Dam correct
Sire correct
phenogroup 1
stated dam
phenogroup 2
stated dam
phenogroup 1
stated sire
phenogroup 2
stated sire
phenogroup 1
true dam
phenogroup 2
true dam
phenogroup 1
true sire
phenogroup 2
true sire
phenogroup 1
offspring
phenogroup 2
offspring
genotype
offspring
factor 40 (F1)
factor 41 (F2)
factor 42 (V1)
factor 43 (V2)
lysis 40
lysis 41
lysis 42
lysis 43
N
SF
EURO
UZZY
Example: Genotype Determination of Jersey Cattle
variables: 22, state space 6  1013, parameters: 324
Graphical Model
• node
 random variable
Phenogr.1
(3 diff.)
• edges
 conditional dependencies
Phenogr.2
(3 diff.)
Genotype
(6 diff.)
• decomposition
 P ( X 1 ,, X 22 ) 
22
 P( X
i
| parents ( X i ))
i 1
• diagnosis
 P(  | knowledge)
N
SF
EURO
UZZY
Learning Graphical Models from Data
• Test whether a distribution is decomposable w.r.t. a given graph.
This is the most direct approach. It is not bound to a graphical
representation, but can also be carried out w.r.t. other representations
of the set of subspaces to be used to compute the (candidate)
decomposition of the given distribution.
• Find an independence map by conditional independence tests.
This approach exploits the theorems that connect conditional
independence graphs and graphs that represent decompositions. it has
the advantage that a single conditional independence test, if it fails, can
exclude several candidate graphs.
• Find a suitable graph by measuring the strength of dependences.
This is a heuristic, but often highly successful approach, which is based
on the frequently valid assumption that in a distribution that is
decomposable w.r.t. a graph an attribute is more strongly dependent on
adjacent attributes than on attributes that are not directly connected to
them.
N
SF
EURO
UZZY
Is Decomposition Always Possible?

 
  
2






large
medium
small
1

 
large
medium
small
   
  






large
medium
small
large
medium
small
N
SF
EURO
UZZY
Direct Test for decomposability
1.
2.
color
shape
shape
size
   
4.
color
shape
size
color
shape
size
   
   










size
   

7.
color
shape
size
   

large
medium
small
large
medium
small
6.
color
shape
size
   
large
medium
small
5.
3.
color
8.
color
shape
large
medium
small
size
   
color
shape
size
   










large
medium
small
large
medium
small

large
medium
small

large
medium
small
N
SF
EURO
UZZY
Evaluation Measures and Search Methods
 An exhaustive search over all graphs is too expensive:

 n
 
2 2  possible
n
undirected graphs for n attributes.
i 1  n  i n 1
 f n     1
i 1
 2
i
f n  i  possible directed acyclic graphs.
 Therefore all learning algorithms consist of an evaluation measure
(scoring function), e.g.
 Hartley information gain
 relative number of occurring value combinations
and a (heuristic) search method, e.g.
 guided random search
 greedy search (K2 algorithm)
conditional independence search
N
SF
EURO
UZZY
Measuring the Strengths of Marginal Dependences

Relational networks: Find a set of subspaces, for which the intersection
of the cylindrical extensions of the projections to these subspaces
contains as few additional states as possible.

This size of the intersection depends on the sizes of the cylindrical
extensions, which in turn depend on the sizes of the projections.

Therefore it is plausible to use the relative number of occurring value
combinations to assess the quality of a subspace.
subspace
color  shape shape  size size  color
possible combinations
12
9
12
occurring combinations
6
5
8
relative number
50%
56%
67%

The relational network can be obtained by interpreting the relative
numbers as edge weights and constructing the minimal weight
spanning tree.
N
SF
EURO
UZZY
Conditional Independence Tests

  



Hartley information needed to determine
coordinates:
log24+ log23= log212 3.58
coordinate pair: log26
2.58
gain:
log212- log26= log22 =1
Definition: Let A and B be two attributes and R a discrete possibility
measure with adom(A):  bdom(B):R(A=a,B=b)=1 Then




Hartley 
 A, B   log 2 adom  A R A  a   log 2 bdom  B  RB  b 
I gain
 log 2 adom  A  bdom  B  R A  a, B  b 
adom  A R A  a   bdom  B  RB  b 
 log 2
adom  A bdom  B  R A  a, B  b 





is called the Hartley information gain of A and B w.r.t. R.
N
SF
EURO
UZZY
Conditional Independence Tests (continued)
 The Hartley information gain can be used directly to test for
(approximate) marginal independence.
attributes
relative number of possible
value combinations
Hartley information gain
color, shape
6/(3*4)=1/2=50%
log23+ log24- log26=1
color, size
6/(3*4)=2/3=67%
log23+ log24- log280.58
shape, size
5/(3*3)=5/9=56%
log23+ log23- log25  0.85
 In order to test for (approximate) conditional independence:
 Compute the Hartley information gain for each possible
instantiation of the conditioning attributes.
 Aggregate the result over all possible instantiations, for instance,
by simply averaging them.
N
SF
EURO
UZZY
Direct Test for Decomposability
Definition: Let p1 and p2 be two strictly positive probability distributions
on the same set e of events. Then
I KLdiv  p1, p2    p1  E  log 2
Ee
p1  E 
p2  E 
is called the Kullback-Leibler information divergence of p1 and p2.
 The Kullback-Leibler information divergence is non-negative.
 It is zero if and only if p1  p2.
 Therefore it is plausible that this measure can be used to asses the
quality of the approximation of a given multi-dimensional distribution
p1 by the distribution p2 that is represented by a given graph:
The smaller the value of this measure, the better the approximation.
N
SF
EURO
UZZY
Direct Test for Decomposability (continued)
1.
2.
A
B
B
C
6.
A
B
C
0
-4401
B
C
7.
C
0.111
-4563
A
B
C
0.429
-4830
A
B
4.
A
0.137
-4612
0.566
-5041
5.
3.
A
0.540
-4991
8.
A
B
C
0.402
-4780
C
A
B
C
0
-4401
Upper numbers: The Kullback-Leibler information divergence of the
original distribution and its approximation.
Lower numbers: The binary logarithms of the probability of an
example database (log-likelihood of data).
N
SF
EURO
UZZY
Evaluation Measures / Scoring Functions
Relational Networks
Relative number of occurring value combinations
 Hartley Information Gain

Probabilistic Networks
2-Measure
 Mutual Information / Cross Entropy / Information Gain

(Symmetric) Information Gain Ratio
 (Symmetric/Modified) Gini Index

Bayesian Measures (g-function, BDeu metric)
 Other measures that are known from Decision Tree Induction

N
SF
EURO
UZZY
A Probabilistic Evaluation Measure
Mutual Information / Cross Entropy / Information Gain
n
 based on Shannon entropy H    pi log 2 pi
i 1
I gain  A, B   H A  H A|B  H A  H B  H AB
 

 P A  a  log 2 P A  a 
adom  A 
 PB  b  log 2 PB  b 
bdom  B 

 P A  a, B  b  log 2 P A  a, B  b 
P B  b 
 P A  a | B  b  log 2 P A  a | B  b 

adom  A  bdom  B 
Idea:
H A|B  

bdom B 
adom  A 
N
SF
EURO
UZZY
Possiblity Theory
 fuzzy set induces possibility
 A   sup  
1
A
cloudy
2
 0  55, 60 
3
 axioms
50
65
85
100

 0  0
    1
 A  B  max  A ,  B
 A  B  min  A ,  B
N
SF
EURO
UZZY
Possibility Distributions and the Context Model
 Let  be the set of all possible states of the world,
0 the actual (but unknown) state.
 Let C={c1,…,ck} be a set of contexts (observers, frame conditions etc.),
(C,2C,P) a finite probability space (context weights).
 Let g:C2 be a set-valued mapping, assigning to each context the
most specific correct set-valued specification of 0.
g is called a random set (since it is a set-valued random variable);
thesets g(c) are also called focal sets.
 The induced one point coverage of g or
the induced possibility distribution is
 g :   0,1
 g    Pc  C |   g c .
N
SF
EURO
UZZY
Database-induced Possibility Distributions
Focal Sets
A
C
D
a1 b2 c3
d1
D
{d1, d2}
d3
{d1, d3 , d4}
a1 b3 c3
d1
a1 b2 c3
d2
a1 b3 c3
d2
a3 b1 c2
d3
{d1, d4}

a3 b2 c2
d3
a2 b3 c1
d1


Imprecise Database
A
a1
a3
{a2, a4}
B
{b2, b3}
{b1, b2}
b3
{a1, a2 , a3}

b2

C
c3
c2
{c1,
c2}
*

B


 Each imprecise tuple – or, more precisely, the set of all precise tuples
compatible with it – is interpreted as a focal set of a random set.
 In the absence of other information equal weights are assigned to the contexts.
 In this way an imprecise database induces a possibility distribution.
N
SF
EURO
UZZY
Reasoning
0
0
0
0
0
0
0
0
0
70
0
0
0
0
0
0
0
0
0
70
20
10
all numbers in
parts per 1000
70
60
10
large
70
0
0
large 0
small
0
0
medium 0
40
70
0
0
small 0
60
• Using the information that the given
10
object is green.
70
70
40
20
40
10
0
0
0
0
l
70
20
10
0
0
0
0
0
0
0
m
70
60
10
0
0
0
0
0
0
0
0
0
0
70
60
10
medium
70
s
20
40
10
N
SF
EURO
UZZY
Reasoning with Projections
Again the same result can be obtained using only projections to
subspaces (maximal degrees of possibility):
s m
0
0
0 70 new
old 90 80
80
old
new
90
70
70 old
size
new 40
shape
min
new
40 80 10 70
0 70
0
0
30 10 70 60
0
0
0 60
80 90 20 10
0 10
0
0
color
max
l
70
new
old
70
80
max
min
60
70
10
90
20 80 70
20 70 70
40 70 20
40 60 20
90 60 30
10 10 10
s
This justifies a network representation: color
70
old
new
column
new
line
70
shape
m
size
l
N
SF
EURO
UZZY
POSSINFER
N
SF
EURO
UZZY
Possibilistic Evaluation Measures / Scoring Functions
 Specificity Gain [Gebhardt and Kruse 1996, Borgelt et al. 1996]

(Symmetric) Specificity Gain Ratio [Borgelt et al. 1996]

Analog of Mutual Information [Borgelt and Kruse 1997]

Analog of the 2-measure [Borgelt and Kruse 1997]
N
SF
EURO
UZZY
Possibilistic Evaluation Measures
Reduction to the relational case via -cuts
log21 + log21 - log21 = 0
0.4
0.4
log22 + log22 - log23  0.42
0.3
0.3
log23 + log22 - log25  0.26
0.2
0.2
log24 + log23 - log28  0.58
0.1
0.1
log24 + log23 - log212 = 0
0
0
Usable relational measures
 relative number of value combinations/Hartley information gain
 specificity gain
 number of additional value combinations in the Cartesian product of
the marginal distributions
N
SF
EURO
UZZY
Specificity Gain
Definition: Let A and B be two attributes and  a possibility measure.

S gain ( A, B)   sup
log 2 (
0
 log 2 (
 log 2 (

 ( A  a))
adom ( A)

 ( A a))
adom ( A)


 ( A  a, B  b)) d
adom ( A) bdom ( B )
is called the specificity gain of A and B w.r.t. .

Generalization of Hartley information gain on the basis
of the -cut view of possibility distributions.
 Analogous
to Shannon information gain.
N
SF
EURO
UZZY
Specificity Gain in the Example
projection to
subspace
40
30
80
80
10
90
10
70
20
s
20
40
90
m
80
70
60
l
70
20
30
large 40
medium 60
small 80
70
80
90
20
70
40
minimum of
marginals
70
60
10
70
70
40
80
70
80
80
70
90
70
70
70
s
70
80
90
m
70
70
70
l
70
80
80
large 70
medium 80
small 80
70
80
90
70
70
70
specificity
gain
70
70
70
0.055 bit
0.048 bit
70
70
70
0.027 bit
N
SF
EURO
UZZY
Learning Graphical Models from Data
1.
2.
color
shape
shape
size
   
4.
color
shape
size
color
shape
size
   
   










size
   

7.
color
shape
size
   

large
medium
small
large
medium
small
6.
color
shape
size
   
large
medium
small
5.
3.
color
8.
color
shape
large
medium
small
size
   
color
shape
size
   










large
medium
small
large
medium
small

large
medium
small

large
medium
small
N
SF
EURO
UZZY
Data Mining Tool Clementine
N
SF
EURO
UZZY
Analysis of Daimler/Chrysler Database
electrical
roof top
air conditioning
faulty
battery
type of
engine
faulty
compressor
type of
tyres
slippage
control
faulty
brakes
Fictitious example:
There are significantly more faulty batteries, if both
air conditioning and electrical roof top are built
into the car.
N
SF
EURO
UZZY
Example Subnet
Influence of special equipment on battery faults:
(fictitious) frequency of
battery faults
electrical sliding roof
with
without
air conditioning
with
without
8%
3%
3%
2%
 significant deviation from independent distribution
 hints to possible causes and improvements
 here: larger battery may be required, if an air conditioning
system and an electrical sliding roof are built in
(The dependencies and frequencies of this example are fictious,
true numbers are confidential.)
N
SF
EURO
UZZY
Resources
http://fuzzy.cs.uni-magdeburg.de
free software tools such as NEFCLASS, …
C. Borgelt, R. Kruse:
Graphical models – Methods for data analysis and mining
Wiley, Chichester, January 2002.
N
SF
EURO
UZZY