Fab: Content-Based, Collaborative Recommendation
Download
Report
Transcript Fab: Content-Based, Collaborative Recommendation
Introduction of
Bayesian Network
4 / 20 / 2005
CSE634 Data Mining Prof. Anita Wasilewska
105269827 Hiroo Kusaba
Software Engineering Laboratory
1
References
[1] D. Heckerman: “A Tutorial on Learning with
Bayesian Networks”, In “Learning in Graphical
Models”, ed. M.I. Jordan, The MIT Press, 1998.
[2] http://www.cs.huji.ac.il/~nir/Nips01-Tutorial/
[3]Jiawei Han:”Data Mining Concepts and
Techniques”,ISBN 1-53860-489-8
[4] Whittaker, J.: Graphical Models in Applied
Multivariate Statistics, John Wiley and Sons
(1990)
Software Engineering Laboratory
2
Contents
Brief introduction
Review
A little review of probability
Bayes theorem
Bayesian Classification
Steps of using Bayesian Network
Software Engineering Laboratory
3
Random variables X, Y, Xi, Θ Capitals
Condition (or value) of a variable x, y, xi, θ small
Set of a variable X, Y, Xi, Θ in Capital bold
Set of a condition (or value) x, y, xi, θ small bold
P(x/a) : Probability that an event x occurs (or
happens) under the condition of a
Software Engineering Laboratory
4
What is Bayesian Network ?
Network which express the dependencies among
the random variables X {x1 , x2 ,..., xn }
Each node has posterior probability which
depends on the previous random variable
The whole network also express the joint
probability distribution from all of the random
variables X
Pa is parent(s) of a node i
p X pxi Pai
n
i 1
Software Engineering Laboratory
5
How is it used ?
Bayesian Learning
Estimating dependencies between the random variables
from the actual data
Bayesian Inference
When some of the random variables are defined it
calculate the other probabilities
Patiants condition as a random variable, from the condition it
predicts the desease
Software Engineering Laboratory
6
What is so good about it?
Conditional independencies and graphical
expression capture structure of many real-world
distributions. [1]
Learned model can be used for many tasks
Supports all the features of probabilistic learning
Model selection criteria
Dealing with missing data and hidden variables
Software Engineering Laboratory
7
Example of Bayesian Network
X
0
1
Structure of a network
Conditional Probability
X,Y,Z are random variables
which takes either 0 or 1
p(X), p(Y|X), p(Z|Y)
P(X)
0.5
0.5
X
0
0
1
1
Y
0
1
0
1
P(Y|X)
0.1
0.9
0.2
0.8
X
Y
0
0
1
1
Software Engineering Laboratory
Y
Z
0
1
0
1
Z
P(Z|Y)
0.3
0.7
0.4
0.6
8
Example of Bayesian Network 2
What is the Joint probability of P(X, Y, Z)?
X
0
0
0
0
P(X, Y, Z) = P(X)*P(Y|X)*P(Z|Y)
Y
0
0
1
1
Z
0
1
0
1
P(X,Y,Z)
0.015
0.035
0.180
0.270
X
1
1
1
1
Software Engineering Laboratory
Y
0
0
1
1
Z
0
1
0
1
P(X,Y,Z)
0.030
0.070
0.160
0.240
9
A little Review of probability 1
Probability : How likely is it that an event will
happen?
Sample Space S
Element of S: elementary event
An event A is a subset of S
P(A) ≧ 0
P(S) = 1
Software Engineering Laboratory
10
A little review of probability 2
Discrete probability distribution
P(A) =
Σs∈A
P(s)
Conditional probability distribution
P(A|B) = P(A, B) / P(B)
If the events are independent
A
B
P(A, B) = P(A)*P(B)
Bayes Theorem
P( A) P( B | A)
P( A) P( B | A)
P( A | B)
P( B)
P APB | A P A P B | A
Software Engineering Laboratory
11
Bayes Theorem
P( Ai | B )
P ( Ai ) P ( B | Ai )
P( B)
P( A) P ( B | A)
n
P A PB | A
i 1
Software Engineering Laboratory
i
i
12
Example of Bayes Theorem
You are about to be tested for a rare desease.
How worried should you be if the test result is
positive ?
Accuracy of the Test is P(T) = 85%
Chance of Infection P(I) = 0.01%
What is P(I / not T)
http://www.gametheory.net/Mike/applets/Bayes/B
ayes.html
Software Engineering Laboratory
13
Bayesian Classification
Suppose that there are m classes, C1 , C2 ,..., Cm
Given an unknown data sample, x
the Bayesian classifier assigns an unknown
sample x to the class c if and only if
P(Ci | X ) P(C j | X )
1 j m, j i
Software Engineering Laboratory
14
P(Ci ) P( X | Ci )
P(Ci | X )
P( X )
We have to maximize P( X | Ci ) P(Ci )
In order to reduce computation
class conditional independence is made
n
P ( X | C i ) P ( x k | Ci )
k 1
Software Engineering Laboratory
15
Example of Bayesian Classification
in the text book[3]
Customer under 30 and income is “medium” and
student and credit rating is “fair”, which category
does the customer belongs? Buy or not.
Software Engineering Laboratory
16
Bayesian Network
Network which express the dependencies among
the random variables X {x1 , x2 ,..., xn }
The whole network also express the joint
probability distribution from all of the random
variables X
Pa is parent(s) of a node i
n
p (x) p ( xi | x1 , x2 ,..., xi 1 )
i 1
X
Y
Z
p( xi | x1 , x2 ,..., xi 1 ) p( xi | Pai )
p X pxi Pai
n
i 1
Pai are a subset
Software Engineering Laboratory
17
Steps to apply Bayesian Network
Step1 Create a Bayesian Belief Network
Include all the variables that are important in your
system
Use causal knowledge to guide the connections made
in the graph
Use your prior knowledge to specify the conditional
distributions
Step2 Calculate the p(xi|pai) for your goal
Software Engineering Laboratory
18
Example from [1]
Example to make a BN from the prior knowledge
BN to find a credit card fraud
Define random variables
Fraud(F):Probability that owner is a fraud
Gas(G):Bought a gas in 24 hours
Jewelry(J):Bought a jewelry in 24 hours
Age(A):Age of owner of the card
Sex(S):Gender of the owner of the card
Software Engineering Laboratory
19
Give orders to random variables
Define dependencies, but you have to be careful.
F
A
G
J
F
J
G
A
S
S
p(a | f ) p(a)
P(s | f , a) p(s)
p ( g | f , a, s ) p ( g | f )
p ( j | f , a, s, g ) p ( j | f , a, s )
p(a | f ) p(a | f )
P( s | f , a) p( s)
p ( g | f , a, s ) p ( g | f )
p ( j | f , a, s, g ) p ( j | f )
Software Engineering Laboratory
20
Next topic
Training with Bayesian Network
Bayes Inference
If the training data is complete
If the training data is missing
Network Evaluation
Software Engineering Laboratory
21
Thank you for listening.
Software Engineering Laboratory
22