Fab: Content-Based, Collaborative Recommendation

Download Report

Transcript Fab: Content-Based, Collaborative Recommendation

Introduction of
Bayesian Network
4 / 20 / 2005
CSE634 Data Mining Prof. Anita Wasilewska
105269827 Hiroo Kusaba
Software Engineering Laboratory
1
References
 [1] D. Heckerman: “A Tutorial on Learning with
Bayesian Networks”, In “Learning in Graphical
Models”, ed. M.I. Jordan, The MIT Press, 1998.
 [2] http://www.cs.huji.ac.il/~nir/Nips01-Tutorial/
 [3]Jiawei Han:”Data Mining Concepts and
Techniques”,ISBN 1-53860-489-8
 [4] Whittaker, J.: Graphical Models in Applied
Multivariate Statistics, John Wiley and Sons
(1990)
Software Engineering Laboratory
2
Contents
 Brief introduction
 Review


A little review of probability
Bayes theorem
 Bayesian Classification
 Steps of using Bayesian Network
Software Engineering Laboratory
3
 Random variables X, Y, Xi, Θ Capitals
 Condition (or value) of a variable x, y, xi, θ small
 Set of a variable X, Y, Xi, Θ in Capital bold
 Set of a condition (or value) x, y, xi, θ small bold
 P(x/a) : Probability that an event x occurs (or
happens) under the condition of a
Software Engineering Laboratory
4
What is Bayesian Network ?
 Network which express the dependencies among
the random variables X  {x1 , x2 ,..., xn }
 Each node has posterior probability which
depends on the previous random variable
 The whole network also express the joint
probability distribution from all of the random
variables X
 Pa is parent(s) of a node i
p X    pxi Pai 
n
i 1
Software Engineering Laboratory
5
How is it used ?
 Bayesian Learning

Estimating dependencies between the random variables
from the actual data
 Bayesian Inference

When some of the random variables are defined it
calculate the other probabilities
 Patiants condition as a random variable, from the condition it
predicts the desease
Software Engineering Laboratory
6
What is so good about it?
 Conditional independencies and graphical
expression capture structure of many real-world
distributions. [1]
 Learned model can be used for many tasks
 Supports all the features of probabilistic learning


Model selection criteria
Dealing with missing data and hidden variables
Software Engineering Laboratory
7
Example of Bayesian Network


X
0
1
Structure of a network
Conditional Probability
 X,Y,Z are random variables
which takes either 0 or 1
 p(X), p(Y|X), p(Z|Y)
P(X)
0.5
0.5
X
0
0
1
1
Y
0
1
0
1
P(Y|X)
0.1
0.9
0.2
0.8
X
Y
0
0
1
1
Software Engineering Laboratory
Y
Z
0
1
0
1
Z
P(Z|Y)
0.3
0.7
0.4
0.6
8
Example of Bayesian Network 2
 What is the Joint probability of P(X, Y, Z)?

X
0
0
0
0
P(X, Y, Z) = P(X)*P(Y|X)*P(Z|Y)
Y
0
0
1
1
Z
0
1
0
1
P(X,Y,Z)
0.015
0.035
0.180
0.270
X
1
1
1
1
Software Engineering Laboratory
Y
0
0
1
1
Z
0
1
0
1
P(X,Y,Z)
0.030
0.070
0.160
0.240
9
A little Review of probability 1
 Probability : How likely is it that an event will
happen?
 Sample Space S


Element of S: elementary event
An event A is a subset of S
 P(A) ≧ 0
 P(S) = 1
Software Engineering Laboratory
10
A little review of probability 2
 Discrete probability distribution

P(A) =
Σs∈A
P(s)
 Conditional probability distribution

P(A|B) = P(A, B) / P(B)
 If the events are independent

A
B
P(A, B) = P(A)*P(B)
 Bayes Theorem
P( A) P( B | A)
P( A) P( B | A)
P( A | B) 

P( B)
P APB | A   P A P B | A
 
Software Engineering Laboratory

11
Bayes Theorem
P( Ai | B ) 
P ( Ai ) P ( B | Ai )

P( B)
P( A) P ( B | A)
n
 P A PB | A 
i 1
Software Engineering Laboratory
i
i
12
Example of Bayes Theorem
 You are about to be tested for a rare desease.
How worried should you be if the test result is
positive ?
 Accuracy of the Test is P(T) = 85%
 Chance of Infection P(I) = 0.01%
 What is P(I / not T)
 http://www.gametheory.net/Mike/applets/Bayes/B
ayes.html
Software Engineering Laboratory
13
Bayesian Classification
 Suppose that there are m classes, C1 , C2 ,..., Cm
Given an unknown data sample, x
the Bayesian classifier assigns an unknown
sample x to the class c if and only if
P(Ci | X )  P(C j | X )
1  j  m, j  i
Software Engineering Laboratory
14
P(Ci ) P( X | Ci )
P(Ci | X ) 
P( X )
 We have to maximize P( X | Ci ) P(Ci )
 In order to reduce computation
class conditional independence is made
n
P ( X | C i )   P ( x k | Ci )
k 1
Software Engineering Laboratory
15
Example of Bayesian Classification
in the text book[3]
 Customer under 30 and income is “medium” and
student and credit rating is “fair”, which category
does the customer belongs? Buy or not.
Software Engineering Laboratory
16
Bayesian Network
 Network which express the dependencies among
the random variables X  {x1 , x2 ,..., xn }
 The whole network also express the joint
probability distribution from all of the random
variables X
 Pa is parent(s) of a node i
n
p (x)   p ( xi | x1 , x2 ,..., xi 1 )
i 1
X
Y
Z
p( xi | x1 , x2 ,..., xi 1 )  p( xi | Pai )
p X    pxi Pai 
n
i 1
Pai are a subset
Software Engineering Laboratory
17
Steps to apply Bayesian Network
 Step1 Create a Bayesian Belief Network



Include all the variables that are important in your
system
Use causal knowledge to guide the connections made
in the graph
Use your prior knowledge to specify the conditional
distributions
 Step2 Calculate the p(xi|pai) for your goal
Software Engineering Laboratory
18
Example from [1]
 Example to make a BN from the prior knowledge
 BN to find a credit card fraud

Define random variables
 Fraud(F):Probability that owner is a fraud
 Gas(G):Bought a gas in 24 hours
 Jewelry(J):Bought a jewelry in 24 hours
 Age(A):Age of owner of the card
 Sex(S):Gender of the owner of the card
Software Engineering Laboratory
19
 Give orders to random variables
 Define dependencies, but you have to be careful.
F
A
G
J
F
J
G
A
S
S
p(a | f )  p(a)
P(s | f , a)  p(s)
p ( g | f , a, s )  p ( g | f )
p ( j | f , a, s, g )  p ( j | f , a, s )
p(a | f )  p(a | f )
P( s | f , a)  p( s)
p ( g | f , a, s )  p ( g | f )
p ( j | f , a, s, g )  p ( j | f )
Software Engineering Laboratory
20
Next topic
 Training with Bayesian Network




Bayes Inference
If the training data is complete
If the training data is missing
Network Evaluation
Software Engineering Laboratory
21
Thank you for listening.
Software Engineering Laboratory
22