Introduction to Machine Learning

Download Report

Transcript Introduction to Machine Learning

CHAPTER 3:
Bayesian Decision
Theory
Souce: Alpaypin with modifications by
Christoph F. Eick; last changed on February 4, 2009.
Remark: Belief Networks will be covered
Early April 2011. Utility theory will be covered as port of
reinforcement learning.
Probability and Inference




Result of tossing a coin is {Heads,Tails}
Random var X {1,0}
Bernoulli: P {X=1} = po
Sample: X = {xt }Nt =1
Estimation: po = # {Heads}/#{Tosses} = ∑t xt / N
Prediction of next toss:
Heads if po > ½, Tails otherwise
In the theory of probability and statistics, a Bernoulli trial is an experiment whose
outcome is random and can be either of two possible outcomes, "success" and "failure".
P(X=k)=
2
Binomial Distribution
3
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Classification



Credit scoring: Inputs are income and savings.
Output is low-risk vs high-risk
Input: x = [x1,x2]T ,Output: C  {0,1}
Prediction:
C  1 if P (C  1 | x 1,x 2 )  0.5
choose 
C  0 otherwise
or equivalent ly
C  1 if P (C  1 | x 1,x 2 )  P (C  0 | x 1,x 2 )
choose 
C  0 otherwise
4
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Bayes’ Rule
prior
posterior
likelihood
P C  p x | C 
P C | x  
p x 
evidence
P C  0  P C  1  1
p x   p x | C  1P C  1  p x | C  0P C  0
p C  0 | x   P C  1 | x   1
5
Bayes’ Rule: K>2 Classes
p x | Ci P Ci 
P C i | x  
p x 
p x | Ci P Ci 
 K
 p x | Ck P Ck 
k 1
P Ci   0 and
K
 P C   1
i 1
i
choose Ci if P Ci | x   max k P C k | x 
6
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Losses and Risks

Actions: αi
Loss of αi when the state is Ck : λik

Expected risk (Duda and Hart, 1973)

K
R i | x    ik P C k | x 
k 1
choose i if R i | x   min k R  k | x 
7
Losses and Risks: 0/1 Loss
0 if i  k
ik  
1 if i  k
K
R i | x    ik P Ck | x 
k 1
  P Ck | x 
k i
 1  P Ci | x 
For minimum risk, choose the most probable class
Remark: This strategy is not optimal in other cases
8
Example and Homework!
C1=has cancer
C2=has not cancer
12=9
21=90
Homework:
a) Determine the optimal decision making strategy
Inputs: P(C1|x), P(C2|x)
Decision Making Strategy:…
b) Now assume we also have a reject option and the cost for making no
decision are 3:
reject,2=3
reject, 1=3
Inputs: P(C1|x), P(C2|x)
Decision Making Strategy: …
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
9
Losses and Risks: Reject
0 if i  k

ik   if i  K  1, 0    1
1 otherwise

Risk for
reject
K
R  K 1 | x    P Ck | x   
k 1
R i | x    P Ck | x  1  P Ci | x 
k i
choose Ci if P Ci | x   P Ck | x  k  i and P Ci | x   1  
reject
otherwise
10
Discriminant Functions
choose Ci if gi x   max k gk x 
gi x , i  1,, K
 R i | x 

gi x   P Ci | x 
p x | C P C 
i
i

K decision regions R1,...,RK
Ri  x | gi x   max k gk x 
11
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)