Transcript Slide 1

Institute of Empirical Research in Economics (IEW)
PATTERN RECOGNITION
AND MACHINE LEARNING
Laboratory for Social & Neural
Systems Research (SNS)
22-09-2010
Computational Neuroeconomics and
Neuroscience
1
Course schedule
Date
Topic
13-10-2010
Density Estimation, Bayesian Inference
Adrian Etter, Marco Piccirelli, Giuseppe Ugazio
Chapter
2
20-10-2010
Linear Models for Regression
Susanne Leiberg, Grit Hein
3
27-10-2010
Linear Models for Classification
Friederike Meyer, Chaohui Guo
4
03-11-2010
Kate Lomakina
6
Kernel Methods I: Gaussian Processes
10-11-2010
Kernel Methods II: SVM and RVM
Christoph Mathys, Morteza Moazami
7
17-11-2010
Justin Chumbley
8
22-09-2010
Probabilistic Graphical Models
Computational Neuroeconomics and
Neuroscience
2
Course schedule
Date
Topic
24-11-2010
Mixture Models and EM
Bastiaan Oud, Tony Williams
Chapter
9
01-12-2010
Falk Lieder
Approximate Inference I: Deterministic Approximations
10
08-12-2010
Kay Brodersen
Approximate Inference II: Stochastic Approximations
11
15-12-2010
Inference on Continuous Latent Variables: PCA,
Probabilistic PCA, ICA
12
Lars Kasper
22-12-2010
Sequential Data: Hidden Markov Models, Linear Dynamical
Systems
13
Chris Burke, Yosuke Morishima
22-09-2010
Computational Neuroeconomics and
Neuroscience
3
Institute of Empirical Research in Economics (IEW)
CHAPTER 1: PROBABILITY, DECISION,
AND INFORMATION THEORY
Sandra Iglesias
Laboratory for Social & Neural
Systems Research (SNS)
22-09-2010
Computational Neuroeconomics and
Neuroscience
4
Outline
- Introduction
- Probability Theory
- Probability Rules
- Bayes’Theorem
- Gaussian Distribution
- Decision Theory
- Information Theory
22-09-2010
Computational Neuroeconomics and
Neuroscience
5
Pattern recognition
computer algorithms
οƒ  automatic discovery of regularities in data
use of these regularities to take actions such
as classifying the data into different categories
classify data (patterns) based either on
- a priori knowledge or
- statistical information extracted from the patterns
22-09-2010
Computational Neuroeconomics and
Neuroscience
6
Machine learning
'How can we program systems to automatically
learn and to improve with experience?'
the machine is programmed to learn from
an incomplete set of examples (training set)
the core objective of a learner is to
generalize from its experience
22-09-2010
Computational Neuroeconomics and
Neuroscience
7
Polynomial Curve Fitting
π‘“π‘’π‘›π‘π‘‘π‘–π‘œπ‘›: sin⁑
(2πœ‹π‘₯)
22-09-2010
Computational Neuroeconomics and
Neuroscience
8
Sum-of-Squares Error Function
οƒ 
22-09-2010
Computational Neuroeconomics and
Neuroscience
9
Plots of polynomials
22-09-2010
Computational Neuroeconomics and
Neuroscience
10
Over-fitting
Root-Mean-Square (RMS) Error:
22-09-2010
Computational Neuroeconomics and
Neuroscience
11
Regularization
Penalize large coefficient values
M=9
M=9
22-09-2010
Computational Neuroeconomics and
Neuroscience
12
Regularization:
vs.
M=9
22-09-2010
Computational Neuroeconomics and
Neuroscience
13
Outline
-
Introduction
Probability Theory
Decision Theory
Information Theory
22-09-2010
Computational Neuroeconomics and
Neuroscience
14
Probability Theory
Noise on measurements
Uncertainty
Finite size of data sets
Probability theory:
οƒ consistent framework for the
quantification and manipulation
of uncertainty
22-09-2010
Computational Neuroeconomics and
Neuroscience
15
Probability Theory
Marginal Probability
Joint Probability
22-09-2010
Conditional Probability
Computational Neuroeconomics and
Neuroscience
16
Probability Theory
i = 1, …,M
j = 1, …,L
nij: number of trials in which
X = xi and Y = yj
ci: number of trials in which
X = xi irrespective of the
value of Y
rj: number of trials in which
X = xi irrespective of the
value of Y
22-09-2010
Computational Neuroeconomics and
Neuroscience
17
Probability Theory
Marginal Probability
Joint Probability
22-09-2010
Conditional Probability
Computational Neuroeconomics and
Neuroscience
18
Probability Theory
Marginal Probability
Joint Probability
22-09-2010
Conditional Probability
Computational Neuroeconomics and
Neuroscience
19
Probability Theory
Marginal Probability
Joint Probability
22-09-2010
Conditional Probability
Computational Neuroeconomics and
Neuroscience
20
Probability Theory
Sum Rule
22-09-2010
Computational Neuroeconomics and
Neuroscience
21
Probability Theory
Product Rule
22-09-2010
Computational Neuroeconomics and
Neuroscience
22
The Rules of Probability
Sum Rule
Product Rule
22-09-2010
Computational Neuroeconomics and
Neuroscience
23
Bayes’ Theorem
p(X,Y) = p(Y,X)
T. Bayes (1702-1761)
P.-S. Laplace (1749-1827)
22-09-2010
Computational Neuroeconomics and
Neuroscience
24
Bayes’ Theorem
Polynomial curve fitting problem
𝑝 𝐷|π’˜ 𝑝 π’˜
𝑝 π’˜|𝐷 =
𝑝 𝐷
T. Bayes (1702-1761)
posterior ο‚΅ likelihood × prior
P.-S. Laplace (1749-1827)
22-09-2010
Computational Neuroeconomics and
Neuroscience
25
Probability Densities
22-09-2010
Computational Neuroeconomics and
Neuroscience
26
Expectations
Expectation of f(x) is the average value of some
function f(x) under a probability distribution p(x)
Expectation for a
discrete distribution:
22-09-2010
Expectation for a
continuous distribution:
Computational Neuroeconomics and
Neuroscience
27
The Gaussian Distribution
22-09-2010
Computational Neuroeconomics and
Neuroscience
28
Gaussian Parameter Estimation
Likelihood function
22-09-2010
Computational Neuroeconomics and
Neuroscience
29
Maximum (Log) Likelihood
22-09-2010
Computational Neuroeconomics and
Neuroscience
30
Curve Fitting Re-visited
22-09-2010
Computational Neuroeconomics and
Neuroscience
31
Maximum Likelihood
Determine
22-09-2010
by minimizing sum-of-squares error,
Computational Neuroeconomics and
Neuroscience
.
32
Outline
-
Introduction
Probability Theory
Decision Theory
Information Theory
22-09-2010
Computational Neuroeconomics and
Neuroscience
33
Decision Theory
β€’ Used with probability theory to make optimal decisions
β€’ Input vector x, target vector t
β€’ Regression: t is continuous
β€’ Classification: t will consist of class labels
β€’ Summary of uncertainty associated is given by
β€’ Inference problem: is to obtain
from data
β€’ Decision problem: make specific prediction for value of t and
take specific actions based on t
Inference step
Determine either
22-09-2010
or
.
Decision step
For given x, determine
optimal t.
Computational Neuroeconomics and
Neuroscience
34
Medical Diagnosis Problem
β€’
β€’
β€’
β€’
X-ray image of patient
Whether patient has cancer or not
Input vector x: set of pixel intensities
Output variable t: whether cancer or not
β€’ C1 = cancer; C2 = no cancer
β€’ General inference problem is to determine 𝑝(π‘₯, πΆπ‘˜ ) which
gives most complete description of situation
β€’ In the end we need to decide whether to give treatment or
not
οƒ  Decision theory helps do this
22-09-2010
Computational Neuroeconomics and
Neuroscience
35
Bayes’ Decision
β€’ How do probabilities play a role in making a
decision?
β€’ Given input x and classes Ck using Bayes’
theorem
𝑝 π‘₯|πΆπ‘˜ 𝑝 πΆπ‘˜
𝑝 πΆπ‘˜ |π‘₯ =
𝑝 π‘₯
β€’ Quantities in Bayes theorem can be obtained from
p(x,Ck) either by marginalizing or conditioning with
respect to the appropriate variable
22-09-2010
Computational Neuroeconomics and
Neuroscience
36
Minimum Expected Loss
Example: classify medical images as β€˜cancer’ or β€˜normal’
Decision
Truth
β€’ Unequal importance of mistakes
β€’ Loss or Cost Function given by
Loss Matrix
β€’ Utility is negative of Loss
β€’ Minimize Average Loss
β€’ Regions
22-09-2010
are chosen to minimize
Computational Neuroeconomics and
Neuroscience
37
Why Separate Inference and Decision?
Classification problem οƒ  broken into two separate stages:
– Inference stage: training data is used to learn a model for
𝑝 πΆπ‘˜ , π‘₯ =
𝑝 π‘₯, πΆπ‘˜ 𝑝 πΆπ‘˜
𝑝 π‘₯
– Decision stage: posterior probabilities used to make optimal
class assignments
Three distinct approaches to solving decision problems
1. Generative models:
2. Discriminative models
3. Discriminant functions
22-09-2010
Computational Neuroeconomics and
Neuroscience
38
Generative models
1. solve inference problem of determining
πΆπ‘˜ each
class-conditional densities 𝑝 π‘₯|πΆπ‘˜ 𝑝 for
𝑝 πΆπ‘˜ |π‘₯ =
𝑝 π‘₯ 𝑝 π‘₯, 𝐢 𝑝 𝐢
class separately and the prior probabilities
π‘˜
π‘˜
𝑝 πΆπ‘˜ , π‘₯ =
𝑝 π‘₯ probabilities
2. use Bayes’ theorem to determine posterior
𝑝 π‘₯|πΆπ‘˜ 𝑝 πΆπ‘˜
𝑝 πΆπ‘˜ |π‘₯ =
𝑝 π‘₯
3. use decision theory to determine class membership
22-09-2010
Computational Neuroeconomics and
Neuroscience
39
Discriminative models
1. solve inference problem to determine
𝑝 π‘₯|πΆπ‘˜ 𝑝 𝐢
posterior class probabilities 𝑝 πΆπ‘˜ |π‘₯ =
𝑝 π‘₯
2. Use decision theory to determine class
membership
22-09-2010
Computational Neuroeconomics and
Neuroscience
40
Discriminant functions
1. Find a function f(x) that maps each input x
directly to a class label
e.g. two-class problem: f (·) is binary valued
f =0 represents C1, f =1 represents C2
οƒ  Probabilities play no role
22-09-2010
Computational Neuroeconomics and
Neuroscience
41
Decision Theory for Regression
Inference step
Determine
Decision step
For given x, make optimal prediction, y(x), for t
Loss function:
22-09-2010
Computational Neuroeconomics and
Neuroscience
42
Outline
-
Introduction
Probability Theory
Decision Theory
Information Theory
22-09-2010
Computational Neuroeconomics and
Neuroscience
43
Information theory
β€’ Quantification of information
Degree of surprise:
highly improbable οƒ  a lot of information
highly probable
οƒ  less information
certain
οƒ  no information
β€’ Based on probability theory
β€’ Most important quantity: entropy
22-09-2010
Computational Neuroeconomics and
Neuroscience
44
Entropy
Entropy is the average amount of information expected,
weighted with the probability of the random variable
οƒ  quantifies the uncertainty involved when we encounter this
random variable
H[x]
0
22-09-2010
p(x)
Computational Neuroeconomics and
Neuroscience
45
The Kullback-Leibler Divergence
β€’ Non-symmetric measure of the difference
between two probability distributions
β€’ Also called relative entropy
22-09-2010
Computational Neuroeconomics and
Neuroscience
46
Mutual Information
Two sets of variables: x and y
If independent:
𝑝 π‘₯, 𝑦 = 𝑝 π‘₯ 𝑝 𝑦
If not independent:
22-09-2010
Computational Neuroeconomics and
Neuroscience
47
Mutual Information
Mutual information
22-09-2010
οƒ  mutual dependence
οƒ  shared information
οƒ  related to the
conditional entropy
Computational Neuroeconomics and
Neuroscience
48
Course schedule
Date
22-09-2010
13-10-2010
20-10-2010
27-10-2010
03-11-2010
10-11-2010
17-11-2010
24-11-2010
01-12-2010
08-12-2010
15-12-2010
22-12-2010
22-09-2010
Topic
Chapter
Probability, Decision, and Information Theory
1
Density Estimation, Bayesian Inference
2
Linear Models for Regression
3
Linear Models for Classification
4
Kernel Methods I: Gaussian Processes
6
Kernel Methods II: SVM and RVM
7
Probabilistic Graphical Models
8
Mixture Models and EM
9
Approximate Inference I: Deterministic Approximations
10
Approximate Inference II: Stochastic Approximations
11
Inference on Continuous Latent Variables: PCA,
Probabilistic PCA, ICA
12
Sequential Data: Hidden Markov Models, Linear Dynamical
Systems
13
Computational Neuroeconomics and
Neuroscience
49