Gaussian Process

Download Report

Transcript Gaussian Process

Kernel Methods – Gaussian
Processes
Presented by Shankar Bhargav
Arizona State University
DMML
Gaussian Processes
• Extending role of kernels to probabilistic
discriminative models leads to framework
of Gaussian processes
• Linear regression model
– Evaluate posterior distribution over W
• Gaussian Processes: Define probability
distribution over functions directly
Arizona State University
DMML
Linear regression
x - input vector
w – M Dimensional weight vector
Prior distribution of w given by the Gaussian form
Prior distribution over w induces a probability distribution
over function y(x)
Arizona State University
DMML
Linear regression
Y is a linear combination of Gaussian distributed variables
given by elements of W,
where
is the design matrix with elements
We need only mean and covariance to find the joint distribution of Y
where K is the Gram matrix with elements
Arizona State University
DMML
Gaussian Processes
• Defn. : Probability distributions over functions
y(x) such that the set of values of y(x) evaluated
at an arbitrary set of points jointly have a
gaussian distribution
– Mean is assumed zero
– Covariance of y(x) evaluated at any two values of x is
given by the kernel function
Arizona State University
DMML
Gaussian Processes for regression
To apply Gaussian process models for regression we need to take
account of noise on observed target values
Consider noise processes with gaussian distribution
with
To find marginal distribution over ‘t’ we need to integrate over ‘Y’
where covariance matrix C
has elements
Arizona State University
DMML
Gaussian Processes for regression
Joint distribution over
is given by
Conditional distribution of
mean and covariance given by
where
Arizona State University
and
is a Gaussian distribution with
is N*N covariance matrix
DMML
Learning the hyperparameters
• Rather than fixing the covariance function we can use a
parametric family of functions and then infer the
parameter values from the data
• Evaluation of likelihood function
where
denotes the hyperparameters of Gaussian process model
• Simplest approach is to make a point estimate of
maximizing the log likelihood function
Arizona State University
by
DMML
Gaussian Process for classification
• We can adapt gaussian processes to
classification problems by transforming the
output using an appropriate nonlinear
activation function
– Define Gaussian process over a function a(x),
and transform using Logistic sigmoid function
,we obtain a non-Gaussian
stochastic process over functions
Arizona State University
DMML
The left plot shows a sample from the Gaussian process prior over functions a(x). The right plot
shows the result of transforming this sample using a logistic sigmoid function.
Probability distribution function over target variable is given
by Bernoulli distribution on one dimensional input space
Arizona State University
DMML
Gaussian Process for classification
• To determine the predictive distribution
we introduce a Gaussian process prior over
vector
, the Gaussian prior takes the
form
The predictive distribution is given by
where
Arizona State University
DMML
Gaussian Process for classification
• The integral is analytically intractable so may be
approximated using sampling methods.
• Alternatively techniques based on analytical
approximation can be used
– Variational Inference
– Expectation propagation
– Laplace approximation
Arizona State University
DMML
Illustration of Gaussian process for classification
Optimal decision boundary – Green
Decision boundary from Gaussian Process classifier - Black
Arizona State University
DMML
Connection to Neural Networks
• For a broad class of prior distributions over w,
the distribution of functions generated by a
neural network will tend to a Gaussian process
as M -> Infinity
• In this Gaussian process limit the ouput
variables of the neural network become
independent.
Arizona State University
DMML
Thank you
Arizona State University
DMML