Transcript pptx - CUNY

Lecture 3: Math Primer II
Machine Learning
Andrew Rosenberg
Today
•
•
•
•
Wrap up of probability
Vectors, Matrices.
Calculus
Derivation with respect to a vector.
1
Properties of probability density
functions
Sum Rule
Product Rule
2
Expected Values
• Given a random variable, with a distribution
p(X), what is the expected value of X?
3
Multinomial Distribution
• If a variable, x, can take 1-of-K states, we
represent the distribution of this variable
as a multinomial distribution.
• The probability of x being in state k is μk
4
Expected Value of a Multinomial
• The expected value is the mean values.
5
Gaussian Distribution
• One Dimension
• D-Dimensions
6
Gaussians
7
How machine learning uses
statistical modeling
• Expectation
– The expected value of a function is the
hypothesis
• Variance
– The variance is the confidence in that
hypothesis
8
Variance
• The variance of a random variable describes
how much variability around the expected
value there is.
• Calculated as the expected squared error.
9
Covariance
• The covariance of two random variables
expresses how they vary together.
• If two variables are independent, their
covariance equals zero.
10
Linear Algebra
• Vectors
– A one dimensional array.
– If not specified, assume x is a column
vector.
• Matrices
– Higher dimensional array.
– Typically denoted with capital letters.
– n rows by m columns
11
Transposition
• Transposing a matrix swaps columns and
rows.
12
Transposition
• Transposing a matrix swaps columns and
rows.
13
Addition
• Matrices can be added to themselves iff
they have the same dimensions.
– A and B are both n-by-m matrices.
14
Multiplication
• To multiply two matrices, the inner dimensions must
be the same.
– An n-by-m matrix can be multiplied by an m-by-k matrix
15
Inversion
• The inverse of an n-by-n or square matrix
A is denoted A-1, and has the following
property.
• Where I is the identity matrix is an n-by-n
matrix with ones along the diagonal.
– Iij = 1 iff i = j, 0 otherwise
16
Identity Matrix
• Matrices are invariant under multiplication
by the identity matrix.
17
Helpful matrix inversion properties
18
Norm
• The norm of a vector, x, represents the
euclidean length of a vector.
19
Positive Definite-ness
• Quadratic form
– Scalar
– Vector
• Positive Definite matrix M
• Positive Semi-definite
20
Calculus
• Derivatives and Integrals
• Optimization
21
Derivatives
• A derivative of a function defines the
slope at a point x.
22
Derivative Example
23
Integrals
• Integration is the inverse operation of
derivation (plus a constant)
• Graphically, an integral can be considered
the area under the curve defined by f(x)
24
Integration Example
25
Vector Calculus
• Derivation with respect to a matrix or
vector
• Gradient
• Change of Variables with a Vector
26
Derivative w.r.t. a vector
• Given a vector x, and a function f(x), how
can we find f’(x)?
27
Derivative w.r.t. a vector
• Given a vector x, and a function f(x), how
can we find f’(x)?
28
Example Derivation
29
Example Derivation
Also referred to as the gradient of a function.
30
Useful Vector Calculus identities
• Scalar Multiplication
• Product Rule
31
Useful Vector Calculus identities
• Derivative of an inverse
• Change of Variable
32
Optimization
• Have an objective function that we’d like to
maximize or minimize, f(x)
• Set the first derivative to zero.
33
Optimization with constraints
• What if I want to constrain the parameters
of the model.
– The mean is less than 10
• Find the best likelihood, subject to a
constraint.
• Two functions:
– An objective function to maximize
– An inequality that must be satisfied
34
Lagrange Multipliers
• Find maxima of
f(x,y) subject to a
constraint.
35
General form
• Maximizing:
• Subject to:
• Introduce a new variable, and find a
maxima.
36
Example
• Maximizing:
• Subject to:
• Introduce a new variable, and find a
maxima.
37
Example
Now have 3 equations with 3 unknowns.
38
Example
Eliminate Lambda
Substitute and Solve
39
Why does Machine Learning need
these tools?
• Calculus
– We need to identify the maximum likelihood, or
minimum risk. Optimization
– Integration allows the marginalization of
continuous probability density functions
• Linear Algebra
– Many features leads to high dimensional spaces
– Vectors and matrices allow us to compactly
describe and manipulate high dimension al
feature spaces.
40
Why does Machine Learning need
these tools?
• Vector Calculus
– All of the optimization needs to be performed
in high dimensional spaces
– Optimization of multiple variables
simultaneously – Gradient Descent
– Want to take a marginal over high
dimensional distributions like Gaussians.
41
Next Time
• Linear Regression
– Then Regularization
42