Introduction - ODU Computer Science

Download Report

Transcript Introduction - ODU Computer Science

Short Resume of Statistical Terms
Fall 2013
By Yaohang Li, Ph.D.
Review
• Last Class
– Introduction to Monte Carlo
• This Class
– Important Statistics Terms
• Random Events
–
–
Independence of Random Events
Axioms on Random Events
–
Independence of Random Variables
–
Characteristics of Expectation
–
–
rth moment
rth central moment
• Random Variables
• CDF
• PDF
• Expectation
• Moments of a Distribution
•
•
•
•
•
•
•
•
Mean
Variance
Standard Deviation
Covariance
–
Characteristics of covariance
Review of Statistics and Probability Terms
Important Distribution
Central Limit Theorem
Estimand and Estimator
• Next Class
– Monte Carlo for Integration
Random Events and Probability
• Random Event
– An event which has a chance of happening
• Probability
– A numerical measure of that chance
– Lying between 0 and 1, both inclusive
• Terminology
– P(A)
• The probability that an event A occurs
– P(A+B+…)
• The probability that at least one of the events A, B, … occurs
– P(AB…)
• The probability that all the events A, B, … occur
– P(A|B)
• The probability that the event A occurs when it known that the event B
occurs
• Conditional probability of A given B
Axioms in Probability
• P(A+B+…)P(A)+P(B)+…
– If only one of the events A, B, … can occur, they are called
exclusive. The equality holds
– If at least one of the events A, B, … must occur, they are called
exhaustive. P(A+B+…)=1
• P(AB)=P(A|B)P(B)
– If P(A|B)=P(A), A and B are independent
• The chance of A occurring is uninfluenced by the
occurrence of B
Random Variables and Distributions
• Random variable ()
– A number to characterize a set of exclusive and exhaustive
events
• Cumulative Distribution Function (CDF)
– F(y)=P(  y)
– The probability that the event which occurs has a value  not
exceeding a prescribed y
– F(+)=1 and F(-)=1
– F(y) is a non-decreasing function of y
Expectation
• If g() is a function of , the expectation (or mean value) of g is
denoted and defined by
Eg ( )   g ( y )dF ( y )
– Stieltjes integral
– The integral is taken over all values of y
• Explanation
– Continuous random events
• F(y) is continuous and f(y) is a derivative
Eg ( )   g ( y ) f ( y )dy
– Discrete random events
• F(y) is a step function and fi is the step of height at the points of yi
Eg ( )   g ( yi ) f i
i
• Probability Density Function (pdf)
– f(y) and yi are the probability density functions
More on Expectation
• The statistical physicist uses another notation for
expectation
– Suppose pi is the probability density function
• How about if g(x) is a constant function?
Linear Combination of the
Expectation Values
Multi-dimensional Distribution
• Multi-dimensional Random Variable
– Represented used a vector 
• Multi-dimensional CDF
– F(y)=P(  y)
•   y means that each coordinate of  is not greater than
the corresponding coordinate of y
• Expectation
Eg (η)   g ( y )dF ( y )
– Continuous multidimensional events
Eg (η)   g ( y ) f ( y )dy
• where
 k F ( y1 , y2 ,..., yk )
f ( y )  f ( y1 , y2 ,..., yk ) 
y1y2 ...yk
Independence of Random Variables
• Consider a set of exhaustive and exclusive events, each
characterized by a pair of numbers  and , for which
F(y,z) is the distribution. G(y) is an CDF for  and H(z)
is an CDF for .
– F(y,z) = P(  y,   z)
– G(y) = P(  y)
– H(z) = P(  z)
• If it so happens that
– F(y,z)=G(y)H(z) for all y and z
– the random variables  and  are called independent
Characteristics of Expectations
 Eg ( )  E  g ( )
i
i
i
i
i
i
• Hold regardless whether or not the random variables i
are independent or not
 Eg ( )  E g ( )
i
i
i
i
i
i
• Hold only i are mutual independent
Moments of Distribution
• rth moment of a distribution
– E(r)
• Principle moment
– = E()
• rth central moment
– r= E{(- )r}
• Most important moments
– = E(), known as the mean of 
• Measure of location of a random variable
– 2, known as the variance of  (usually used abbreviation of “var”)
• Measure of dispersion about the mean
– standard deviation
  2
– coefficients of variation
• /
Covariance
• Definition of covariance (usually abbreviation of cov)
– If  and  are random variables with means  and v,
respectively, the quantity E{(- )(-v)} is called the
covariance of  and 
– If  and  are independent, the covariance is 0
• Why?
– Also, cov(, )=var()
• Why?
Important Formula of Covariance
k
k
k
var( i )   cov(i , j )
i 1
i 1 j 1
Correlation Coefficient
• Definition
  cov( ,  ) / var  var 
–
–
–
–
Always between +1 and -1
If =0, they are not correlated
If <0, they are negatively correlated
If >0, they are positively correlated
Important Distributions
•
•
•
•
•
Uniform Distribution
Exponential Distribution
Binomial Distribution
Poison Distribution
Normal Distribution
Uniform Distribution
• Uniform Distribution (Rectangle Distribution)
– A distribution has constant probability
– Mean?
– Variance?
Exponential Distribution
• Exponential Distribution
– mean 1/
– variance 1/ 2
Binomial Distribution
• Binomial Distribution
– Discrete probability distribution Pp(n|N) of obtaining exactly n
successes out of N Bernoulli trials
– Each Bernoulli trial is true with probability p and false with
probability q=1-p
=
=
Poisson Distribution
• Poisson Distribution
– The limit of the Binomial Distribution
– Mean is v
– Variance is v
v nev
Pv (n)  lim PB (n) 
N 
n!
Normal Distribution
• Normal Distribution (Gaussian Distribution)
– Bell curve
– De Moivre developed the normal distribution as an
approximation to the binomial distribution
Normal Distribution in Data Analysis
• 68.26% of the data will be found within one SD
either side of the mean (±1SD)
 95.44% of the data will be found within two SD
either side of the mean(±2SD)
 99.74% of the data will be found within three SD
either side of the mean (±3SD)
Central Limit Theorem
• Central Limit Theorem
– The sum of n independent random variables has an
approximately normal distribution when n is large
• Random variables conform to arbitrary distribution
Central Limit Theorem in Practice
• In practice
– n = 10 is reasonably large number
– n = 25 is rather large (effective infinite)
Estimation
• Monte Carlo Computation
– Goal: estimating the unknown numerical value of some parameter of some
distribution
• The parameter is called an estimand
• Sample
• The available data (may consist of a number of observed random
variables)
• The number of observations in the sample is called the sample size
• Estimand
– mean
• (1+ 2+…+ n)/n
– weighted average
• (w11+w22+…+wnn)/(w1+w2+…+wn)
• May be a better estimator
• Connection between the sample and the estimand
– The estimand is a parameter of the distribution of the random variables
constituting the sample
Sampling Distribution
• Parent Distribution
– We can represent the sample by a vector  with coordinates 1, 2, 3,…,
n
– The distribution of 1, 2, 3,…, n is called the Parent Distribution
– To estimate the estimand  (a parameter of the Parent Distribution), we use
some function t()
• t is an estimator
• Sampling Distribution
–  is a random variable, so is t()
• if we repeated the experiment, we should expect to get a different value
of 
– Since  varies from experiment, t() has a distribution, called sampling
distribution
– If t() is to be close to , then the sampling distribution ought to be closely
concentrated around 
Measuring Sampling Distribution
• The bias of t
– The difference between  and the average value of t()
– =E{t()-}
– t is an unbiased estimator if =0
• The sampling variance of t
– 2t=var{t()}=E{[t()-Et()]2}=E{[t- - ]2}
• If  and 2t are small, t is a good estimator
Important Estimators
• Mean of the parent distribution
  (1  2  ... n ) / n
– standard error
   / n
• Variance of the parent distribution
2
s 2  (1   2  ...   n  n ) /( n  1)
2
2
– standard error
 s   2 / 0.5n
2
2
Efficiency
• Goal of Monte Carlo Work
– Obtain a respectably small standard error in the final result
– More random samples can lead to better accuracy
• Not very rewarding
– Variance Reduction Method
Summary
• Important Statistics Terms
– Random Events
• Independence of Random Events
• Axioms on Random Events
– Random Variables
• Independence of Random Variables
– CDF
– PDF
– Expectation
• Characteristics of Expectation
– Moments of a Distribution
• rth moment
• rth central moment
– Mean
– Variance
– Standard Deviation
– Covariance
• Characteristics of covariance
– Correlation Coefficient
Summary (Cont.)
• Important Distributions
– Uniform Distribution
– Exponential Distribution
– Binomial Distribution
– Poison Distribution
– Normal Distribution
• Estimation
–
–
–
–
–
Sample
Estimand
Parent Distribution
Sampling Distribution
Estimator
• Important estimators
– Buffon’s Needle
What I want you to do?
• Review Slides
• Review basic probability/statistics concepts
• Work on your Assignment 1