distribution
Download
Report
Transcript distribution
Review of Probability
Important Topics
1 Random Variables and Probability Distributions
2 Expected Values, Mean, and Variance
3 Two Random Variables
4 The Normal, Chi-Squared, Fm , , and t Distributions
5 Random Sampling and the Sampling Distribution
6 Large-Sample Approximations to Sampling Distributions
Definitions
Outcomes: the mutually exclusive potential results of a random
process.
Probability: the proportion of the time that the outcome occurs.
Sample space: the set of all possible outcomes.
Event: A subset of the sample space.
Random variables: a random variable is a numerical summary of
a random outcome.
Probability distribution: discrete variable
List of all possible [x, p(x)] pairs
x = value of random variable (outcome)
p(x) = probability associated with value
Mutually exclusive (no overlap)
Collectively exhaustive (nothing left out)
0 p(x) 1 for all x
p(x) = 1
Probability distribution: discrete variable
Probabilities of events.
Cumulative probability distribution.
Example: Bernoulli distribution.
Let G be the gender of the next new person you meet, where
G=0 indicates that the person is male and G=1 indicates that
she is female.
The outcomes of G and their probabilities are
G= 1 with probability p
= 0 with probability 1-p
Probability distribution: continuous variable
Probability distribution: continuous variable
1.
Mathematical formula
2.
Shows all values, x, and frequencies, f(x)
•
f(x) Is Not Probability
3. Properties
Frequency
(Value, Frequency)
f(x)
f (x)dx 1
All x (Area Under Curve)
f (x ) 0, a x b
a
b
Value
x
Probability density function (p.d.f.).
Probability distribution: Continuous Variable
Cumulative probability distribution.
Uniform Distribution
1. Equally likely outcomes
2. Probability density
function
1
f ( x)
d c
f(x)
1
d c
3. Mean and Standard Deviation
cd
2
d c
12
c
a
b
d
x
Expected Values, Mean, and Variance
Expected value of a Bernoulli random variable
Expected value of a continuous random variable
Let f(Y) is the p.d.f of random variable Y , then the expected
value of Y is
Variance, Standard Deviation, and Moments
Variance of a Bernoulli random variable
The mean of the Bernoulli random variable G is G p
, so its variance is
The standard deviation is
Moments
The expected value of Y r is called the r th moments of the
random variable Y .
That is the r th moment of Y is E(Y r ).
The mean of Y , E(Y), is also called the first moment of Y .
Moments, ctd.
Y Y 3
E
skewness =
Y3
=measure of asymmetry of a distribution
skewness = 0: distribution is symmetric
skewness > (<) 0: distribution has long right (left) tail
Moments, ctd.
kurtosis =
4
E Y Y
Y4
=measure of mass in tails
= measure of probability of large values
kurtosis = 3: normal distribution
kurtosis > 3: heavy tails (“leptokurtotic”)
Mean and Variance of a Linear Function of a
Random Variable
Suppose X is a random variable with mean
and
Then the mean and variance of Y are
and the standard deviation of Y is
and variance
,
Two Random Variables
Joint and Marginal Distributions
The joint probability distribution of two discrete random
variables, say X and Y , is the probability that the random
variables simultaneously take on certain values, say x and .
The joint probability distribution can be written as the function
The marginal probability distribution of a random variable Y
is just another name for its probability distribution.
E (Y ) 0 (0.15 0.15) 1 (0.07 0.63)
Conditional distribution of Y given X=x is
Conditional expectation of Y given X=x is
E (Y ) 0 (0.35 0.45) 1 (0.065 0.035) 2 (0.05 0.01)
3 (0.025 0.005) 4 (0.01 0.00)
0.35
E (Y ) E (Y | A 0) Pr( A 0) E (Y | A 1) Pr( A 1)
(0 0.70 1 0.13 2 0.10 3 0.05 4 0.02) 0.5
(0 0.90 1 0.07 2 0.02 3 0.01 4 0.00) 0.5
0.35
The mean of Y is the weighted average of the conditional
expectation of Y given X, weighted by the probability
distribution of X.
Stated differently, the expectation of Y is the expectation of the
conditional expectation of Y given X, that is
where the inner expectation is computed using the conditional
distribution of Y given X and the outer expectation is computed
using the marginal distribution of X.
This is known as the law of iterated expectations.
Conditional variance
The variance of Y conditional on X is the variance of the
conditional distribution of Y given X.
Var (Y | A 0) [0 0.56]2 0.70 [1 0.56]2 0.13
[2 0.56]2 0.10 [3 0.56]2 0.05
[4 0.56]2 0.02
Var (Y | A 1) [0 0.14]2 0.90 [1 0.14]2 0.07
[2 0.14]2 0.02 [3 0.14]2 0.01
[4 0.14]2 0.00
Independence
Two random variable X and Y are independently distributed,
or independent, if knowing the value of one of the variables
provides no information about the other.
That is, X and Y are independent if for all values of x and ,
State differently, X and Y are independent if
That is, the joint distribution of two independent random
variables is the product of their marginal distributions.
Covariance and Correlation
Covariance
One measure of the extent to which two random variables move
together is their covariance.
Correlation
The correlation is an alternative measure of dependence between
X and Y that solves the “unit” problem of covariance.
The random variables X and Y are said to be uncorrelated if
Corr(X, Y) = 0.
The correlation is always between -1 and 1.
The Mean and Variance of Sums of Random
Variables
Normal, Chi-Squared, Fm , , and t Distributions
The Normal Distribution
The probability density function of a normal distributed random
variable (the normal p.d.f.) is
where exp(x) is the exponential function of x.
The factor
ensures that
The normal distribution with mean μ and
variance σ2 is expressed as N(μ, σ2 ).
90%: +- 1.69
95%: +- 1.96
99%: +- 2.58
The Empirical Rule (normal distribution)
36 of 42
Copyright © 2011 Pearson Education, Inc.
The standard normal distribution is the normal distribution
with mean μ= 0 and variance σ2 = 1 and is denoted N(0, 1).
The standard normal distribution is often denoted by Z and its
cumulative distribution function is denoted by Ф. Accordingly,
Pr(Z ≤ c)= Ф(c), where c is a constant.
Key Concept 2.4
Copyright © 2003 by Pearson
Education, Inc.
2-47
The bivariate normal distribution
The bivariate normal p.d.f. for the two random variables X and
Y is
where
is the correlation between X and Y .
Important properties for normal distribution.
1. If X and Y have a bivariate normal distribution with covariance ,
and if a and b are two constants, then
2. The marginal distribution of each of the two variables is normal.
This follows by setting a = 1; b = 0 in 1.
3. If
= 0, then X and Y are independent.
4. Any linear combination of random draws from normal
distributions also has a normal distribution.
The Chi-squared distribution
The Chi-squared distribution is the distribution of the sum of
m squared independent standard normal random variables.
The distribution depends on m, which is called the degrees of
freedom of the chi-squared distribution.
A chi-squared distribution with m degrees of freedom is denoted
.
Fm , distribution
where
and
are independent.
When n is ∞,
.
The Fm , distribution is the distribution of a random variable
with a chi-squared distribution with m degrees of freedom,
divided by m.
Equivalently, the Fm , distribution is the distribution of the
average of m squared standard normal random variables.
The Student t Distribution
The Student t distribution with m degrees of freedom is
defined to be the distribution of the ratio of a standard normal
random variable, divided by the square root of an independently
distributed chi-squared random variable with m degrees of
freedom divided by m.
That is, let Z be a standard normal random variable, let W be a
random variable with a chi-squared distribution with m degrees
of freedom, and let Z and W be independently distributed. Then
When m is 30 or more, the Student t distribution is well
approximated by the standard normal distribution, and t∞
distribution equals the standard normal distribution Z.
Random Sampling
Simple random sampling is the simplest sampling scheme in
which n objects are selected at random from a population and each
member of the population is equally likely to be included in the
sample.
Since the members of the population included in the sample are
selected at random, the values of the observations Y1, … , Yn are
themselves random.
i.i.d. draws.
Because individuals #1 and #2 are selected at random, the value
of Y1 has no information content for Y2. Thus:
Y1 and Y2 are independently distributed
Y1 and Y2 come from the same distribution, that is, Y1, Y2 are
identically distributed
That is, under simple random sampling, Y1 and Y2 are
independently and identically distributed (i.i.d.).
More generally, under simple random sampling, {Yi}, i =
1,…, n, are i.i.d.
This framework allows rigorous statistical inferences about
moments of population distributions using a sample of data
from that population …
Sampling Distribution of the Sample Average
The sample average of the n observations Y1, … , Yn is
Because Y1, … , Yn are random, their average is random and has
a probability distribution. The distribution of is called the
sampling distribution of .
Mean and Variance of
Suppose Y1, … , Yn are i.i.d. and let
and variance of Yi . Then
and
denote the mean
Things we want to know about the sampling
distribution:
What is the mean of ?
If E( ) = true = .78, then is an unbiased estimator of
What is the variance of ?
How does var( ) depend on n (famous 1/n formula)
Does become close to when n is large?
Law of large numbers:
is a consistent estimator of
– appears bell shaped for n large…is this generally true?
In fact, – is approximately normally distributed for n large
(Central Limit Theorem)
Find out the sampling distribution of Y if Y is normally
distributed
The linear combination of normally distributed random variable
is also normally distributed (equation 2. 42)
For a normal, we need to find out mean and variance to
determine its distribution
E (Y ) y , var(Y ) y
Therefore,
Y
N ( y ,
y2
n
)
Large-Sample Approximations to Sampling
Distributions
Two approaches to characterizing sample distributions.
Exact distribution, or finite sample distribution when the
distribution of Y is known.
Asymptotic distribution: large-sample approximation to the
sampling distribution.
Law of Large Numbers
The law of large numbers states that, under general conditions,
will be near
with very high probability when n is large.
The property that is near
with increasing probability as n
increases is called convergence in probability, or consistency.
The law of large numbers states that, under certain conditions,
converges in probability to
, or, is consistent for
.
Key Concept 2.6
Copyright © 2003 by Pearson
Education, Inc.
2-63
The conditions for the law of large numbers are
Yi , i=1, …, n, are i.i.d.
The variance of Yi ,
, is finite.
Formal definitions of consistency and law of large
numbers.
Consistency and convergence in probability.
Let S1 , S2 , … , Sn , … be a sequence of random variables. For
example, Sn could be the sample average of a sample of n
observations of the random variable Y .
The sequence of random variables {Sn} is said to converge in
probability to a limit, μ, if the probability that Sn is within ±δ of
μ tends to one as n → ∞, as long as the constant is positive.
That is,
if and only if Pr [| Sn – μ |≥δ] → 0 as n → ∞ for every
δ > 0.
If
, then Sn is said to be a consistent estimator of μ .
The law of large numbers.
If Y1, … , Yn are i.i.d., E(Yi) =
and Var(Yi) < ∞, then
The Central Limit Theorem
The central limit theorem says that, under general conditions,
the distribution of is well approximated by a normal
distribution when n is large.
Since the mean of is
and its variance if
, when n is
large the distribution of is approximately N( , ).
Accordingly,
is well approximated by the standard normal
distribution N(0,1)
The sampling distribution of when n is large
For small sample sizes, the distribution of is complicated, but if
n is large, the sampling distribution is simple!
As n increases, the distribution of becomes more tightly
centered around Y (the Law of Large Numbers)
Moreover, the distribution of – Y becomes normal (the
Central Limit Theorem)
Convergence in distribution.
Let F1, … , Fn , … be a sequence of cumulative distribution
functions corresponding to a sequence of random variables, S1,
… , Sn , … .
Then the sequence of random variables Sn is said to converge in
distribution to S (denoted as
) if the distribution
functions {Fn} converge to F.
That is,
if and only if
where the limit holds at all
points t at which the limiting distribution F is continuous.
The distribution F is called the asymptotic distribution of Sn .
The central limit theorem
If Y1, … , Yn are i.i.d. and 0 <
< ∞, then
In other words, the asymptotic distribution of
is N(0,1) .
Slutsky’s theorem
Slutsky’s theorem combines consistency and convergence in
distribution.
Suppose that
, where a is a constant, and
. Then
Continuous mapping theorem
If g is a continuous function, then
But, how large of n is “large enough?”
The answer is: it depends on the distribution of the underlying Yi
that make up the average.
At one extreme, if the Yi are themselves normally distributed,
then is exactly normally distributed for all n.
In contrast, when Yi is far from normally distributed, then this
approximation can require n = 30 or even more.
Example: A skewed distribution.