Lecture 2 Review

Download Report

Transcript Lecture 2 Review

Lecture 2
Review
Probabilities
Probability Distributions
Normal probability
distributions
Sampling distributions and
estimation
Probabilities
The probability of an even occurring is the proportion
of its occurrence in the long run.
Some properties of probabilities:
 Usually expressed as between 0 and 1.
 Sometimes they can be expressed in percentages,
and referred to as chance, or the likelihood of an
outcome. (eg: there is a 30% probability of
precipitation).
Probabilities



The sum of the probabilities of all possible outcomes
will always be 1.00, or 100%. (something has to
happen).
Therefore, the probability that an event will not occur
is the complement of the probability that it will occur.
Probabilities are sometimes converted to odds. The
odds of an event happening is the ratio of the
probability of its occurrence to the probability of its
non-occurrence.
Probabilities
The odds of variable Y having outcome of y is
equal to the probability of outcome y, divided by
the probability of an outcome other than y.
P( y )

1  P( y )
Probability Distributions

Probability Distribution: A list of the possible
outcomes for a variable, along with their probabilities

Probabilities for continuous variables can be plotted
as curves. The area under the curve is the
probability that the variable will take that value.

The mean of the probability distribution is also
known as the expected value of the variable.
Probability Distributions
For discrete distributions, the expected value
(μ) is the sum of the possible values, multiplied
by their probabilities of occurrence.
   yP y 
Where y is the value for the variable Y, and
P(y) is the probability of that value.
The Normal Curve
Characteristics of the normal curve:
 Bell-shaped and unimodal
 Asymptotic
 Have means, medians, and modes which are
equal
 Have areas under them that have the
constant proportions:
The Normal Curve
68%
95%
99.7%
.34
.34
.134
.134
.047

.047



Normal Distribution


Z-Scores
Z-scores are values for particular values of a
variable, expressed in terms of the standard
deviation of that variable.
z
Y 

Where μ is the mean of the variable, σ is its
standard deviation.
The Standard Normal Curve
68%
95%
99.7%
.34
.34
.134
.134
.047
3
.047
2
1
0
1
Standard Normal Distribution
μ=0
σ=1
2
3
Z-Scores
An intelligence test has scores which are
normally distributed, with a mean of 100 and a
s.d. of 15. Convert a test score of 127 to zscores.
Z-scores

If income is approximately normally
distributed (It isn’t), with a mean of $15,000
and a standard deviation of $2100, what is
the approximate probability of an individual
having an income of $25,000?
Sampling distributions

A sampling distribution is a probability
distribution that represents the long-run
distribution of the sample statistic, if repeated
samples of size n are taken.

For particular statistics (such as means,
proportions, or differences of means), we can
assume that sampling distributions will have
certain properties.
Sampling Distribution of Means

The sampling distribution of means is the theoretical
distribution of means if many random samples of the
same size (n) are taken .

For random samples, the individual sample means
will fluctuate around the actual population parameter
(μ). In the long run, the mean of the sample means
(the mean of the sampling distribution) will have the
same mean as the population (μ).
Sampling Distribution of Means

The sampling distribution of means is the theoretical
distribution of means if many random samples of the
same size (n) are taken .

For random samples, the individual sample means
will fluctuate around the actual population parameter
(μ). In the long run, the mean of the sample means
(the mean of the sampling distribution) will have the
same mean as the population (μ).
Standard Error
The standard deviation of the sampling
distribution is the standard error.
For the sampling distribution of means, the
standard error, or the standard error of the

mean is:

Y 
n
where σ is the population standard deviation, n is the
sample size.
Central Limit Theorem
Central Limit Theorem: If repeated random samples
are drawn from a population, As the sample size n
grows, the sampling distribution of sample means
approaches a normal distribution.
Law of Large Numbers (Bernoulli): as more samples
are taken from a population with mean μ, the closer the
mean of these means approaches the population mean
μ.
Central Limit Theorem


The importance of these two findings is that
the sampling distribution of means, if n is
reasonably large, will be normally distributed
and centred on the population mean.
Importantly, this holds no mater what the
shape of the sample or the population
distributions. It is the sampling distribution
that is normally distributed.
Sampling Distribution of Means
Long-run
mean of
the sample
means

Sampling Distribution of Means
Means of
individual
samples
Central Limit Theorem
We therefore have 3 distributions:



1) The sample distribution, that describes the actual sample
collected, with sample mean, standard deviation of s and
sample size n.
2) The sample distribution is drawn from the population
distribution, which has a mean of μ, sample size N, and
standard deviation of σ.
3) The sampling distribution of a statistic is the a theoretical
probability distribution of the statistic, or the variability in the
statistic among samples of a certain size (n).
Examples

If a sample of size (n=100) is drawn from a
population of size (N=100,000) with a
standard deviation of (σ =12.4), what is the
standard error or the mean?

What is the standard error of a sample of
(n=500) drawn from the same population?
Next:



Review of Point and Interval Estimators
Statistical Significance
Hypothesis Testing