Probability and Sampling Distributions
Download
Report
Transcript Probability and Sampling Distributions
Basic Statistics
Probability and Sampling
Distributions
STRUCTURE OF STATISTICS
TABULAR
DESCRIPTIVE
GRAPHICAL
NUMERICAL
STATISTICS
ESTIMATION
INFERENTIAL
TESTS OF
HYPOTHESIS
Inferential Statistics
Inferential statistics describes a population of data using
the information contained in a sample
?
Population
parameters
Estimating
Predicting
(inferring)
EXAMPLE:
X
Sampling
Sample
statistics
A sample is a portion, or part, of the population of interest
Everyone makes inferences, why
Statistical Inference?
The difference
between a
fortune teller
and a statistician
is a “Statement
of Goodness”
I predict rain
today!!!
A statement of goodness is a statement
indicating the chance or probability
that an inference is wrong.
Probability is a statement of one’s
belief that an event will happen.
* What are the chances of a head
appearing when you toss a coin?
* What are the chances of selecting an
Ace of Diamonds from a deck of cards?
Probability Has a Basis in
Mathematics
Consider a coin-toss experiment:
Prob ( Head )
k
, as N approaches infinity where k = number of heads and
N
N = number of tosses
1.0
.8
.6
1/2
.4
.2
0.0
1
2
3
4
5
6
Number of To sses
7
8
9
10
11
12
13
14
15
16
17
18
Relative Frequency Probability
• P(A) = Number of Events of Interest (A) divided
by the Total Number of Events.
– Probability of a Head on the flip of an honest coin =
1/2 (as we saw on the last slide).
– Probability of drawing the Ace of Diamonds = 1/52
(since there are 52 cards in the deck and only 1 ace
of diamonds).
• There are other elements of probability
discussed in the text but we will not be using
those portions for this class.
We can put probability in the context of a relative frequency
distribution. Recall the example of 10 students who took a 5point quiz with the following results.
It turns out that the area under the curve also represents the
probability of an event. For example, what is the probability
that a student picked at random scored 4 on the quiz?
rf
.4 --
.1
.1
.1
.1
.1
.1
1
2
3
4
5
.3 -.2 --
.1 -.0 --
The probability
would be .2, the
same as the area
under the curve!
.1
.1
.1
.1
X
This idea transfers directly to a normal distribution.
What is the probability that a randomly selected person
scores at least 650 on the Verbal portion of the GRE?
X GRE X
z GRE
S
650 500
1.5
100
First, we must compute
the person’s z-score.
From Table A, Column C
we find that .0668 of the
area is in the tail.
Thus, the probability is
about .07.
0
1.5
z
Some Notation
(Necessary to Distinguish Between Sample and Population)
Characteristic Sample
Mean
Standard
Deviation
Sample
Size
Population
X
S
n
N
In descriptive statistics, the differentiation is not important as the
sample and population numerical measures are the same.
Sampling
• A key to statistical inference is the assumption
that the sample is representative of the
population from which it was drawn.
• Random Sampling ensures that each possible
sample has an equal chance of being selected
and all members of the population have an
equal chance of being selected into the sample.
• We will assume that all samples are selected in
a random fashion. Please note that bigger is not
better, unless it is representative.
Introduction to Inferential
Statistics
How do we get from a sample to a
prediction about a population?
In statistics, we use a sampling
distribution to infer the
characteristics of the population.
Some Definitions
• Sampling Distributions: A distribution of
statistics obtained by selecting all possible
samples of a specific size from a population.
• Sampling Error: The discrepancy between
the statistic obtained from the sample and the
parameter for the population.
• Standard Error: Provides an estimate of
exactly how much error, on average, should
exist between the statistic and the parameter.
It is a measure of chance and is the standard
deviation of the sampling distribution.
What is the shape of the sampling
distribution and can we describe it
in terms of mean and standard
deviation (standard error)?
YES!
The answer is the Central Limit
Theorem (CLT).
Central Limit Theorem
Applied to Means
For any population with mean and
standard deviation , the distribution of
sample means X for sample size n (n >30)
will have a mean of and a standard
deviation (standard error) of n , and will
approach a normal distribution as n
approaches infinity.
The Central Limit Theorem
Recap
• Regardless of the mean of the population, the
mean of the distribution of sample means
(sampling distribution) will be the same.
• Regardless of the SD of the population, the SD of
the sampling distribution will be the same divided
by the square root of the sample size.
• Regardless of the shape of the population, the
shape of the sampling distribution will be
approximately normal.
An Example of a Sampling
Distribution of the Means
• First, assume that we have a Population
consisting of only four numbers N =4,
(2, 4, 6, 8).
• Next, we will take all possible samples
from this population of size n = 2.
• We will calculate the mean X of each
sample that we obtain.
• Finally, we will plot the means in a
frequency histogram.
Our Population Parameters
X
N
= 20 / 4 = 5
( X)
X
N
σ
N
2
2
202
120
4 = 2.236
=
4
All Possible Samples (n = 2)
from our Population
First Pick
Second Pick
Mean
First Pick
Second Pick
Mean
2
2
2
6
2
4
2
4
3
6
4
5
2
6
4
6
6
6
2
8
5
6
8
7
4
2
3
8
2
5
4
4
4
8
4
6
4
6
5
8
6
7
4
8
6
8
8
8
These are all possible samples (16) of size n = 2 and the
means of those samples that can be taken from our
population of N = 4 objects.
Frequency Histogram of Means
4
3
2
Means
1
0
2
3
4
5
6
7
All 16 means plotted
8
Calculate the Mean of the Means
and Standard Deviation of the
Means
X 80
Xx
5
N
16
( X )
80
440
X
N
16 1.58
Sx
N
16
2
2
2
The Central Limit Theorem
Recall that the Central Limit Theorem states that the mean of
the sampling distribution of the means would be equal to .
From the previous slide we calculated the mean of the
sampling distribution of our 16 means to be 5, which is the
population mean we calculated earlier.
Also, recall that the Central Limit Theorem states that the
standard deviation (standard error) would be equal to n
If we divide the standard deviation we calculated on our
population (s = 2.236) and divide it by the square root of our
sample size ( n= 2) we would obtain 2.236 / 1.4142 = 1.58;
which is exactly what we calculated our standard deviation to
be using our sample data.
Summary of Central Limit
Theorem
X
N
= 20 / 4 = 5
( X ) 2
X
N
N
n
N
16
( X ) 2
802
440
X
N
16 1.58
Sx
N
16
2
2
X 80
Xx
5
= 2.236
= 2.236 / 2
= 1.58
Note the symbols used to
denote the mean of the means
and the standard error of the
mean X x and S
x
Inferential Statistics
• We will use our knowledge of the sampling
distribution of the means {(), ( n)} given
by the Central Limit Theorem as the basis
for inferential statistics.
• We also will use our ability to locate a
single score in a distribution using z
scores in hypothesis testing.