Chapter 6-7-8 Sampling Distributions and Hypothesis Testing

Download Report

Transcript Chapter 6-7-8 Sampling Distributions and Hypothesis Testing

Chapter 6-7-8 Sampling
Distributions and
Hypothesis Testing
 When
we have a frequency
distribution, or histogram, we can
determine probabilities. Look at the
M&M example.
 What
is one of the most common
shapes of frequency distributions??
The normal distribution.
Again, all normal distributions are
characterized by the mean and the
standard deviation. There are an infinite
number of normal distributions.
 But some are very special to us, like the
Standardized Normal Distribution.

– ALL normal distributions can be standardized.
– All scores are put in terms of Standard
Deviation units from the mean.
– SO, we know proportions, and hence,
probabilities associated with scores that fall in
a normal distribution. We just did that in
Chapter 5.
100% of our observations appear in
the normal distribution.
Proportions and probabilities are the
same.
 What proportion of scores fall above a zscore of 1?
 What is the probability that a randomly
chosen z-score will be 1 or higher?
 What is the probability that a randomly
chosen z-score will fall between 0 and .5?
 There is a .05 probability (or a 5%
chance) of a z-score being this high or
higher?

More

We can also look at specific scores (X),
convert them into z-score, and find the
probability of getting a score that high or
higher, lower than that score, and so on.
– Given sigma = 100 and the mean = 500, what
is the probability of getting a 600 or higher?
– 1) Convert to z; (600-500)/100 = 1.
– 2) What proportion of the distribution falls at
or above a z-score of 1?
The past
What we have been doing is descriptive
statistics.
 We have come up with distributions,
measures of central tendency and
measures of variability, all of which
describe a population or a sample.
 We can use these, as we have found out,
to find the probability of a score, or range
of scores, etc.
 But statistics, z-scores, probabilities, etc.,
can be used for more interesting
purposes.

The future

Inferential statistics – Estimate population
parameters from a sample, or determine if
two samples are different
– Hypothesis testing – Is the population
parameter equal to some specific value?
– Ex. This class (random sample) takes a study
skills course: Seating, classroom tips, study
habits
– G. P. A. – Is the G.P.A. of this class now
different than MSU students generally
(population)?
Well, let’s think about this.




Of course, if we were to randomly sample 50
MSU students and get their mean GPA, it would
be a little different than the actual population
mean GPA.
There will always be a little error, the sample
mean will probably not equal the population
mean until all of the members in the population
are in our sample.
The quantification of this discrepancy is called
Sampling Error –
The discrepancy, or amount or error, between a
sample statistic and its corresponding parameter.
Well, let’s think about this.

Also, we can take numerous samples. For
example, the next day I can get the GPAs of 40
different students. The mean GPA for this sample
will also be a little different than the true
population mean. ALSO, this second sample will
have a mean that is slightly different from our
first sample mean.
– In fact, we could take a huge number of samples, and
get a huge number of sample means.

So, how do we use a given sample to estimate
the population if every sample will be a little
different?
Sampling Distribution


To answer this we have to create a sampling
Distribution of a statistic (mean, median)
In particular, we will use a Sampling
Distribution of Sample Means =
– This is the collection of sample means for all the possible
random samples of a particular size (n) that could be
obtained from a population.

OR
– The distribution of a statistic (the mean) over repeated
sampling from a specified population.


Sampling distribution of sample means : (Most common),
G.P.A.: Say MSU population mean is 2.74,
distribution of means of an infinity of random
samples.
We have been looking at distributions of
SCORES, now we are doing to look at
distributions of all possible SAMPLE
MEANS.
 We are dealing with particular type of
sampling distribution = a distribution of
statistics (e.g., mean) obtained by
selecting all the possible samples of a
specific size from a population

DRAW SAMPLING DISTRIBUTION
OF MEANS: N = 50
 Distribution
of means if we sample
50 students and assume the
population mean is 2.74:
 Sample 1: 2.77
 Sample 2: 2.91
 Sample 3: 2.55
 Sample 4: 3.77
NOTE: This is similar to what we were
doing with z scores. We were looking at
where a z score falls in a distribution of
scores. Now we are looking at where a
sample statistic (in this case the mean)
falls among a distribution of samples.
 If close to the middle of the distribution
we retain null hypothesis (no difference)
 If far from the middle – sample unlikely,
reject hypothesis.

 Sampling
Error: Variability of a
statistic from sample to sample. Due
to chance.
 Standard Error: The standard
deviation of a sampling distribution
from the population. (sigma/ sqrt n)
As usual, n = sample size, which should
be taken into account when calculating
standard deviations.
 Obviously, the larger the sample, the
closer the sample means will be to the
population mean (i.e., less error). So, we
have to take sample size into account.
 Law of large numbers = the larger the
sample size, the more probable it is that
the sample mean will be close to the
population mean.


When n = 1, se = sd
As n increases, the standard error should
decrease. The equation takes this into
account.
 There is this great mathematical Theorem
that allows us to know the general
properties of our sampling distribution as
our samples (and population) get larger
and larger.

Central Limit Theorem:


Central Limit Theorem:
From the book: For any population with a mean
(mu) and a standard deviation (sigma), the
distribution of sample means for sample size n
will have a mean or mu and a standard deviation
of sigma/sqrt n and will approach a normal
distribution as n approaches infinity.
– So what is this saying?

As N increases, sample means and standard
deviations approach those of the population.
– With a sample size of 30+, the distribution of sample
means is practically normal.
– So, we have a clue about the mean of the sampling
distribution, the standard deviation, and its shape
(normal). What can we do with this information???
So what is this saying?




As N increases, sample means and standard
deviations approach those of the population.
With a sample size of 30+, the distribution of
sample means is practically normal.
So, we have a clue about the mean of the
sampling distribution, the standard deviation, and
its shape (normal). What can we do with this
information???
This allows us to know the distribution of sample
means for any population, regardless of the mean
and SD, and even if the population distribution is
not normal.
Back to our example:
 MSU
Mean: 2.53
 Class Mean: 3.02
 There may be no relationship
between this class (the intervention)
and G.P.A.
Goal:
Determine whether this difference is due
to chance (sampling error)
 Can determine with probabilities how
likely/unlikely it is that this difference is
due to chance.
 If this class is different, then we can
classify it as a different population with
different population parameters (higher
mean)


A statistical test will answer this question
for us:
HYPOTHESIS TESTING!
A hypothesis test = a statistical procedure
that uses sample data to evaluate
hypotheses about a population parameter.
 General steps.

– 1) generate a hypothesis about the population
mean.
– 2) So, we hypothesize that our sample mean
will be close to this guess regarding the
population mean.
– 3) Obtain a sample and sample mean
– 4) Compare the sample and population means.
1) Set up Null Hypothesis:

The null hypothesis always says the opposite of
that in which we are interested:
–

In other words:
–
–

We can never prove something is true; We can only
prove that it is false
There is no difference between our groups or:
If we are only interested in whether our group is
better:
Null Hypothesis would say our group is equal to
or worse than other.
–
–
–
We are usually working to reject the null hypothesis
Note: Assuming the null is true, we create our
sampling distribution. In this case the sampling
distribution of means.
M class = 2.53
2. Set up the “Alternative
hypothesis” (What we want to
find)


M class ne 2.53
Doing this before we collect our
data. Mean could be higher or
lower. Maybe our class hurts
people G.P.A.
3. Set a criterion level for our
Decision:
 How
far away does the mean have to
be for us to reasonably doubt that
this sample came from the same
population?
 When are we going to say this
sample is the same as the population
(just sampling error) or when we are
going to say this sample is different
from the population.
3. Set a criterion level for our
Decision:


When are we going to say this sample is the
same as the population (just sampling error) or
when we are going to say this sample is different
from the population.
Significance level – Predetermined probability
that represents a sample result that is so rare or
unusual that is cast doubt on the accuracy of Ho:
alpha
– The probability with which we are willing to reject Ho
when it is correct.
– Rejection region: the set of outcomes from an
experiment that will lead to a rejection of Ho.

Typically:
– Choose : alpha = 5%