Inferential Statistics

Download Report

Transcript Inferential Statistics

Inferential Statistics
Sampling, Probability, and
Hypothesis Testing
Review of Sampling
• Population – group of people, communities, or
organizations studied. Includes all possible objects of
study.
• Sampling frame list of people/organizations etc. in the
population who can be chosen for participation in the
study. Most sampling frames do not include all people in
the population (example – phone book)
• Sample – part of the population. Reduced down to
manageable size. Ideally we would want to draw a sample
that is representative of the population in terms of certain
key characteristics (for example, gender and age).
Important information about
samples
• For qualitative research, we are looking at specific
situations. It may not be important to have a
representative sample. We often use
nonprobability sampling with qualitative methods
(snowball, purposive, or convenience samples).
• For most types of quantitative research we do
want a sample that is representative of the
population. We will want to generalize our
findings from the sample to the population.
To generalize means that we can
say that we would expect to have
the same findings if we studied
everyone in the population as we
did when we looked at the
sample (within a certain degree
of probability)
In studies in which we will generalize
from the sample to the population
• We must have a sample that is similar or the same on
specific dimensions as the population.
• We will want to use inferential statistics to analyze our data
so that we can infer that findings from a sample are the
same as those we would get from the population.
• Theoretically, we must have a normal distribution in order
to use inferential statistics.
• We will use sampling methods in which every respondent
has a known probability of selection (probability sampling)
• The best type of sampling method to use with inferential
statistics is that in which each participant has an equal
probability of selection (random sampling).
Exceptions to this Rule
• The population under study is small enough that
everyone can be selected for participation (this
still allows you to use inferential statistics)
• Certain types of applied research using
quantitative methods such as community needs
assessments and some types of surveys in which it
is simply important to have as many people
respond as possible. However, we will not be able
to generalize our findings to the population.
We can choose random samples by
assigning a code number to each
respondent and:
• Pulling numbers out of a hat.
• Using a table of random numbers from a
statistics book.
• Generating a table of random numbers on a
computer.
Important Definitions
• Probability – the mathematical likelihood
that a certain event will occur. Probabilities
can range from 0 to 1.00
• Parameters describe the characteristics of a
population. (Variables such as age, gender,
income, etc.).
• Statistics describe the characteristics of a
sample on the same types of variables.
We apply some of the ideas of central limit
theorem to determining the probability that an
event in research will occur
• The Normal Curve can be viewed as a theoretical
frequency as well as a probability distribution for
normally distributed ratio and interval data.
• The area under the curve is equal to 100% or a
total probability of 1.00
• Probability looks at how many chances out of l00
something will occur.
• Odds are chances against an event occurring.
Concepts related to Sampling Error
• Sampling Error: The degree to which a sample differs on a key
variable from the population.
• Confidence Level:
The number of times out of 100 that the true value will fall within the
confidence interval.
• Confidence Interval:
A calculated range for the true value, based on the relative sizes of the
sample and the population.
• Why is Confidence Level Important? Confidence levels, which
indicate the level of error we are willing to accept, are based on the
concept of the normal curve and probabilities. Generally, we set this
level of confidence at either 90%, 95% or 99%. At a 95% confidence
level, 95 times out of 100 the true value will fall within the confidence
interval.
The term used to describe the
difference between sample
statistics and population
parameters is sampling error.
Confidence Level Example
Confidence level would fall between 2 points
We can theoretically draw
numerous samples from a
population that examine the value
of one variable. The more
samples we draw from the
population, the more likely it is
that the frequency distribution of
that variable will resemble a
normal distribution
Important concepts about sampling
distributions:
• If a sample is representative of the population, the mean
(on a variable of interest) for the sample and the population
should be the same.
• However, there will be some variation in the value of
sample means due to random or sampling error. This refers
to things you can’t necessarily control in a study or when
you collect a sample.
• The amount of variation that exists among sample means
from a population is called the standard error of the mean.
• Standard error decreases as sample size increases.
Two primary types of
hypotheses:
• Research hypothesis. Specifies expected
relationship between two or more variables.
Also called alternative hypothesis. May be
symbolized by H1 or Ha.
• Null hypothesis. Statement that says there is
no real relationship between the variables
described in the research hypothesis.
Both null and research
hypotheses can be:
• Nondirectional: The direction of the relationship
between the two variables is not specified. Usually
relationships are positive or negative. A
nondirectional relationship is also called a “twotailed relationship:
• Directional or one-tailed. This means that the
hypothesis explicitly states whether the
relationship is positive or negative.
Example: Types of Relationships
Positive
Income
($)
Negative
Education
(yrs)
Income
($)
No Relationship
Education
(yrs)
Income
($)
Education
(yrs)
20,000
10 20,000
18 20,000
14
30,000
12 30,000
16 30,000
18
40,000
14 40,000
14 40,000
10
50,000
16 50,000
12 50,000
12
75,000
18 75,000
10 75,000
16
Income
Income and Education
80000
60000
40000
20000
0
Income
Education
1
2
3
Education
4
5
In inferential statistics, the
hypothesis that is actually tested
in the null hypothesis. Therefore
what we must do is disprove that a
relationship between the variables
does not exist.
Examples of research hypotheses
• There is an association between gender and scores
on the self-esteem scale.
• Women will have higher levels of low self-esteem
than men.
• There is a positive association between education
and income.
• Scores on the MAST are negatively associated
with successful treatment for alcoholism.
(Are these hypotheses directional or nondirectional?
What is the null hypotheses for each statement?)
Steps in the Hypothesis Testing
Process
• State the research hypothesis
• State the null hypothesis
• Choose a level of statistical significance
(alpha level)
• Select and compute the test statistic
• Make a decision regarding whether to
accept or reject the null hypothesis.
Significance level (also called
confidence level or alpha):
• How big a risk do we want to take in rejecting the null hypothesis
when it actually is true.
• Significance levels are generally less than or equal to .10 (making an
error 10 times out of 100) but can be less than or equal to .05 or .01.
Larger data sets permit the researcher to set the significance level
lower.
• The use of the confidence level allows us to create a decision rule. If
the alpha (also called probability level or p) is lower than our
confidence level, we will reject the null hypothesis and accept the
research hypothesis.
• What we are really asking is whether the apparent relationship between
our two variables really exists or is it due to random chance or error.
• SPSS print outs will contain information on the test statistic as well as
the p or probability value.
For example:
• We are looking at the correlation between
income and education. A correlation (r) is a
measure of nondirectional association
between two variables.
• The information on our computer print out
is r = .75, p. = .04. Our confidence level is
.05. Would we accept or reject the null
hypothesis?
Another example:
• A t-test is a measure that indicates whether there is
a large difference between mean scores on one
variable among members of two groups. We are
looking at the mean scores on the self-esteem
scale for men and women.
• T = 1.8, p. = .12. Confidence level = .10.
Would we accept or reject the null hypothesis?
When we say that something is
statistically significant, it means
that the probability of something
happening by chance is less than
our confidence or significance
level.
We can make two types of errors
in hypothesis testing:
In the population,
Ho actually is:
Not reject Ho
Reject Ho
True
Correct decision made
Type 1 error
Researcher thinks there is
an actual relationship
between the variables when
there is not
False
Type II error
Correct decision made
There is an actual
relationship between
variables although
researcher has accepted null
hypothesis