Inferential Statistics - Gail Johnson`s Research Demystified

Download Report

Transcript Inferential Statistics - Gail Johnson`s Research Demystified

Inferential Statistics
Research Methods for Public
Administrators
Dr. Gail Johnson
Dr. G. Johnson,
www.researchdemystified.org
1
Welcome to Inferential Statistics
 This is a companion to Sampling Demystified
 It could be argued that this should follow that
chapter

If the results are not statistically significant, no further
analysis is warranted
 But some people find inferential statistics
overwhelming so I saved it for last
 There is much that can be done with descriptive
data analysis but it gets overshadowed by the
fancier statistics of regression and inference.
Dr. G. Johnson,
www.researchdemystified.org
2
Welcome to Inferential Statistics
 Used when working with data from random
samples
 Used when researchers want to infer
conclusions about a population based on
results from a randomly selected sample
from that population
 Hence
the term “Inferential”
 Jargon term: generalizability
Dr. G. Johnson,
www.researchdemystified.org
3
Inferential Statistics: A Powerful
Analytical Tool
 Enables researchers to:
 Estimate
population proportions
 Estimate population mean
 Estimate sampling error
 Estimate confidence intervals
 Test for statistical significance
Dr. G. Johnson,
www.researchdemystified.org
4
Confidence Revisited
 Estimate the population mean or proportion based
on the sample survey
 Confidence level: social science standard is 95%


95% certain that our population estimate is correct
within a specified range
This is the precision of the estimates


90% confidence level is the lowest level that should be used
In some cases, the researchers might want to raise the bar to
99%--to very, very certain
Dr. G. Johnson,
www.researchdemystified.org
5
Confidence Revisited
 Confidence interval: this is the range where
the true mean exists
 Social
science standard for the confidence
interval is plus or minus 5%
 Sampling error is the analogous term when
working with proportions, like with survey data

Sometimes called the margin of error
Dr. G. Johnson,
www.researchdemystified.org
6
Sampling Error: Revisited
 Most familiar in polling data:
 Big
national surveys use 95 percent confidence
level with a margin of error of Typically results
are within +/- 3%
 That means that if we had surveyed
everyone, the researchers are 95% certain
that the results would be within +/-3% of
the results from the survey.
Dr. G. Johnson,
www.researchdemystified.org
7
Sampling Error: Revisited
 11/15/09 Poll: Views on Cap and Trade.

There's a proposed system called "cap and trade." The government
would issue permits limiting the amount of greenhouse gases
companies can put out. Companies that did not use all their permits
could sell them to other companies. The idea is that many
companies would find ways to put out less greenhouse gases,
because that would be cheaper than buying permits. Would you
support or oppose this system?
 Results:
Support
53
Oppose
42
Dr. G. Johnson,
www.researchdemystified.org
8
Sampling Error: Revisited
 The sampling error is plus or minus 3
percent
 If they had surveyed everyone:
 The
real percentage supporting cap and trade
would be between 56% and 50%
 The real percentage opposing cap and trade
would be between 45% and 39%
Dr. G. Johnson,
www.researchdemystified.org
9
Sampling Error: Revisited
 Sampling error provides a likely range for
the true proportion in the population
 If the sampling errors overlap, then there is
no discernable difference in the views--“too
close to call”
Dr. G. Johnson,
www.researchdemystified.org
10
Statistical Significance
 When working with random sample data,
the big question is:
 How
likely are these results a fairly accurate
reflection of the large population from which
the sample was taken?
 Put another way: are these results just a quirk of
chance?
Dr. G. Johnson,
www.researchdemystified.org
11
Statistical Significance
 Statisticians have provided researchers with
analytical techniques to estimate how likely
it is that the researchers have gotten the
results they see in their analysis of sample
data by chance.
 These techniques are called tests of
statistical significance.
Dr. G. Johnson,
www.researchdemystified.org
12
Statistical Significance
 We do not need to understand calculus in order to
understand how to interpret tests of statistical
significance
 We just have to have faith that the statisticians
have figured out the correct theories and that
computers have been programmed to give correct
results.
 I Believe, I Believe!
Dr. G. Johnson,
www.researchdemystified.org
13
Statistical Significance
 The logic will seem familiar.
 Researchers set a standard for determining how much
risk they are willing to take that the observed results are
due to random chance
 The social science standard or convention is to set an
alpha level or p value of .05 or less.
 They run the statistical significance test.
 If the test comes in at .05 or less, the researchers
conclude that there is little probability (less than 5
percent) that the results are due to chance.
Dr. G. Johnson,
www.researchdemystified.org
14
Another Way To Understand
Statistical Significance
 If I took 100 random samples from this
population, only 5 out of 100 would have the
results I have gotten.
 It is unlikely, therefore, that I would have gotten
such unusual results.
 I am willing to take a risk that my sample results
fairly accurately captures what is true in the larger
population from which the sample was selected.
Dr. G. Johnson,
www.researchdemystified.org
15
How much risk?
 It All Depends!
 The standard is .05 or less, meaning there is 95%
chance of being reasonably accurate (i.e.within
sampling error)
 I could raise the bar and set the standard at .01 or less,
meaning there is 99% chance of being accurate
 I could lower the bar and set the standard at .10,
meaning there is a 90% chance of being accurate
Dr. G. Johnson,
www.researchdemystified.org
16
Statistical Significance: The Logic of
Hypothesis Testing
 Research Hypothesis
 Women
and men earn different salaries.
 Null Hypothesis:
 There
is no difference between women and
men’s salaries.
 Remember: the null hypothesis is always one of
“no difference”
Dr. G. Johnson,
www.researchdemystified.org
17
Steps In The Process
 Collect salary data from a random sample of men
and women across the U.S.
 Analyze the data

There is a $5,000 difference
 Because I am working with random sample data,
you have to determine whether this $5,000
difference is the result of chance

In the jargon: is this difference statistically significant?
Dr. G. Johnson,
www.researchdemystified.org
18
Testing for Statistical
Significance:
 Testing against the Null Hypothesis:
 What
is the probability of getting a $5,000
difference in my sample results if there really is
no difference in the population from which the
sample was drawn?
 I set the alpha or p value at .05.
 I run the test for statistical significance.
Dr. G. Johnson,
www.researchdemystified.org
19
Testing for Statistical
Significance:
 If the test is .05 or less, I reject the null hypothesis
 This means that the probability of getting the $5,000
difference when there really is no difference in the
population is 5% or less. I am willing to take the risk
and therefore I reject the null hypothesis.
 I conclude that there is a $5,000 difference in
salaries between men and women, and that
difference is statistically significant.
Dr. G. Johnson,
www.researchdemystified.org
20
Testing for Statistical
Significance:
 If the test is more than .05, there is too great a
chance that the results do not reflect the
population.
 This difference of $5,000 difference might be
due to random chance.
 I would conclude that this salary difference is not
statistically significant.
Dr. G. Johnson,
www.researchdemystified.org
21
Remember:
 A statistical significance test is nothing
more than a determination of the probability
of getting the results the researchers got by
chance.
Dr. G. Johnson,
www.researchdemystified.org
22
Common Tests for Statistical
Significance
 Chi Square: nominal and ordinal data
 T-tests: DV: interval/ratio data; IV:
nominal/ordinal with2 categories
 Anova: DV: interval/ratio data; IV
nominal/ordinal with 3+ categories
 F-tests: interval/ratio data
Dr. G. Johnson,
www.researchdemystified.org
23
Statistical Significance
 There are 100+ kinds of tests for statistical
significance.

Good news! They all get interpreted the same way.
 If researchers set the probability level at .05:
 Then anything that is .05 or less is statistically
significant.
 And anything that is more than .05 is not statistically
significant.
Dr. G. Johnson,
www.researchdemystified.org
24
Test for Statistical Significance:
Chi Square
 Use with crosstabs
 Chi Square is based on a mathematical formula
that looks at the differences between the actual
data compared to how the data should have looked
if there was no difference.
 The more difference there is, the more likely that
the results will be statistically significant.
Dr. G. Johnson,
www.researchdemystified.org
25
Chi Square
 If there was no difference in attitudes based
on gender (which is our null hypothesis),
our crosstab would expect to see results
similar to this:
Men
Women
For
Against
50
50
50
50
Dr. G. Johnson,
www.researchdemystified.org
26
Chi Square
 But what if our respondents actually
reported this way:
For Against
Men
75 25
Women
25 75
 Clearly, there is a difference in attitudes
based on gender.
Dr. G. Johnson,
www.researchdemystified.org
27
Example: Gender and Gun Law
 Are views on gun permit laws different
based on gender?
 Results: it appears that women are
somewhat more likely (89%) to favor gun
permit law than men (77%).
 But are these results statistically significant?
 The computer calculates a p value of .001
 Conclusion?
Dr. G. Johnson,
www.researchdemystified.org
28
Example: Gender and Abortion
Attitudes
 Are views on abortion for any reason
different based on gender?
 48 percent of men favor abortion for any
reason as compared to 49 percent of
women.
 But are these results statistically significant?
 The computer calculates a p value of .78
 Conclusion?
Dr. G. Johnson,
www.researchdemystified.org
29
Statistical Significance: T-Tests
 Used with means, comparison of means
 Single Mean:
 Interval/ration data where you are comparing to a
known population mean
 Paired Means:
 before and after design
 Independent Means:
 comparing 2 means
 For t-tests: the dependent variable must be
interval or ratio level data.
Dr. G. Johnson,
www.researchdemystified.org
30
Testing a Hypothesis about a
Single Mean:
 Research hypothesis: There is a difference
in average hours worked as compared to
“40.”
 Null: not different from 40
 Results: Average number of hours =42.
 T-test (p value) =.000
 Interpretation?
Dr. G. Johnson,
www.researchdemystified.org
31
Interpretation Process
 In this case, you are comparing the actual result
against the assumption that the norm is 40 hours.

How likely is to get 42 hours if the the real average in
the population is 40?
 It is less than .05
 It is very unlikely you would have gotten these results
by chance alone, so you reject the null hypothesis.
 Conclusion: the average number of hours worked
is 42 and these results are statistically significant.
Dr. G. Johnson,
www.researchdemystified.org
32
Independent T-Test:
Gender and Income
 Is there a difference in men’s and women’s
income?
 The research hypothesis is that there is a
difference in salaries.
 The null hypothesis is that there is no
difference:

Technically: The groups are independent or there is
no difference in the population means for these 2
groups.
Dr. G. Johnson,
www.researchdemystified.org
33
Independent T-Test:
Gender and Income
 We collect the data and compare means
 We run an independent t-test
 Note: this test can only be used with a nominal
independent variable with two values like gender, and
an interval/ratio level dependent variable
 Results:
Mean for men:
$38,000
Mean for women: $33,000
T-test = .001
Interpretation?
Dr. G. Johnson,
www.researchdemystified.org
34
F-Tests with Analysis of Variance
 Used when researchers have an independent
variable with more than 2 categories
 Examples:
 Religion
(Christian, Jewish, Muslim, Buddhist,
None)
 Marital status (single, married, divorced)
 Education (HS, College, Graduate Degree)
Dr. G. Johnson,
www.researchdemystified.org
35
Example: Working The Statistical
Significance Logic
 Is there a difference in income based on whether
one has a High School degree or less, some
college or completed a bachelor’s degree, or has a
graduate degree
 Your Research Hypothesis is?
 Your Null Hypothesis is?
Dr. G. Johnson,
www.researchdemystified.org
36
Results: Education and Income
 HS or less:
 College
 Graduate
$29, 225
$46,764
$62,275
 But are these results statistically significant?
 F-test = .001
 Your Conclusion?
Dr. G. Johnson,
www.researchdemystified.org
37
But There Is Potential For Error
 Type I and Type II Errors
 Type I Error:
 This
occurs when the null hypothesis is rejected
even though it is actually true.
 “There really is no difference in salaries
population but we concluded that there was a
statistically significant difference.”
 In very large samples, small differences will be
found to be statistically significant.
Dr. G. Johnson,
www.researchdemystified.org
38
But There Is Potential For Errorat Least a 5% Chance
 Type II Error:

This occurs when researchers fail to reject the
null hypothesis even though it is false.
 “There really is a difference in salaries in the
population but we concluded there were no
statistically significant difference in salaries
between men and women.”
Dr. G. Johnson,
www.researchdemystified.org
39
No Way To Avoid Error When
Working With Random Sample Data
 To avoid a Type I error, the researchers may want
to make it harder to reject the null hypothesis
 So they will raise the bar—and set the alpha or
p-value at .01 rather than .05
 But by doing so, they have increased the
likelihood of making a Type II error
Dr. G. Johnson,
www.researchdemystified.org
40
No Way To Avoid Error When
Working With Random Sample Data
 To avoid a Type II error, the researchers may want
to make it easier to reject the null hypothesis
 So they will lower the bar—and set the alpha or
p-value at .10 rather than .05
 Or they will increase sample size
 But by making it easier to reject the null
hypothesis, they will increase the likelihood of
making a Type I error.
Dr. G. Johnson,
www.researchdemystified.org
41
Which Error Is Worse?
It Depends
 Generally, social scientists feel that it is worse to
make a Type I error than a Type II error.
 It is more problematic to conclude there is a
difference or an impact when there really isn’t
any.
 For example, concluding that a drug has a
statistically significant positive impact when
the results are just a Type I error is a problem.
Dr. G. Johnson,
www.researchdemystified.org
42
Which One Is Worse?
Type I and Type II
 As a program manager, you may feel that it is
worse to make a Type II error.




In this case, the null hypothesis of “No difference”
would not be rejected.
The risk is that “No statistically significant differences
were found” might turn into a conclusion that the
program did not work.
But technically, all that should be concluded is the
researchers “failed to reject the null hypothesis.”
The program may actually make a difference that the
researchers failed to detect.
Dr. G. Johnson,
www.researchdemystified.org
43
More Statistical Significance
Concepts
 ONE-Tailed Test: is used whenever the
hypothesis specifies a direction.
 Men will earn more than women
 We are concerned with only one tail of the
normal curve.
 Easier to reject a null-hypothesis.
Dr. G. Johnson,
www.researchdemystified.org
44
More Statistical Significance
Concepts
 TWO-tailed test: when the research question does
not specify a direction.
 The salaries of men and women are different
 Generally the default on statistical software
packages.
 Generally the more “conservative” measure:
harder to reject a null hypothesis.
Dr. G. Johnson,
www.researchdemystified.org
45
Statistical Significance Does Not
Mean Meaningful Or Important
 They surveyed 3000 people, selected randomly
across the U.S.
 87% with a private physician reported being
satisfied
 85% of those with an HMO physician reported
being satisfied.
 These results were statistically significant.
 Are they meaningfully different?
Dr. G. Johnson,
www.researchdemystified.org
46
Statistical Significance Does Not
Mean Meaningful Or Important
 Statistical Significance has a narrow meaning and
is based on mathematics

Although the researchers do decide on the alpha or pvalue they will set as the criterion for whether the
results are statistically significant
 “Meaningful” or “important” is a judgment call.
 But remember: “significance” is a word owned by
statisticians—so only use it when you are talking
about tests for statistical significance.
Dr. G. Johnson,
www.researchdemystified.org
47
Statistical Significance
Does Not Mean
 The results are meaningful or important.
 The relationship is strong or weak.
 That design errors have been eliminated.
 A test result of .001 rather than .049 is not
stronger or better in any other sense than
there is a lower probability the results are
due to random chance.
Dr. G. Johnson,
www.researchdemystified.org
48
Statistical Significance
Does Not Mean
 That non-sampling errors have been
eliminated.
 Poorly
worded survey questions, error-prone
data entry, low response rates, systematic bias
in respondents, etc etc have to acknowledged as
limitations of the study even if the results are
reported as statistically significant.
Dr. G. Johnson,
www.researchdemystified.org
49
Over-attachment To Statistical
Significance Tests
 “Unfortunately, researchers often place
undue emphasis on significance
tests….Perhaps it is because they have spent
so much time in courses learning to use
significance tests, that many researchers
give the tests an undue emphasis in their
research.” --Phillip Shively, p. 172
Dr. G. Johnson,
www.researchdemystified.org
50
Key Points
 Tests for statistical significance assume that
the study was designed properly using a
random sample with valid and reliable
measures.
 No amount of statistical wizardry will
correct design flaws.
Dr. G. Johnson,
www.researchdemystified.org
51
Key Points
 When working with random sample data,
error is always a possibility.
 Whether Type
I or Type II: absolute certainty is
an illusion.
 It is useful to provide readers with “point
estimates” but these should be provided
with the context of the confidence interval.
 We
are 95% certain that the true mean in the
population is between this range.
Dr. G. Johnson,
www.researchdemystified.org
52
Key Points
 The emphasis on finding statistical significance
can diminish the importance of not finding
statistically significant results
 Results that are not statistically significant can be
important
 They can provide evidence that something
thought to be a problem may not be
 They can provide other researchers with
information about what has been tried—so they
can try something else
Dr. G. Johnson,
www.researchdemystified.org
53
Key Points
 Final Word:
 When working with random sample data, be aware
that the results might not be as solid one hopes.
 Be mindful of premature certainty.
 It helps if researchers pull in other similar research
to provide support their findings

If there is a pattern from other studies, then we can
have more faith that the results are solid—meaning they
fairly accurately reflect the larger population.
Dr. G. Johnson,
www.researchdemystified.org
54
Creative Commons
 This powerpoint is meant to be used and
shared with attribution
 Please provide feedback
 If you make changes, please share freely
and send me a copy of changes:
 [email protected]
 Visit www.creativecommons.org for more
information
Dr. G. Johnson,
www.researchdemystified.org
55