Inferential Statistics (K-19)

Download Report

Transcript Inferential Statistics (K-19)

class 7: 10/21/13
intro to statistical methods cont.
• Being wrong in science is fine, and even necessary—as
long as scientists recognize that they blew it, report
their mistake openly instead of disguising it as a success,
and then move on to the next thing—until they come up
with the very occasional breakthrough. But as long as
careers remain contingent on producing a stream of
research that’s dressed up to seem more right than it is,
scientists will keep delivering exactly that.
Science is a noble endeavor, but it is also a low-yield
endeavor. I’m not sure that more than a very small
percentage of medical research is ever likely to lead to
major improvements in clinical outcomes and quality of
life. We should be very comfortable with that fact. (p.
86)
Friedman, David H. (2010, November). Lies, damned lies, and
medical science. The Atlantic, 306(4), 76-86
all researchers must learn the trick and
avoid the mistake
• trick: begin with the question and then to
figure out the best way to answer that
question
• mistake: begin with the method and fit
the question to the method
more on models
• models should meet three criteria:
– generality, precision, accuracy
• models can usually satisfy any two, at the cost of
sacrificing the third.
– climatology settles for generality & accuracy
– ecologists focusing particular species, for precision &
accuracy
– rigorous history & ethnography often give up generality
for precision & accuracy—results can still be important
•
Kitcher, Philip. (2012, May 24). The trouble with scientism: Why
history and the humanities are also a form of knowledge. The New
Republic, 243, 20-25)
• research using
– measurement description
– statistical analysis
critical for answering certain kinds of
important questions
strengths of measurement description
• precise descriptions
• often efficient—one can make confident
predictions based on relatively small samples—if
samples good
• increasingly sophisticated ways of analyzing
measurement data
• powerful stat packages now available for desktop
computers, e.g, Systat, SPSS, SAS
cautions
• measure only what can be measured
– “to replace the unmeasureable with the
unmeaningful is not progress” (Achen, 1977, p.
806)
• value precision but realize a precise description
may not be an accurate one
• scientific method (drawing inferences from
observations) comprises many research methods—
strength not from any one specific method
my personal recommendations
• whatever your Ph.D. Research Specialization take
at least one stat course, preferably 2 or 3
• whatever your methodological expertise, find
people with similar interests but different
methodological expertise and work with them—
the best research often uses many approaches
• the statistician knows, for example, that in nature
there never was a normal distribution, there
never was a straight line, yet with normal and
linear assumptions, known to be false, he can
often derive results which match, to a useful
approximation, those found in the real world.
(Box, p. 792)
• all models are false, but some are useful.
Box, George E. P. (1976). Science and statistics. Journal of
the American Statistical Association, 71, 791-799.
further caution
• Statistics today is in a conceptual and theoretical
mess. The discipline is divided into two rival
camps, the frequentists and the Bayesians, and
neither camp offers the tools that science needs
for objectively representing and interpreting
statistical data as evidence. (pp. 127-128)
Royall, Richard. (2004). The likelihood paradigm for statistical
evidence. In M. L. Taper & S. R. Lele (Eds.), The nature of
scientific evidence: Statistical, philosophical, and empirical
considerations (pp. 199-152). Chicago, IL: University of
Chicago.
K ch 19: inferential statistics
• inferential statistics allow one to infer the
characteristics of a population from a
representative sample
– estimate characteristics of population within a
determined range with a given probability
– determine (in general) with a given probability
whether effect beyond sampling and chance
error exists
• parameters: refer to population
• statistics: refer to sample
• sampling distribution: descriptive statistic
calculated from repeated sampling
• confidence intervals: range that includes the
population value with a given probability
confidence level:
• the probability that the interval will contain the
population value: conventionally 68%, 95%, and
99% (2 to 1, 19 to 1, 99 to 1 respectively)
• the wider the interval the more likely it contains
the population value (and the less valuable the
information)
• hypothesis testing (traditionally takes form of
rejecting the null hypothesis, i.e., that there is no
effect beyond sampling and chance error)
• alpha level: the risk the result is due to chance;
set by the researcher in advance, traditionally .10,
.05, .01, .001 (N.B., no good reason for these and
not others)
• p-level: the actual probability level found, which is
then compared to the alpha level
two-tailed test:
• non-directional, puts the alpha level at both ends.
used when one does not expect results in one
direction
one-tailed test:
• directional, puts alpha level at one end
(determined by researcher). increases probability
of finding statistically significant result
common statistical tests
t test of difference between means
• common, simple test for differences
between means of two groups
chi-square
• common test for categorical data and
frequencies
– are cell values different from what would
be expected
chi-square examples
Jefferson, Madison Combined: Years in Kindergarten by SES
2-year
1-year
total
poor*
24
6
30
not poor
34
65
99
total
58
71
129
chi-square: 19.4 (1 df) p < .0001
* eligible for free or reduced lunch
Jefferson, Madison Combined: Years in Kindergarten by
race
non-white
2-year
10
1-year
9
total
19
white
48
62
110
total
58
71
129
chi-square: .530 (1 df) p < .5
ANOVA (analysis of variance)
• experimental designs where two or more groups or
multiple conditions are being compared (common in
psychology and ed psych, and in educational
research in general)
• powerful:
– accurate measure of error variance
– tests significance of each variable as well as
combined effect,
– avoids inflation of probabilities problem
(not in K)
regression analysis
• explains (predicts) variability of a dependent
variable using information about one or more
independent variables.
• predicts expected change in dependent variable
given specific changes in the independent variable
• not used in educational research as much as
ANOVA, but more useful for policy purposes
regression example
achievement*= 77.5 - .80 SES**
*combined math & reading scores, ITBS
** percent of low income students
errors of inference
• type I error (alpha error): a concern when theory
testing (K, “when validating a finding”)
• type II error (beta error): a concern when
theory building (K: “when exploring”)
• decreasing the probability of one type increases
the probability of the other
• pointless to talk about Type I or II error absent
discussion of what is at stake
cost of type I error in theory testing
• dominant theory not challenged
• knowledge production stopped
cost of type II error in theory building
• possibly important explanations etc. ignored
• knowledge production stopped
one of the many challenges the late and great Lee
Cronbach (1916-2001) made to the accepted
wisdom of the day
statistical power: 1-beta
• increasing statistical power:
– increase size of effect (stronger treatment)
– increase sample size
– reduce variability
statistical & practical significance
• statistical: confidence at a given probability
that result is not due to chance
• practical: is the result important enough, big
enough, feasible, affordable—all value judgments
– if one apple a day keeps the doctor away, but
it takes three grapefruit, then…?
• no statistic or statistical test can make a
practical decision
• whether one risks being wrong cautiously (Type
I) or wrong incautiously (Type II) cannot be
decided absent cost and risk, needs, what’s at
stake etc.
• no statistical analysis better than numbers
(descriptions) fed into it: garbage in, garbage
out
statistical significance refers only to samples
from population
• it does not refer to size of effect—ceteris
paribus larger effects are more likely to be
statistically significant, but with large samples
very small effects will be
• if you have the population, any effects are real,
no matter the size
no proof in science:
• a statistically significant result (assuming
appropriate analysis etc.) does not prove that the
hypothesis is true, only that it has escaped
disconfirmation
• the more often an hypothesis passes the test and
the more demanding the tests it passes, the more
certain we can be that we know something—the
more we have reduced uncertainty
other terms
• parametric: assumes random sampling, from
distribution with known parameters, often normal
distribution
• nonparametric: when data do not come from
known distribution—often with nominal or ordinal
data
• robust test: accurate even when assumptions
violated
• effect size: too long and too often ignored—
journals now requiring estimates of effect size
thinking
simple statistical way to find out what people may not
willing to admit
• ask people to flip coin
– if head, answer “head: no answer”
– if tail and have done X, answer “head: no answer”
– if tail and have not done X, answer “no”
• thus, no’s an estimate of half who have not done x
• thus, N minus twice the number of “no’s” gives estimate of
those who have done X
Vogt
•
•
•
•
•
•
•
regression toward the mean
reliability
self-selection bias
sleeper effect
sociogram
spurious relation (or correlation)
suppressor variable
Sieber & Tolich: 8: Degrees of nondisclosure
1. double-blind study (neither research nor subject
knows)
2. researcher knows; subject does not
3. researcher knows; subject told she will not know
everything at the beginning, study has been
judged ethical, will be debriefed at the end.
• Hawthorne effect: short-lived increase in
performance due to an intervention.
• Pygmalion effect: expectation of the researcher
produces expected effects. (Rosenthal &
Jacobson [1986] strongly criticized for methods)
• I find the discussion on the second degree (pp.
143-147 hard to follow).
• third degree: list on pp. 147-148
• 4 defensible justifications for deception:
–
–
–
–
data unobtainable if subjects knew real purpose
to achieve stimulus control or random assignment
to study responses to low frequency events
to avoid serious risk
• dehoaxing: explaining procedure, carefully
• desensitizing: returning subject to frame of mind
at least as positive and constructive as when
subject entered study
Lit review
• review section
– review lit, follow explicit and logical scheme.
– 3-5 sections, with subsections, if useful
– end sections with a discussion
• discussion section
– synthesize the review (discussion of
discussions)
• conclusion section (< 1 p)
– address original question(s)
• personal reflections section (1 p)
– discuss briefly what you learned in the process
of doing the lit review
• references
– make sure all citations in references
– make sure all references cited
APA
• use first person to talk about yourself, not third,
e.g., “The researcher . . .”
• use we (us, our, etc.) only to refer to you and your
co-authors (69-70)
• do not italicize Latin abbreviations—e.g., et al.,
etc. and so on (only use within parens)
• for seriation see 63-65
• single quotation marks only within double
• periods & commas always inside quotation marks
• italicize new, technical, or key terms or labels the
first time, e.g., “The term peer response . . .”
(104-106)
• do not separate compound verbs with comma:
“She walked down the block past her house and
then turned into the driveway.”
• avoid beginning sentences with “however”
• avoid “throat-clearings” to begin sentences, e.g,
furthermore, therefore, also, additionally
colon (80-81)
• between a grammatically complete intro clause
and a final clause that illustrates, extends, or
amplifies the first. If second clause a complete
sentence, capitalize.
– Kelly presented two findings: Teachers
preferred . . .
• do not use a colon after intro that is not a
complete sentence
– The students were Ben, Akiko, Mustafa. . . .
this week free and cheap
• under construction