Inferential Statistics (K-19) - University of Illinois Urbana

Download Report

Transcript Inferential Statistics (K-19) - University of Illinois Urbana

class 7: 10/22/12
intro to statistical methods
cont.
• Being wrong in science is fine, and even
necessary—as long as scientists recognize that
they blew it, report their mistake openly instead
of disguising it as a success, and then move on to
the next thing—until they come up with the very
occasional breakthrough. But as long as careers
remain contingent on producing a stream of
research that’s dressed up to seem more right
than it is, scientists will keep delivering exactly
that.
Science is a noble endeavor, but it is also a
low-yield endeavor. I’m not sure that more than a
very small percentage of medical research is
ever likely to lead to major improvements in
clinical outcomes and quality of life. We should
be very comfortable with that fact. (p. 86)
Friedman, David H. (2010, November). Lies, damned lies, and
medical science. The Atlantic, 306(4), 76-86
all researchers must learn the trick and
avoid the mistake
• trick: begin with the question and
then to figure out the best way(s) to
answer that question
• mistake: begin with a specific method
and fit the question to that method
more on models
• models should meet three criteria:
– generality, precision, accuracy
• can generally satisfy any two, at the cost of
sacrificing the third.
– climatology settles for generality & accuracy
– ecologists focusing particular species, for
precision & accuracy
– rigorous history & ethnography often give up
generality for precision & accuracy—results
can still be important
•
Kitcher, Philip. (2012, May 24). The trouble with scientism: Why
history and the humanities are also a form of knowledge. The New
Republic, 243, 20-25)
• research using
– measurement description
– statistical analysis
critical for answering certain kinds of
important questions
strengths of measurement description
• precise descriptions
• often efficient—one can make confident
predictions based on relatively small
samples—if samples good
• increasingly sophisticated ways of
analyzing measurement data
• powerful stat packages now available for
desktop computers, e.g, Systat, SPSS,
SAS
cautions
• measure only what can be measured
– “to replace the unmeasureable with the
unmeaningful is not progress” (Achen,
1977, p. 806)
• value precision but realize that a precise
description may not be an accurate one
• scientific method (drawing inferences
from observations) comprises many
research methods—its strength does not
come from any one specific method
my personal recommendations
• whatever your Ph.D. Research
Specialization take at least one stat
course, preferably 2 or 3
• whatever your methodological expertise,
find people with similar interests but
different methodological expertise and
work with them—the best research
often uses many approaches
• The statistician knows, for example, that in
nature there never was a normal
distribution, there never was a straight
line, yet with normal and linear assumptions,
known to be false, he can often derive
results which match, to a useful
approximation, those found in the real
world. (Box, p. 792)
• All models are false, but some are useful.
– Box, George E. P. (1976). Science and statistics. Journal
of the American Statistical Association, 71, 791-799.
a caution
• Statistics today is in a conceptual and
theoretical mess. The discipline is divided
into two rival camps, the frequentists and
the Bayesians, and neither camp offers
the tools that science needs for
objectively representing and interpreting
statistical data as evidence. (Royall, pp.
127-128)
– Royall, Richard (2004). The likelihood paradigm for statistical evidence.
In M. L. Taper & S. R. Lele (Eds.), The nature of scientific evidence:
Statistical, philosophical, and empirical considerations (pp. 199-152).
Chicago: University of Chicago Press.
• It is possible to spend a lifetime analysing
data without realising that there are two
very different fundamental approaches to
statistics: Bayesianism and Frequentism.
• Bayesians address the question everyone
is interested in, by using assumptions noone believes
• Frequentists use impeccable logic to deal
with an issue of no interest to anyone
(Louis Lyons, 2007)
K ch 19: inferential statistics
• inferential statistics allow one to infer the
characteristics of a population from a
representative sample
– estimate characteristics of population
within a determined range with a given
probability
– determine (in general) with a given
probability whether effect beyond
sampling and chance error exists
• parameters: refer to population
• statistics: refer to sample
• sampling distribution: descriptive statistic
calculated from repeated sampling
• confidence intervals: range that includes
the population value with a given
probability
confidence level:
• the probability that the interval will
contain the population value:
conventionally 68%, 95%, and 99% (2 to 1,
19 to 1, 99 to 1 respectively)
• the wider the interval the more likely it
contains the population value (and the less
valuable the information)
• hypothesis testing (traditionally takes
form of rejecting the null hypothesis, i.e.,
that there is no effect beyond sampling
and chance error)
• alpha level: the risk the result is due to
chance; set by the researcher in advance,
traditionally .10, .05, .01, .001 (N.B., no
good reason for these and not others)
• p-level: the actual probability level found,
which is then compared to the alpha level
two-tailed test:
• non-directional, puts the alpha level at
both ends. used when one does not expect
results in one direction
one-tailed test:
• directional, puts alpha level at one end
(determined by researcher). increases
probability of finding statistically
significant result
common statistical tests
t test of difference between means
• common and simple test for differences
between means of two groups
chi-square
• common test for categorical data and
frequencies
– are cell values different from what
would be expected
chi-square examples
Jefferson & Madison Combined
Years in Kindergarten by SES
2-year
1-year
total
poor*
24
6
30
not poor
34
65
99
total
58
71
129
chi-square: 19.4 (1 df) p < .0001
* eligible for free or reduced lunch
Jefferson and Madison Combined
Years in Kindergarten by race
non-white
2-year
10
1-year
9
total
19
white
48
62
110
chi-square: .530 (1 df) p < .5
total
58
71
129
ANOVA (analysis of variance)
• experimental designs where two or more
groups or multiple conditions are being
compared (common in psychology and ed
psych, and in educational research in
general)
• powerful:
– accurate measure of error variance
– tests significance of each variable as well
as combined effect,
– avoids inflation of probabilities problem
(not in K)
regression analysis
• explains (predicts) variability of a
dependent variable using information
about one or more independent variables.
• predicts expected change in dependent
variable given specific changes in the
independent variable
• not used in educational research as much
as ANOVA, but more useful for policy
purposes
regression example
achievement*= 77.5 - .8 SES**
*combined math & reading scores, ITBS
** percent of low income students
errors of inference
• type I error (alpha error): a concern when
theory testing (K, “when validating a
finding”)
• type II error (beta error): a concern when
theory building (K: “when exploring”)
• decreasing the probability of one type
increases the probability of the other
• pointless to talk about Type I or II error
absent discussion of what is at stake
cost of type I error in theory testing
• dominant theory not challenged
• knowledge production stopped
cost of type II error in theory building
• possibly important explanations etc.
ignored
• knowledge production stopped
(one of the many challenges the late and
great Lee Cronbach (1916-2001) made to
the accepted wisdom of the day)
statistical power: 1-beta
• increasing statistical power:
– increase size of effect (stronger
treatment)
– increase sample size
– reduce variability
statistical & practical significance
• statistical: confidence at a given
probability that result is not due to
chance
• practical: is the result important enough,
big enough, feasible, affordable—all value
judgments
– if one apple a day keeps the doctor
away, but it takes three grapefruit,
then…?
• no statistic or statistical test can make a
practical decision
• whether one risks being wrong cautiously
(type I) or wrong incautiously (type II)
cannot be decided absent cost and risk,
needs, what’s a stake etc
• no statistical analysis better than
numbers (descriptions) fed into it:
garbage in, garbage out
statistical significance refers only to
samples from population
• it does not refer to size of effect—
ceteris paribus larger effects are more
likely to be statistically significant, but
with large samples very small effects will
be
• if you have the population, any effects are
real, no matter the size
no proof in science:
• a statistically significant result (assuming
appropriate analysis etc) does not prove
that the hypothesis is true, only that it
has escaped disconfirmation
• the more often an hypothesis passes the
test and the more demanding the tests it
passes, the more certain we can be that
we know something—the more we have
reduced uncertainty
other terms
• parametric: assumes random sampling,
from distribution with known parameters,
often normal distribution
• nonparametric: when data do not come
from known distribution—often with
nominal or ordinal data
• robust test: accurate even when
assumptions violated
• effect size: too long and too often
ignored—journals now requiring
estimates of effect size
thinking
simple statistical way to find out what
people may not willing to admit
• ask people to flip coin
– if head, answer “head: no answer”
– if tail and have done X, answer “head:
no answer”
– if tail and have not done X, answer “no”
• thus, no’s an estimate of half who have not
done x
• thus, N minus twice the number of “no’s”
gives estimate of those who have done X
Monte Hall Problem
1. behind door: Lamborghini
Monte reveals either goat
switching loses
2. behind door: goat A
Monte must reveal Goat B
switching wins
3. behind door: goat B
Monte must reveal Goat A
switching wins
•
initially player has .33 chance of selecting the
car, Goat A, or Goat B. Switching results in a
win 2/3 of the time
Vogt
•
•
•
•
•
•
•
nominal scale
operational definition
outlier
parsimony
path diagram
practical significance
Pygmalion effect
Vogt
• regression toward the mean
• reliability
• sample space
• sampling frame
• scatter plot
• self-selection bias
• sleeper effect
• sociogram
• spurious relation (or correlation)
• suppressor variable
Sieber ch 6: Strategies for Assuring
Confidentiality
6.1 Confidentiality refers to agreements
with people about what can be done with
data
• states steps will be taken to insure privacy
• states legal limitations to assurances of
confidentiality
6.2 why an issue (be able to discuss the
cases)
6.3 confidentiality or anonymity
6.4 procedural approaches to assuring
confidentiality
6.4.1 cross-sectional research
– anonymity
– temporarily identified responses
– separately identified responses
6.4.2 longitudinal data (requires links)
– aliases
6.4.3 interfile linkage
6.5 statistical strategies for assuring
confidentiality (coin flip example)
6.6 certificates of confidentiality
– researchers do NOT have testimonial
privilege unless they have certificate of
confidentiality from Dept of Health and
Human Services
6.7 confidentiality and consent:
– consent statement must specify
promises of confidentiality researcher
cannot make—be aware of state
reporting laws, e.g., on child abuse
6.8 data sharing
– when data shared publicly, all
identifiers must be removed and
researcher must ensure no way to
deduce identity
– techniques
lit review
• review section
– review lit, follow explicit and logical
scheme.
– 3-5 sections, with subsections if useful
– end sections and subsections with a
discussion
• discussion section
– synthesize the review (discussion of
discussions)
• conclusion section (< 1 p)
– address original question(s)
• personal reflections section (1 p)
– discuss briefly what you learned in the
process of doing the lit review
• references
– make sure all citations in references
– make sure all references cited
• additional references (optional)
– references not cited but which you want
to record
APA
• use first person to talk about yourself, not
third, e.g., “The researcher . . .”
• use we (us, our, etc.) only to refer to you
and your co-authors (69-70)
• do not italicize Latin abbreviations—e.g.,
et al., etc. and so on
• for seriation see 63-65
• single quotation marks only within double
• periods and commas always inside
quotation marks
• italicize new, technical, or key terms or
labels the first time, e.g., “The term peer
response . . .” (104-106)
• do not separate compound verbs with
comma: “She walked down the block past
her house and then turned into the
driveway.”
• avoid beginning sentences with “however”
• avoid “throat-clearings” to begin sentences,
e.g, furthermore, therefore, also,
additionally
colon (80-81)
• between a grammatically complete intro
clause and a final clause that illustrates,
extends, or amplifies the first. If second
clause a complete sentence, capitalize.
– Kelly presented two findings: Teachers
preferred . . .
• do not use a colon after intro that is not
a complete sentence
– The students were Ben, Akiko, Mustafa.
...
this week free and cheap
• wed: 9 Billion People and 1 Earth. Andrew
Revkin, Pace University. 4pm, Alice
Campbell Alumni Center, 601 S. Lincoln.
free, reception to follow
• thurs: Krannert Uncorked, 5pm, free.
• thurs: Creating Community through African
Art. Krannert Art Museum, Gallery, free.
• thurs: When China Met Africa. film. 7pm,
Urbana Free Library, free.
• sat: Brahms Instrumental Music with Piano,
Ian Hobson. 7:30, Smith Hall. $5-10.