Lecture 11c - Statistics I

Download Report

Transcript Lecture 11c - Statistics I

Going from data to analysis
Dr. Nancy Mayo
Getting it right
Research is about getting the right
answer, not just an answer
An answer is easy
The right answer is hard to find
Types of Questions
About hypotheses
Is treatment A better than treatment B?
Answer: Yes or No
About parameters
What is the extent to which treatment A improves
outcome in comparison to treatment B?
Answer: A number / value (parameter)
© Nancy E. Mayo
Research is about relationships
Links one variable or factor to another
One is thought or supposed
(hypothesized) to be the “cause” of the
second variable
What’s in a name?
Discipline
Cause
Effect
Epidemiology
Exposure
Outcome
Medical/clinical
Risk factor
Disease
Psychology
Independent
Dependent
Statistical
Stimulus
Response
Mathematical
X
y
Why do I need statistics?
Reduce data
Define relationships
Make inferences from your sample to
the population
61103120112311111211111112121111222
62102231222221222221211122233333333
63203229112221122111111111111121111
64103241111111133111111111121122233
65203220111331332312211112111121212
66214141122321321221221211221122232
67103241111111111111111111122911123
68103220111211111111111111111111111
69203220121321324421113412342244213
70102241122211232111121111222222333
71202431111111133311111111111111111
72103141111311122211111111133332232
73113120111321111111111111111113312
74203441133421422212233313441244443
75104341111211112211121211311113223
76202441111111111211111111131114224
77202141112421311213411211131111113
78103220111111122111112111221111222
79112240221221211211111112221111121
80113241111411244121111111211111234
81112120211111111111111111133323334
82101120111111111111191111111111111
83102320211221122111111212132333942
Y, outcome, dependent variable
Linear
None
X, exposure, independent variable
Y, outcome, dependent variable
Linear
None
X, exposure, independent variable
Only linear relationships can be
examined by correlation
Y, outcome, dependent variable
Linear
None
X, exposure, independent variable
Inference from Sample to Population
Need stats
Population
Target
Available
Sample
©Nancy E. Mayo 2004
What kind of statistics do I
need?
Depends on your DATA
Measured
Counted
Only 2 kinds of data
Measured = Continuous
– can take on any value the precision of which depends
upon the calibration of your measurement device
– Distribution is expected to be normal
Counted = Categorical (values are fixed)
– Binary (dichotomous) Polychotomous
– Ordinal
ranked (need for assistance)
interval (categories are equally spaced: falls)
ratio (there is a natural 0 )
– Nominal – named values, no order (diagnosis)
Your Job
When reading an article (later doing your
own research)
IDENTIFY THESE VARIABLES
IDENTIFY WHAT SCALE THEY ARE
MEASURED ON
MATCH DATA TO ANALYSIS
Quantitative Research
The answer to the question is
found in the tables
What tables should I find in an
article
Table 1 – basic characteristics sample
Table 2 – outcomes / exposures
Table 3 - answer the main question
– Relationship between exposure and outcome
Table 4 – interesting subgroup
What tables should I find in an
article
Table 1 – characteristics of the sample on
features relating to target and available
population
Table 2 – distribution of the sample on
exposure and outcome variables
Table 3 - relationship between the
exposure and outcome
Table 4 – interesting sub-groups
What kind of statistics should
I find in these Tables?
What kind of statistics are
there?
Depends on your DATA
Depends on your QUESTION
Data
Uses
Reduce Data
(Descriptive)
Continuous
Means (SD)
medians (percentiles,
range)
Define relationships Scatter plot
Linear (Pearson
correlation)
Make inferences
(Simple univariate
(bivariate)
Multivariate
t-test independent
paired t-test
ANOVA
multiple linear
regression
Categorical
Proportions
Histogram
Correlation
(Spearman ranked )
Relative risk
Chi-square test
McNemar’s test
Logistic regression
Standard Normal Distribution
Showing the proportion of the population that
lies within 1, 2 and 3 SD (Wikipedia)
Questions
Question
Test or parameter
Significance
HYPOTHESIS
PARAMETER
Questions is
answered by YES or
NO
Value of the test has
no meaning (t-test, F
test)
P –value (probability
that what you
observed occurred
by chance alone)
Question demands a
numeric response
Difference between
two means,
rate or a risk
95% confidence intervals
(with studies of this
nature, 95% of the time
the mean will lie within
this interval)
Lets look at Table 1
Uses
Reduce Data
(Descriptive)
Continuous
Categorical
Means (SD)
Proportions
medians (percentiles,
range)
Data
Uses
Continuous
Define relationships Scatter plot
Linear (Pearson
correlation)
Go to internet: scatter plot
Got to internet: histogram
Categorical
Histogram
Correlation
(Spearman ranked )
Relative risk
Probability
Degree of likelihood that something will happen.
Statistical probabilities are expressed as as
decimals 0.5, 0.25, 0.75 between 0 and 1.
For example, a probability of 0 means that
something can never happen; a probability of 1
means that something will always happen.
The probability of an event is calculated as
follows:
– n favourable outcomes / n of all possible outcomes
The probability of getting heads in one toss is:
p(heads) = 1/(1 + 1) = 1⁄2.
Statistical probability
Probability that what you observed could
have occurred by chance
Wish that to be a very small number
By convention: p < 0.05 is considered very
unlikely to have occurred by chance
Means that in studies like this, an
observation this extreme or more extreme
would occur by chance alone only in 5 of
100 studies
Remember: one study is only a
Unlikely to
sample
have occurred
by chance, the
assumption is
that it occurred
because of
something
done in the
study
Likely to occurred
by chance;
unlikely to be
because of
anything that was
done in the study
When you start a study, there
are risks
Probability that you are one of the
yellow
studies
You conclude that there was an effect when
there was not
Type I or alpha error
By convention, we set this risk at 5 chances out
of 100 or p=0.05
Any finding that has a p value associated with it
of <0.05 is considered statistically significant
(unlikely to have occurred by chance alone)
Correlation
>0.8
strong
0.5 to 0.8
moderate
<0.5
weak
Correlation
What proportion of outcome is explained
by the exposure?
ANSWER: r2
r = 0.5 (moderate) r2 = 0.25 (not much)
r = 0.9 (strong) r2 = 0.81 (still a lot)
r = 0.3 (weak) r2 = 0.09 (almost nothing)
Measuring Effects
Effect
Post-only
Groups similar at baseline so effect of I will
be observed at t=post. Assumes pre value
unimportant; event dara (eg. Falls)
Change pre
to post
Assumes pre value unimportant; reduces
variability as a change value can occur in
different ways; analyses based on
explaining variability
Change pre
to follow up
Often addresses maintenance of effects
Growth
Longitudinal change; good for interventions
over long term or with multiple
measurements (4 or more ideal); cpre-value
Nancy E. Mayo (Nov
2005)
is considered
RCT’s are Longitudinal Designs
Analyses of post only or change are crosssectional
Time may be important
Effect of intervention may depend on time
c Nancy E. Mayo (Nov
2005)
Estimating Effects
Time: pre / post
Time effect = impact of time averaged over
group
Group: Intervention Control
At baseline, groups are equal
Group effect= effect of group averaged over
time, as baseline is equal, group effect can only
be due to post-score
Group * Time: does the effect of group depend
on time
c Nancy E. Mayo (Nov
2005)
Main Effect of Group
Effect
Group effect (averaged over
time)
X
X
X
Time
X
}
c Nancy E. Mayo (Nov
2005)
Main Effect of Time
Effect
Time effect (averaged over
group)
a
a
X
X
X
Time
a
X
c Nancy E. Mayo (Nov
2005)
Group*Time Effect
Effect
The effect of group depended on the
time: same at baseline but increasingly
different over time
X
}
}
X
X
Time
}
X
c Nancy E. Mayo (Nov
2005)
95% CI
Mean ± 1.96 X SE
SE = SD / sqrt N (number of subjects)
1.96 is the area under the curve of a standard normal
(mean of 0 and sd 1) distribution that is outside of the
95% range
Interpretation of 95% CI
With 100 studies like this one
The mean change in PPT will lie
Between the 95% confidence bounds
95 times out of 100
Likely that a gain will be between 4 and 8
units of change
Linking Data to Statistics
Outcome
Exposure1
Exposure2
Exposure3