Theories - the Department of Psychology at Illinois State
Download
Report
Transcript Theories - the Department of Psychology at Illinois State
Using Statistics in Research
Psych 231: Research Methods in
Psychology
Announcements
I will be helping with statistical analyses of
group project data during this week’s labs.
– Enter data into SPSS datafile and e-mail it to me
– Bring raw data in organized fashion for easy entry
into SPSS
– Think about what the appropriate statistical test
should be IN ADVANCE of seeing me
Statistics
Why do we use them?
– Descriptive statistics
• Used to describe, simplify, & organize data sets
– Inferential statistics
• Used to test claims about the population, based on data
gathered from samples
• Takes sampling error into account, are the results above
and beyond what you’d expect by random chance
Distributions
Recall that a variable is a characteristic that
can take different values.
The distribution of a variable is a summary of
all the different values of a variable
– both type (each value) and token (each instance)
Distribution
Example: Distribution of scores on an exam
– A frequency histogram
Frequency
20
18
16
14
12
10
8
6
4
2
0
18
17
12
11
10
8
7
5
3
1
5054
5559
6064
6569
7074
7579
8084
8589
9094
95100
Distribution
Properties of a distribution
– Shape
• Symmetric v. asymmetric (skew)
• Unimodal v. multimodal
– Center
• Where most of the data in the distribution are
– Spread (variability)
• How similar/dissimilar are the scores in the distribution?
Distributions
A picture of the distribution is usually helpful
– Gives a good sense of the properties of the distribution
Many different ways to display distribution
– Graphs
• Continuous variable:
– histogram, line graph (frequency polygons)
• Categorical variable:
– pie chart, bar chart
– Table
• Frequency distribution table
• Stem and leaf plot
Graphs for continuous variables
Histogram
20
16
12
8
4
0
50.0
60.0
55.0
EXAM2
70.0
65.0
80.0
75.0
90.0
85.0
100.0
95.0
Line graph
Graphs for categorical variables
Bar chart
Pie chart
Cutting
Doe
Missing
Smith
Frequency distribution table
VAR00 003
Va lid
1.00
Fre quen cy
2
Percent
7.7
Va lid Perce nt
7.7
Cumu lati ve
Percent
7.7
2.00
3.00
4.00
3
3
5
11 .5
11 .5
19 .2
11 .5
11 .5
19 .2
19 .2
30 .8
50 .0
5.00
6.00
7.00
8.00
4
2
4
2
15 .4
7.7
15 .4
7.7
15 .4
7.7
15 .4
7.7
65 .4
73 .1
88 .5
96 .2
9.00
To tal
1
26
3.8
10 0.0
3.8
10 0.0
10 0.0
Values
(types)
Counts
Percentages
Descriptive statistics
In addition to pictures of the distribution, numerical
summaries are also presented.
Numeric Descriptive Statistics
– Shape:
• Skew (symmetry) & Kurtosis (flatness)
– Measures of Center:
• Mean
• Median
• Mode
– Measures of Variability (Spread)
• Standard deviation (variance)
• Range
Shape
Symmetric
Asymmetric
Positive Skew
tail
Negative Skew
tail
Shape
Unimodal (one mode)
Multimodal
– Bimodal examples
Center
There are three main measures of center
– Mean (M): the arithmetic average
• Add up all of the scores and divide by the total number
• Most used measure of center
– Median (Mdn): the middle score in terms of location
• The score that cuts off the top 50% of the from the bottom 50%
• Good for skewed distributions (e.g. net worth)
– Mode: the most frequent score
• Good for nominal scales (e.g. eye color)
• A must for multi-modal distributions
Spread (Variability)
How similar are the scores?
– Range: the maximum value - minimum value
• Only takes two scores from the distribution into account
• Influenced by extreme values (outliers)
– Standard deviation (SD): (essentially) the average
amount that the scores in the distribution deviate
from the mean
• Takes all of the scores into account
• Also influenced by extreme values (but not as much as
the range)
– Variance: standard deviation squared
Variability
Low variability
– The scores are fairly similar
mean
High variability
– The scores are fairly dissimilar
mean
Relationships between variables
Suppose that you notice that the more you study for an exam,
the better your score typically is. This suggests that there is a
relationship between study time and test performance.
Computation of the Correlation Coefficient (and regression) - a
numerical description of the relationship between two variables
May be used for
–
–
–
–
Prediction
Validity
Reliability
Theory verification
Correlation
For relationship between two continuous variables we
use Pearson’s r
(Pearson product-moment correlation)
It basically tells us how much our two variables vary
together
– As X goes up, what does Y typically do
• X, Y
• X, Y
• X, Y
Correlation
Properties of a correlation
– Form
• Linear
• Non-linear
– Direction
• Negative
• Positive
– Strength
• Ranges from -1 to +1, 0 means no relationship
Scatterplot
Plots one variable against the other
Useful for “seeing” the relationship
– Form, Direction, and Strength
Each point corresponds to a different
individual
Imagine a line through the data points
Scatterplot
Y
6
X
6
1
Y
6
2
5
6
3
3
4
2
3
2
1
5
4
1
2
3
4
5
6 X
Form
Linear
Non-linear
Direction
Negative
Positive
Y
• As X goes up, Y goes up
Y
X
X
• As X goes up, Y goes down
• X & Y vary in the same
direction
• X & Y vary in opposite
directions
• positive r
• negative r
Strength
Zero means “no relationship”.
– The farther the r is from zero, the stronger the
relationship
The strength of the relationship
– Spread around the line (note the axis scales)
r2
sometimes reported instead
– %variance in Y given X
Strength
r = -1.0
“perfect negative corr.”
r2 = 100%
-1.0
r = 0.0
“no relationship”
r2 = 0.0
0.0
r = 1.0
“perfect positive corr.”
r2 = 100%
+1.0
The farther from zero, the stronger the relationship
Strength
Rel A
Rel B
r = 0.5
r2 = 25%
r = -0.8
r2 = 64%
-.8
-1.0
.5
0.0
Which relationship is stronger?
Rel A, -0.8 is stronger than +0.5
+1.0
Regression
Compute the equation for the line that best
fits the data points
Y
6
5
Y = (X)(slope) + (intercept)
4
3
2
1
0.5
Change in Y
1
2
3
4
5
6 X
Change in X
2.0
= slope
Regression
4.5
Can make specific predictions about Y
based on X
Y
6
5
X=5
Y = (X)(.5) + (2.0)
Y=?
Y = (5)(.5) + (2.0)
Y = 2.5 + 2 = 4.5
4
3
2
1
1
2
3
4
5
6 X
Regression
Also need a measure of error
Y = X(.5) + (2.0) + error
Y = X(.5) + (2.0) + error
• Same line, but different relationships (strength difference)
Y
6
5
Y
6
5
4
3
2
1
4
3
2
1
1
2
3
4
5
6 X
1
2
3
4
5
6 X
Multiple regression
You want to look at how more than one
variable may be related to Y
The regression equation gets more complex
– X, Z, & W variables are used to predict Y
– e.g., Y = b1X + b2Z + b3W + b0 + error
Cautions with correlation and
regression
Don’t make causal claims
Don’t extrapolate
Extreme scores can strongly influence the
calculated relationship
Inferential Statistics
Why?
– Purpose: To make claims about populations based on data
collected from samples
What’s the big deal?
– Example Experiment:
•
•
•
•
Group A - gets treatment to improve memory
Group B - control, gets no treatment
After treatment period test both groups for memory
Results: Group A’s average memory score is 80%, while group B’s is
76%
• Is the 4% difference a “real” difference or is it just sampling error?
Testing Hypotheses
Step 1: State your hypotheses
– Null hypothesis (H0)
• There are no differences (effects)
• This is the hypothesis that you are testing
– Alternative hypothesis(ses)
• Generally, not all groups are equal
• You aren’t out to prove the alternative hypothesis (although it
feels like this is what you want to do)
• If you reject the null hypothesis, then you’re left with support for
the alternative(s) (NOT proof!)
Hypotheses
In our memory example experiment
– H0: mean of Group A = mean of Group B
– HA: mean of Group A ≠ mean of Group B
• (Or more precisely: Group A > Group B)
– It seems like our theory is that the treatment
should improve memory.
– That’s the alternative hypothesis. That’s NOT the
one the we’ll test with inferential statistics.
– Instead, we test the H0
Testing Hypotheses
Step 2: Set your decision criteria
– Your alpha level will be your guide for when to reject or fail to reject
the null hypothesis
Step 3: Collect your data from your sample(s)
Step 4: Compute your test statistics
– Descriptive statistics (means, standard deviations, etc.)
– Inferential statistics (t-tests, ANOVAs, etc.)
Step 5: Make a decision about your null hypothesis
– Reject H0
– Fail to reject H0
Statistical significance
“Statistically significant difference”
– When you reject your null hypothesis
– Essentially this means that the observed difference is above
what you’d expect by chance
– “Chance” is determined by estimating how much sampling
error there is
– Factors affecting “chance”
• Sample size
• Population variability
Sampling error
Population mean
Population
Distribution
x
N=1
Sampling error
(Pop mean - sample mean)
Sampling error
Population mean
Population
Distribution
Sample mean
x
N=2
x
Sampling error
(Pop mean - sample mean)
Sampling error
Population mean
Population
Sample mean
Distribution
x
N = 10
x
x
x x x
x
x xx
Sampling error
(Pop mean - sample mean)
Generally, as the sample size increases, the
sampling error decreases
Sampling error
Typically the narrower the population distribution, the
narrower the range of possible samples, and the smaller the
“chance”
Small population variability
Large population variability
Sampling distribution
The sampling distribution is a distribution of all
possible sample means of a particular sample size
that can be drawn from the population
Population
Distribution of
sample means
Samples
of size = n
XA XB XC XD
“chance”
Avg. Sampling
error
Error types
Based on the outcomes of the statistical tests
researchers will either:
– Reject the null hypothesis
– Fail to reject the null hypothesis
This could be correct conclusion or the incorrect
conclusion
– Two ways to go wrong
• Type I error: saying that there is a difference when there really
isn’t one
• Type II error: saying that there is not a difference when there
really is one
Error types
Real world (‘truth’)
H0 is
correct
Reject
H0
Experimenter’s
conclusions
Fail to
Reject
H0
H0 is
wrong
Type I
error
Type II
error
Error types: Courtroom analogy
Real world (‘truth’)
Defendant
is innocent
Defendant
is guilty
Type I error
Jury’s decision
Find
guilty
Type II error
Find not
guilty
Error types
Type I error: concluding that there is an effect (a difference
between groups) when there really isn’t.
–
–
–
–
Sometimes called “significance level”
We try to minimize this (keep it low)
Pick a low level of alpha
Psychology: 0.05 and 0.01 most common
Type II error: concluding that there isn’t an effect, when there
really is.
– Related to the Statistical Power of a test
– How likely are you able to detect a difference if it is there
1
Significance
“A statistically significant difference” means:
– the researcher is concluding that there is a difference above
and beyond chance
– with the probability of making a type I error at 5% (assuming
an alpha level = 0.05)
Note “statistical significance” is not the same thing as
theoretical significance.
– Only means that there is a statistical difference
– Doesn’t mean that it is an important difference
Non-Significance
Failing to reject the null hypothesis
– Generally, not interested in “accepting the null hypothesis”
(remember we can’t prove things only disprove them)
– Usually check to see if you made a Type II error (failed to
detect a difference that is really there)
• Check the statistical power of your test
– Sample size is too small
– Effects that you’re looking for are really small
• Check your controls, maybe too much variability
Inferential Statistical Tests
Different statistical tests
– “Generic test”
– T-test
– Analysis of Variance (ANOVA)
“Generic” statistical test
Tests the question:
– Are there differences between groups due to a treatment?
H0: is true (no treatment effect)
XA
XB
H0: is false (is a treatment effect)
XA
XB
“Generic” statistical test
XA
XB
Why might the samples be different?
(What is the source of the variability between
groups)?
– ER: Random sampling error
– ID: Individual differences (if between subjects
factor)
– TR: The effect of a treatment
“Generic” statistical test
XA
XB
The generic test statistic
Observed difference
Difference from chance
=
TR + ID + ER
ID + ER
“Generic” statistical test
The generic test statistic distribution
– To reject the H0, you want a computed test statistics that is large
– This large difference, reflects a large Treatment Effect (TR)
Distribution of
the test statistic
Reject H0
Fail to reject H0
1 tailed or 2 tailed
2-tailed tests “look” for any difference
1-tailed tests “look” for a difference in a specific direction (e.g.
“an increase”, “an impairment” …)
– Statistically more powerful
2-tailed test
Reject H0
Fail to reject H0
1-tailed test
Reject H0
Fail to reject H0
T-tests
Three types
– One sample
– 2-independent samples
– Repeated measures samples
T-distribution
– Centered on zero, negative and positive values
– Degrees of freedom
• Based on number of subjects in sample(s)
• Tell you what t-distribution to look at
Independent samples t-test
Design
– 2 separate groups of participants (e.g. control and treatment)
Degrees of freedom
– df = n1 + n2 - 2
Formula:
Xtreat - Xcontrol
T=
Diff by chance
Based on variability
and size of the samples
Independent samples t-test
Reporting your results
–
–
–
–
–
The observed difference
Kind of t-test
Computed T-statistic
Degrees of freedom for the test
The “p-value” of the test
“The mean of the treatment group was 12 points higher than the
control group. An independent samples t-test yielded a
significant difference, t(25) = 5.67, p < 0.05.”
Repeated measures t-test
Design
– 1 group of participants tested twice (e.g. pre-test and posttest)
Degrees of freedom
– df = n - 1 (where n = number of difference scores)
Formula:
Xpost - Xpre
T=
Diff by chance
Difference scores
Based on variability
and size of the sample
of “difference score”
Repeated measures t-test
Reporting your results
–
–
–
–
–
The observed difference
Kind of t-test
Computed T-statistic
Degrees of freedom for the test
The “p-value” of the test
“The mean score of the post-test was 12 points higher than the
pre-test. A repeated measures t-test demonstrated that this
difference was significant significant, t(25) = 5.67, p < 0.05.”
Analysis of Variance
XA
XB
XC
Designs
– More than two groups
• 1 Factor ANOVA, Factorial ANOVA
• Both Within and Between Groups Factors
Test statistic is an F-ratio
Degrees of freedom
– Several to keep track of
– Vary depending on the design
Analysis of Variance
XA
XB
XC
More than two groups
– Now we can’t just compute a simple difference score
since there are more than1 difference
– So we use variance instead of simply the difference
• Variance is essentially an average difference
Observed variance
F-ratio =
Variance from chance
1 factor ANOVA
XA
XB
XC
1 Factor, with more than two levels
– Now we can’t just compute a simple difference score
since there are more than1 difference
• A - B, B - C, & A - C
1 factor ANOVA
XA
XB
XC
Null hypothesis:
The ANOVA
tests this one!!
H0: all the groups are equal
XA = XB = XC
Alternative hypotheses
HA: not all the groups are equal
XA ≠ XB ≠ XC
XA = XB ≠ XC
XA ≠ XB = XC
XA = XC ≠ XB
1 factor ANOVA
Planned contrasts and post-hoc tests:
- Further tests used to rule out the different
Alternative hypotheses
XA ≠ XB ≠ XC
Test 1: A ≠ B
Test 2: A ≠ C
Test 3: B = C
XA = XB ≠ XC
XA ≠ XB = XC
XA = XC ≠ XB
1 factor ANOVA
Reporting your results
–
–
–
–
–
–
The observed difference
Kind of test
Computed F-ratio
Degrees of freedom for the test
The “p-value” of the test
Any post-hoc or planned comparison results
“The mean score of Group A was 12, Group B was 25, and
Group C was 27. A 1-way ANOVA was conducted and the
results yielded a significant difference, F(2,25) = 5.67, p < 0.05.
Post hoc tests revealed that the differences between groups A
and B and A and C were statistically reliable (respectively t(1) =
5.67, p < 0.05 & t(1) = 6.02, p <0.05). Groups B and C did not
differ significantly from one another”
Factorial ANOVAs
We covered much of this in our experimental design lecture
More than one factor
– Factors may be within or between
– Overall design may be entirely within, entirely between, or mixed
Many F-ratios may be computed
– An F-ratio is computed to test the main effect of each factor
– An F-ratio is computed to test each of the potential interactions
between the factors
Factorial ANOVA
Reporting your results
– The observed differences
• Because there may be a lot of these, may present them in a table
instead of directly in the text
– Kind of design
• e.g. “2 x 2 completely between factorial design”
– Computed F-ratios
• May see separate paragraphs for each factor, and for interactions
– Degrees of freedom for the test
• Each F-ratio will have its own set of df’s
– The “p-value” of the test
• May want to just say “all tests were tested with an alpha level of
0.05)
– Any post-hoc or planned comparison results
• Typically only the theoretically interesting comparisons are
presented