Theories - the Department of Psychology at Illinois State

Download Report

Transcript Theories - the Department of Psychology at Illinois State

Using Statistics in Research
Psych 231: Research Methods in
Psychology
Announcements

I will be helping with statistical analyses of
group project data during this week’s labs.
– Enter data into SPSS datafile and e-mail it to me
– Bring raw data in organized fashion for easy entry
into SPSS
– Think about what the appropriate statistical test
should be IN ADVANCE of seeing me
Statistics

Why do we use them?
– Descriptive statistics
• Used to describe, simplify, & organize data sets
– Inferential statistics
• Used to test claims about the population, based on data
gathered from samples
• Takes sampling error into account, are the results above
and beyond what you’d expect by random chance
Distributions

Recall that a variable is a characteristic that
can take different values.
 The distribution of a variable is a summary of
all the different values of a variable
– both type (each value) and token (each instance)
Distribution
Example: Distribution of scores on an exam
– A frequency histogram
Frequency

20
18
16
14
12
10
8
6
4
2
0
18
17
12
11
10
8
7
5
3
1
5054
5559
6064
6569
7074
7579
8084
8589
9094
95100
Distribution

Properties of a distribution
– Shape
• Symmetric v. asymmetric (skew)
• Unimodal v. multimodal
– Center
• Where most of the data in the distribution are
– Spread (variability)
• How similar/dissimilar are the scores in the distribution?
Distributions

A picture of the distribution is usually helpful
– Gives a good sense of the properties of the distribution

Many different ways to display distribution
– Graphs
• Continuous variable:
– histogram, line graph (frequency polygons)
• Categorical variable:
– pie chart, bar chart
– Table
• Frequency distribution table
• Stem and leaf plot
Graphs for continuous variables


Histogram
20
16
12
8
4
0
50.0
60.0
55.0
EXAM2
70.0
65.0
80.0
75.0
90.0
85.0
100.0
95.0
Line graph
Graphs for categorical variables


Bar chart
Pie chart
Cutting
Doe
Missing
Smith
Frequency distribution table
VAR00 003
Va lid
1.00
Fre quen cy
2
Percent
7.7
Va lid Perce nt
7.7
Cumu lati ve
Percent
7.7
2.00
3.00
4.00
3
3
5
11 .5
11 .5
19 .2
11 .5
11 .5
19 .2
19 .2
30 .8
50 .0
5.00
6.00
7.00
8.00
4
2
4
2
15 .4
7.7
15 .4
7.7
15 .4
7.7
15 .4
7.7
65 .4
73 .1
88 .5
96 .2
9.00
To tal
1
26
3.8
10 0.0
3.8
10 0.0
10 0.0
Values
(types)
Counts
Percentages
Descriptive statistics


In addition to pictures of the distribution, numerical
summaries are also presented.
Numeric Descriptive Statistics
– Shape:
• Skew (symmetry) & Kurtosis (flatness)
– Measures of Center:
• Mean
• Median
• Mode
– Measures of Variability (Spread)
• Standard deviation (variance)
• Range
Shape

Symmetric

Asymmetric
Positive Skew
tail
Negative Skew
tail
Shape

Unimodal (one mode)

Multimodal
– Bimodal examples
Center

There are three main measures of center
– Mean (M): the arithmetic average
• Add up all of the scores and divide by the total number
• Most used measure of center
– Median (Mdn): the middle score in terms of location
• The score that cuts off the top 50% of the from the bottom 50%
• Good for skewed distributions (e.g. net worth)
– Mode: the most frequent score
• Good for nominal scales (e.g. eye color)
• A must for multi-modal distributions
Spread (Variability)

How similar are the scores?
– Range: the maximum value - minimum value
• Only takes two scores from the distribution into account
• Influenced by extreme values (outliers)
– Standard deviation (SD): (essentially) the average
amount that the scores in the distribution deviate
from the mean
• Takes all of the scores into account
• Also influenced by extreme values (but not as much as
the range)
– Variance: standard deviation squared
Variability

Low variability
– The scores are fairly similar
mean

High variability
– The scores are fairly dissimilar
mean
Relationships between variables



Suppose that you notice that the more you study for an exam,
the better your score typically is. This suggests that there is a
relationship between study time and test performance.
Computation of the Correlation Coefficient (and regression) - a
numerical description of the relationship between two variables
May be used for
–
–
–
–
Prediction
Validity
Reliability
Theory verification
Correlation

For relationship between two continuous variables we
use Pearson’s r
(Pearson product-moment correlation)

It basically tells us how much our two variables vary
together
– As X goes up, what does Y typically do
• X, Y
• X, Y
• X, Y
Correlation

Properties of a correlation
– Form
• Linear
• Non-linear
– Direction
• Negative
• Positive
– Strength
• Ranges from -1 to +1, 0 means no relationship
Scatterplot

Plots one variable against the other
 Useful for “seeing” the relationship
– Form, Direction, and Strength

Each point corresponds to a different
individual
 Imagine a line through the data points
Scatterplot
Y
6
X
6
1
Y
6
2
5
6
3
3
4
2
3
2
1
5
4
1
2
3
4
5
6 X
Form
Linear
Non-linear
Direction
Negative
Positive
Y
• As X goes up, Y goes up
Y
X
X
• As X goes up, Y goes down
• X & Y vary in the same
direction
• X & Y vary in opposite
directions
• positive r
• negative r
Strength

Zero means “no relationship”.
– The farther the r is from zero, the stronger the
relationship

The strength of the relationship
– Spread around the line (note the axis scales)
 r2
sometimes reported instead
– %variance in Y given X
Strength
r = -1.0
“perfect negative corr.”
r2 = 100%
-1.0
r = 0.0
“no relationship”
r2 = 0.0
0.0
r = 1.0
“perfect positive corr.”
r2 = 100%
+1.0
The farther from zero, the stronger the relationship
Strength
Rel A
Rel B
r = 0.5
r2 = 25%
r = -0.8
r2 = 64%
-.8
-1.0
.5
0.0
Which relationship is stronger?
Rel A, -0.8 is stronger than +0.5
+1.0
Regression

Compute the equation for the line that best
fits the data points
Y
6
5
Y = (X)(slope) + (intercept)
4
3
2
1
0.5
Change in Y
1
2
3
4
5
6 X
Change in X
2.0
= slope
Regression

4.5
Can make specific predictions about Y
based on X
Y
6
5
X=5
Y = (X)(.5) + (2.0)
Y=?
Y = (5)(.5) + (2.0)
Y = 2.5 + 2 = 4.5
4
3
2
1
1
2
3
4
5
6 X
Regression

Also need a measure of error
Y = X(.5) + (2.0) + error
Y = X(.5) + (2.0) + error
• Same line, but different relationships (strength difference)
Y
6
5
Y
6
5
4
3
2
1
4
3
2
1
1
2
3
4
5
6 X
1
2
3
4
5
6 X
Multiple regression

You want to look at how more than one
variable may be related to Y
 The regression equation gets more complex
– X, Z, & W variables are used to predict Y
– e.g., Y = b1X + b2Z + b3W + b0 + error
Cautions with correlation and
regression
Don’t make causal claims
 Don’t extrapolate
 Extreme scores can strongly influence the
calculated relationship

Inferential Statistics

Why?
– Purpose: To make claims about populations based on data
collected from samples

What’s the big deal?
– Example Experiment:
•
•
•
•
Group A - gets treatment to improve memory
Group B - control, gets no treatment
After treatment period test both groups for memory
Results: Group A’s average memory score is 80%, while group B’s is
76%
• Is the 4% difference a “real” difference or is it just sampling error?
Testing Hypotheses

Step 1: State your hypotheses
– Null hypothesis (H0)
• There are no differences (effects)
• This is the hypothesis that you are testing
– Alternative hypothesis(ses)
• Generally, not all groups are equal
• You aren’t out to prove the alternative hypothesis (although it
feels like this is what you want to do)
• If you reject the null hypothesis, then you’re left with support for
the alternative(s) (NOT proof!)
Hypotheses

In our memory example experiment
– H0: mean of Group A = mean of Group B
– HA: mean of Group A ≠ mean of Group B
• (Or more precisely: Group A > Group B)
– It seems like our theory is that the treatment
should improve memory.
– That’s the alternative hypothesis. That’s NOT the
one the we’ll test with inferential statistics.
– Instead, we test the H0
Testing Hypotheses

Step 2: Set your decision criteria
– Your alpha level will be your guide for when to reject or fail to reject
the null hypothesis


Step 3: Collect your data from your sample(s)
Step 4: Compute your test statistics
– Descriptive statistics (means, standard deviations, etc.)
– Inferential statistics (t-tests, ANOVAs, etc.)

Step 5: Make a decision about your null hypothesis
– Reject H0
– Fail to reject H0
Statistical significance

“Statistically significant difference”
– When you reject your null hypothesis
– Essentially this means that the observed difference is above
what you’d expect by chance
– “Chance” is determined by estimating how much sampling
error there is
– Factors affecting “chance”
• Sample size
• Population variability
Sampling error
Population mean
Population
Distribution
x
N=1
Sampling error
(Pop mean - sample mean)
Sampling error
Population mean
Population
Distribution
Sample mean
x
N=2
x
Sampling error
(Pop mean - sample mean)
Sampling error
Population mean
Population
Sample mean
Distribution
x
N = 10
x
x
x x x
x
x xx
Sampling error
(Pop mean - sample mean)
 Generally, as the sample size increases, the
sampling error decreases
Sampling error

Typically the narrower the population distribution, the
narrower the range of possible samples, and the smaller the
“chance”
Small population variability
Large population variability
Sampling distribution

The sampling distribution is a distribution of all
possible sample means of a particular sample size
that can be drawn from the population
Population
Distribution of
sample means
Samples
of size = n
XA XB XC XD
“chance”
Avg. Sampling
error
Error types

Based on the outcomes of the statistical tests
researchers will either:
– Reject the null hypothesis
– Fail to reject the null hypothesis

This could be correct conclusion or the incorrect
conclusion
– Two ways to go wrong
• Type I error: saying that there is a difference when there really
isn’t one
• Type II error: saying that there is not a difference when there
really is one
Error types
Real world (‘truth’)
H0 is
correct
Reject
H0
Experimenter’s
conclusions
Fail to
Reject
H0
H0 is
wrong
Type I
error

Type II
error

Error types: Courtroom analogy
Real world (‘truth’)
Defendant
is innocent
Defendant
is guilty
Type I error
Jury’s decision
Find
guilty
Type II error
Find not
guilty
Error types

Type I error: concluding that there is an effect (a difference
between groups) when there really isn’t.
–
–
–
–

Sometimes called “significance level”
We try to minimize this (keep it low)
Pick a low level of alpha
Psychology: 0.05 and 0.01 most common
Type II error: concluding that there isn’t an effect, when there
really is.
– Related to the Statistical Power of a test
– How likely are you able to detect a difference if it is there
1
Significance

“A statistically significant difference” means:
– the researcher is concluding that there is a difference above
and beyond chance
– with the probability of making a type I error at 5% (assuming
an alpha level = 0.05)

Note “statistical significance” is not the same thing as
theoretical significance.
– Only means that there is a statistical difference
– Doesn’t mean that it is an important difference
Non-Significance

Failing to reject the null hypothesis
– Generally, not interested in “accepting the null hypothesis”
(remember we can’t prove things only disprove them)
– Usually check to see if you made a Type II error (failed to
detect a difference that is really there)
• Check the statistical power of your test
– Sample size is too small
– Effects that you’re looking for are really small
• Check your controls, maybe too much variability
Inferential Statistical Tests

Different statistical tests
– “Generic test”
– T-test
– Analysis of Variance (ANOVA)
“Generic” statistical test

Tests the question:
– Are there differences between groups due to a treatment?
H0: is true (no treatment effect)
XA
XB
H0: is false (is a treatment effect)
XA
XB
“Generic” statistical test
XA

XB
Why might the samples be different?
(What is the source of the variability between
groups)?
– ER: Random sampling error
– ID: Individual differences (if between subjects
factor)
– TR: The effect of a treatment
“Generic” statistical test
XA

XB
The generic test statistic
Observed difference
Difference from chance
=
TR + ID + ER
ID + ER
“Generic” statistical test

The generic test statistic distribution
– To reject the H0, you want a computed test statistics that is large
– This large difference, reflects a large Treatment Effect (TR)
Distribution of
the test statistic
Reject H0
Fail to reject H0
1 tailed or 2 tailed


2-tailed tests “look” for any difference
1-tailed tests “look” for a difference in a specific direction (e.g.
“an increase”, “an impairment” …)
– Statistically more powerful
2-tailed test
Reject H0
Fail to reject H0
1-tailed test
Reject H0
Fail to reject H0
T-tests

Three types
– One sample
– 2-independent samples
– Repeated measures samples

T-distribution
– Centered on zero, negative and positive values
– Degrees of freedom
• Based on number of subjects in sample(s)
• Tell you what t-distribution to look at
Independent samples t-test

Design
– 2 separate groups of participants (e.g. control and treatment)

Degrees of freedom
– df = n1 + n2 - 2

Formula:
Xtreat - Xcontrol
T=
Diff by chance
Based on variability
and size of the samples
Independent samples t-test

Reporting your results
–
–
–
–
–

The observed difference
Kind of t-test
Computed T-statistic
Degrees of freedom for the test
The “p-value” of the test
“The mean of the treatment group was 12 points higher than the
control group. An independent samples t-test yielded a
significant difference, t(25) = 5.67, p < 0.05.”
Repeated measures t-test

Design
– 1 group of participants tested twice (e.g. pre-test and posttest)

Degrees of freedom
– df = n - 1 (where n = number of difference scores)

Formula:
Xpost - Xpre
T=
Diff by chance
Difference scores
Based on variability
and size of the sample
of “difference score”
Repeated measures t-test

Reporting your results
–
–
–
–
–

The observed difference
Kind of t-test
Computed T-statistic
Degrees of freedom for the test
The “p-value” of the test
“The mean score of the post-test was 12 points higher than the
pre-test. A repeated measures t-test demonstrated that this
difference was significant significant, t(25) = 5.67, p < 0.05.”
Analysis of Variance
XA

XB
XC
Designs
– More than two groups
• 1 Factor ANOVA, Factorial ANOVA
• Both Within and Between Groups Factors


Test statistic is an F-ratio
Degrees of freedom
– Several to keep track of
– Vary depending on the design
Analysis of Variance
XA

XB
XC
More than two groups
– Now we can’t just compute a simple difference score
since there are more than1 difference
– So we use variance instead of simply the difference
• Variance is essentially an average difference
Observed variance
F-ratio =
Variance from chance
1 factor ANOVA
XA

XB
XC
1 Factor, with more than two levels
– Now we can’t just compute a simple difference score
since there are more than1 difference
• A - B, B - C, & A - C
1 factor ANOVA
XA
XB
XC
Null hypothesis:
The ANOVA
tests this one!!
H0: all the groups are equal
XA = XB = XC
Alternative hypotheses
HA: not all the groups are equal
XA ≠ XB ≠ XC
XA = XB ≠ XC
XA ≠ XB = XC
XA = XC ≠ XB
1 factor ANOVA
Planned contrasts and post-hoc tests:
- Further tests used to rule out the different
Alternative hypotheses
XA ≠ XB ≠ XC
Test 1: A ≠ B
Test 2: A ≠ C
Test 3: B = C
XA = XB ≠ XC
XA ≠ XB = XC
XA = XC ≠ XB
1 factor ANOVA

Reporting your results
–
–
–
–
–
–

The observed difference
Kind of test
Computed F-ratio
Degrees of freedom for the test
The “p-value” of the test
Any post-hoc or planned comparison results
“The mean score of Group A was 12, Group B was 25, and
Group C was 27. A 1-way ANOVA was conducted and the
results yielded a significant difference, F(2,25) = 5.67, p < 0.05.
Post hoc tests revealed that the differences between groups A
and B and A and C were statistically reliable (respectively t(1) =
5.67, p < 0.05 & t(1) = 6.02, p <0.05). Groups B and C did not
differ significantly from one another”
Factorial ANOVAs


We covered much of this in our experimental design lecture
More than one factor
– Factors may be within or between
– Overall design may be entirely within, entirely between, or mixed

Many F-ratios may be computed
– An F-ratio is computed to test the main effect of each factor
– An F-ratio is computed to test each of the potential interactions
between the factors
Factorial ANOVA

Reporting your results
– The observed differences
• Because there may be a lot of these, may present them in a table
instead of directly in the text
– Kind of design
• e.g. “2 x 2 completely between factorial design”
– Computed F-ratios
• May see separate paragraphs for each factor, and for interactions
– Degrees of freedom for the test
• Each F-ratio will have its own set of df’s
– The “p-value” of the test
• May want to just say “all tests were tested with an alpha level of
0.05)
– Any post-hoc or planned comparison results
• Typically only the theoretically interesting comparisons are
presented