Longitudinal Studies - Ball State University

Download Report

Transcript Longitudinal Studies - Ball State University

Psychological Science 342
Advanced Statistics
Review of Basic Concepts
Basic Terminology
• Descriptive statistics
 Central tendency, variability
 Displaying data
• Inferential statistics
 Populations and Samples
 Hypothesis testing
 t tests, ANOVA, Regression
Measurement Basics
Variables
• Define variable
 Property of an object or event that can take on
different values
• Discrete variable
 Variable that can take on only a small set of possible
values
• Continuous variable
 Variable that can take on any value
Cont.
Variables--cont.
• Independent variables
 Those variables controlled by the
experimenter
• Dependent variables
 Those variables being measured
 The data or score
Random Assignment
• Define
 Each P has an equal chance of being in any
condition
 Equates groups
 Defines experimental (vs. correlational)
procedure
 Independent/Predictor variable
Random Sampling
• Define
 Each member of a population has an equal
chance of being included
 Generalizability
 Do psychologists use random sampling?
Scales of Measurement
• Definition
• Nominal scales
• Ordinal scales
• Interval scales
• Ratio scales
Sample Problems
• For each of the following identify the IV
(s), DV(s), whether the variable is
categorical or continuous, and level of
measurement
Sample Problems
• 1. People will read a paragraph more
quickly if it has a title than if it doesn’t
have a title.
• 2. People from collectivist cultures have
lower self-esteem than people from
individualist cultures, and the difference
is larger for males than for females.
Sample Problems
• 3. The right hemisphere is more
specialized (i.e., faster) than the left
hemisphere for negative emotion words
and the left hemisphere is more
specialized than the right hemisphere for
positive emotion words.
Sample Problems
• 4. When taking an exam, increasing
levels of noise is associated with better
performance for extraverts than for
introverts.
• 5. People will retain more information if
a text is written in an ugly font than if it
is written in a non-ugly font.
Sample Problems
• 6. People appear to be more outgoing
on facebook than in real life.
• 7. Reported well-being increases as a
function of temperature (up to 80F) and
whether or not it is sunny.
Deciding on a Procedure
• Decision tree
• What types of variables?
• How many groups or variables?
Choosing a Procedure
Type of Dependent Variable
Categorical
Continuous
One Category
Two Categories
Goodness of Fit Chi-square
Contingency Table Chi-square
See next slide
Cont.
Choosing a Procedure
Continuous DV
Continuous IV
One Predictor
Degree of Relationship
Form of Relationship
Correlation
Regression
Categorical IV
Two Predictors
See next slide
Multiple Regression
Cont.
Choosing a Procedure
Categorical IV
Two Groups
Multiple Groups
Independent
Dependent
Independent Groups
Dependent Measures
Two-sample t
Related Sample t
Anova
Repeated measures Anova
Notation
• Variable names
 X and Y
• Individual values
 Xi
• X versus Xi
• Summation notation
 X
 X2
 (X)2
 XY
 X Y
 Constants
Hypothetical data on family size by decade of 20th century
Decade
(X)
3
Family
Size(Y)
5.2
X2
9
Y2
27.04
X–Y
-2.2
XY
15.6
4
4.8
16
23.04
-0.8
19.2
5
3.5
25
12.25
1.5
17.5
6
2.5
36
6.25
3.5
15.0
7
2.3
49
5.29
4.7
16.1
25
18.3
138
73.87
6.7
83.4
Displaying Data
The Sternberg Example
• One to five digits displayed
• Followed by a single digit
• Was single digit in first set?
• Predictions of sequential processing
• Predictions of parallel processing
The Following is a Simple
Demonstration.
Click to Begin
Click Mouse
4
7
3
6
9
4
Was the single digit in the comparison set?
Plotting Data
• Histograms
 Values of dependent variable on X axis
• discuss grouping or “bins”
 Frequency on Y axis
 Histogram of Sternberg’s data
Histogram of Reaction Time
Stem-and-Leaf Display
Stem-and-leaf of RxTime
Leaf Unit = 1.0
7
27
62
103
150
150
96
57
35
22
13
9
6
3
1
1
1
1
1
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
N
= 300
6788999
00001112223333344444
55555566666666666777777777888899999
00000111111111111222222222233333333444444
55555556666666666777777788888888888899999999999
000000000000111111111112222222222222233333333334444444
555555556666666677777777777777889999999
0111122222222333444444
5566667788899
000112333
5678
044
558
44
5
Scatterplots
• Plot two variables against each other.
• Points represent coordinates on each
axis.
• Dependent variable on Y axis.
• See next slide for example
Scatterplot of Solar Radiation
and Cancer
Describing Distributions
• Symmetry
• Modality
 Bimodal
 Unimodal
• Skewness
 Positively skewed
 Negatively skewed
Figure 3.9
16
20
14
12
10
8
10
6
50
5.
00
5.
50
4.
00
4.
50
3.
00
3.
50
2.
00
2.
50
1.
00
1.
0
.5
00
0.
0
-.5 0
.0
-1
0
.5
-1
0
.0
-2
83
2.
43
2.
03
2.
63
1.
23
1.
89
5.
57
5.
25
5.
94
4.
62
4.
30
4.
99
3.
67
3.
35
3.
04
3.
72
2.
40
2.
09
2.
77
1.
45
1.
39
4.
07
4.
75
3.
44
3.
12
3.
80
2.
49
2.
17
2.
85
1.
54
1.
22
1.
0
.9
9
.5
7
.2
5
-.0
Score
Score
3
.8
3
.4
3
.0
7
-.3
7
-.7
7
.1
-1
7
.5
-1
8
.9
-1
8
.3
-2
8
.7
-2
30
40
N = 200.00
0
N = 200.00
0
Score
Score
N = 200.00
0
Mean = 1.54
2
Mean = -.01
0
N = 200.00
Std. Dev = 1.79
Std. Dev = 1.02
4
30
20
20
10
10
Std. Dev = .91
Std. Dev = .73
Mean = 4.85
Mean = .96
Measures of
Central Tendency
Mode
• The most common value
• There may be several
• Bimodal distribution has two distinct
modes.
Median
• Center value in an ordered series
 Average of two center values for an even
number of points
• Median location
 location of central value
 defined as (N + 1)/2
Mean
• What we normally call the “average”
• Denoted as “xbar”
• Calculated as
X
X
ΣX
N
• This will be our most common statistic
Advantages & Disadvantages
• Mean
 Most common statistic
 Easily manipulated algebraically
 Good statistical properties
 Easily influenced by extreme scores
• Median
 Slightly less desirable statistical properties
than mean
 May not be good to ignore extreme values
Measures of
Variability
The General Problem
• Central tendency only deals with the
center
• Dispersion
 Variability of the data around something
 The spread of the points
• Example: Mice and Music
Mice and Music
• Study by David Merrell
• Raised some mice in quiet environment
• Raised some mice listening to Mozart
• Raised other mice listening to Anthrax
• Dependent variable is the time to run a
straight alley maze after 4 weeks.
Results
• Anthrax mice took much longer to run
• Much greater variability in Anthrax group
 See following graphs for Anthrax and Mozart
 Both X axes are 500 units wide
• We often see greater variability with
larger mean
Mozart Group
12
10
8
6
4
2
Std. D ev = 36.10
Mean = 114.6
N = 24.00
0
27.8
83.3 138.9 194.4 250.0 305.6 361.1 416.7 472.2
WEEK4
Anthrax Group
10
8
6
4
2
Std. D ev = 103.14
Mean = 1825.9
N = 24.00
0
1600.0
1700.0
1650.0
WEEK4
1800.0
1750.0
1900.0
1850.0
2000.0
1950.0
2050.0
Range and Related Statistics
• The range
 Distance from lowest to highest score
 Too heavily influenced by extremes
• The interquartile range (IQR)
 Delete lowest and highest 25% of scores
 IQR is range of what remains
 May be too little influenced by extremes
Trimmed Samples
• Delete a fixed (usually small) percentage
of extreme scores
• Trimmed statistics are statistics
computed on trimmed samples.
Deviation Scores
• Definition
 distance between a score and a measure of
central tendency
 usually deviation around the mean
(X  X )
• Importance
Variance
• Definitional formula
( X  X )
s 
N 1
2
• Example
 See next slide
2
Calculation

X
2 4 5 8 7 4
30
(X  X )
-3 -1 0 3 2 -1
0
(X  X )
9 1 0 9 4 1
24
2
( X  X )
24
s 

 4.80
N 1
5
2
2
Standard Deviation
• Definitional formula
 The square root of the variance
( X  X )
s s 
N 1
2
2
Computational Formula
(X )2 2 2 2 2 2 2 30 2
X 
2  4 5 8 7  4 
2
N
6
s 

N 1
5
2
 4.80
( X ) 2
X 
N  4.8  2.19
s
N 1
2
Estimators
• Mean
 Unbiased estimate of population mean ()
• Define unbiased
 Long range average of statistic is equal to the
parameter being estimated.
• Variance
( X  X ) 2
s 
N 1
2
 Unbiased estimate of 2
Cont.
Estimators--cont.
 Using
•
2

(
X

X
)
s2 
N
gives biased estimate
 Standard deviation
• use square root of unbiased estimate.
Merrell’s Music Study SPSS
Printout
WEEK4
Treatment
Mean
N
Std. Deviation
Quiet
307.2319
23
71.8267
Mozart
114.5833
24
36.1017
Anthrax
1825.8889
24
103.1392
755.4601
71
777.9646
Total
Boxplots
• The general problem
 A display that shows dispersion for center and tails of
distribution
• Calculational steps (simple solution)
 Find median
 Find top and bottom 25% points (quartiles)
 eliminate top and bottom 2.5% (fences)
 Draw boxes to quartiles and whiskers to fences, with
remaining points as outliers
• Boxplots for comparing groups
Combined Merrell Data
3000
2000
1000
0
-1000
N=
71
W EEK4
Merrell Data by Group
3000
2000
1000
WEEK4
0
-1000
N=
23
24
24
Quiet
Mozart
Anthrax
Treatment Condition