Data and Descriptives

Download Report

Transcript Data and Descriptives

Index Cards
1.
2.
3.
4.
5.
Name
Major
Favorite Class Ever, & Why
Areas of Interest in Psychology
Unique/Bizarre/Little Know fact about
you.
6. Most exciting event over vacation
7. Favorite TV show ever
8. Stupidest thing you’ve ever done
1
Small Group Questions
Name, where you’re from
Best class ever & why
Stupidest thing you’ve ever done
Bizarre facts/tricks you can do
2
Pop Quiz #1
1. Your instructor is from…
a.
b.
c.
d.
e.
Nevada
New York
Nebraska
Minnesota
East-central Tibet
3
Pop Quiz #1
2. Your instructor has taught statistics
a.
b.
c.
d.
e.
f.
Never
About 10 times
About 20 times
About 30 times
About 40 times
Way, way too many times
4
Pop Quiz #1
3. Your instructor was once bitten by a …
a.
b.
c.
d.
e.
f.
Rattle Snake
Polar Bear
South American Malting Meek Mouse
Snapping Turtle
An oversized freshman
His wife, after refusing to mow the lawn
5
Pop Quiz #1
4. Your instructor can’t get enough…
a.
b.
c.
d.
e.
f.
g.
Chocolate
Schlitz Malt Liquor
Diet Pepsi
Diet Coke
Diet Schlitz Malt Liquor
Prune Juice
Red Bull
6
Pop Quiz #1
5. Your instructor’s 2nd favorite TV show is …
a.
b.
c.
d.
e.
f.
g.
Married with Children
The Simpsons
Survivor #5: Downtown Rockhill
The Daily Show
Space Ghost
Seinfeld
NOVA – Deadly Snapping Turtles
7
Stats Basics: 1st Week Overview
Course Tips
Types of Data
Graphing Distributions
The Normal Curve
Graphing Sample Means
Practicing with SPSS
8
Secret Course Tips
Bulldog Tactics
Syllabus
Office hours
Engagement &
Attendance
Quizzes
Request for leniency
Notebook
Course Packs
Organization
Homework, Labs, &
Reading
Class time
Set-up first
Please avoid surfing
Note-taking
• Write & Process
Ask questions!
Slow me down!
Homework
** Studying **
• Often
• Active
• Self-Explanation
Practicing SPSS
Laugh at my
jokes!!
9
Make Friends Quickly!!
Option A: Solo
Every Penguin For Herself!
Keep the competition down.
Option B: Teamwork!!!
Ask questions of peers
Answer questions
Form study groups
Practice explaining
10
Terminology: Samples vs. Populations
Samples & Populations
Statistics: refer to characteristics of samples
• e.g., xbar or M
• always regular alphabet symbols
Parameters: refer to characteristics of population
• e.g. μ
• always greek symbols
Self-check:
height of several students in class to represent class
height of class to represent height of typical
undergraduates
11
Qualitative vs. Quantitative Data
Quantitative: can be ranked
• shoe size, height, self-esteem score on scale, airplane lift
Qualitative: can’t be ranked
• gender, political affiliation, major, car maker
Check
Gender
region
weight
depression
steps
Social Security Number
Letter Grade: A, B, C, D
12
Scales of measurement
Nominal – classify data into categories (religion)
Ordinal – classify and rank (Olympic Medals)
Interval – classify and rank with equal intervals (Celsius)
Ratio – classify, rank with equal intervals, true zero (Kelvin)
 your residence hall
 batting average
 your rank on mom’s love list
 height
 IQ
 weight
 Self-esteem (7 point Likert
Scale)
 SAT score
 Grade: A, B, C, D
 distance
 gender
 gpa
 number of close friends
 social security number
 region of country
 level of depression
13
Experimental terms
Empirical Method: Experimental Method
Question: Why do airplanes fly?
Theory: Wings create lift
Operational Definitions
IV: Wing position: (straight, bent up) ‘levels’
DV: Lift
Gathering data
Careful observation; quantification
Level of measurement – use highest possible
Controlling Extraneous Variables
Drawing Conclusions
14
Experimental terms (2)
Experimental Terminology
Independent Variable: (e.g., Wing Position)
• Variable you manipulate;
• variable you think will impact DV
Dependent Variable: (e.g., Change in Vertical Position)
• Variable that might be affected by IV;
• variable you measure
Extraneous Variable: (e.g., drafts, throwing style)
• Any fact that affects the DV other than the IV
• Sources of “error” – we want to STANDARDIZE conditions to
minimize the amount of error
Quasi Experimental Design
No manipulation of IV
15
Experimental terms (3)
Practice
Can fat people eat more bacon than skinny people?
Does B.O. significantly decrease attractiveness?
Do kids who get “hooked on phonics” have more
problems with addiction later in life
Do people who study more do better on tests?
16
Frequency Distributions
Definitions
Grade Freq
The values taken on by a given variable
All the actual data points you obtained for a
given variable
Most basic ways to look at study outcomes
Quantitative Examples:
•
•
•
•
The SAT scores for all Winthrop students
The reaction times for all study participants
Grades on the first test: #’s of As, Bs, Cs, & Ds
The starting salaries of graduates
Qualitative Examples:
• Favorite TV shows of students in this class
• Residence halls occupied by students in this
class
A
4
B
7
C
3
D
2
Test 1
7
6
5
4
3
2
1
0
A
B
C
17
D
Representing Frequency Distributions
Table:
List possible values,
and indicate the
number of times each
value occurred.
Graphs
X-axis: possible
values
Y-axis: # of times that
value occured
R. Time
Freq.
0-10
4
11-20
8
21-30
12
31-40
6
41-50
4
Reac Time
12
10
51-60
3
8
61-70
1
6
4
2
0
0-10 11- 21- 31- 41- 51- 6120 30 40 50 60 70
18
Graphing Distributions
Quantitative Data
Line graphs or Histograms (columns touching)
Qualitative Data
Pie charts & Bar graphs (columns not touching)
See SPSS Guide for examples
Also, you can practice with these datasets on the
website…
city sprawl
bogus winthrop data
employee data
19
A Graph of the Normal Curve
Hypothetical Frequency Distribution (Line Graph)
Shows distribution of infinitely large sample (theoretical)
Symmetrical
Shows common and uncommon (extreme) scores
Basis for testing hypotheses
Percentiles
SAT Scores
μ = 500
20
Normal Curve (with raw and standard scores)
μ
Few Extreme
Scores
Few Extreme
Scores
SAT
200
300
400
500
600
700
800
Female Height
4’4”
4’8”
5’0”
5’4”
5’8”
6’0”
6’4”
Anxiety
20
30
40
50
60
70
80
Stand.Normal Curve
-3
-2
-1
0
+1
+2
+3
21
Deviations from Normality
Ways in which distribution can be non-normal
Skew
Positive Skew
Negative Skew
Kurtosis
Platykurtic
Mesokurtic
Leptokurtic
Modality
Unimodal
Bimodal (etc.)
22
Graphing Sample Means
One IV: Typically use bar-graph
Two IV: Typically use line-graph
$400
$350
$300
$250
Damage
$200
$150
$100
$50
$0
Rock
Anvil
Tomato
23
Math Review
Preparation for Calculating Standard Deviation
Learn the differences between…
Σx
Σx2
(Σx)2
24
Problem #1
x
x2
“Sum of x squared”
??
2
4
“Sum of x-quantity
squared”
??
3
9
2
4
Σx = ?? Σx2 =??
(Σx)2 =??
25
Problem #1 Answer
x
x2
“Sum of x squared”
Σx2
2
4
“Sum of x-quantity
squared”
(Σx)2
3
9
2
4
Σx = 7 Σx2 =17
(Σx)2 =49
26
Problem #2
x
x2
1
??
2
??
2
??

x

x  n
2
2
sˆx 
n 1
Σx = ?? Σx2 = ??
(Σx)2 =??
27
Problem #2: Answer-a
x
x2
1
1
2
4
2
4
Σx = 5
Σx2 = 9
(Σx)2 =25

x

x  n
2
2
sˆx 
n 1
25
9
3
sˆx 
3 1
28
Problem #2: Answer-b
x
x2
1
1
2
4
2
4
Σx = 5
Σx2 = 9
(Σx)2 =25

x

x  n
2
2
sˆx 
n 1
25
9
.666 6
3
sˆx 

3 1
2
sˆx  .333 3  .5774
29
Problem #3
x
2
3

x

x  n
2
2
sˆx 
n 1
5
30
Problem #3: Answer
x
x2
2
4
3
9
5
25
Σx = 10
Σx2 = 38

x

x  n
2
2
sˆx 
n 1
100
38 
4.666 6
3
sˆx 

3 1
3 1
(Σx)2 =100
sˆx  2.3333  1.5275
31
Descriptive Statistics
Measures of Central Tendency
Where does the center of the distribution fall?
Where are most of the scores
Measures of Variability
How spread out is the distribution?
How dispersed are the scores?
Importance:
To determine whether IV affects DV, we consider:
• The difference between the means
• The amount of variability
32
Imaginary Study with 2 Outcomes
Purpose: See why variability is important
Research Question:
Imagine a business where customers are routinely
offended:
• comments about their mothers
• misc. name calling
Does social skills training for clerks improve customer
satisfaction scores.
IV: Social Skills training (training, no training)
DV: Customer Satisfaction
Imagine two worlds where we get two different
outcomes
33
Training Study Outcomes
Version 1
Version 2
Control
Experimental
Control
Experimental
2
5
4
3
2
2
5
4
4
2
4
4
1
6
4
4
1
2
1
6
7
4
3
3
M= 3
M= 4
M= 3
M= 4
SD = 1.26 SD = 1.10
SD = 2.10 SD = 2.19
34
Measures of Central Tendency
Data: # of close
friends
Note: Use “frequencies” in SPSS
Mean
1
2
3
3
3
4
4
5
5
6
12
arithmetic mean: all scores divided by “n”
Sample: xbar or M
Population: μ (“mu”)
most arithmetically sophisticated
best predictor if no other info available
used in deviation score calculation
M = 4.36
Median
Score at 50th percentile – middle score
less influenced by skew
Md = 4
Mode
most frequent score
used with qualitative data
Mo = 3
35
Choosing Measures of Central Tendency
Data: # of close friends
A
2
3
4
2
3
4
5
3
2
3
1
B
2
3
4
2
1
4
6
2
27
3
5
C
3
4
3
1
2
12
15
12
14
16
10
What’s best for A?
What’s best for B?
What’s best for C?
36
SPSS – Setting up Frequencies Analysis
37
SPSS Frequencies Output (partial)
Note: Need to select mean, median, & mode
Statistics
A
N
Valid
Missing
Mean
Median
Mode
11
0
2.91
3.00
3
B
11
0
5.36
3.00
2
C
11
0
8.36
10.00
3a
a. Multiple m odes exist. The s mallest value is shown
38
Measures of Variability
What is Variability?
dispersion; spread; distance between scores
“Some people did really well, some did really poorly”
“My tips are always about the same, between $30 and
$35”
“Some students study only a few minutes a day, some
put in 30 hours per week.”
Range
simplest measure
High Score – Low Score
Problems:
 only uses two scores – not good for summarize entire
distribution
 unduly affected by extreme scores
39
The Big Daddy: Standard Deviation
Standard Deviation
The typical deviation of a score from the mean of the
distribution
Most scores (68%) fall between +1 and –1 SD.
Four Steps to Standard Deviation
1. Deviation Score
2. Sum of Squares
3. Variance
4. Standard Deviation
x

x


2
2
n 1
n
40
1. Deviation Scores
idea:
consider deviation of every score and add up
distance from mean of a given score: x – xbar
 positive/negative deviation scores fall to the ____ of the mean
problem
why can’t we just add up the deviation scores
consider distribution of : 1, 2, 3
x
1
2
xbar
-2
-1
-2
0
3
-2
+1
0
41
2. Sum of Squares (SS)
Means “Sum of the Squared Deviation Scores”
Square each score, then add up
Conceptual Formula (how we think about it)

SS   x  x

2
Computational Formula (how we calculate by hand)
SS   x
•Problem

x


2
2
Sum of x Quantity Squared
n
Sum of x Squared
–Biased by sample size – bigger samples have bigger SS
42
3. Variance
Sum of Squares – no control for size of sample
Think of relation between sum and average – divide
sum by n
… with sum of sq. and variance – divide sum of sq. by
n
Variance:

x


Average of the Squared Deviation Scores

2
x
SS


n
x
2
2
n
n
43
4. Standard Deviation
Want measure in metric of raw scores
Remember?? We used Sum of the SQUARED
Deviations
So…we take the square root of the variance
Note, subscript “x” is optional

x

x  n
2
2
x 
SS

n
n
note that σ is no longer squared
44
SD: Bridge Building Example
How high should the
bridge be?
Truck Height:
7,6,8,5,6,5,6,7
average: 6.25
Can we build it 6.25?
•Calculation Tip:
–Think anal retentive!!
45
SD: Bridge Building II

x

x  n
2
2
x 
n
2500
320 
8
x 
8
7.5000
x 
 .9375
8
 x  .9682
So we’d expect the truck
height to range between
about 6.25  .9682
Roughly 5.25 to 7.25.
But…
What if we missed
some extremely tall
trucks???
Should actually
calculate ŝ –
Standard Deviation
as a population
estimate
46
SD: Typical Formula
Standard Deviation as a Population Parameter
SD as a Population Parameter Estimate
corrects for bias of smaller samples – missing of
extreme scores

x

x  n
2
2
sˆx 
n 1
47
SD: Different Forms
48
SD: Bridge Building Revisited

x

x  n
2
2
sˆx 
n 1
2500
320 
8
sˆx 
8 1
So…
σ = 0.9682
ŝ = 1.0351
SD calculated as
estimate will always be
larger.
7.5000
sˆx 
 1.0714
8 1
sˆx  1.0351
49
What type of Standard Deviation?
 A manager wants to know the variability in shift productivity for
planning future projects.
 A teacher calculates the variability of reading scores for just her
class of 25 students, and only applies it to her sample.
 The Educational Testing Service calculates the variability among
SAT scores for all the students that took the SAT.
 A researcher determines the variability in reaction time in a
perception study.
 Your statistics professor calculates test score variability with 25
students to know how much variability to expect on that sort of
test.
 A researcher on anxiety collects data from 1000 participants in
order to develop norms for a new anxiety instrument.
50
Practice

x

x  n
2
Problem: Calculate σ for 4,2,3
2
x 
n
81
29 
3
x 
3
2
x 
 .6667
3
 x  .8165
51
$$ Practice Calculations I
52
$$ Practice Calculations II
53
Confidence Intervals
Combines mean with standard deviation
68% CI = M ± 1SD
We can be 68% certain that a given score will fall between one SD
below the mean to one SD above.
Example:
Bob took the history test after the rest of the class. The class
scored 70 on average (μ =70) with a standard deviation of 10 (σ).
What score do you expect Bob to get?
68% CI = M ± 1SD
68% CI = 70 ± 10
68% CI = 60, 80
That is, we’d expect Bob to get between 60 and 80.
We’ll be right about 68% of the time.
54
Error Bars
Graph in SPSS
Shows mean ± 1 SD.
55