Handout - Personal Web Pages

Download Report

Transcript Handout - Personal Web Pages

Introduction to Descriptive Statistics
“Data have a story to tell. Statistical analysis is detective work in
which we apply our intelligence and our tools to discover parts of
that story.”
-Hamilton (1990)
Objectives:
1. Explain the general role of statistics in assessment & evaluation
2. Explain three methods for describing a data set: shape, center,
and spread
3. Explain the relationship between the standard deviation and
the normal curve
Levels of Measurement

Nominal
 Ordinal
 Interval
 Ratio
 Determining what statistics are
appropriate
Nominal

Naming things.

Creating groups that are
qualitatively different or unique…

But not necessarily quantitatively
different.
Nominal

Placing individuals or objects
into categories.
 Making mutually excusive
categories.

Numbers assigned to
categories are arbitrary.
Nominal

Sample variables:
–
–
–
–
–
Gender
Race
Ethnicity
Geographic location
Hair or eye color
Ordinal

Rank ordering things.

Creating groups or categories
when only rank order is known.

Numbers imply order but not
exact quantity of anything.
Ordinal

The difference between
individuals with adjacent
ranks, on relevant quantitative
variables, is not necessarily the
same across the distribution.
Ordinal

Sample variables:
– Class Rank
– Place of finish in a race (1st, 2nd, etc.)
– Judges ratings
– Responses to Likert scale items (for
example – SD, D, N, A, SA)
Interval

Orders observations according to
the quantity of some attribute.
 Arbitrary origin.
 Equal intervals.
 Equal differences expressed as
equal distances.
Interval

Sample variables:
– Test Scores
• SAT
• GRE
• IQ tests
– Temperature
• Celsius
• Fahrenheit
Ratio

Quantitative measurement.
 Equal intervals.
 True zero point.
 Ratios between values are useful.
Ratio

Sample variables:
– Financial variables
– Finish times in a race
– Number of units sold
– Test scores scaled as percent correct
or number correct
Levels of Measurement Review

What level of measurement?
– Today is a fall day.
– Today is the third hottest day of the
month.
– The high today was 70o Fahrenheit.
– The high today was 20o Celsius.
– The high today was 294o Kelvin.
Levels of Measurement Review

What level of measurement?
– Student #1256 is:
– a male
– from Lawrenceville, GA.
– He came in third place in the race today.
– He scored 550 on the SAT verbal section.
– He has turned in 8 out of the 10
homework assignments.
Levels of Measurement Review

What level of measurement?
– Student #3654 is:
– in the third reading group.
– Nominal?
– Ordinal?
– Interval?
– Ratio?
Descriptive Statistics
Used to describe the basic features of a batch of
data. Uses graphical displays and descriptive
quantitative indicators.
The purpose of descriptive statistics is to organize
and summarize data so that the data is more
readily comprehended. That is, descriptive
statistics describes distributions with numbers.
Five Descriptive Questions
What is the middle of the set of scores?
 How spread out are the scores?
 Where do specific scores fall in the
distribution of scores?
 What is the shape of the distribution?
 How do different variables relate to each
other?

Five Descriptive Questions





Middle
Spread
Rank or Relative Position
Shape
Correlation
Middle

Mean
 Median
 Mode
Examples of these measures

Mean of: 2, 3, 6, 7, 3, 5, 10
(2 + 3 + 6 + 7 + 3 + 5 + 10)/ 7 = 36/ 7 = 5.14

Mode of: 2, 3, 6, 7, 3, 5, 10 is 3

Median of: 2, 3, 6, 7, 3, 5, 10
First data is ordered: 2, 3, 3, 5, 6, 7, 10.
Middle value is 5 therefore that is the
median.
Some Important Points

Mode is the only descriptive measure used for
nominal data
 Median is unaffected by extreme values, it is
resistant to extreme observations.
 Mean or Average is affected by extremely small
or large values. We say that it is sensitive or
nonresistant to the influence of extreme
observations. The mean is the balance point of
the distribution.
 In symmetric distributions the mean and
median are close together.
More important points

In skewed data the mean is pulled to the tail
of the distribution.
 Median is not necessarily preferred over the
mean even if it is resistant. However if data
is known to be strongly skewed then the
median is preferable.
 Finally, the average is usually the
measurement of central tendency of choice
because it is stable during sampling.
Spread




Standard Deviation
Variance
Range
IQR
Describing Data: Center & Spread
How do measures of variability differ when
distributions are spread out?
Large S
X = 50 (S = 20)
X = Mean
S = Standard Deviation
Average or
Normal S
Small S
X = 50 (S = 10)
X = 50 (S = 5)
Rank or Relative Position

Five number summary
 Min, 25th, 50th, 75th, Max
 Identifying specific values that have
interpretive meaning
 Identifying where they fall in the set of
scores
 Box plots
 Outliers
Shape




Positive Skewness
Negative Skewness
Normality
Histograms
Shape - Normality
100
60
80
50
40
122
233
60
40
40
20
Std. Dev = 4.84
30
Mean = 38.0
184
71
125
9
N = 344.00
0
25.0
30.0
27.5
35.0
32.5
40.0
37.5
45.0
42.5
50.0
47.5
20
N=
Scanning
344
Scanning
Shape- Positive Skewness
50
4.5
4.0
40
29
104
107
256
336
27
110
3.5
30
3.0
2.5
20
2.0
10
Std. Dev = .56
1.5
Mean = 2.10
N = 344.00
0
1.0
13
4.
88
3.
63
3.
38
3.
13
3.
88
2.
63
2.
38
2.
13
2.
88
1.
63
1.
38
1.
13
1.
.5
N=
Total for IIP
344
Total for IIP
Shape – Negative Skewness
40
4.5
4.0
30
3.5
3.0
20
79
130
2.5
10
2.0
91
119
1.5
111
64
118
Std. Dev = .42
Mean = 3.32
N = 154.00
0
00
4.
75
3.
50
3.
25
3.
00
3.
75
2.
50
2.
25
2.
00
2.
75
1.
50
1.
1.0
N=
PREACT
154
PREACT
Describing Data: Center & Spread
Relating the Standard Deviation (S) to the normal distribution.
“68-95-99.7% Rule”
When a distribution of data
resembles a normal distribution
(or normal curve):
68% of the data lies within + or
– 1 standard deviation
95% of the data lie within + or
– 2 standard deviations
99.7% of the data lie within +
or – 3 standard deviations from
the mean
68%
95%
99.7%
Outliers
50
40
120
82
71
30
61
220
336
18
300
329
11
85
276
125
196
321
107
20
10
0
-10
N=
344
BDI Total
Outliers
BDI Total
140
120
100
80
Frequency
60
40
Std. Dev = 7.10
20
Mean = 7.1
N = 344.00
0
0.0
10.0
5.0
BDI Total
20.0
15.0
30.0
25.0
40.0
35.0
Outliers
Statistics
BDI Total
N
Mean
Median
Mode
Std. Deviation
Variance
Minimum
Maximum
Percentiles
Valid
Missing
25
50
75
344
0
7.12
5.00
0
7.101
50.426
0
40
2.00
5.00
10.00