lecture12_methods
Download
Report
Transcript lecture12_methods
An Introduction to Statistics
Two Branches of Statistical Methods
Descriptive statistics
Techniques for describing data in abbreviated,
symbolic fashion
Inferential statistics
Drawing inferences based on data. Using
statistics to draw conclusions about the
population from which the sample was taken.
Populations and Samples
A parameter is a characteristic of a population
e.g., the average height of all Americans.
A statistics is a characteristic of a sample
e.g., the average height of a sample of
Americans.
Inferential statistics infer population
parameters from sample statistics
e.g., we use the average height of the sample
to estimate the average height of the
population
Descriptive Statistics
Numerical Data
Properties
Shape
Central
Tendency
Variation
Skewness
Mean
Range
Kurtosis
Median
Interquartile
Range
Mode
Standard Deviation
Variance
Ordering the Data: Frequency Tables
Frequency table (distribution)
A listing in order of magnitude of each score
achieved and the number of times the score
occurred.
Grouped frequency table (distribution)
Range of scores in each of several equally sized
intervals
Why Frequency Tables?
Gives some order to a set of data
Can examine data for outliers
Is an introduction to distributions
Frequency Tables
HEIGHT
Valid
67.00
69.00
70.00
71.00
72.00
73.00
74.00
75.00
76.00
77.00
78.00
Total
Frequency
1
1
3
3
7
7
11
11
7
4
1
56
Percent
1.8
1.8
5.4
5.4
12.5
12.5
19.6
19.6
12.5
7.1
1.8
100.0
Valid Percent
1.8
1.8
5.4
5.4
12.5
12.5
19.6
19.6
12.5
7.1
1.8
100.0
Cumulative
Percent
1.8
3.6
8.9
14.3
26.8
39.3
58.9
78.6
91.1
98.2
100.0
Grouped Frequency Tables
RangeNumber
30-39
1
40-49
3
50-59
4
60-69
12
70-79
19
80-89
7
90-100
2
Total
48
Percent Cumulative
2.08
2.08
6.25
8.33
8.33
16.67
25.00
41.67
39.58
81.25
14.58
95.83
4.17
100.00
100
Making a Frequency Table
1)
2)
3)
4)
List each possible value, from highest to lowest
Go one by one through the scores, making a mark for each
score next to its value on the list
Make a table showing how many times each value on your list
was used
Calculate the percentage of scores for each value
Making a Stem-and-Leaf Plot
Each data point is broken down into a “stem”
and a “leaf.” Select one or more leading digits
for the stem values. The trailing digit(s)
becomes the leaves
First, “stems” are aligned in a column.
Record the leaf for every observation beside
the corresponding stem value
Stem and Leaf Plot
Stem-and-leaf of Shoes
12
63
(33)
43
25
12
8
4
4
2
2
1
1
1
1
1
N = 139 Leaf Unit = 1.0
0 223334444444
0 555555555555566666666677777778888888888888999999999
1 000000000000011112222233333333444
1 555555556667777888
2 0000000000023
2 5557
3 0023
3
4 00
4
50
5
6
6
7
75
Stem and Leaf / Histogram
Stem Leaf
2
3
4
5
1
2
3
2
3 4
2 3 6
8 8
5
By rotating the stem-leaf, we can see
the shape of the distribution of scores.
6
Leaf
Stem
4
3
8
3
2
8
5
1
2
3
2
2
3
4
5
Histograms
Histograms
Depicts information from a frequency table or a
grouped frequency table as a bar graph
7
6
5
4
3
2
Std. Dev = .09
1
Mean = .82
N = 17.00
0
.59 - .66
.66 - .72
.72 - .78
.78 - .84
EXAM 1
.84 - .91
.91 - .97
Frequency Polygons
Frequency Polygons
Depicts information from a frequency table or a
grouped frequency table as a line graph
Shapes of Frequency Distributions
Frequency tables, histograms & polygons describe how the
Unimodal
0 20 40 60 80 10
frequencies are distributed
Distributions are a fundamental concept in statistics
One
peak
-2
0
2
n o r m.
x
0 10 20 30 40 50 60
Bimodal
Two
peaks
-2
0
2
4
b imo d .
x
6
Symmetrical vs. Skewed
Frequency Distributions
Symmetrical distribution
Approximately equal numbers of observations
above and below the middle
Skewed distribution
One side is more spread out that the other,
like a tail
Direction of the skew
Positive or negative (right or left)
Side with the fewer scores
Side that looks like a tail
0 10 20 30
0 20 40 60 80 10
Symmetrical vs. Skewed
Symmetric
- 20
2
0.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
.
0
0 20 40 60 80
Skewed
Right
0
5
0 20 40 60 80 10 120
n o r m.
x
10
15
c h is .
x
u n i.
x
Skewed
Left
5 10
15
20
25
30
c h is 2 .
x
Positively Skewed
Positively skewed distribution
Cluster towards the low end of the variable
Skewed Frequency Distributions
Positively skewed
AKA Skewed right
Tail trails to the right
Proportion of Poplulation
0.25
0.20
0.15
0.10
0.05
0.00
1
3
5 7 9 11 13 15 17 19
Annual Income * $10,000
Negatively Skewed
Negatively skewed distribution
Cluster towards the high end of the variable
Skewed Frequency Distributions
Negatively skewed
Skewed left
Tail trails to the left
0.25
Proportion of Scores
0.20
0.15
0.10
0.05
0.00
1
3
5 7 9 11 13 15 17 19
Tests Scores (max = 20)
Kurtosis
How peaked or flat the curve is
Leptokurtic: high and thin
Mesokurtic: normal shape
Platykurtic: flat and spread out
Leptokurtic
Mesokurtic
Platykurtic
Comparing the Kurtosis of Three Curves
Curve A:
Mesokurtic
(Intermediate)
Comparing the Kurtosis of Three Curves
Curve A:
Mesokurtic
(Intermediate)
Curve B
Leptokurtic
(High & Peaked)
Comparing the Kurtosis of Three Curves
Curve A:
Mesokurtic
(Intermediate)
Curve B
Leptokurtic
(High & Peaked)
Curve C
Platykurtic
(Broad & Flat)
The Normal Curve
Seen often in the social sciences and in
nature generally
Characteristics
Bell-shaped
Unimodal
Symmetrical
Average tails
Central Tendency
Give information concerning the average or
typical score of a number of scores
mean
median
mode
Central Tendency: The Mean
The Mean is a measure of central tendency
What most people mean by “average”
Sum of a set of numbers divided by the
number of numbers in the set
1 2 3 4 5 6 7 8 910 55
5.5
10
10
Central Tendency: The Mean
X
M
so
if
N
X [1,2, 3,4, 5,6,7, 8, 9,10]
N
= the number of numbers in X (10
for this example)
then
X / N 5.5
Central Tendency: The Mean
Important conceptual point:
The mean is the balance point of the data in the sense
that if we took each individual score (X) and subtracted
the mean from them, some are positive and some are
negative. If we add all of those up we will get zero.
X M [4.5,3.5,2.5,1.5,.5,.5,1.5,2.5,3.5,4.5]
(X M) 0
Also, the sum of the absolute values of the negative
numbers is equal to the sum of the absolute values of
the positive numbers
Central Tendency:The Median
Middlemost or most central item in the set of
ordered numbers; it separates the distribution into
two equal halves
If odd n, middle value of sequence
if X = [1,2,4,6,9,10,12,14,17]
then 9 is the median
If even n, average of 2 middle values
if X = [1,2,4,6,9,10,11,12,14,17]
then 9.5 is the median; i.e., (9+10)/2
Median is not affected by extreme values
Central Tendency: The Mode
The mode is the most frequently occurring
number in a distribution
if X = [1,2,4,7,7,7,8,10,12,14,17]
then 7 is the mode
Mode is not affected by extreme values
There may be no mode or several modes
Mean, Median, Mode
Mean
Median
Mean
Median
Mode
Negatively
Skewed
Symmetric
(Not Skewed)
Mode
Mean
Mode
Median
Positively
Skewed
When to Use What
Mean is a great measure. But, there are time
when its usage is inappropriate or impossible.
Nominal data: Mode
The distribution is bimodal: Mode
You have ordinal data: Median or mode
Are a few extreme scores: Median
Measures of Central Tendency
Overview
Central Tendency
Mean
M
Median
Mode
X
N
Midpoint of
ranked
values
Most
frequently
observed
value
Variability
Variability
How tightly clustered or how widely dispersed
the values are in a data set.
Example
Data set 1: [0,25,50,75,100]
Data set 2: [48,49,50,51,52]
Both have a mean of 50, but data set 1 clearly
has greater Variability than data set 2.
Variability: The Range
The Range is one measure of variability
The range is the difference between the maximum
and minimum values in a set
Example
Data set 1: [0,25,50,75,100]; R: 100-0 = 100
Data set 2: [48,49,50,51,52]; R: 52-48 = 4
The range ignores how data are distributed and
only takes the extreme scores into account
Range X Largest X Smallest
Quartiles
Split Ordered Data into 4 Quarters
25%
25%
Q1
25%
Q2
25%
Q3
Q1 = first quartile
Q2 = second quartile= Median
Q3= third quartile
Variability: Interquartile Range
Difference between third & first quartiles
Interquartile Range = Q3 - Q1
Spread in middle 50%
Not affected by extreme values
Variability: Standard Deviation
“The Standard Deviation tells us approximately
how far the scores vary from the mean on
average”
SD
(X M)
2
N
The typical deviation in a given distribution
Variability: Standard Deviation
Standard Deviation can be calculated with the
sum of squares (SS) divided by N
SD
(X M )
N
2
Variability: Standard Deviation
let X = [3, 4, 5 ,6, 7]
SD
M=5
(X - M) = [-2, -1, 0, 1, 2]
subtract M from each number in X
(X - M)2 = [4, 1, 0, 1, 4]
squared deviations from the mean
S (X - M)2 = 10
sum of squared deviations from the mean
(SS)
S (X - M)2 /N = 10/5 = 2
average squared deviation from the mean
S (X - M)2 /N =
2 = 1.41
square root of averaged squared deviation
2
(X
M)
N
Variability: Standard Deviation
let X = [1, 3, 5, 7, 9]
M=5
SD
(X - M) = [-4, -2, 0, 2, 4 ]
subtract M from each number in X
(X - M)2 = [16, 4, 0, 4, 16]
squared deviations from the mean
S (X - M)2 = 40
2
(X
M)
sum of squared deviations from the mean (SS)
S (X - M)2 /N = 40/5 = 8
average squared deviation from the mean
S (X - M)2 /N =
8 = 2.83
square root of averaged squared deviation
N
Variability: Standard Deviation
The square of the standard deviation is called
the variance
SD
2
(X
M)
Standard Deviation
N
(X M)
2
SD
2
N
Variance
Standard Deviation & Standard Scores
Z scores are expressed in the following way
X M
z
SD
Z scores express how far a particular score is
from the mean in units of standard deviation
if (X - M) = SD then (X - M)/SD = 1, and X is
said to be one standard deviation above the
mean
Standard Deviation & Standard Scores
Z scores provide a common scale to express
deviations from a group mean
X M
z
SD
X (Z )(SD) M
Standard Deviation and Standard Scores
Let’s say someone has an IQ of 145 and is 52
inches tall
IQ in a population has a mean of 100 and a
standard deviation of 15
Height in a population has a mean of 64” with
a standard deviation of 4
How many standard deviations is this person
away from the average IQ?
How many standard deviations is this person
away from the average height?