lecture12_methods

Download Report

Transcript lecture12_methods

An Introduction to Statistics
Two Branches of Statistical Methods
 Descriptive statistics

Techniques for describing data in abbreviated,
symbolic fashion
 Inferential statistics

Drawing inferences based on data. Using
statistics to draw conclusions about the
population from which the sample was taken.
Populations and Samples
 A parameter is a characteristic of a population

e.g., the average height of all Americans.
 A statistics is a characteristic of a sample

e.g., the average height of a sample of
Americans.
 Inferential statistics infer population
parameters from sample statistics

e.g., we use the average height of the sample
to estimate the average height of the
population
Descriptive Statistics
Numerical Data
Properties
Shape
Central
Tendency
Variation
Skewness
Mean
Range
Kurtosis
Median
Interquartile
Range
Mode
Standard Deviation
Variance
Ordering the Data: Frequency Tables
 Frequency table (distribution)

A listing in order of magnitude of each score
achieved and the number of times the score
occurred.
 Grouped frequency table (distribution)

Range of scores in each of several equally sized
intervals
 Why Frequency Tables?



Gives some order to a set of data
Can examine data for outliers
Is an introduction to distributions
Frequency Tables
HEIGHT
Valid
67.00
69.00
70.00
71.00
72.00
73.00
74.00
75.00
76.00
77.00
78.00
Total
Frequency
1
1
3
3
7
7
11
11
7
4
1
56
Percent
1.8
1.8
5.4
5.4
12.5
12.5
19.6
19.6
12.5
7.1
1.8
100.0
Valid Percent
1.8
1.8
5.4
5.4
12.5
12.5
19.6
19.6
12.5
7.1
1.8
100.0
Cumulative
Percent
1.8
3.6
8.9
14.3
26.8
39.3
58.9
78.6
91.1
98.2
100.0
Grouped Frequency Tables
RangeNumber
30-39
1
40-49
3
50-59
4
60-69
12
70-79
19
80-89
7
90-100
2
Total
48
Percent Cumulative
2.08
2.08
6.25
8.33
8.33
16.67
25.00
41.67
39.58
81.25
14.58
95.83
4.17
100.00
100
Making a Frequency Table
1)
2)
3)
4)
List each possible value, from highest to lowest
Go one by one through the scores, making a mark for each
score next to its value on the list
Make a table showing how many times each value on your list
was used
Calculate the percentage of scores for each value
Making a Stem-and-Leaf Plot
 Each data point is broken down into a “stem”
and a “leaf.” Select one or more leading digits
for the stem values. The trailing digit(s)
becomes the leaves
 First, “stems” are aligned in a column.
 Record the leaf for every observation beside
the corresponding stem value
Stem and Leaf Plot

















Stem-and-leaf of Shoes
12
63
(33)
43
25
12
8
4
4
2
2
1
1
1
1
1
N = 139 Leaf Unit = 1.0
0 223334444444
0 555555555555566666666677777778888888888888999999999
1 000000000000011112222233333333444
1 555555556667777888
2 0000000000023
2 5557
3 0023
3
4 00
4
50
5
6
6
7
75
Stem and Leaf / Histogram
Stem Leaf
2
3
4
5
1
2
3
2
3 4
2 3 6
8 8
5
By rotating the stem-leaf, we can see
the shape of the distribution of scores.
6
Leaf
Stem
4
3
8
3
2
8
5
1
2
3
2
2
3
4
5
Histograms
 Histograms

Depicts information from a frequency table or a
grouped frequency table as a bar graph
7
6
5
4
3
2
Std. Dev = .09
1
Mean = .82
N = 17.00
0
.59 - .66
.66 - .72
.72 - .78
.78 - .84
EXAM 1
.84 - .91
.91 - .97
Frequency Polygons
 Frequency Polygons

Depicts information from a frequency table or a
grouped frequency table as a line graph
Shapes of Frequency Distributions
 Frequency tables, histograms & polygons describe how the
Unimodal
0 20 40 60 80 10
frequencies are distributed
 Distributions are a fundamental concept in statistics
One
peak
-2
0
2
n o r m.
x
0 10 20 30 40 50 60
Bimodal
Two
peaks
-2
0
2
4
b imo d .
x
6
Symmetrical vs. Skewed
Frequency Distributions
 Symmetrical distribution

Approximately equal numbers of observations
above and below the middle
 Skewed distribution


One side is more spread out that the other,
like a tail
Direction of the skew



Positive or negative (right or left)
Side with the fewer scores
Side that looks like a tail
0 10 20 30
0 20 40 60 80 10
Symmetrical vs. Skewed
Symmetric
- 20
2
0.
0
0
.
0
2
.
0
4
.
0
6
.
1
8
.
0
0 20 40 60 80
Skewed
Right
0
5
0 20 40 60 80 10 120
n o r m.
x
10
15
c h is .
x
u n i.
x
Skewed
Left
5 10
15
20
25
30
c h is 2 .
x
Positively Skewed
Positively skewed distribution
Cluster towards the low end of the variable
Skewed Frequency Distributions
 Positively skewed

AKA Skewed right
Tail trails to the right
Proportion of Poplulation

0.25
0.20
0.15
0.10
0.05
0.00
1
3
5 7 9 11 13 15 17 19
Annual Income * $10,000
Negatively Skewed
Negatively skewed distribution
Cluster towards the high end of the variable
Skewed Frequency Distributions
 Negatively skewed

Skewed left
Tail trails to the left
0.25
Proportion of Scores

0.20
0.15
0.10
0.05
0.00
1
3
5 7 9 11 13 15 17 19
Tests Scores (max = 20)
Kurtosis
 How peaked or flat the curve is
Leptokurtic: high and thin
 Mesokurtic: normal shape
 Platykurtic: flat and spread out

Leptokurtic
Mesokurtic
Platykurtic
Comparing the Kurtosis of Three Curves
Curve A:
Mesokurtic
(Intermediate)
Comparing the Kurtosis of Three Curves
Curve A:
Mesokurtic
(Intermediate)
Curve B
Leptokurtic
(High & Peaked)
Comparing the Kurtosis of Three Curves
Curve A:
Mesokurtic
(Intermediate)
Curve B
Leptokurtic
(High & Peaked)
Curve C
Platykurtic
(Broad & Flat)
The Normal Curve
 Seen often in the social sciences and in
nature generally
 Characteristics




Bell-shaped
Unimodal
Symmetrical
Average tails
Central Tendency
 Give information concerning the average or
typical score of a number of scores



mean
median
mode
Central Tendency: The Mean
 The Mean is a measure of central tendency


What most people mean by “average”
Sum of a set of numbers divided by the
number of numbers in the set
1 2 3 4 5  6 7 8  910 55

 5.5
10
10
Central Tendency: The Mean
X

M
 so
if
N
X  [1,2, 3,4, 5,6,7, 8, 9,10]
N
= the number of numbers in X (10
for this example)
 then
X / N  5.5
Central Tendency: The Mean


Important conceptual point:
The mean is the balance point of the data in the sense
that if we took each individual score (X) and subtracted
the mean from them, some are positive and some are
negative. If we add all of those up we will get zero.
X  M  [4.5,3.5,2.5,1.5,.5,.5,1.5,2.5,3.5,4.5]
(X  M)  0

Also, the sum of the absolute values of the negative
numbers is equal to the sum of the absolute values of
the positive numbers
Central Tendency:The Median
 Middlemost or most central item in the set of
ordered numbers; it separates the distribution into
two equal halves
 If odd n, middle value of sequence


if X = [1,2,4,6,9,10,12,14,17]
then 9 is the median
 If even n, average of 2 middle values
 if X = [1,2,4,6,9,10,11,12,14,17]
 then 9.5 is the median; i.e., (9+10)/2
 Median is not affected by extreme values
Central Tendency: The Mode
 The mode is the most frequently occurring
number in a distribution


if X = [1,2,4,7,7,7,8,10,12,14,17]
then 7 is the mode
 Mode is not affected by extreme values
 There may be no mode or several modes
Mean, Median, Mode
Mean
Median
Mean
Median
Mode
Negatively
Skewed
Symmetric
(Not Skewed)
Mode
Mean
Mode
Median
Positively
Skewed
When to Use What
 Mean is a great measure. But, there are time
when its usage is inappropriate or impossible.




Nominal data: Mode
The distribution is bimodal: Mode
You have ordinal data: Median or mode
Are a few extreme scores: Median
Measures of Central Tendency
Overview
Central Tendency
Mean
M
Median
Mode
X
N
Midpoint of
ranked
values
Most
frequently
observed
value
Variability
 Variability

How tightly clustered or how widely dispersed
the values are in a data set.
 Example



Data set 1: [0,25,50,75,100]
Data set 2: [48,49,50,51,52]
Both have a mean of 50, but data set 1 clearly
has greater Variability than data set 2.
Variability: The Range
 The Range is one measure of variability
 The range is the difference between the maximum
and minimum values in a set
 Example
 Data set 1: [0,25,50,75,100]; R: 100-0 = 100
 Data set 2: [48,49,50,51,52]; R: 52-48 = 4
 The range ignores how data are distributed and
only takes the extreme scores into account
Range  X Largest  X Smallest
Quartiles
 Split Ordered Data into 4 Quarters
25%
25%
 Q1 

25%
 Q2 
25%
Q3 
Q1 = first quartile
 Q2 = second quartile= Median
 Q3= third quartile
Variability: Interquartile Range


Difference between third & first quartiles
Interquartile Range = Q3 - Q1

Spread in middle 50%

Not affected by extreme values
Variability: Standard Deviation
 “The Standard Deviation tells us approximately
how far the scores vary from the mean on
average”
SD 
(X  M)
2
N
The typical deviation in a given distribution
Variability: Standard Deviation
 Standard Deviation can be calculated with the
sum of squares (SS) divided by N
SD 
(X  M )
N
2
Variability: Standard Deviation





let X = [3, 4, 5 ,6, 7]
SD 
M=5
(X - M) = [-2, -1, 0, 1, 2]
subtract M from each number in X
(X - M)2 = [4, 1, 0, 1, 4]
squared deviations from the mean
S (X - M)2 = 10
 sum of squared deviations from the mean
(SS)

S (X - M)2 /N = 10/5 = 2
 average squared deviation from the mean

S (X - M)2 /N =
2 = 1.41
 square root of averaged squared deviation
2
(X

M)

N
Variability: Standard Deviation





let X = [1, 3, 5, 7, 9]
M=5
SD 
(X - M) = [-4, -2, 0, 2, 4 ]
subtract M from each number in X
(X - M)2 = [16, 4, 0, 4, 16]
squared deviations from the mean
S (X - M)2 = 40
2
(X

M)

 sum of squared deviations from the mean (SS)

S (X - M)2 /N = 40/5 = 8
 average squared deviation from the mean

S (X - M)2 /N =
8 = 2.83
 square root of averaged squared deviation
N
Variability: Standard Deviation
 The square of the standard deviation is called
the variance
SD 
2
(X

M)

Standard Deviation
N
(X  M)


2
SD
2
N
Variance
Standard Deviation & Standard Scores
 Z scores are expressed in the following way
X M
z
SD
 Z scores express how far a particular score is
from the mean in units of standard deviation
 if (X - M) = SD then (X - M)/SD = 1, and X is
said to be one standard deviation above the
mean
Standard Deviation & Standard Scores
 Z scores provide a common scale to express
deviations from a group mean
X M
z
SD
X  (Z )(SD)  M
Standard Deviation and Standard Scores
 Let’s say someone has an IQ of 145 and is 52
inches tall


IQ in a population has a mean of 100 and a
standard deviation of 15
Height in a population has a mean of 64” with
a standard deviation of 4
 How many standard deviations is this person
away from the average IQ?
 How many standard deviations is this person
away from the average height?