Descriptive Statistics

Download Report

Transcript Descriptive Statistics

Edpsy 511
Exploratory Data Analysis
Homework 1: Due 9/19
Shapes of Distributions
► Normal
distribution
► Positive Skew
 Or right skewed
► Negative
Skew
 Or left skewed
How is this variable distributed?
3.0
2.5
Frequency
2.0
1.5
1.0
0.5
Mean = 4.3
Std. Dev. = 1.494
N = 10
0.0
1
2
3
4
5
score
6
7
8
How is this variable distributed?
3.0
2.5
Frequency
2.0
1.5
1.0
0.5
Mean = 2.80
Std. Dev. = 1.75119
N = 10
0.0
0.00
1.00
2.00
3.00
4.00
right
5.00
6.00
7.00
How is this variable distributed?
3.0
2.5
Frequency
2.0
1.5
1.0
0.5
Mean = 5.40
Std. Dev. = 1.42984
N = 10
0.0
2.00
3.00
4.00
5.00
left
6.00
7.00
8.00
Descriptive Statistics
Statistics vs. Parameters
►A
parameter is a characteristic of a population.
 It is a numerical or graphic way to summarize data
obtained from the population
►A
statistic is a characteristic of a sample.
 It is a numerical or graphic way to summarize data
obtained from a sample
Types of Numerical Data
►
There are two fundamental types of numerical
data:
1)
2)
Categorical data: obtained by determining the
frequency of occurrences in each of several
categories
Quantitative data: obtained by determining
placement on a scale that indicates amount or
degree
Techniques for Summarizing
Quantitative Data
► Frequency
Distributions
► Histograms
► Stem and Leaf Plots
► Distribution curves
► Averages
► Variability
Summary Measures
Summary Measures
Central Tendency
Arithmetic
Mean
Quartile
Variation
Median Mode
Range
Variance
Standard Deviation
Measures of Central Tendency
Central Tendency
Average (Mean)
Median
n
X 
X
i 1
n
N

X
i 1
N
i
i
Mode
Mean (Arithmetic Mean)
►Mean
(arithmetic mean) of data values
 Sample mean
Sample Size
n
X
X
i 1
i
n
X1  X 2 

n
 Population mean
N

X
i 1
N
i
 Xn
Population Size
X1  X 2 

N
 XN
Mean
►The
most common measure of central
tendency
►Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
Mean of Grouped Frequency
X
10
9
8
7
6
5
Total
f
1
3
2
4
6
5
N
21
fX
 fX
X   fX / N
Weighted Mean
A form of mean obtained
from groups of data in
which the different
sizes of the groups are
accounted for or
weighted.
xw
f ( x)


N total
Group
1
2
3
xbar
30
25
40
N
10
15
25
xw
f ( x)


N total
f(xbar)
Median
►Robust
measure of central tendency
►Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10
Median = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
►In
an Ordered array, median is the
“middle” number
 If n or N is odd, median is the middle number
 If n or N is even, median is the average of
the two middle numbers
Mode
►A
measure of central tendency
► Value that occurs most often
► Not affected by extreme values
► Used for either numerical or categorical data
► There may may be no mode
► There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
The Normal Curve
Different Distributions Compared
Variability
►
►
►
Refers to the extent to which the scores on a quantitative
variable in a distribution are spread out.
The range represents the difference between the highest
and lowest scores in a distribution.
A five number summary reports the lowest, the first
quartile, the median, the third quartile, and highest score.
 Five number summaries are often portrayed graphically by the use
of box plots.
Variance
►
►
The Variance, s2, represents the amount of variability of
the data relative to their mean
As shown below, the variance is the “average” of the
squared deviations of the observations about their mean
s
►
2
( x  x)


2
i
n 1
The Variance, s2, is the sample variance, and is used to
estimate the actual population variance, s 2
s 2
2
(
x


)
 i
N
Standard Deviation
►
►
►
Considered the most useful index of variability.
It is a single number that represents the spread of a
distribution.
If a distribution is normal, then the mean plus or minus 3
SD will encompass about 99% of all scores in the
distribution.
Calculation of the Variance and Standard
Deviation of a Distribution
Raw
Score
85
80
70
60
55
50
45
40
30
25
Mean
54
54
54
54
54
54
54
54
54
54
X–X
31
26
16
6
1
-4
-9
-14
-24
-29
2
(X – X)
961
676
256
36
1
16
81
196
576
841
2
Σ(X – X)
Variance (SD ) =
N-1
2
Standard deviation (SD) =
=
3640
=404.44
9
2
√
Σ(X – X)
N-1
Comparing Standard Deviations
Data A
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
S = 3.338
Data B
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
S = .9258
Data C
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
S = 4.57
Facts about the Normal Distribution
50% of all the observations fall on each side of the
mean.
► 68% of scores fall within 1 SD of the mean in a
normal distribution.
► 27% of the observations fall between 1 and 2 SD from
the mean.
► 99.7% of all scores fall within 3 SD of the mean.
► This is often referred to as the 68-95-99.7 rule
►
Fifty Percent of All Scores in a Normal Curve
Fall on Each Side of the Mean
Probabilities Under the Normal Curve
Standard Scores
Standard scores use a common scale to indicate how an
individual compares to other individuals in a group.
► The simplest form of a standard score is a Z score.
► A Z score expresses how far a raw score is from the mean
in standard deviation units.
► Standard scores provide a better basis for comparing
performance on different measures than do raw scores.
► A Probability is a percent stated in decimal form and refers
to the likelihood of an event occurring.
► T scores are z scores expressed in a different form (z score
x 10 + 50).
►
Probability Areas Between the Mean and
Different Z Scores
Examples of Standard Scores