Stats_lecture_3 (Statistics lecture on bell

Download Report

Transcript Stats_lecture_3 (Statistics lecture on bell

Statistics lecture 3
Bell-Shaped Curves
and Other Shapes
Goals for lecture 3




Realize many measurements in nature
follow a bell-shaped (“normal”) curve
Understand and learn to compute a
standardized score
Learn to find the proportion of the
population that falls into a given range
Memorize the Empirical Rule
Histogram
Bell-Shaped “Normal” Curve
Bell-Shaped “Normal” Curve
Bell-Shaped “Normal” Curve
Bell-Shaped “Normal” Curve
Remember?


Mean (average): Sum of the values
divided by the number of values
Standard deviation: A measure of how
spread out the values are. Think of it as
the “average distance” of all values
from the mean.
Some Characteristics
of a Normal Distribution




Symmetrical (not skewed)
One peak in the middle, at the mean
The wider the curve, the greater the
standard deviation
Area under the curve is 1 (or 100%)
mean
Why it looks like that
With many things in nature, most
individuals fall near the average. The
farther you move above or below the
average, the fewer individuals there are
with those extreme values.
Examples: Height, weight, IQ, pulse rate
Bell-shaped wear
Bell-shaped wear
Not all curves are “normal”
Normal Curve...
If you know these two things:
 The Mean
 The Standard Deviation
...
...Normal Curve
...you
 The
into
 The
 The
can figure these things:
proportion of individuals who fall
any range of values
percentile of any given value
value of any given percentile
Percentiles
Your percentile for a particular measure
(like height or IQ) is the percentage of
the population that falls below you. In
one of my recent classes:
 My height (183 cm): 89th percentile
 My weight (
): 99th percentile
 My age (62): 99th percentile
104 kg
Standardized Scores
A standardized score (also called the
z-score) is simply the number of
standard deviations a particular value is
either above or below the mean.
The standardized score is:
 Positive if above the mean
 Negative if below the mean
Standardized Score Examples
Class height: Mean 170 cm, StdDev: 10 cm.
What is the z-score of someone:
 160 cm
 180 cm
 175 cm
 150 cm
 170 cm
 145 cm
Calculate z-score
for a Particular Value
z-score = (Value - mean) / StdDev
185 cm : (185 – 170) / 10 = 15 / 10 = +1.5
165 cm: (165 - 170) / 10 = -5 / 10 = -0.5
180 cm: (180 - 170) / 10 = 10 / 10 = +1.0
What’s the Point?


With z-score or percentile, you can
compare unlike things.
For instance, I am heavier (99th pctile)
than I am tall (89th pctile).
With a z-score, you can look up the
percentile in a table or an online
calculator
The Empirical Rule
For any normal curve, approximately:
 68% of values within one StdDev of the
mean
 95% of values within two StdDevs of
the mean
 99.7% of values within three StdDevs
of the mean
Empirical Rule
Empirical Rule
Empirical Rule
Outlier

A value that is more than three
standard deviations above or below the
mean.
Apply Empirical Rule
to Class Height
Class height: Mean 170 cm., StdDev 10 cm.
 About 68% of class is between what heights?
160 cm and 180 inches (+/- 10 cm)

About 95% of class is between what heights?
150 inches and 190 inches (+/- 20 cm)
Data visualization goals




See different ways of graphically
displaying data.
Learn the features of a good statistical
picture.
Be able to identify common problems
with graphs and plots.
Learn to read graphs comprehensively.
Why do we turn data
into graphics?



Easier to understand
Easier to see the trends
A good graphic will convey the same
message you would get if you really
studied the data
“Graphics reveal data.”
-- Edward Tufte
Two kinds of variables


Categorical: Data that can be counted
in categories, such as gender or race
Measurement: Data that can be
recorded as a number and then put into
order, such as IQ, weight, cigarettes
smoked per day, etc.
Pictures of
Categorical Data
Three common types of graphics for
categorical data:
 Pie charts
 Bar graphs
 Pictograms
Pie Charts
Women
37%
Men
63%


Good for showing one categorical
variable, like gender
Show the percentage that falls into
each category
Bar Graphs
Can show two or
more categorical
variables
simultaneously
(for example, height
and gender)
10
8
Students

12
F
6
M
4
2
0
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
Inches
number
Pictograms
A
grades

B
C
D
F
Height of pictures is used like bars
Pictograms
can be misleading

We tend to focus on the area, rather
than just the height
Pictograms
can be misleading

To be fair, you should keep the width of
pictograms the same
Pictures of
Measurement Data
Lots of ways to illustrate measurement
variables:
 Stemplots and histograms (lecture 2)
 Line graphs (also called fever charts)
 Scatter plots
 Others: Area, radar, doughnut, highlow-close, surface plots, maps, et al.
Stemplots
19
20
21
22
23
24
25
|
|
|
|
|
|
|
5
1
0
0
0
4
1
9
1
1
2
2
6
7
4
2
2
2
7
5
2
2
3
7
5
2
5
5
9
56666778
24444588899
679
5
9
Line Graph
(Fever Chart)
Scatter Plot

Good for displaying the relationship
between two measurement variables
Scatter Plot
350
Doig
pounds
300
250
200
150
100
60
65
70
inches
75
80
Scatter Plot
350
Doig
pounds
300
250
200
150
100
60
65
70
inches
75
80
Scatter Plot
height vs. weight
pounds
300
250
200
150
100
60
65
70
inches
75
80
Scatter Plot
height vs. weight
300
pounds
250
200
150
100
60
65
70
inches
75
80
Scatter Plot
height vs. weight
300
pounds
250
200
150
100
60
65
70
inches
75
80
Difficulties and Disasters
Most common problems:
 No labeling on one or more axes
 Not starting at zero
 Changes in labeling on axes
 Misleading units
 Graphs based on poor information
Checklist for Statistical Pictures
1. Does the message clearly stand out?
2. Is the purpose or title evident?
3. Is a source given for the data?
4. Did the data come from a reliable,
believable source?
5. Is everything labeled clearly and
unambiguously?
Checklist for Statistical Pictures
6. Do the axes start at zero?
7. Do the axes maintain a constant scale?
8. Are there breaks in the numbers on the
axes that may be easy to miss?
9. Have financial numbers been adjusted for
inflation?
10. Is there extraneous information cluttering
the picture or misleading the eye?
Perguntas?