Section 5.1x

Download Report

Transcript Section 5.1x

January 24, 2012

So far, we have used histograms to represent
the overall shape of a distribution. Now
smooth curves can be used:

If the curve is symmetric, single peaked, and
bell-shaped, it is called a normal curve.


Plot the data: usually a histogram or a stem
plot.
Look for overall pattern
◦
◦
◦
◦
Shape
Center
Spread
Outliers


Choose either 5 number summary or “Mean
and Standard Deviation” to describe center
and spread of numbers
◦ 5 number summary used when there are outliers
and graph is skewed; center is the median.
◦ Mean and Standard Deviation used when there are
no outliers and graph is symmetric; center is the
mean
Now, if the overall pattern of a large number of
observations is so regular, it can be described by a
normal curve.



The tails of normal curves fall off quickly.
There are no outliers.
The mean and median are the same number,
located at the center (peak) of graph.

Most histograms show the “counts” of
observations in each class by the heights of
their bars and therefore by the area of the
bars.
◦ (12 = Type A)

Curves show the “proportion” of observations
in each region by the area under the curve.
The scale of the area under the curve equals
1. This is called a density curve.
◦ (0.45 = Type A)





Median: “Equal-areas” point – half area is to the
right, half area is to the left.
Mean: The balance point at which the curve
would balance if made of a solid material (see
next slide).
Area: ¼ of area under curve is to the left of
Quartile 1, ¾ of area under curve is to the left of
Quartile 3. (Density curves use areas “to the
left”).
Symmetric: Confirms that mean and median are
equal.
Skewed: See next slide.
A curve will only balance if its
median & mean are the same.

The mean of a skewed distribution is pulled
along the long tail (away from the median).


If the curve is a normal curve, the standard
deviation can be seen by sight. It is the point
at which the slope changes on the curve.
A small standard
deviation shows
a graph which is
less spread out,
more sharply
peaked…


Carl Gauss used standard deviations to
describe small errors by astronomers and
surveyors in repeated careful measurements.
A normal curve showing the standard
deviations was once referred to as an “error
curve”.
The 68-95-99.7 Rule shows the area under
the curve which shows 1, 2, and 3 standard
deviations to the right and the left of the
center of the curve…more accurate than by
sight.

In a normal distribution, approximately…
◦ 68% of all data fall within 1 standard deviation of the
mean.
◦ 95% of all data fall within 2 standard deviations of the
mean.
◦ 99.7% of all data fall
within 3 standard
deviations of the
mean.
◦ Remember, the
amount of data will
have an impact on how
close these numbers
are to reality.
34 — 34
13.5
2.35
0.15
13.5
2.35
0.15
1. How much of the data is
greater than 500?
Answer: 50%
2. How much of the data
falls between 400 and 500?
Answer: 34%
3. How much of the data
falls between 200 and 400?
Answer: 50-34-0.15=15.85%
4. How much of the data
falls between 300 and 700?
Answer: 95%
5. How much of the data is
less than 300?
Answer: 100-50-34-13.5=2.5%


What kind of curve is being used if the area
under the curve is defined by a proportion (a
value between 0 and 1).
A DENSITY CURVE
To use the 68-95-99.7 Rule, it also must be a
normal curve.






Symmetric
Bell shaped
Single peak
Tails fall off
No outliers
Mean and median are same number


You can use Chebychev’s Theorem if the
distribution is not bell shaped, or if the shape
is not known.
The portion of any data set lying within “k”
standard deviations (k > 1) of the mean is at
least:
1
1 2
k

If you wanted to know how much data fell
within 3 standard deviations of a data set and
it wasn’t a bell shaped curve or you didn’t
know the shape:
1
1 8
1  2  1    88.9%
3
9 9

This is used to standardize numbers which
may be on different scales.
z
x

x  non  standardized
z  standardized
  mean
  standard _ deviation


Answer can be between -3.49 and +3.49.
Z-scores represent the number of standard
deviations above or below the mean the
number is.
◦ If z=1, the value is 1 standard deviation above the
mean.
◦ If z=-2, the value is 2 standard deviations below
the mean.




During the 2003 regular season, the Kansas City
Chiefs (NFL) scored 63 touchdowns. During the
2003 regular season the Tampa Bay Storm (Arena
Football) scored 119 touchdowns. The mean
number of touchdowns for Kansas City is 37.4
with a standard deviation of 9.3. The mean
number of touchdowns for Tampa Bay is 111.7
with a standard deviation of 17.3.
Find the z-score for each.
KC = 2.75
TB = 0.42
Kansas City had a better record of touchdowns
for the season (much higher above the mean).

Cth percentile of a distribution is a value such
that C percent of the observations lie below it
and the rest lie above it.
◦ 80th percentile =
 80% below, 20% above = Top 20%
◦ 90th percentile =
 90% below, 10% above = Top 10%
◦ 99th percentile =
 99% below, 1% above = Top 1%





Please note, you cannot use this identical activity
for your project.
Everyone in class is to measure their height in
centimeters (nearest whole number) and record
the result on the white board.
Arrange the values from smallest to largest.
Find the mean and standard deviation (both to
the nearest tenth)… on your calculator it’s x bar
and s(x).
Calculate the borders for the 68-95-99.7
regions. How many students fall into each
category? How many should, based on the 6895-99.7 rule?
◦ 1. Find the mean and standard deviation, to the nearest
tenth.
◦ 2. Construct a 68-95-99.7 curve to determine the number
of presidents who fall into each category and determine 0
if the curve was accurate.
◦ 3. What is the minimum age a person can be to be
president?
◦ 4. Who will win the Super Bowl?
◦ Data:
57
68
56
54
60
64
61
51
46
42
61
46
57
49
54
51
43
54
57
64
49
56
55
48
58
50
51
55
56
57
48
47
51
61
61
65
55
54
52
54
52
55
51
69