Modeling Continuous Variables
Download
Report
Transcript Modeling Continuous Variables
Modeling Continuous
Variables
Lecture 19
Section 6.1 - 6.3.1
Fri, Oct 6, 2006
Models
Mathematical model – An abstraction and,
therefore, a simplification of a real situation, one
that retains the essential features.
Real situations are usually much to complicated
to deal with in all their details.
Example
The “bell curve” is a model (an abstraction) of
many populations.
Real populations have all sorts of bumps and twists
and irregularities.
The bell curve is smooth and perfectly symmetric.
In statistics, the bell curve is called the normal
curve, or normal distribution.
Models
Our models will be models of distributions,
presented either as histograms or as continuous
distributions.
Histograms and Area
In a histogram, frequency is represented by
area.
Consider the following distribution of test
scores.
Grade
Frequency
60 – 69
3
70 – 79
8
80 – 89
9
90 – 99
5
Histograms and Area
Frequency
10
8
6
4
2
Grade
0
60
70
80
90
100
Histograms and Area
What is the total area of this histogram?
We will rescale the vertical scale so that the total
area equals 1, representing 100%.
Histograms and Area
To achieve this, we divide the frequencies by the
original area to get the density.
Grade
Frequency
Density
60 – 69
3
0.012
70 – 79
8
0.032
80 – 89
9
0.036
90 – 99
5
0.020
Histograms and Area
Density
0.040
0.030
0.020
0.010
Grade
0
60
70
80
90
100
Histograms and Area
Density
0.040
Total area = 1
0.030
0.020
0.010
Grade
0
60
70
80
90
100
Histograms and Area
This histogram has the special property that the
proportion can be found by computing the area of
the rectangle.
For example, what proportion of the grades are
less than 80?
Compute: (10 0.012) + (10 0.032)
= 0.12 + 0.32 = 0.44 = 44%.
Density Functions
This is the fundamental property that connects
the graph of a continuous model to the
population that it represents, namely:
The area under the graph between two numbers a and b
on the x-axis represents the proportion of the population
that lies between a and b.
AREA = PROPORTION
Density Functions
Now consider an arbitrary distribution.
The area under the curve between a and b is the
proportion of the values of x that lie between a
and b.
x
a
b
Density Functions
Now consider an arbitrary distribution.
The area under the curve between a and b is the
proportion of the values of x that lie between a
and b.
x
a
b
Density Functions
Now consider an arbitrary distribution.
The area under the curve between a and b is the
proportion of the values of x that lie between a
and b.
x
a
b
Area = Proportion
Density Functions
Again, the total area under the curve must be 1,
representing a proportion of 100%.
x
a
b
Density Functions
Again, the total area under the curve must be 1,
representing a proportion of 100%.
100%
a
x
b
The Normal Distribution
Normal distribution – The statistician’s name for
the bell curve.
It is a density function in the shape of a “bell.”
Symmetric.
Unimodal.
Extends over the entire real line (no endpoints).
“Main part” lies within 3 of the mean.
The Normal Distribution
The curve has a bell shape, with infinitely long
tails in both directions.
The Normal Distribution
The mean is located in the center, at the peak.
The Normal Distribution
The width of the “main” part of the curve is 6
standard deviations wide (3 standard deviations
each way from the mean).
– 3
+ 3
The Normal Distribution
The area under the entire curve is 1.
(The area outside of 3 st. dev. is approx.
0.0027.)
Area = 1
– 3
+ 3
The Normal Distribution
The normal distribution with mean and
standard deviation is denoted N(, ).
For example, if X is a variable whose
distribution is normal with mean 30 and
standard deviation 5, then we say that “X is
N(30, 5).”
The Normal Distribution
If X is N(30, 5), then the distribution of X
looks like this:
15
30
45
Some Normal Distributions
N(3, 1)
0
1
2
3
4
5
6
7
8
Some Normal Distributions
N(5, 1)
N(3, 1)
0
1
2
3
4
5
6
7
8
Some Normal Distributions
N(2, ½)
N(5, 1)
N(3, 1)
0
1
2
3
4
5
6
7
8
Some Normal Distributions
N(2, ½)
N(3½, 1½)
N(5, 1)
N(3, 1)
0
1
2
3
4
5
6
7
8
Bag A vs. Bag B
Suppose we have two bags, Bag A and Bag B.
Each bag contains millions of vouchers.
In Bag A, the values of the vouchers have
distribution N(50, 10).
Normal with = $50 and = $10.
In Bag B, the values of the vouchers have
distribution N(80, 15).
Normal with = $80 and = $15.
Bag A vs. Bag B
H0: Bag A
H1: Bag B
30
40
50
60
70
80
90
100
110
Bag A vs. Bag B
We are presented with one of the bags.
We select one voucher at random from that bag.
H0: Bag A
H1: Bag B
30
40
50
60
70
80
90
100
110
Bag A vs. Bag B
If its value is less than or equal to $65, then we
will decide that it was from Bag A.
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B
If its value is less than or equal to $65, then we
will decide that it was from Bag A.
H0: Bag A
H1: Bag B
30
40
50
Acceptance Region
60
65
70
80
90
100
110
Bag A vs. Bag B
If its value is less than or equal to $65, then we
will decide that it was from Bag A.
H0: Bag A
H1: Bag B
30
40
50
Acceptance Region
60
65
70
80
90
Rejection Region
100
110
Bag A vs. Bag B
What is ?
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B
What is ?
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B
What is ?
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B
What is ?
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B
If the distributions are very close together, then
and will be large.
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B
If the distributions are very similar, then and
will be large.
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B
If the distributions are very similar, then and
will be large.
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B
Similarly, if the distributions are far apart, then
and will both be very small.
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B
Similarly, if the distributions are far apart, then
and will both be very small.
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110
Bag A vs. Bag B
Similarly, if the distributions are far apart, then
and will both be very small.
H0: Bag A
H1: Bag B
30
40
50
60
65
70
80
90
100
110