StatInference

Download Report

Transcript StatInference

Chapter 7:
Normal Probability
Distributions
March 16
In Chapter 7:
7.1 Normal Distributions
7.2 Determining Normal Probabilities
7.3 Finding Values That Correspond to
Normal Probabilities
7.4 Assessing Departures from Normality
§7.1: Normal
Distributions
• Normal random variables are the most common
type of continuous random variable
• First described de Moivre in 1733
• Laplace elaborated the mathematics in 1812
• Describe some (not all) natural phenomena
• More importantly, describe the behavior of
means
Normal Probability Density
Function
• Recall the continuous
random variables are
described with
smooth probability
density functions
(pdfs) – Ch 5
• Normal pdfs are
recognized by their
familiar bell-shape
This is the age distribution
of a pediatric population.
The overlying curve
represents its Normal pdf
model
Area Under the Curve
• The darker bars of the
histogram correspond to
ages less than or equal
to 9 (~40% of
observations)
• This darker area under
the curve also
corresponds to ages less
than 9 (~40% of the total
area)
 x 

 
 12 
1
f ( x) 
e 
2 
2
Parameters μ and σ
• Normal pdfs are a family of distributions
• Family members identified by parameters
μ (mean) and
σ (standard deviation)
μ controls location
σ controls spread
Mean and Standard Deviation
of Normal Density
σ
μ
Standard Deviation σ
• Points of inflections (where
the slopes of the curve
begins to level) occur one σ
below and above μ
• Practice sketching Normal
curves to feel inflection
points
• Practice labeling the
horizontal axis of curves with
standard deviation markers
(figure)
68-95-99.7 Rule for
Normal Distributions
• 68% of the AUC falls within ±1σ of μ
• 95% of the AUC falls within ±2σ of μ
• 99.7% of the AUC falls within ±3σ of μ
Example: 68-95-99.7 Rule
• 68% of scores fall in
Wechsler adult
intelligence scores are
μ ± σ = 100 ± 15
= 85 to 115
Normally distributed
with μ = 100 and σ = 15; • 95% of scores fall in
X ~ N(100, 15). Using
μ ± 2σ = 100 ±
the 68-95-99.7 rule:
(2)(15) = 70 to 130
• 99.7% of scores in
μ ± 3σ = 100 ±
(3)(15) = 55 to 145
Symmetry in the Tails
Because of the Normal
curve is symmetrical and
the total AUC adds to 1…
95%
… we can determine the
AUC in tails, e.g.,
Because 95% of curve is
in μ ± 2σ, 2.5% is in each
tail beyond μ ± 2σ
Example: Male Height
• Male height is approximately Normal with μ =
70.0˝ and σ = 2.8˝
• Because of the 68-95-99.7 rule, 68% of
population is in the range 70.0˝  2.8˝ = 67.2 ˝
to 72.8˝
• Because the total AUC adds to 100%, 32%
are in the tails below 67.2˝ and above 72.8˝
• Because of symmetry, half of this 32% (i.e.,
16%) is below 67.2˝ and 16% is above 72.8˝
Example: Male Height
64%
16%
67.2 70
16%
72.8
Reexpression of Non-Normal
Variables
• Many biostatistical variables are not Normal
• We can reexpress non-Normal variables
with a mathematical transformation to make
them more Normal
• Example of mathematical transforms
include logarithms, exponents, square
roots, and so on
• Let us review the logarithmic transformation
Logarithms
• Logarithms are
exponents of their
base
• There are two main
logarithmic bases
– common log10
(base 10)
– natural ln
(base e)
Landmarks:
• log10(1) = 0
(because 100 = 1)
• log10(10) = 1
(because 101 = 10)
Example:
Logarithmic Re-expression
• Prostate specific
antigen (PSA) not
Normal in 60 year olds
but the ln(PSA) is
approximately Normal
with μ = −0.3 and σ =
0.8
• 95% of ln(PSA) falls in
μ ± 2σ = −0.3 ±
(2)(0.8) = −1.9 to 1.3
• Thus, 2.5% are above
ln(PSA) 1.3; take antilog of 1.3: e1.3 = 3.67
Since only 2.5% of
population has values
greater than 3.67 → use this
as cut-point for suspiciously
high results
§7.2: Determining Normal
Probabilities
To determine a Normal probability when the
value does not fall directly on a ±1σ, ±2σ,
or ±3σ landmark, follow this procedure:
1. State the problem
2. Standardize the value (z score)
3. Sketch and shade the curve
4. Use Table B to determine the probability
Example: Normal Probability
Step 1. Statement of Problem
• We want to determine the percentage of human
gestations that are less than 40 weeks in length
• We know that uncomplicated human pregnancy
from conception to birth is approximately
Normally distributed with μ = 39 weeks and σ =
2 weeks. [Note: clinicians measure gestation from last
menstrual period to birth, which adds 2 weeks to the μ.]
• Let X represent human gestation: X ~ N(39, 2)
• Statement of the problem: Pr(X ≤ 40) = ?
Standard Normal (Z) Variable
• Standard Normal variable ≡ a Normal
random variable with μ = 0 and σ = 0
• Called “Z variables”
• Notation: Z ~ N(0,1)
• Use Table B to look up cumulative
probabilities
• Part of Table B shown on next slide…
Example: A Standard
Normal (Z) variable
with a value of 1.96 has
a cumulative probability
of .9750.
Normal Probability
Step 2. Standardize
To standardize, subtract μ and divide by σ.
z
x

The z-score tells you how the number of σ-units
the value falls above or below μ
For example, the value 40 from X ~ N (39,2) has
40  39
z
 0.5
2
Steps 3 & 4.
Sketch and Use Table B
3. Sketch and label axes
4. Use Table B to lookup Pr(Z ≤ 0.5) = 0.6915
Probabilities Between Two
Points
Let a represent the lower boundary and b represent
the upper boundary of a range:
Pr(a ≤ Z ≤ b)
=
Pr(Z ≤ b)
−
Pr(Z ≤ a)
Use of this concept will be demonstrate in class and on HW exercises.
§7.3 Finding Values
Corresponding to Normal
Probabilities
1. State the problem.
2. Use Table B to look up the z-percentile
value.
3. Sketch
4. Unstandardize with this formula
x    z p
Looking up the z percentile
value



Use Table B to look up the z percentile
value, i.e., the z score for the probability
in questions
Look inside the table for the entry closest
to the associated cumulative probability.
Then trace the z score to the row and
column labels.
Suppose you wanted the
97.5th percentile z score.
Look inside the table for
.9750. Then trace the z
score to the margins.
Notation: Let zp
represents the z score
with cumulative
probability p,
e.g., z.975 = 1.96
Finding Normal Values Example
Suppose we want to know what gestational length
is less than 97.5% of all gestations?
Step 1. State the problem!
Let X represent gestations length
Prior problem established X ~ N(39, 2)
We want the gestation length that is shorter than
.975 of all gestations. This is equivalent to the
gestation that is longer than.025 of gestations.
Example, cont.
Step 2. Use Table B to
look up the z value.
Table B lists only “left
tails”.
“less than 97.5%” (right
tail) = “greater than 2.5%”
(left tail).
z lookup in table shows
z.025 = −1.96
z
.00
–1.9 .0287
.01
.02
.03
.04
.05
.06
.07
.08
.09
.0281
.0274
.0268
.0262
.0256
.0250
.0244
.0239
.0233
3. Sketch
4. Unstandardize
x  39  (1.96)( 2)  35.08  35
“The 2.5th percentile gestation is 35 weeks.”
7.4 Assessing Departures
from Normality
The best way to assess Normality is graphically
Approximately
Normal histogram
Normal “Q-Q” Plot of
same distribution
A Normal distribution will adhere to a diagonal line on the Q-Q
plot
Negative Skew
A negative skew will show an upward curve on the Q-Q plot
Positive Skew
A positive skew will show an downward curve on the Q-Q plot
Same data as previous slide but
with logarithmic transform
A mathematical transform can Normalize a skew
Leptokurtotic
A leptokurtotic distribution (skinny tails) will show an S-shape
on the Q-Q plot