Transcript Chapter 6
Chapter 6: Probability
Distributions
Section 6.1: How Can We Summarize
Possible Outcomes and Their
Probabilities?
1
Learning Objectives
1. Random variable
2. Probability distributions for discrete random
variables
3. Mean of a probability distribution
4. Summarizing the spread of a probability
distribution
5. Probability distribution for continuous
random variables
2
Learning Objective 1:
Randomness
The numerical values that a variable
assumes are the result of some random
phenomenon:
Selecting a random sample for a
population
or
Performing a randomized experiment
3
Learning Objective 1:
Random Variable
A random variable is a numerical
measurement of the outcome of a random
phenomenon.
4
Learning Objective 1:
Random Variable
Use letters near the end of the alphabet, such as x, to
symbolize
Variables
A particular value of the random variable
Use a capital letter, such as X, to refer to the random
variable itself.
Example: Flip a coin three times
X=number of heads in the 3 flips; defines the
random variable
x=2; represents a possible value of the random
variable
5
Learning Objective 2:
Probability Distribution
The probability distribution of a random
variable specifies its possible values and
their probabilities.
Note: It is the randomness of the variable
that allows us to specify probabilities for
the outcomes
6
Learning Objective 2:
Probability Distribution of a Discrete Random
Variable
A discrete random variable X has separate values
(such as 0,1,2,…) as its possible outcomes
Its probability distribution assigns a probability P(x) to
each possible value x:
For each x, the probability P(x) falls between 0
and 1
The sum of the probabilities for all the possible x
values equals 1
7
Learning Objective 2:
Example
What is the estimated probability of at least three
home runs?
P(3)+P(4)+P(5)=0.13+0.03+0.01=0.17
8
Learning Objective 3:
The Mean of a Discrete Probability Distribution
The mean of a probability distribution for a
discrete random variable is
x p(x)
where the sum is taken over all possible values
of x.
The mean of a probability distribution is denoted
by the parameter, µ.
The mean is a weighted average; values of x that
are more likely receive greater weight P(x)
9
Learning Objective 3:
Expected Value of X
The mean of a probability distribution of a
random variable X is also called the expected
value of X.
The expected value reflects not what we’ll
observe in a single observation, but rather that
we expect for the average in a long run of
observations.
It is not unusual for the expected value of a
random variable to equal a number that is NOT a
possible outcome.
10
Learning Objective 3:
Example
Find the mean of this probability distribution.
The mean:
x p(x)
= 0(0.23) + 1(0.38) + 2(0.22) + 3(0.13) +
4(0.03) + 5(0.01) = 1.38
11
Learning Objective 4:
The Standard Deviation of a Probability
Distribution
The standard deviation of a probability
distribution, denoted by the parameter, σ,
measures its spread.
Larger values of σ correspond to greater
spread.
Roughly, σ describes how far the random
variable falls, on the average, from the mean
of its distribution
12
Learning Objective 5:
Continuous Random Variable
A continuous random variable has an
infinite continuum of possible values in an
interval.
Examples are: time, age and size
measures such as height and weight.
Continuous variables are measured in a
discrete manner because of rounding.
13
Learning Objective 5:
Probability Distribution of a Continuous Random
Variable
A continuous random variable has possible
values that form an interval.
Its probability distribution is specified by a curve.
Each interval has probability between 0 and 1.
The interval containing all possible values has
probability equal to 1.
14
Chapter 6: Probability
Distributions
Section 6.2: How Can We Find
Probabilities for Bell-Shaped
Distributions?
15
Learning Objectives
1. Normal Distribution
2. 68-95-99.7 Rule for normal distributions
3. Z-Scores and the Standard Normal
Distribution
4. The Standard Normal Table: Finding
Probabilities
5. Using the TI-calculator: find probabilities
16
Learning Objectives
6. Using the Standard Normal Table in Reverse
7. Using the TI-calculator: find z-scores
8. Probabilities for Normally Distributed
Random Variables
9. Percentiles for Normally Distributed Random
Variables
10. Using Z-scores to Compare Distributions
17
Learning Objective 1:
Normal Distribution
The normal distribution is symmetric, bellshaped and characterized by its mean µ and
standard deviation .
The normal distribution is the most important
distribution in statistics
Many distributions have an approximate normal
distribution
Approximates many discrete distributions well
when there are a large number of possible
outcomes
Many statistical methods use it even when the
data are not bell shaped
18
Learning Objective 1:
Normal Distribution
Normal distributions are
Bell shaped
Symmetric around the mean
The mean () and the standard deviation ()
completely describe the density curve
Increasing/decreasing moves the curve
along the horizontal axis
Increasing/decreasing controls the spread of
the curve
19
Learning Objective 1:
Normal Distribution
Within what interval do almost all of the men’s
heights fall? Women’s height?
20
Learning Objective 2:
68-95-99.7 Rule for Any Normal Curve
68% of the observations fall within one standard deviation of the
mean
95% of the observations fall within two standard deviations of
the mean
99.7% of the observations fall within three standard deviations
of the mean
21
Learning Objective 2:
Example : 68-95-99.7% Rule
Heights of adult women
can be approximated by a normal distribution
= 65 inches; =3.5 inches
68-95-99.7 Rule for women’s heights
68% are between 61.5 and 68.5 inches
[ µ = 65 3.5 ]
95% are between 58 and 72 inches
[ µ 2 = 65 2(3.5) = 65 7 ]
99.7% are between 54.5 and 75.5 inches
[ µ 3 = 65 3(3.5) = 65 10.5 ]
22
Learning Objective 2:
Example : 68-95-99.7% Rule
What proportion of women are less than 69
inches tall?
? = 84%
16%
68%
(by 68-95-99.7 Rule)
?
-1
+1
65
68.5
23
(height values)
Learning Objective 3:
Z-Scores and the Standard Normal Distribution
The z-score for a value x of a random variable is
the number of standard deviations that x falls from
the mean
z
x
A negative (positive) z-score indicates that the
value is below (above) the mean
z-scores can be used to calculate the probabilities
of
a normal random variable using the normal
tables in the back of the book
24
Learning Objective 3:
Z-Scores and the Standard Normal Distribution
A standard normal distribution has mean µ=0
and standard deviation σ=1
When a random variable has a normal
distribution and its values are converted to zscores by subtracting the mean and dividing
by the standard deviation, the z-scores have
the standard normal distribution.
25
Learning Objective 4:
Table A: Standard Normal Probabilities
Table A enables us to find normal probabilities
It tabulates the normal cumulative probabilities
falling below the point +z
To use the table:
Find the corresponding z-score
Look up the closest standardized score (z) in
the table.
First column gives z to the first decimal place
First row gives the second decimal place of z
The corresponding probability found in the
body of the table gives the probability of falling
below the z-score
26
Learning Objective 4:
Example: Using Table A
Find the probability that a normal random variable
takes a value less than 1.43 standard deviations
above µ; P(z<1.43)=.9236
TI Calculator = Normcdf(-1e99,1.43,0,1)= .9236
27
Learning Objective 4:
Example: Using Table A
Find the probability that a normal random variable takes a
value greater than 1.43 standard deviations above µ:
P(z>1.43)=1-.9236=.0764
TI Calculator = Normcdf(1.43,1e99,0,1)= 0.0764 28
Learning Objective 4:
Example:
Find the probability that a normal random variable
assumes a value within 1.43 standard deviations of µ
Probability below 1.43σ = .9236
Probability below -1.43σ = .0764 (1-.9236)
P(-1.43<z<1.43) =.9236-.0764=.8472
29
TI Calculator = Normcdf(-1.43,1.43,0,1)= .8472
Learning Objective 5:
Using the TI Calculator
To calculate the cumulative probability
2nd DISTR; 2:normalcdf(lower bound, upper
bound,mean,sd)
Use –1E99 for negative infinity and 1E99 for
positive infinity
30
Learning Objective 5:
Find Probabilities Using TI Calculator
Find probability to the left of -1.64
P(z<-1.64)=normcdf(-1e99,-1.64,0,1)=.0505
Find probability to the right of 1.56
P(z>1.56)=normcdf(1.56,1e99,0,1)=.0594
Find probability between -.50 and 2.25
P(-.5<z<2.25)=normcdf(-.5,2.25,0,1)=.6793
31
Learning Objective 6:
How Can We Find the Value of z for a Certain
Cumulative Probability?
To solve some of our problems, we will need
to find the value of z that corresponds to a
certain normal cumulative probability
To do so, we use Table A in reverse
Rather than finding z using the first column
(value of z up to one decimal) and the first row
(second decimal of z)
Find the probability in the body of the table
The z-score is given by the corresponding values
in the first column and row
32
Learning Objective 6:
How Can We Find the Value of z for a Certain
Cumulative Probability?
Example: Find the value of z for a cumulative
probability of 0.025.
Look up the cumulative probability of 0.025 in the
body of Table A.
A cumulative probability of 0.025 corresponds to z
= -1.96.
Thus, the probability that a normal
random variable falls at least 1.96
standard deviations below the
mean is 0.025.
33
Learning Objective 6:
How Can We Find the Value of z for a Certain
Cumulative Probability?
Example: Find the value of z for a cumulative
probability of 0.975.
Look up the cumulative probability of 0.975 in the
body of Table A.
A cumulative probability of 0.975 corresponds to z
= 1.96.
Thus, the probability that a normal
random variable takes a value no more
than 1.96 standard deviations above
the mean is 0.975.
34
Learning Objective 7:
Using the TI Calculator to Find Z-Scores for a
Given Probability
2nd DISTR 3:invNorm; Enter
invNorm(percentile,mean,sd)
Percentile is the probability under the curve
from negative infinity to the z-score
Enter
35
Learning Objective 7:
Examples
The probability that a standard normal random
variable assumes a value that is ≤ z is 0.975. What
is z? Invnorm(.975,0,1)=1.96
The probability that a standard normal random
variable assumes a value that is > z is 0.0275.
What is z? Invnorm(.975,0,1)=1.96
The probability that a standard normal random
variable assumes a value that is ≥ z is 0.881.
What is z? Invnorm(1-.881,0,1)=-1.18
The probability that a standard normal random
variable assumes a value that is < z is 0.119.
What is z? Invnorm(.119,0,1)= -1.18
36
Learning Objective 7:
Example
Find the z-score z such that the probability
within z standard deviations of the mean is
0.50.
Invnorm(.75,0,1)= .67
Invnorm(.25,0,1)= -.67
Probability = P(-.67<Z<.67)=.5
37
Learning Objective 8:
Finding Probabilities for Normally Distributed
Random Variables
1. State the problem in terms of the observed
random variable X, i.e., P(X<x)
2. Standardize X to restate the problem in
terms of a standard normal variable Z
x
P(X x) PZ z
3. Draw a picture to show the desired
probability under the standard normal curve
4. Find the area under the standard normal
curve using Table A
38
Learning Objective 8:
P(X<x)
Adult systolic blood pressure is normally
distributed with µ = 120 and σ = 20. What
percentage of adults have systolic blood pressure
less than 100?
100120
P(X<100) = PZ
P(z 1.00) .1587
20
Normcdf(-1E99,100,120,20)=.1587
of adults have systolic blood pressure less
15.9%
than 100
39
Learning Objective 8:
P(X>x)
Adult systolic blood pressure is normally distributed
with µ = 120 and σ = 20. What percentage of adults have
systolic blood pressure greater than 100?
P(X>100) = 1 – P(X<100)
100120
PZ
P(Z 1.00) .1587
20
P(X>100)= 1-.1587=.8413
Normcdf(100,1e99,120,20)=.8413
84.1% of adults have systolic blood pressure greater than
100
40
Learning Objective 8:
P(X>x)
Adult systolic blood pressure is normally distributed
with µ = 120 and σ = 20. What percentage of adults have
systolic blood pressure greater than 133?
P(X>133) = 1 – P(X<133)
133120
PZ
P(Z .65) .7422
20
P(X>133)= 1-.7422=.2578
Normcdf(133,1E99,120,20)=.2578
25.8% of adults have systolic blood pressure greater than
133
41
Learning Objective 8:
P(a<X<b)
Adult systolic blood pressure is normally distributed
with µ = 120 and σ = 20. What percentage of adults have
systolic blood pressure between 100 and 133?
P(100<X<133) = P(X<133)-P(X<100)
133120 100120
PZ
PZ
20
20
P(Z .65) P(Z 1.00) .7422 .1587 .5835
Normcdf(100,133,120,20)=.5835
58% of adults have systolic blood pressure between 100
and 133
42
Learning Objective 9:
Find X Value Given Area to Left
Adult systolic blood pressure is normally distributed
with µ = 120 and σ = 20. What is the 1st quartile?
P(X<x)=.25, find x:
Look up .25 in the body of Table A to find z= -0.67
Solve equation to find x:
x z 120 (0.67) *20 106.6
Check:
P(X<106.6) P(Z<-0.67)=0.25
TI Calculator = Invnorm(.25,120,20)=106.6
43
Learning Objective 9:
Find X Value Given Area to Right
Adult systolic blood pressure is normally distributed
with µ = 120 and σ = 20. 10% of adults have systolic blood
pressure above what level?
P(X>x)=.10, find x.
P(X>x)=1-P(X<x)
Look up 1-0.1=0.9 in the body of Table A to find z=1.28
Solve equation to find x:
x z 120 (1.28) * 20 145.6
Check:
P(X>145.6) =P(Z>1.28)=0.10
TI Calculator = Invnorm(.9,120,20)=145.6
44
Learning Objective 10:
Using Z-scores to Compare Distributions
Z-scores can be used to compare observations from
different normal distributions
Example:
You score 650 on the SAT which has =500 and
=100 and 30 on the ACT which has =21.0 and
=4.7. On which test did you perform better?
Compare z-scores
SAT:
ACT:
30 21
650 500
z
1.91
z
1.5
4.7
100
Since your z-score is greater for the ACT, you
performed better on this exam
45
Chapter 6: Probability
Distributions
Section 6.3: How Can We Find
Probabilities When Each Observation Has
Two Possible Outcomes?
46
Learning Objectives
The Binomial Distribution
Conditions for a Binomial Distribution
Probabilities for a Binomial Distribution
Factorials
Examples using Binomial Distribution
Do the Binomial Conditions Apply?
Mean and Standard Deviation of the Binomial
Distribution
8. Normal Approximation to the Binomial
1.
2.
3.
4.
5.
6.
7.
47
Learning Objective 1:
The Binomial Distribution
Each observation is binary: it has one of two
possible outcomes.
Examples:
Accept, or decline an offer from a bank for a credit
card.
Have, or do not have, health insurance.
Vote yes or no on a referendum.
48
Learning Objective 2:
Conditions for the Binomial Distribution
Each of n trials has two possible outcomes:
“success” or “failure”.
Each trial has the same probability of success,
denoted by p.
The n trials are independent.
The binomial random variable X is the number of
successes in the n trials.
49
Learning Objective 3:
Probabilities for a Binomial Distribution
Denote the probability of success on a trial by
p.
For n independent trials, the probability of x
successes equals:
n!
P(x)
p x (1 p) nx , x 0,1,2,...,n
x!(n - x)!
50
Learning Objective 4:
Factorials
Rules for factorials:
n!=n*(n-1)*(n-2)…2*1
1!=1
0!=1
For example,
4!=4*3*2*1=24
51
Learning Objective 5:
Example: Finding Binomial Probabilities
John Doe claims to possess ESP.
An experiment is conducted:
A person in one room picks one of the integers 1,
2, 3, 4, 5 at random.
In another room, John Doe identifies the number
he believes was picked.
Three trials are performed for the experiment.
Doe got the correct answer twice.
52
Learning Objective 5:
Example 1
If John Doe does not actually have ESP and is
actually guessing the number, what is the
probability that he’d make a correct guess on two
of the three trials?
The three ways John Doe could make two correct
guesses in three trials are: SSF, SFS, and FSS.
Each of these has probability: (0.2)2(0.8)=0.032.
The total probability of two correct guesses is
3(0.032)=0.096.
53
Learning Objective 5:
Example 1
The probability of exactly 2 correct guesses is the binomial
probability with n = 3 trials, x = 2 correct guesses and p = 0.2
probability of a correct guess.
3!
P(2)
(0.2)2 (0.8)1 3(0.04)(0.8) 0.096
2!1!
2nd Vars
0:binampdf(n,p,x)
Binampdf(3,.2,2)=0.096
54
Learning Objective 5:
Binomial Example 2
1000 employees, 50% Female
None of the 10 employees chosen for management
training were female.
The probability that no females are chosen is:
10!
P(0)
(0.50)0 (0.50)10 0.001
0!10!
Binompdf(10,.5,0)=9.765625E-4
It
is very unlikely (one chance in a thousand) that none of the
10 selected for management training would be female if the
employees were chosen randomly
55
Learning Objective 6:
Do the Binomial Conditions Apply?
Before using the binomial distribution,
check that its three conditions apply:
Binary data (success or failure).
The same probability of success for each
trial (denoted by p).
Independent trials.
56
Learning Objective 6:
Do the Binomial Conditions Apply to Example 2?
The data are binary (male, female).
If employees are selected randomly, the
probability of selecting a female on a given trial
is 0.50.
With random sampling of 10 employees from a
large population, outcomes for one trial does
not depend on the outcome of another trial
57
Learning Objective 7:
Binomial Mean and Standard Deviation
The binomial probability distribution for n trials
with probability p of success on each trial has
mean µ and standard deviation σ given by:
np, np(1- p)
58
Learning Objective 7:
Example: Racial Profiling?
Data:
262 police car stops in Philadelphia in 1997.
207 of the drivers stopped were African-American.
In 1997, Philadelphia’s population was 42.2%
African-American.
Does the number of African-Americans
stopped suggest possible bias, being
higher than we would expect (other things
being equal, such as the rate of violating
traffic laws)?
59
Learning Objective 7:
Example: Racial Profiling?
Assume:
262 car stops represent n = 262 trials.
Successive police car stops are
independent.
P(driver is African-American) is p = 0.422.
Calculate the mean and standard deviation
of this binomial distribution:
262(0.422) 111
262(0.422)
(0.578) 8
60
Learning Objective 7:
Example: Racial Profiling?
Recall: Empirical Rule
When a distribution is bell-shaped, close to
100% of the observations fall within 3
standard deviations of the mean.
u - 3 111 - 3(8) 87
3 111 3(8) 135
61
Learning Objective 7:
Example: Racial Profiling?
If there is no racial profiling, we would not be
surprised if between about 87 and 135 of the 262
drivers stopped were African-American.
The actual number stopped (207) is well above these
values.
The number of African-Americans stopped is too high,
even taking into account random variation.
Limitation of the analysis:
Different people do different amounts of
driving, so we don’t really know that 42.2%
of the potential stops were AfricanAmerican.
62
Learning Objective 8:
Approximating the Binomial Distribution with
the Normal Distribution
The binomial distribution can be well
approximated by the normal distribution
when the expected number of successes,
np, and the expected number of failures,
n(1-p) are both at least 15.
63