Chapter 7--prob dist

Download Report

Transcript Chapter 7--prob dist

Random Variables and Probability Distribution
How can a sampling be used to determine critical
information relating to the population when
considering all subgroups of samples?
Random Variables
7-1
What type of random variables exist?
• Random variable is a numeric variable whose value
depends on the outcome of a chance experiment
• Discrete—means isolated points on a number line
(often integer values)
• Continuous—all values (real numbers)
Examples:
 Determine if each of the following are descrete or continuous:
 The number of defective tires on a car
 Someone’s body temperature
 The number of pages in a book
 The lifetime of a lightbulb
• Probability Distributions for Discrete Random
Variables 7-2
• What is the probability distribution for discrete random
variables?
• Probability distribution—gives the probability associated with each value
of x
• Where x represents each possible result of the experiment
• Example
• What is the sample space for flipping a coin 4 times? The probability
distribution is based on the number of either heads or tails in this case we
will count the number of tails.
Note the pattern used to help get all the possible outcomes: 2^4=16
1st column 8 H’s then 8 T’s
2nd column 4 H’s then 4 T’s then repeat
Then groups of 2
TAIL COUNT
Prob of event
Then 1
H H H H
0
.5·.5·.5·.5=.0625
H H H T
1
H H T H
1
H H T T
2
H T H H
1
H T H T
2
H T T H
2
H T T T
3
T H H H
1
T H H T
2
T H T H
2
T H T T
3
T T H H
2
T T H T
3
T T T H
3
T T T T
4
# Of Tails
# of
Outcomes
0
1
1
4
2
6
3
4
4
1
# of tails # of occurrences * probability
0
1
.0625
1
4
.0625
2
6
.0625
3
4∙
.0625
4
1∙
.0625
= overall probability
= .0625
= .25
= .375
=.25
=.0625
1
∑ p(x) =1 for all distributions.
What is the probability of getting exactly one tail?
What is the probability of getting at most two tails?
What is the probability of getting at least two tails?
What is the probability of getting more than two tails?
What is the probability of getting at least one head?
 Of all the airline flight requests
received by a certain discount
ticket broker, 70% are for
domestic flights (D). The rest
are for international flights (I).
Determine the probability
distribution of domestic flights
among the next three flights.
1
2
3
Dom
flights
D D D
3
D D
2
I
D
I
D
2
D
I
I
1
I
D D
2
I
D
I
1
 Pg 356 1, 5, 7
I
I
D
1
 Pg 361 9, 11, 12, 15, 17, 19
I
I
I
0
x
#
Prob per
occurence
P(X)
0
1
.027
1
3
.189
2
3
.441
3
1
total
.343
.343
1
Probability Distributions for Continuous Random Variables
7.3
What is the probability distribution for continuous random variables?
The Mean Value and Standard Deviation of a Random Variable
7.4
What is the mean and the standard deviation of a random variable?
Probability for a continuous random variable (called the density
Function) denoted by a smooth curve called the density curve.
If f(x) is the function and f(x)≥0 therefore the curve does not go
below the x-axis and the total area under the density curve is equal to 1.
Uniform distribution when the density is constant over an interval (a
“flat” density curve)
Area = b * h
Probability for a continuous random variable is often calculated using cumulative area
P(a< x <b) = (cumulative area left of b) – (cumulative area left of a)
Example:
A particular professor never dismisses class early. Let x denote the amount of time past
the hour (minutes) that elapses before the professor dismisses class. Suppose that x has
a uniform distribution on the interval from 0 to 10 as in the density “curve” below.
0
10
A) What is the probability that at most 5 min. elapsed?
B) What is the probability that between 3 and 5 min elapsed before dismissal?
C) What is the probability that between 3 and 5 min inclusive elapsed before dismissal?
0
1
2
3
4
5
6
7
Shade the probability of:
1) that between 2 and 6 min elapsed before dismissal?
2) that between 3 and 5 min inclusive elapsed before dismissal?
Pg 366-67 20, 22, 23, 24, 26
Mean value of a random Variable—describes where the probability distribution
is centered (sometimes called the expected value)
µx =
 xp( x)
µx= E(x)
x
Examples:
Individuals applying for a certain license are allowed up to four
attempts to pass the licensing exam. Let x denote the number of attempts
made by a randomly selected applicant. The probability distribution is as
follows.
X
1
2
3
4
P(x)
.1
.2
.3
.4
Calculate the expected value (average number of times it takes)
E(x) = µ = 1·.1 + 2·.2 + 3·.3 + 4· .4 = 3
What does this mean?
Standard deviation of a random variable describes the variability in a probability
distribution.

 ( x   ) 2 p ( x)
x
x

The more values you have the closer that x becomes to µx and σ to σx
example:
Use the data and the answer from the last example to calculate the standard deviation.
x

(1  3) 2  .1  ( 2  3) 2  .2  (3  3) 2  .3  ( 4  3) 2  .4
1
x
1
2
3
4
P(x)
.1
.2
.3
.4
Since µ =3 and
x =1
What is the probability that x exceeds the mean value?
Pg 378-379 27, 29, 30, 33, 34, 35, 37
The Binomial and Geometric distributions 7.5
What is the difference between a binomial and a
geometric distribution?
Binomial Distribution—
•Has a predetermined number of dichotomus (2 possible results)
independent outcomes that have the same probability of success for each
trial (that is, you “have replacement” or a large enough basis)
•The label of success is arbitrary and based on the question
•. . .in counting the number of female births n= the number of trials
P(s) = a female is born
x= the number of successes
n!
p ( x) 
p x (1  p) n  x
x!(n  x)!
You may have seen this as
p = the prob of success on
any one trial
C x p (1  p )
x
or
n x
n x
  p (1  p )
 x
n
n x
Ex. 60% of watches sold by a large discount store have digital faces
the remaining 40% are analog. Twelve watches are sold today.
A) what is the probability that exactly 4 are digital?
N =12 x = 4 p= .6
P(4) = 12C4 .64 .48
B) what is the probability that between 4 and 7 inclusive were
digital?
Calculator version:
2nd
Vars (dist)
binomcdf (returns Prob ≤x)
binompdf (returns Prob for x)
Trials: (how many trials)
p = (prob success)
x value: (# of successful outcomes)
Enter
enter
NOTE: there is a table on p 776 that assists by giving pre-calculated
values based on the value of n and π
•When sampling without replacement, the events are no longer independent
n
•However, if N ≤ .05
where n = number sampled and N = number in the population
Then the differences are so small that they can be and are often ignored since the calculations
would be difficult to make
µx = np
and
σ=
np(1  p )
Example:
It has been reported that one-third of all credit card users pay their bills in full each month. This
figure is, of course, an average across different cards and issuers. Suppose that 30% of all individuals
holding Visa cards issued by a certain bank pay in full each month. A random sample of n= 25 cardholders is
to be selected. The bank is interested in the variable x = number in the sample that pay each month. Even
though the sampling in done without replacement, the sample size of 25 is most likely very small compared
to the total number of credit cardholders so we can approximate the probability distribution of x using a
binomial distribution of n = 25 and p= .3 We have defined paid in full as a success. Find the mean and
standard deviation.
Geometric distributions do not have a specific number of
trials instead, they terminate when we have a successful
outcome.
Calculator version:
2nd
Vars (dist)
x = number of trials to achieve a success
p = probability of success
geometcdf (returns Prob ≤x)
geometpdf (returns Prob for x)
P(x) = (1 – p)x-1p
p = (prob success)
x value: (# of trials until a success)
Enter
enter
Example:
You will ask someone if they have jumper cables until someone
says yes. If they have them they will let you use them. Assume
that p = .4 since 40% of those driving have jumper cables in
the car. Let x = the number of students who must be asked
before you find someone who has jumper cables. What is the
probability that it will be at most 3.
Homework pg 390-392 44, 48, 49, 52, 54, 56, 58, 60, 61
The Normal Distribution
7.6
What is the normal distribution?
Characteristics:
 The smaller the σ the taller and narrower the distribution
 The points of concavity lie ±1σ from the mean
 That is µ±1σ = pt of inflection
A standard normal curve has a µ= 0 and σ=1
this is normally called the z-curve (relates to z scores)
 The words largest or smallest x% refers to amounts to one side of the curve
 The most extreme x% refers to the x/2% at each end.
Any data can be converted to a normal curve using the z- score formula
P(a < x < b) = P(a* < Z < b*)
a and b are the given values a* and b* are the converted values
(called standardizing)
Z-score formula
z
x

Z values are inside the back cover or on your formula sheet
Please note that the value given in the chart is the percentage of
the data that falls BELOW the z value
Give that µ=10 and σ=2
1. What % of the data would fall below 8?
2. What % of the data would fall above 11?
3. What % of the data would fall between 8 and 11?
4. What value separates the bottom 86%?
1. What % falls to the left of a z score of 2.18
2. Determine the z* value that separates the largest 4% of the population
Means from .96 to 1
3. Determine the z* values that separate the most extreme 4% of the population
Means from 0 to .02 and from .98 to 1
Homework Pg 406-409 64, 66, 68, 70, 72, 74, 75
Checking for Normality and Normalizing
Transformations 7-7
How do you check for and normalize
transformations?
Used when you have univariate data that you want to normalize.
1) The normal probability plot is
(z-score, observed score)
2) You know the data is normalized if
-after placing observed data in L2, your run 1-var stat on L2 and
then calculating the z-score in L1=(L2- x )/σ
-so you can plot L1 vs L2 and if you get a linear pattern
-turn on diagnostics and calculate the regression line
-r turns out to be greater than or equal to critical r from the
table below based on “n” (the number of items in the data set)
n values
5
10
15
Critical r
values
.832 .880 .911
20
25
30
40
50
60
75
.929
.941
.949
.960
.966
.971
.976
r< critical r causes plausible doubt on normality
3) Many tests do not go to the trouble of the above, they simply plot
a histogram and determine if it appears approximately normal if
so, no transformation is needed . If not ,transform L2 by taking
the square root, cube root or the log etc. and storing in L3
-plot the histogram of L3 to determine if it is approximately
normal using a window setting of .25 to 3.5 with x-scale .25
-if it is normal, do one variable stat on L3 get µx and σx
-let L4 =(L3 - x )/σ
-plot (L4, L2)
-get the linreg with r and compare
Together pg 374 #80
The following observations are DDT concentrations in the blood of 20
people.
24 26 30 35 35 38 39 40 40 41
42 52 56 58 61 75 79 88 102 42
Assuming no transformation is needed. Calculate the normal scores,
find your r value and check it against critical r.
Homework pg 415-418 84, show the work!
REVIEW
Chapter 7 review
What formulas are needed for chapter 7?
 x   xp( x)
x
x 
2
(
x


)
p ( x)

x
y  a  bx
 y  a  b x
( y ) 2  b 2 x
 y  b x
2
Discrete random variablesset up your chart and
calculate p(X)
Continuous random variablesmany convert using z scores
to determine amounts in a given range
Binomial distribution (only 2 outcomes –set number of trials)
x
n x
C
p
(
1

p
)
n x
µx = nπ and σ=
np(1  p )
Geometric distributions (experiment goes on until there is a success –outcomes
success or failure)
p(x) =(1-p)x-1 p
Hypergeometric distribution(no replacement)
if
Z-scores
n
 .05
N
z
ie <5% sampled, µx, σx are the same as above
x

normal plot (z-score, obs value)
Review Problems
 Pg 426-429

106, 107, 108, 110, 114, 119, 121, 124

SHOW ALL WORK –just answers is not acceptable
 TEST
 15 mult. Choice
 4 computational
 Total about 60 points