Transcript biostat 4

Random Variables…
• A random variable is a symbol that represents the
outcome of an experiment.
• Alternatively, the value of a random variable can be
either numerical [one we will concentrate on] or
categorical data
• “the number of heads when flipping a coin 10 times”
• “the time it takes a doctor to complete an operation”
• “the number of infections last week at a hospital”
1
Two Types of Random Variables…
•
•
•
•
•
•
•
•
•
•
Discrete Random Variable
– one that takes on a countable number of values
– E.g. values on the roll of dice: 2, 3, 4, …, 12
[normally “count” type data]
Continuous Random Variable
– one whose values are not discrete, not countable
– E.g. time (30.1 minutes? 30.10000001 minutes?)
[normally measurement type data]
Analogy:
Integers are Discrete, while Real Numbers are
Continuous
2
Probability Distributions…
• A probability distribution is a table, formula, or graph
that describes the values of a random variable and the
probability associated with these values.
• Since we’re describing a random variable (which can
be discrete or continuous) we have two types of
probability distributions:
•
– Discrete Probability Distribution, and
•
– Continuous Probability Distribution
3
Probability Notation…
• An upper-case letter will represent the name of the
random variable, usually X.
• Its lower-case counterpart will represent the value of the
random variable.
• The probability that the random variable X will equal x is:
•
P(X = 6) = 0.57
• or more simply
•
P(6) = 0.57
4
Discrete Probability
Distributions…
• The probabilities of the values of a discrete random
variable may be derived by means of probability tools
such as tree diagrams or by applying one of the
definitions of probability, so long as these two conditions
apply:
X = # Classes Missed
X
P(X)
0
0.40
1
0.30
2
0.20
3
0.05
4
0.05
Cumulative
0.40
0.70
0.90
0.95
1.00
5
Population Mean (Expected Value)
• The population mean is the weighted average of all of its values.
The weights are the probabilities.
• This parameter is also called the expected value of X and is
represented by E(X).
X
0
1
2
3
4
P(X)
0.40
0.30
0.20
0.05
0.05
μ=
X*P(X)
0.00
0.30
0.40
0.15
0.20
1.05
6
Population Variance…
• The population variance is calculated similarly. It
is the weighted average of the squared
deviations from the mean.
X
0
1
2
3
4
P(X)
0.40
0.30
0.20
0.05
0.05
μ=
X*P(X)
0.00
0.30
0.40
0.15
0.20
1.05
(X - μ)2
1.1025
0.0025
0.9025
3.8025
8.7025
σ2 =
σ=
(X - μ)2 * P(X)
0.4410
0.0008
0.1805
0.1901
0.4351
1.2475
1.1169
7
Laws of Expected Value…NOT IN TEXT
1. E(c) = c
• The expected value of a constant (c) is just the value
of the constant.
2. E(X + c) = E(X) + c
3. E(cX) = cE(X)
4. E(c1X1 + c2X2 + c3X3 + c4X4 + c5X5)
5.
= c1E(X1) + c2E(X2) + c3E(X3) + c4E(X4) + c5E(X5)
6. Example: what is the expected mean weight of a
surgical pack containing 5 components [maybe we
could weigh the pack to determine if one of the
components is missing].
7. True when random variables are independent!!!
8
Laws of Variance…NOT IN TEXT
1. V(c) = 0
• The variance of a constant (c) is zero.
2. V(X + c) = V(X)
• The variance of a random variable and a constant is
just the variance of the random variable (per 1 above).
3. V(cX) = c2V(X)
• The variance of a random variable and a constant
coefficient is the coefficient squared times the variance
of the random variable.
4. V(c1X1 + c2X2 + c3X3 + c4X4 + c5X5)
= c12V(X1) + c22V(X2) + c32V(X3) + c42V(X4) + c52V(X5)
9
Example: Convert Celsius to Fahrenheit
• Patient temperature (celsius) data collected over
the last 5 years resulted in
• μc = 100 and σc2 = 0.4
• Your boss wants these numbers in fahrenheit
• F = (9/5)* C + 32
• μF = (9/5) * μc + 32 = {(9/5) * 100} + 32 = 180+ 32 = 212
• σF2 = (9/5)2 * σc2 = (9/5)2 * (.4) = 0.72
• σF = SQRT(.72) = 0.849
10
HOMEWORK
Exercises:
4.2.1
4.2.3
Extra Problem: A study of the weights of 12 year old
children resulted in a mean weight of 112 lbs. and
variance of 16 lbs.2 . After the study was finished,
someone noticed that the scales were not zeroed out
and all the data showed a child’s weight 2 lbs. heavier
than they actually weighed [100 lbs. should have been
98 lbs.]. What is the mean and variance of the actual
weights of the children?
11
HW Problem: Function of Random Variables
• Administrators at a local hospital decided it would be
more efficient to assign a given task to one nurse rather
than have this task performed by several nurses.
Studies have shown that the average time to complete
this task is 15 minutes with a standard deviation of 2
minutes.
• The head nurse calculated the number of tasks a nurse
could expect to complete in a 8 hour shift to be 32. [15
min/task = 4 tasks/hour = 32 tasks/8Hr Shift]. Since
nurses have no break time, the head nurse assigned
Nurse Wilson 30 of these tasks for tomorrow.
• *What is the expected time to complete all 30 tasks?
• *What is the std. dev. of the time to complete all 30
tasks? Note: will have to calculate variance first.
• *Do you think Nurse Wilson be able to complete all 30
12
tasks in her 8 hr shift?
Flip a coin 5 times!
• Your Professor told you that if he flipped a
coin 5 times and observed any tails, the
class could go home early. However, if he
observed all 5 heads, the class had to stay
15 minutes after class.
• Describe this experiment [tell me
everything you know about this game]
13
Binomial Distribution…
•
•
1.
2.
3.
4.
The binomial distribution is the probability distribution
that results from doing a “binomial experiment”.
Binomial experiments have the following properties:
“A NATURAL DISTRIBUTION”
Fixed number of trials, represented as n.
Each trial has two possible outcomes, a “success” and
a “failure”.
P(success)=p (and thus: P(failure)=1–p), for all trials.
The trials are independent, which means that the
outcome of one trial does not affect the outcomes of
any other trials.
14
Binomial Random Variable…
•
•
•
Experiment: flip a fair coin 5 times…
Random Variable: X = # heads
Want to know: P(# heads = 5) = P(5) = ???
– 1) Fixed number of trials  n= 5
– 2) Each trial has two possible outcomes  {heads (success), tails
(failure)}
– 3) P(head)= 0.50; P(tail)=1–0.50 = 0.50 
– 4) The trials are independent  (i.e. the outcome of heads on the first
flip will have no impact on subsequent coin flips).
•
Hence flipping a coin 5 times is a binomial experiment since all conditions
were met.
15
Binomial Random Variable…
• The binomial random variable counts the number of successes in n
trials of the binomial experiment. It can take on values from 0, 1, 2,
…, n. Thus, its a discrete random variable.
• In the old days we had to use the binomial formula (or binomial
tables) below, but now we can calculate using excel statistical
functions.
• The Binomial Distribution [formula]:
for x=0, 1, 2, …, n
• Coin flip problem: X = # heads, n = 5, p = 0.5, (1 – p) = 0.5
• P(You stay late) = P(X = 5) = ?
16
Binomial Tables
• Tables in Text
TABLE B Cumulative Binomial Probability Distribution
x\p
0
1
2
3
4
5
…
…
…
…
…
…
…
n=5
0.5
0.0312
0.1875
0.5000
0.8125
0.9688
1.0000
…
…
…
…
…
…
• How do you get individual probabilities?
Individual Probabilities:
x\p
0
1
2
3
4
5
…
…
…
…
…
…
…
n=5
0.5
0.0312
0.1563
0.3125
0.3125
0.1563
0.0312
…
…
…
…
…
…
17
=BINOMDIST() Excel Function…
•
•
There is a binomial distribution function in Excel that can also be used to calculate these
probabilities. For example:
What is the probability that you get two answers correct guessing on a 10 multiple choice test
with 5 options for each question?
# successes
# trials
P(success)
cumulative
(i.e. P(X≤x)?)
P(X=2)=.3020
18
Binomial Distribution…
• Statisticians have developed general formulas for the mean,
variance, and standard deviation of a binomial random variable.
They are:
•
•
•
•
•
These are the true parameter values for this random variable.
For our coin flipping example:
μ = n*p = 5*(0.5) = 2.5
σ2 = n*p*(1-p) = 5*(0.5)*(0.5) = 1.25
σ = SQRT[ n*p*(1-p)] = SQRT[5*(0.5)*(0.5)] = SQRT[1.25] = 1.118
19
WORK IN CLASS: Binomial
•
Given a binomial random variable with n = 15 and p = .25, find the
following probabilities
–P(X = 5) =
–P( X < 5) =
–P(3 < X < 5) =
20
HW: Binomial
• Exercise: 4.3.1(also calculate the variance
and standard deviation for exercise 4.3.1),
4.3.2, Review Question 15
21
Poisson Distribution…
•
Named for Simeon Poisson, the Poisson distribution is a discrete
probability distribution and refers to the number of events (a.k.a. “things that
occur”) within a specific time period or region of space [“a sample unit”]. For
example:
• The number of cars arriving at a service station in 1 hour. (The interval
of time is 1 hour.)
• The number of flaws in a one square foot of cloth. (The specific region
is one square foot of cloth.)
• The number of accidents in 1 day on a particular stretch of highway.
(The interval is defined by both time, 1 day, and space, the particular
stretch of highway.)
• The number of infections at a hospital in one week.
• The number of critters in a bottle of coke.
NOTE: these random variables MAY of MAY NOT be Poisson random
variables. We have ways to test the data to see if Poisson would be an
appropriate distribution to use for that example.
22
The Poisson Experiment…
•
Like a binomial experiment, a Poisson experiment
has four defining characteristic properties:
1. The number of successes that occur in any interval
[sample unit] is independent of the number of
successes that occur in any other interval [sample unit].
2. The probability of a success in an interval is the same
for all equal-size intervals
3. The probability of a success is proportional to the size
of the interval.
4. The probability of more than one success in an interval
approaches 0 as the interval becomes smaller.
23
Poisson Probability Distribution…
•
The probability that a Poisson random variable assumes a value of x is
given by:
• and e is the natural logarithm base.
• This text will use λ instead of μ
•
FYI: μ
= σ2 = λ
• or
24
Use Poisson to Approximate Binomial Probabilities
• We often work problems which are “actually” binomial
but we use the poisson.
• Condition: If the binomial sample size is very large and
the probability of success is very small, it is easier to use
the poisson [normally n*p < 5].
• Example: The probability of an infection in a hospital is
known to be 0.0012. You sample the last 1000 patients
that stayed in your hospital and wish to calculate the
probability that fewer than 6 will have an infection [mean
of binomial: μ = n*p = 1000*(.0012) = 1.2]
• Try to work this problem using the binomial!
25
Using Excel to Calculate Poisson Probabilities
• Example: x = 5, λ = 6, P(X = 5) = .4457
26
Cumulative Poisson Distribution Tables
• You can find poisson table in the back of
your text for select values of λ.
• Verify Excel answer using Poisson Tables
TABLE C Cumulative Poisson Probability Distribution
x\λ
…
6
0
1
2
3
4
5
6
…
infinity
…
…
…
…
…
…
…
…
…
0.002
0.017
0.062
0.151
0.285
0.446
0.606
…
1.000
…
…
…
…
…
…
…
…
…
27
Back to the Poisson Approximation to the Binomial.
• Example: The probability of an infection in a hospital is
known to be 0.0012. You sample the last 1000 patients
that stayed in your hospital and wish to calculate the
probability that fewer than 6 will have an infection [mean
of binomial: μ = n*p = 1000*(.0012) = 1.2]
• Use Poisson Tables to approximate this probability (λ = 1.2)
• Answer: P(X < 6) = P(X < 5) = ???
• If you actually had 6 infections for the last 1000 patients who
stayed in your hospital, what would you tell the chief hospital
administrator?
• If you only sampled 500 patients, what would λ now be equal
to?
28
Students Work in Class: Poisson
• The number of infections [X] in a hospital each week has
been shown to follow a poisson distribution with mean
3.0 infections per week. Calculate the following
probabilities.
• P(X = 0) =
• P(X < 8) =
• P(X > 9) =
• If you found 9 infections next week, what would you
say??
29
Homework for Students: Poisson
Extra Problem:
•
With infections running wild in many hospitals, the chief administrator
•
of Local Hospital decided to find out how Local Hospital stacks up
•
against the national norm [national norm states that the average
•
number of bacteria per square yard of surface area should be no more
•
than 9 bacteria/square yard]. The number of bacteria per square yard
•
is assumed to be a poisson random variable.
–*If you go into the hospital, randomly sample one square yard of surface area, and
count the number of bacteria found, calculate the probability of finding 19 or fewer
bacteria.
–*If you actually found 15 bacteria, what would you conclude about the state of the
hospital?
–*In order to continuous monitor the state of the hospital, it was decided to randomly
sample one square foot of surface area each day to insure that the hospital is being
cleaned properly [takes too much time to sample 1 square yard]. If you do this,
what would the mean of the poisson be in this case?
TEXTBOOK EXERCISES: 4.4.3 and 4.4.5, Review Question 17
30
Continuous Probability Distributions
•
A function f(x) is called a continuous probability
distribution (over the range a ≤ x ≤ b if it meets the
following requirements:
1) f(x) ≥ 0 for all x between a and b, and
f(x)
area=1
a
b
x
2) The total area under the curve between a and b is 1.0
31
The Normal Distribution…
• The normal distribution is the most important of all
probability distributions. The probability density function
of a normal random variable is given by:
•
•
•
•
It looks like this:
Bell shaped,
Symmetrical around the mean μ
Two Parameters: μ = mean and σ = std. dev.
32
Some Facts About The Normal Distribution…
•
•
•
•
The area (probability) within + 1σ is ~ .68 (68%)
The area (probability) within + 2σ is ~ .95 (95%)
The area (probability) within + 1σ is ~ .997 (99.7%)
The area (probability) to the right or left of the mean is
exactly .5 (50%)
• This fact allows us to use one set of Normal Tables to
calculate all normal probabilities, provided we know how
many standard deviations a given value of the random
variable is away from the mean. This is called a Z-Score
33
The Standard Normal Distribution
• This Z-Score is also a random variable and is
called “the standard normal distribution” whose
mean (μ) is equal to 0 and standard deviation (σ) is
equal to 1. From this standard normal distribution we
can calculate any normal probability where the mean
and std. dev. is something other than 0 and 1.
34
Calculating Normal Probabilities…
• P(45 < X < 60) ?
…mean of 50 minutes and a
standard deviation of 10 minutes…
0
35
Calculating Normal Probabilities…
• Standard Normal Distribution (Table D)
• P(Z < 1.3) = 0.9032
• Use “Table D: Normal Curve Areas” in text
• NOTE: True for any normal distribution when ZScore is 1.3
36
Calculating Normal Probabilities…
• Work following EXERCISES in class:
• 4.6.1, 4.6.2, 4.6.11, 4.6.13, 4.7.1
• HW for students to work:
4.6.3, 4.6.5, 4.6.7, 4.6.9, 4.7.3, 4.7.5,
REVIEW QUESTION 23
37