Transcript 02/04/2008

Random Variables &
Probability Distributions
Chapter 6 of the textbook
Pages 167-208
Lecture Overview
Schedule
Clarification from Friday
Discrete Random Variables
Schedule
Today:
–
–
Discrete random variables
Homework #4 will be posted this afternoon
Wednesday:
–
–
Continuous random variables & bivariate random variables
Homework #3 due
Friday:
–
Homework 4 help & Excel show and tell
Next Monday:
–
–
–
Any remaining chapter 6 slides
Exam #1 Review
Homework #4 due
Next Wednesday:
–
–
Exam #1 (you’re allowed 1 sheet of paper (front & back) for notes & equations)
Test questions will be very reminiscent of homework problems
Next Friday:
–
–
Go over exam #1 questions
Intro to S-Plus
Clarification From Friday
On HW3, question #15
P(A) = .3, P(B) = .5, P(B|A) = .4
What is P(A|B)
What is P(A∩B)
Using the multiplication theorem
P(B∩A) = P(B|A)*P(A) = 0.4 * 0.3 = .12
Using the definition of conditional probability
P(A|B) = P(A∩B) / P(B) = .12 / .5 = .24
Clarification From Friday
Why doesn’t P(A ∩ B) = P(A) * P(B)?
Answer: because (A) and (B) aren’t statistically independent
Recall that statistical independence is defined as:
– P(A|B) = P(A)
– OR
– P(B|A) = P(B)
This is not true for this problem
If A and B are statistically independent, the multiplication theorem
becomes: P(A ∩ B) = P(A) * P(B) since we can just replace P(A|B)
with P(A)
Definitions
Random Sample (from Ch. 1)
Variable (from Ch. 1)
Random Variable
– “any numerically valued function that is defined over a sample
space”
– For the household example in the book - “the variable is random
not because the household makes a random decision to include a
certain number of people, but because our sample experiment
selects a household randomly”
Example
Imagine randomly sampling students in the
union and asking them how many books
they are carrying
Example Data
Elementary Outcomes
–
–
–
–
–
–
–
–
–
–
Student 1 : 3 books
Student 2 : 2 books
Student 3 : 0 books
Student 4 : 1 books
Student 5 : 2 books
Student 6 : 1 books
Student 7 : 0 books
Student 8 : 1 books
Student 9 : 4 books
Student 10 :1 books
Sample Space : {0,1,2,3,4}
Random Variable (X)
(Function: X = # of books)
–
–
–
–
X(Student 1) = 3
X(Student 2) = 2
X(3) = 0
Etc.
Probability of Random Variables
– So X can be any value from the
full set of possible #s of books
– x can = any number in the
sample space (0,1,2,3,4)
– P(x) is the probability of getting
an x in a random sample
– Example: P(0 books) = 2/10 = .2
– Example: P(3) = 1/10 = .1
Clarification: X and x
X is the random variable
– Can be any of the possible values and their associated probabilities
– In other words, this can equal any element in a sample space, each
with a probability of occurring
x is one possible outcome of the random variable (i.e., an
event)
– For example, x can be 0 books, 1 book, 2 books, etc.
Why does this matter?
– When we figure out probabilities we are usually concerned with
P(xi) since P(X) = 1
– When we figure out expected values (E) or variances (V) we are
concerned with X because we want to know the expected values
with respect to all possibilities
Definition
Probability Distribution or Function
– “a table, graph, or mathematical function that
describes the potential values of a random
variable X and their corresponding
probabilities”
Example Continued
Probability Distribution or Function
Graph Form
Probability : P(x)
xi
0
1
2
3
4
Table Form
P(xi)
2/10 = 0.2
4/10 = 0.4
2/10 = 0.2
1/10 = 0.1
1/10 = 0.1
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
Number of Books
Question: What do these remind you of from past chapters?
3
4
Key Concept
Discrete Random Variables
– “The set of possible values (i.e., the sample space) is finite or
countably infinite.”
Continuous Random Variables
– The set of possible values can be any real number in the range of
possible values (i.e., infinite possible values)
Questions:
– What type of random variable is the student / book example?
– Can you come up with examples of each?
Probability Mass Function
Specifies the probability distribution for discrete variables
The tables and graphs are examples of the probability mass function
(i.e., the probability is “massed” at the discrete possible values)
The probability mass function preserves the provision:
k
P ( X )   P ( xi )  1
i 1
k = the different values of the discrete variable
i = 1, 2, …. k
This is identical to the rule from last chapter
Example Continued
k
 P( x )  1
i 1
i
k
 P( x )  P(0)  P(1)  P(2)  P(3)  P(4)  1
i 1
i
k
 P( x )  0.2  0.4  0.2  0.1  0.1  1
i 1
i
Expected Values for Discrete
Random Variables
“E” is the term for expected values
E(X) = the expected value of a discrete random variable
To calculate E we need the probability distribution (e.g.,
the probability distribution table)
k
E ( X )   P( xi ) * ( xi )
i 1
Example Continued
For our students & books example
k
E ( X )   P( xi ) * ( xi )
i 1
E ( X )  P( x0 ) * ( x0 )  P( x1 ) * ( x1 ).....
E ( X )  0.2 * 0  0.4 *1  0.2 * 2  0.1* 3  0.1* 4
E ( X )  0  0.4  0.4  0.3  0.4  1.5
So if we randomly selected a student we would expect
them to have 1.5 books with them
Question: does this remind you of any other statistic?
Variance Values for Discrete
Random Variables
“V” is the term for the variance of a probability distribution
V(X) = the variance value of a discrete random variable
To calculate V we need E and the probability distribution (e.g., the
probability distribution table)
k
V ( X )  [ P( xi ) * ( xi ) 2 ]  [ E ( X ) 2 ]
i 1
Note: I wrote the equation a little differently than the book to make it
clear that you do the sum first and then subtract the E(X)2
Example Continued
For out students & books example
k
E ( X )  [ P( xi ) * ( xi ) 2 ]  [ E ( X ) 2 ]
i 1
E ( X )  [ P( x0 ) * x02  P( x1 ) * x12 .....]  [ E ( X ) 2 ]
E ( X )  [0 * 0.2 2  1* 0.4 2  2 * 0.2 2  3 * 0.12  4 * 0.12 ]  [1.52 ]
E ( X )  [3.7]  [2.25]  1.45
As with univariate statistics, the standard
deviation is the square root of the variance
Discrete Probability Models
If we have a census, the probability
distribution (table, graph, etc.) is
complete/finished/appropriate/accurate
If we have a sample, we try to match our
sample probability distribution to a known
probability distribution
Discrete Probability Models
Common probability models for discrete random
variables include
– Uniform distribution
– Binomial distribution
– Poisson distribution
The benefit of using these models is that they have
known properties and corresponding equations
already made
Discrete Uniform Distribution
The probability of all possible random variables
is equal
This equates to a rectangular graph (i.e., flat on
top)
Discrete Uniform Distribution Equations
Probability:
1
P( x) 
k
k
Expected:
1
E( X )   x
k
x 1
Variance:
K 2 1
V (X ) 
12
Discrete Binomial Distribution
There are 2 and only 2 possible outcomes of
a statistical experiment (e.g., flipping a
coin)
Discrete Binomial Distribution Equations
Probability:
P( x)  C  (1   )
n
x
x
Expected:
E ( X )  n
Variance:
V ( X )  n (1   )
= the probability of one solution
n x
Example (from book)
Quiz with 10 multiple choice questions
Each question has 5 possible answers
P(guessing correctly) = 1/5 = 0.2
P(guessing incorrectly) = 4/5 = 0.8
What is the probability of guessing 5 correct
answers?
Answering this question with what
we learned last chapter
Imagine we only have 2 questions
Since the questions are independent, P(A∩B) =
P(A) * P(B)
This result is in the upper left box
Answering this question with what
we learned last chapter
Now imagine we add a third question
Since the questions are independent
P(A∩B∩C) = P(A) * P(B) *P(C) is the upper left
box
Answering this question with what
we learned last chapter
So 3 out of 3 right answers = P(RRR) = 0.2 * 0.2 * 0.2 = .008
Follow this out to 5 correct & 5 incorrect
– = 0.2*0.2*0.2*0.2*0.2*0.8*0.8*0.8*0.8*0.8 (i.e., 0.25 * 0.85)
– = 0.000104858 for one option of 5 R, 5W
– Another option would be 4 R, 5 W, 1 R (i.e., 1-4 & 10 correct)
How many combinations of 5 Right & 5 Wrong are there?
Use combinations rule: C(10,5) = 252
Answer = 252 * 0.000104858 = 0.026
Answer Question Using the New
Equation
P( x)  C  (1   )
n
x
x
n x
105
P( x)  C 0.2 (1  0.2)
10
5
5
 0.026
This answer can also be found in a
table in the back of the book on P. 605
Poisson Discrete Distribution
Poisson distributions are often used to determine
the probability of a number of events (x) occurring
in a fixed space or over a fixed period of time
But they can be used to determine probabilities for
other variables as well (the book used the example
of lengths of rope)
Poisson distributions are also called the
“distribution of rare events”
Poisson Discrete Distribution
Requirements for using a Poisson distribution
– Mutually exclusive events are independent
– The probability of an event occurring is small and
proportional to the size of the area (or to the length of
the interval)
– The probability of 2 or more events occurring in a small
area or interval is near zero
Rules 2 and 3 are where the phrase “distribution of
rare events”
Poisson Discrete Distribution
The parts of a Poisson distribution equation:
– λ - The average occurrence of an event in time or space
•
•
•
•
8 houses per block
1 hiccup per minute
1 lightening strike per square mile per decade
The “answers” to questions will be in the same units as λ (e.g., “per
minute”)
– e - base of the natural logarithm (e = 2.71828...)
– X – the Poisson random variable
• Just like the random variable for the other distributions
• This is the value for which we determine E and V
– x – the values from X for which we find probabilities etc.
• E.g., what is the probability of x if x = 2 hiccups per minute?
• x can be any positive number
Poisson Discrete Distribution
Probability:
Expected:
Variance:

e 
P( x) 
x!
x
e  x
E( X )   x

x!
x 0

 x
e

2
V ( X )  [ x  E ( X )] x

x!
x 0

Poisson Discrete Distribution
The Poisson discrete distribution is actually
a family of distributions
– The members of the family relate to one λ each
For example:
– 1 hiccup per minute uses one family
– 2 hiccups per minute uses another family
Poisson Discrete Distribution
1 Hiccup Per Minute (i.e., λ = 1)
What is the probability of hiccupping 4 times
per minute (i.e., x = 4)?
e   x
P( x) 
x!
2.71828114
P(4) 
4!
0.36788
P(4) 
16
P(4)  0.01533
This answer can also be found in a
table in the back of the book on P. 606
Continuous Random Variables
Review:
– Continuous Random Variables: The set of
possible values can be any real number in the
range of possible values (i.e., infinite possible
values)
For continuous random variables we use
probability density functions rather than
probability mass functions
Probability Density Functions
Specify the probability distribution for continuous
variables
Unlike probability mass functions we used with discrete
variables where all the P(xi) added up to 1, with
probability density function the area under the curve = 1
Also unlike probability mass functions we aren’t
concerned with P(xi) because each xi is a vertical line with
an area equal to zero
Instead we are concerned with probabilities such as P(x >
some amount A) or P(x between values B and C)
Probability Density Functions
The probability density function of a random continuous
variable X is denoted as f(X)
The “function” part (i.e., the “f”) relates to the equation
that produces a curve (i.e., it is used to graph the line)
Conditions satisfied by probability density functions
(assume the min and max values are a and b respectively):
– f(x) ≥ 0 for a ≤ x ≤ b
– The area under f(x) from x = a to x = b = 1
Because we are ultimately concerned with areas under
portions of the curve, what type of math do we need?
Continuous Probability Distribution Models
Probability Distribution Models
– As with discrete random variables we usually
have a sample rather than a census
– To calculate probabilities from a sample we
assume the data conform to some known
distribution for which we have handy tables
– This is how we avoid having to do calculus
Continuous Probability Models
Common probability models for continuous
random variables include
– Uniform distribution
• Rectangular distribution
– Normal distribution (a.k.a. Gaussian)
• The “bell shaped curve”
Uniform Continuous Distribution
Probability: P(c to d) = (d-c) / (b-a)
1
Distribution Function: f ( x) 
ba
Expected:
Variance:
given a ≤ c ≤ d ≤ b
for a ≤ x ≤ b
ba
E( X ) 
2
2
(b - a)
V (X ) 
12
Normal Probability Distribution
Probability: convert to z-scores first (explained in a few slides)
Distribution Function:
1
f ( x) 
e
 2
Expected:
E( X )  
Variance:
V (X )   2
Notes:
= 3.14……
1  x 
 

2  
2
Features of the Normal Probability
Distribution
The mean, median, and mode values are all
equal to the peak of the distribution
The distribution is symmetrical
½ of the curve is above the mean and viceversa
Z scores
To avoid having to calculate probabilities for curves with
varying μ, σ, and shapes we can convert any normally
distributed random variable to a standard form for which
we have tables
To do this we use the z transformation:
zx 
x

This conversion changes our variable measured in units x
(e.g., meters, miles, pounds) to units of z (i.e., standard
deviation units)
Example
If we have a normally distributed dataset of bowling
scores with μ = 150, σ = 10, what is the z-score of 175?
z175 
x

175  150 25


 2.5
10
10
– What does a z-score of 2.5 mean?
– Answer: the value of 175 is 2.5 standard deviations above the
mean for this particular normal probability distribution
Probability z-scores
Remember that one of the reasons we calculate zscores is to ask questions about probability
For example what is the probability of bowling
over 175 given our previous example?
To answer this question it is easiest to use a z-table
like the one on page 207 of your book
Using Standard Normal
Probabilities (i.e., a z-table)
The table in our book is atypical of what you usually see,
but more user friendly thanks to the pictures
For our bowling example, find the z-score (2.5) in the
column on the left
Now choose the column of interest, in this case column #3:
P(Z > z)
The probability value we get from the table is 0.006
– This means that the probability of bowling over 175, for our
fictional dataset, is 0.006 (i.e., 0.6%)
More Conventional Z tables
Normally we see z-tables with the following
characteristics:
– 2 digits of precision in the far left column
– 1 additional digit of precision in each of the 10 other
columns to the right
– A value indicating a one-directional probability (i.e., the
total probability of values less than a z-score)
• This is equivalent to the 5th column in the z-table in our book
Bivariate Random Variables
Now we turn our attention to the relationship
between two variables (hence the name
“bivariate”)
The random variables can be discrete or
continuous
Most of the following slide & equations should
look very familiar to those from chapter 5
Bivariate Probability Functions
Conditions:
– 0 ≤ P(x,y) ≤ 1
–
 P( x, y)  1
x
y
For 2 discrete random variables (x & y) it is useful to set
up a contingency table
These contingency tables are just like those for 2 events
and they may contain actual counts or probabilities
Example (from book)
100 households sampled and asked how many people are in the
household (x) and how many cars are owned by members of the
household (y)
Our book did you the disservice of switching the x and y axes
The data can be summarized in the following table:
Cars (y)
0
1
2
3
2
10
8
3
2
Household Size
3
7
10
6
3
(x)
4
4
5
12
6
5
1
2
6
15
Marginal Totals
Marginal Probabilities are the sums of the rows and columns
The marginal totals for household size (x) are in red
The marginal totals for cars (y) are in blue
The total number of households sampled is in green
Cars (y)
0
1
2
3
2
10
8
3
2
23
Household Size
3
7
10
6
3
26
(x)
4
4
5
12
6
27
5
1
2
6
15
24
22
25
27
26
100
Marginal Probabilities
All probabilities are just the totals from each box (see last slide) divided by
the total number of households (100)
The marginal probabilities for household size (x) are in red
The marginal probabilities for cars (y) are in blue
The sum of each set of marginal probabilities green
Cars (y)
0
1
2
3
2
.1
.08
.03
.02
.23
Household Size
3
.07
.1
.06
.03
.26
(x)
4
.04
.05
.12
.06
.27
5
.01
.02
.06
.15
.24
.22
.25
.27
.26
1.0
Conditional Probabilities
Equation:
P( x, y )
P( x | y ) 
P( y )
For Example: what is the probability of having a
household size of 4 (x=4) given the household has
3 cars (y=3)?
0.06
P(4 | 3) 
 0.231
0.26
Covariance
“Covariance is a direct statistical measure of the degree to
which two random variables X and Y tend to vary
together”
Covariance is positive when X and Y increase together
(and therefore decrease together)
– Ex. the amount of ice cream you eat and the temperature outside
Covariance is negative when X and Y are inversely related
– Ex. the number of layers of clothes you tend to wear and the
temperature outside
When there is no pattern the covariance is close to zero
Covariance
Covariance Equation:
C ( X , Y )   P( x, y )[ x  E ( X )][ y  E (Y )]
x
y
Note 1: There is another option for calculating covariance
in your book
Note 2: there are also nice tables showing how you would
go about calculating these values on page 202 & 203
Covariance
Covariance Equation:
C ( X , Y )   P( x, y )[ x  E ( X )][ y  E (Y )]
x
y
Parts of this equation:
– The P(x,y) values come from the covariance table
– The E(X) and E(Y) values are calculated using this
formula (from about 40 slides ago):
k
E ( X )   P( xi ) * ( xi )
i 1
Covariance Example (from book)
First we need to calculate the expected (E)
values:
k
E ( X )   P ( xi ) * ( xi )
i 1
We can use the marginal probabilities for this
E ( X )  2 * 0.23  3 * .26  4 * .27  5 * .24
For E(X) multiply the xi by the row totals
–
(i.e., orange # * red #)
E ( X )  .46  .78  1.08  1.2  3.52
For E(Y) multiply the yi by the column totals
–
(i.e., purple # * blue #)
k
E (Y )   P( yi ) * ( yi )
0
1
2
3
2
.1
.08
.03
.02
.23
3
.07
.1
.06
.03
.26
4
.04
.05
.12
.06
.27
5
.01
.02
.06
.15
.24
.22
.25
.27
.26
1.0
i 1
E (Y )  0 * 0.22  1* .25  2 * .27  3 * .26
E (Y )  0  .25  .54  .78  1.57
Covariance Example (book is wrong again!)
(x,y)
P(x,y)
X-E(X)
Y-E(Y)
[x-E(X)][y-E(Y)]
P(x,y) [x-E(X)][y-E(Y)]
2,0
.1
-1.52
-1.57
2.3864
0.23864
2,1
.08
-1.52
-0.57
0.8664
0.069312
2,2
.03
-1.52
0.43
-0.6536
-0.019608
2,3
.02
-1.52
1.43
-2.1736
-0.043472
3,0
.07
-0.52
-1.57
0.8164
0.057148
3,1
.1
-0.52
-0.57
0.2964
0.02964
3,2
.06
-0.52
0.43
-0.2236
-0.013416
3,3
.03
-0.52
1.43
-0.7436
-0.022308
4,0
.04
0.48
-1.57
-0.7536
-0.030144
4,1
.05
0.48
-0.57
-0.2736
-0.01368
4,2
.12
0.48
0.43
0.2064
0.024768
4,3
.06
0.48
1.43
0.6864
0.041184
5,0
.01
1.48
-1.57
-2.3236
-0.023236
5,1
.02
1.48
-0.57
-0.8436
-0.016872
5,2
.06
1.48
0.43
0.6364
0.038184
5,3
.15
1.48
1.43
2.1164
0.31746
Sum
0.6336
Independence
As with events (e.g., A and B) from last chapter, x
and y are independent if P(x,y) = P(x)P(y) for all
values of x and y
Independence and covariance are closely related,
but not the same
– Independent variable will have a covariance of 0
– But random variables with a covariance of 0 may not be
independent
Problems With Covariance
The sign (+ or -) of the calculated covariance is
meaningful, but not the magnitude
This is because the covariance is dependent on the
scale of the input data
Therefore if we multiplied x or y by 10 and
recalculated the covariance it will have changed
even though the relationship between x and y,
strictly speaking, is the same
Correlation Coefficient
The correlation coefficient is a standardized
statistic that measures the relationship between
random variables
Correlation coefficients range from -1 to 1
– 1 is a positive relationship (both ↑ or ↓ together)
– -1 is an inverse relationship (one ↑ while the other ↓)
– 0 suggests, but doesn’t guarantee independence
Unlike covariance the scale of the data does not
matter
Correlation Coefficient
In chapter 6 the book introduces this
statistics for with the assumption that the
population covariance (C) & standard
deviation (σ) are known or can be
calculated
The correlation coefficient for a sample is
discussed in chapter 12
Correlation Coefficient
Equation:
 xy 
C( X ,Y )
 x y