Chapter 4 Slides

Download Report

Transcript Chapter 4 Slides

Random Variables and Probability
Distributions
• Random Variables - Random outcomes corresponding
to subjects randomly selected from a population.
• Probability Distributions - A listing of the possible
outcomes and their probabilities (discrete r.v.s) or their
densities (continuous r.v.s)
• Normal Distribution - Bell-shaped continuous
distribution widely used in statistical inference
• Sampling Distributions - Distributions corresponding
to sample statistics (such as mean and proportion)
computed from random samples
Discrete Probability Distributions
• Discrete RV - Random variable that can
take on a finite (or countably infinite) set of
discontinuous possible outcomes (Y)
• Discrete Probability Distribution - Listing
of outcomes and their corresponding
probabilities (y , P(y))
0  P( y )  1

P
(
y
)

1
all y
Example - Supreme Court Vacancies
• Supreme Court Vacancies by Year 18371975
• Y  # Vacancies in Randomly selected year
# Vacancies (y)
0
1
2
3
>3
Total
Frequency (# of Years)
81
43
14
1
0
139
Proportion (P(y))
81/139=.5827
43/139=.3094
14/139=.1007
1/139=.0072
0/139=.0000
1.0000
Source: R.J. Morrison (1977), “FDR and the Supreme Court: An Example of the Use of Probability Theory in Political History”,
History and Theory, Vol. 16, pp 137-146
Parameters of a P.D.
• Mean (aka Expected Value) - Long run
average outcome
  E(Y )   yP( y)
Deviation - Measure of the “typical”
distance of an outcome from the mean
 Standard
  E (Y   ) 2 
2
(
y


)
P( y ) 

2
2
y
P
(
y
)



Example - Supreme Court Vacancies
y
P(y)
yP(y)
y2P(y)
0
.5827
.0000
.0000
1
.3094
.3094
.3094
2
.1007
.2014
.4028
3
.0072
.0216
.0648
Total
1.0000
.5324
.7770
   yP( y)  .5324

2
2
2
y
P
(
y
)



.
7770

(.
5324
)
 .4936  .7025

Normal Distribution
• Bell-shaped, symmetric family of distributions
• Classified by 2 parameters: Mean () and standard
deviation (). These represent location and spread
• Random variables that are approximately normal have
the following properties wrt individual measurements:
–
–
–
–
Approximately half (50%) fall above (and below) mean
Approximately 68% fall within 1 standard deviation of mean
Approximately 95% fall within 2 standard deviations of mean
Virtually all fall within 3 standard deviations of mean
• Notation when Y is normally distributed with mean 
and standard deviation  :
Y ~ N ( , )
Normal Distribution
P(Y   )  0.50 P(     Y     )  0.68 P(   2  Y    2 )  0.95
Example - Heights of U.S. Adults
• Female and Male adult heights are well approximated by
normal distributions: YF~N(63.7,2.5) YM~N(69.1,2.6)
20
20
18
16
14
12
10
10
8
6
4
Std. Dev = 2.48
Std. Dev = 2.61
2
Mean = 63.7
Mean = 69.1
0
N = 99.68
55.5
57.5
56.5
59.5
58.5
61.5
60.5
63.5
62.5
65.5
64.5
67.5
66.5
INCHESF
69.5
68.5
70.5
N = 99.23
0
59.5 61.5 63.5 65.5 67.5 69.5 71.5 73.5 75.5
60.5 62.5 64.5 66.5 68.5 70.5 72.5 74.5 76.5
INCHESM
Cases weighted by PCTM
Cases weighted by PCTF
Source: Statistical Abstract of the U.S. (1992)
Standard Normal (Z) Distribution
• Problem: Unlimited number of possible normal
distributions (- <  <  ,  > 0)
• Solution: Standardize the random variable to have
mean 0 and standard deviation 1
Y ~ N ( , )  Z 
Y 

~ N (0,1)
• Probabilities of certain ranges of values and specific
percentiles of interest can be obtained through the
standard normal (Z) distribution
Standard Normal (Z) Distribution
• Standard Normal Distribution Characteristics:
–
–
–
–
a
za
P(Z  0) = P(Y   ) = 0.5000
P(-1  Z  1) = P(-  Y  + ) = 0.6826
P(-2  Z  2) = P(-2  Y  +2 ) = 0.9544
P(Z  za) = P(Z  -za) = a (using Z-table)
0.500
0.000
0.100
1.282
0.050
1.645
0.025
1.960
0.010
2.326
0.005
2.576
Finding Probabilities of Specific Ranges
• Step 1 - Identify the normal distribution of interest (e.g.
its mean () and standard deviation () )
• Step 2 - Identify the range of values that you wish to
determine the probability of observing (YL , YU), where
often the upper or lower bounds are  or -
• Step 3 - Transform YL and YU into Z-values:
ZL 
YL  

ZU 
YU  

• Step 4 - Obtain P(ZL Z  ZU) from Z-table
Example - Adult Female Heights
• What is the probability a randomly selected female is
5’10” or taller (70 inches)?
• Step 1 - Y ~ N(63.7 , 2.5)
• Step 2 - YL = 70.0 YU = 
• Step 3 70.0  63.7
ZL 
 2.52
ZU  
2.5
• Step 4 - P(Y  70) = P(Z  2.52) = .0059 (  1/170)
z
2.4
2.5
2.6
.00
.0082
.0062
.0047
.01
.0080
.0060
.0045
.02
.0078
.0059
.0044
.03
.0075
.0057
.0043
Finding Percentiles of a Distribution
• Step 1 - Identify the normal distribution of interest
(e.g. its mean () and standard deviation () )
• Step 2 - Determine the percentile of interest
100p% (e.g. the 90th percentile is the cut-off where only
90% of scores are below and 10% are above)
• Step 3 - Turn the percentile of interest into a tail
probability a and corresponding z-value (zp):
– If 100p  50 then a = 1-p and zp = za
– If 100p < 50 then a = p and zp = -za
• Step 4 - Transform zp back to original units:
Yp    z 
p
Example - Adult Male Heights
•
•
•
•
Above what height do the tallest 5% of males lie above?
Step 1 - Y ~ N(69.1 , 2.6)
Step 2 - Want to determine 95th percentile (p = .95)
Step 3 - Since 100p > 50, a = 1-p = 0.05
zp = za = z.05 = 1.645
• Step 4 - Y.95 = 69.1 + (1.645)(2.6) = 73.4
z
1.5
1.6
1.7
.03
.0630
.0516
.0418
.04
.0618
.0505
.0409
.05
.0606
.0495
.0401
.06
.0594
.0485
.0392
Statistical Models
• When making statistical inference it is useful to
write random variables in terms of model
parameters and random errors
Y    (Y   )    
 Y 
• Here  is a fixed constant and  is a random variable
• In practice  will be unknown, and we will use sample data to
estimate or make statements regarding its value
Sampling Distributions and the Central
Limit Theorem
• Sample statistics based on random samples are also
random variables and have sampling distributions that
are probability distributions for the statistic (outcomes
that would vary across samples)
• When samples are large and measurements independent
then many estimators have normal sampling
distributions (CLT):
  
Y ~ N  ,

n

– Sample Mean:
– Sample Proportion:


(1   ) 

 ~ N   ,

n


^
Example - Adult Female Heights
• Random samples of n = 100 females to be selected
• For each sample, the sample mean is computed
• Sampling distribution:
2.5 

Y ~ N  63.5,
  N (63.5,0.25)
100 

• Note that approximately 95% of all possible random
samples of 100 females will have sample means between
63.0 and 64.0 inches