Risk and Sensitivity Analyses

Download Report

Transcript Risk and Sensitivity Analyses

Dr. Héctor Allende
Review of Probability and Statistics
A Review of Probability and Statistics
• Descriptive statistics
• Probability
• Random variables
• Sampling distributions
• Estimation and confidence intervals
• Test of Hypothesis
–For mean, variances, and proportions
–Goodness of fit
1
Dr. Héctor Allende
Review of Probability and Statistics
Key Concepts
• Population
-- "parameters"
–Finite
–Infinite
• Sample -- "statistics"
• Random samples - Your MOST important
decision!
2
Dr. Héctor Allende
Review of Probability and Statistics
Data
• Deterministic vs. Probabilistic (Stochastic)
• Discrete or Continuous:
– Whether a variable is continuous (measured) or
discrete (counted) is a property of the data, not of the
measuring device: weight is a continuous variable,
even if your scale can only measure values to the
pound.
• Data description:
– Category frequency
– Category relative frequency
3
Dr. Héctor Allende
Review of Probability and Statistics
Data Types
• Qualitative (Categorical)
–Nominal -- I E = 1 ; EE = 2 ;
–Ordinal -- poor = 1 ; fair = 2 ;
CE = 3
good = 3 ; excellent = 4
• Quantitative (Numerical)
–Interval -- temperature, viscosity
–Ratio -- weight, height
•
The type of statistics you can calculate
depends on the data type. Average, median, and
variance make no sense if the data is categorical
(proportions do).
4
Dr. Héctor Allende
Review of Probability and Statistics
Data Presentation for Qualitative Data
• Rules:
– Each observation MUST fall in one and only one category.
– All observations must be accounted for.
• Table -- Provides greater detail
• Bar graphs -- Consider Pareto presentation!
• Pie charts (do not need to be round)
5
Dr. Héctor Allende
Review of Probability and Statistics
Data Presentation for Quantitative Data
• Consider a Stem-and-Leaf Display
• Use 5 to 20 classes (intervals, groups).
–Cell width, boundaries, limits, and midpoint
• Histograms
–Discrete
–Continuous (frequency polygon - plot at
class mark)
• Cumulative frequency distribution (Ogive - plot
at upper boundary)
6
Dr. Héctor Allende
Review of Probability and Statistics
Statistics
• Measures of Central Tendency
– Arithmetic Mean
– Median
– Mode
– Weighted mean
• Measures of Variation
– Range
– Variance
– Standard Deviation
• Coefficient of Variation
• The Empirical Rule
7
Dr. Héctor Allende
Review of Probability and Statistics
Arithmetic Mean and Variance -- Raw Data
• Mean
n
_
y
y
i
i 1
n
• Variance
S
2


_


 yi  y 


n 1
2

n y 
2
i
 y 
2
i
n n1
8
Dr. Héctor Allende
Review of Probability and Statistics
Arithmetic Mean and Variance -- Grouped Data
• Mean
n
 f y
_
y
• Variance
S 
 f
2
where
i

_
y y
i
n 1
nf
i

i
i 1
i
n
2
n  f y  f y

n n 1
2
i
and
i
i
2
i
y = class midpoint
i
9
Dr. Héctor Allende
Review of Probability and Statistics
Percentiles and Box-Plots
• 100pth percentile:
value such that 100p% of
the area under the relative frequency
distribution lies below it.
– Q1: lower quartile (25% percentile)
– Q3: upper quartile (75% percentile)
• Box-Plots: limited by lower and upper
quartiles
– Whiskers mark lowest and highest values within
1.5*IQR from Q1 or Q3
– Outliers: Beyond 1.5*IQR from Q1 or Q3 (mark with *)
– z-scores - deviation from mean in units of standard
deviation. Outlier: absolute value of z-score > 3
10
Dr. Héctor Allende
Review of Probability and Statistics
Probability: Basic Concepts
• Experiment: A process of OBSERVATION
• Simple event - An OUTCOME of an
experiment that can not be decomposed
– “Mutually exclusive”
– “Equally likely”
• Sample Space - The set of all possible
outcomes
• Event “A” - The set of all possible simple
events that result in the outcome “A”
11
Dr. Héctor Allende
Review of Probability and Statistics
Probability
• A measure of uncertainty of an estimate
– The reliability of an inference
• Theoretical approach - “A Priori”
– Pr (Ai) = n/N
• n = number of possible ways “Ai” can be observed
• N = total number of possible outcomes
• Historical (empirical) approach - “A Posteriori”
– Pr (Ai) = n/N
• n = number of times “Ai” was observed
• N = total number of observations
• Subjective approach
– An “Expert Opinion”
12
Dr. Héctor Allende
Review of Probability and Statistics
Probability Rules
0  Pr (A i )  1
 Pr (A )
i
= 1
i
• Multiplication Rule:
– Number of ways to draw one element from set 1 which
contains n1 elements, then an element from set 2, ....,
and finally an element from set k (ORDER IS
IMPORTANT!):
n1* n2* ......* nk
13
Dr. Héctor Allende
Review of Probability and Statistics
Permutations and Combinations
• Permutations:
– Number of ways to draw r out of n elements WHEN
ORDER IS IMPORTANT:
n!
n
P 
r ( n  r)!
• Combinations:
– Number of ways to select r out of n items when order is
NOT important
n!
n
C 
r r! ( n  r ) !
14
Dr. Héctor Allende
Review of Probability and Statistics
Compound Events
Union
( A B){x xAor Borboth}
Intersecton
i
( A B){x xAandB}
Complement
( A') {x xA}
15
Dr. Héctor Allende
Review of Probability and Statistics
Conditional Probability
P( A B)
P( AB)
provided P(B)0
P( B)
Multiplicative
Rule :
P( AB) P( A B)P(B ) provided P(B)0
16
Dr. Héctor Allende
Review of Probability and Statistics
Other Probability Rules
P ( A B )  P ( A)  P ( B )  P ( A B )
• Mutually Exclusive Events:
P ( A B )  {}
• Independence:
– A and B are said to be statistically INDEPENDENT if
and only if:
P( AB)P( A)P( B)
17
Dr. Héctor Allende
Review of Probability and Statistics
Bayes’ Rule
P ( Ai E ) 
P( A i ) P ( E A i )
P ( A
j
) P ( E Aj )
j
18
Dr. Héctor Allende
Review of Probability and Statistics
Random Variables
• Random variable: A function that maps every
possible outcome of an experiment into a
numerical value.
• Discrete random variable: The function can
assume a finite number of values
• Continuous random variable: The function
can assume any value between two limits.
19
Dr. Héctor Allende
Review of Probability and Statistics
Probability Distribution for a Discrete
Random Variable
• Function that assigns a value to the
probability p(y) associated to each possible
value of the random variable y.
0  p ( y)  1

p ( y)  1
 y
20
Dr. Héctor Allende
Review of Probability and Statistics
Poisson Process
• Events occur over time (or in a given area,
volume, weight, distance, ...)
• Probability of observing an event in a given
unit of time is constant
• Able to define a unit of time small enough so
that we can’t observe two or more events
simultaneously.
• Tables usually give CUMULATIVE values!
21
Dr. Héctor Allende
Review of Probability and Statistics
The Poisson Distribution
x is the number of events observed over T

is the expected number of events over T
e is the base of natural logs (2.71828)
 =
2


22
Dr. Héctor Allende
Review of Probability and Statistics
Poisson Approximation to the Binomial
• In a binomial situation where n is very large
(n > 25) and p is very small (p < 0.30, and np
< 15), we can approximate b(x, n, p) by a
Poisson with probability ( lambda = np)
np
y
n
 y
e ( n p)
n y
b ( y , n, p )    p ( 1  p )  P ( y ,   n p ) 
 y
y!
23
Dr. Héctor Allende
Review of Probability and Statistics
Probability Distribution for a
Continuous Random Variable
• F( y0 ), is a cumulative distribution function
that assigns a value to the probability of
observing a value less or equal to y0
F ( y )  P ( y  y )  y f ( y ) dy
0
0
0
Property: F ( y ) is continuous over   y 
24
Dr. Héctor Allende
Review of Probability and Statistics
Probability Calculations
P ( a  y  b )  ab f ( y ) dy
where f ( y ) is the density function of y
f ( y ) d[F ( y ) ]
dy
F( y )is the (probability) distribution function of y
f ( y )0
 y

  f ( y ) dy  1
F ( y ) iscontinuous
P( y  a) 0
for allcontinuous r.v.( a  constant )
25
Dr. Héctor Allende
Review of Probability and Statistics
Expectations
  E ( y )   yp y 
discrete
all y

  E ( y )  
 y f ( y ) dy
continuous

E [ g ( y ) ]  
 g ( y ) f ( y ) dy
2
Variance 
 E [ ( y   )2 ]  E ( y 2 )   2
Standard deviation  
2
Properties of Expectations
E (c )  c
E ( cy )  c E ( y )
E [ g ( y )  g ( y )    g ( y ) ] 
E [ g ( y ) ]    E [ g ( y ) ]
 (c)  0
 ( cy )  c  ( y )
1
2
k
1
k
2
2
2
2
26
Dr. Héctor Allende
Review of Probability and Statistics
The Uniform Distribution
(a b)
( b  a )2
2

 
2
12
A frequently used model when no data are available.
27
Dr. Héctor Allende
Review of Probability and Statistics
The Triangular Distribution
A good model to use when no data are available. Just ask an expert
to estimate the minimum, maximum, and most likely values.
28
Dr. Héctor Allende
Review of Probability and Statistics
The Normal Distribution
z
y

the standard normal variable
Tables provide cumulative values for the
Standard Normal Distribution N (  = 0,  = 1 )
29
Dr. Héctor Allende
Review of Probability and Statistics
The Lognormal Distribution
Consider this model when 80 percent of the data values
lie in the first 20 % of the variable’s range.
30
Dr. Héctor Allende
Review of Probability and Statistics
The Gamma Distribution
Properties:
  
 2   2
31
Dr. Héctor Allende
Review of Probability and Statistics
The Erlang Distribution
A special case of the Gamma Distribution when  = k = integer
A Poisson process where we are interested in the time to observe k events
32
Dr. Héctor Allende
Review of Probability and Statistics
The Exponential Distribution
A special case of the Gamma Distribution when  =1
33
Dr. Héctor Allende
Review of Probability and Statistics
The Weibull Distribution
A good model for failure time distributions of manufactured items.
It has a closed expression for F ( y ).
34
Dr. Héctor Allende
Review of Probability and Statistics
The Beta Distribution
A good model for proportions. You can fit almost any data.
However, the data set MUST be bounded!
35
Dr. Héctor Allende
Review of Probability and Statistics
Bivariate Data (Pairs of Random Variables)
• Covariance: measures strength of linear
relationship


Cov( X ,Y )  E  X  E X Y  EY   E XY   E X  EY 
• Correlation: a standardized version of the
covariance

Cov X , Y 

• Autocorrelation: For a single time series:
X
Y
Relationship between an observation and those
immediately preceding it. Does current value (Xt) relate
to itself lagged one period (Xt-1)?
36
Dr. Héctor Allende
Review of Probability and Statistics
Sampling Distributions
The population has PARAMETERS

 ,  
_
A sample yields STATISTICS X, S
2

A statistics is calculated based on the values observed in a sample.
Those values are random variables. Therefore, a statistics
is a RANDOM VARIABLE.
The sampling distribution of a statistic is its probability distribution.
The STANDARD ERROR of a statistic is the standard deviation of
its sampling distribution.
See slides 8 and 9 for formulas to calculate sample means and
variances (raw data and grouped data, simultaneously).
37
Dr. Héctor Allende
Review of Probability and Statistics
The Sampling Distribution of the
Mean (Central Limit Theorem)
The CENTRAL LIMIT THEOREM: If random samples
of size n are taken from a population having ANY distribution
with mean  and standard deviation  , then, when n is large
enough, the sample distribution of the mean can be approximated
by a normal density with mean    and standard deviation
 
_
Y

_
Y
n
38
Dr. Héctor Allende
Review of Probability and Statistics
The Sampling Distribution of Sums
Let L  a y  a y  .....  a y
1
Assume
1
2
2
k
E (y )   ,
i
k
Var ( y )  
i
i
, Cov ( y , y )  
2
i
i
j
ij
Then L possesses a normal density with mean and variance:
E ( L )  a   a   .....  a 
1
1
2
2
k
k
Var ( L )  a   a   .....  a  
2
2
2
2
2
2
1
1
2
2
k
k
2 a a   2 a a   .....  2 a
1
2
12
1
3
13
k 1
a 
k
k  1, k
39
Dr. Héctor Allende
Review of Probability and Statistics
Distributions Related to Variances
For a sample with standard deviation S, the statistics
 
2
( n  1) S

2
2
follows a Chi  square distr. with   n  1.
For two independent samples, the statistics
 /
F
follows an F  distribution with parameters
 /
 in the numerator and  in the denominator.
2
1
1
2
2
2
1
2
The sum of two chi - squares follows a chi - square
distribution with  =   
1
2
40
Dr. Héctor Allende
Review of Probability and Statistics
The t Distribution
Let z be a standard normal variable and  be a chi - square
2
random variable with  degrees of freedom. If z and  are
z
independent, then t =
is said to posses a
 /
2
2
Student' s distribution (" t - distribution" ) with  df.
COROLLARY: For a random sample taken from a
y-
normal population, t =
follows a t  distribution
S/ n
with  df .
41
Dr. Héctor Allende
Review of Probability and Statistics
Estimation
• Point and Interval Estimators
• Properties of Point Estimators
– Unbiased: E (estimator) = estimated parameter
Note: S2 is Unbiased if E Y  

_
– MVUE: Minimum Variance Unbiased Estimators
• Most frequently used method to estimate
parameters: MLE - Maximum Likelihood
Estimators.
42
Dr. Héctor Allende
Review of Probability and Statistics
Interval Estimators -- Large
sample CI for mean
From the Central Limit Theorem:

X
Prob  -z 
 z

/ n

_
/2
/2

  1   


After some algebraic manipulation we get:




 
Prob  X  z
   X  z
 1   

n 
n
 




_
_
/2
The ( 1 -
/2
) * 100% Confidence Interval for
43
Dr. Héctor Allende
Review of Probability and Statistics
Interval Estimators -- Small
sample CI for mean
For small samples( n < 30 ):


X

Prob - t  
 t   1   


S/ n


_
/2
/2
After some algebraic manipulation we get:



S
S 
Prob  X  t
   X  t
  1   
n
n
 




_
_
/2
The ( 1 -
/2
) * 100% Confidence Interval for
(small samples)
44
Dr. Héctor Allende
Review of Probability and Statistics
Sample Size
Based on CI for the mean:
 z / 2  
n

  
2
 z / 2 S 
 

  
2
Recommendation:
Sample approximately 30
Estimate  2 using S 2
Estimate n
Take more observations as needed.
45
Dr. Héctor Allende
Review of Probability and Statistics
CI for proportions (large samples)
The distribution of a proportion is fairly normal with mean  = p and
variance  
2
p 1  p
n
Then, the C. I. for the population proportion is:
^
^
p  p  z
^
where
p
/2
^
p (1 p )
n
y
is the observed proportion of successes
n
Assumption: The interval does not contain 0 or 1.
46
Dr. Héctor Allende
Review of Probability and Statistics
Sample Size (proportions)
Based on CI for a proportion:
 
^
 z / 2  ^
n 
 p 1 p
  
2
Recommendation:
Sample approximately 30
^
Estimate p
Estimate n
Take more observations as needed.
47
Dr. Héctor Allende
Review of Probability and Statistics
CI for the variance
The statistics:
n  1 S

2
2
~ 
A Chi - Square distr. with  = n - 1
2
After some algebraic manipulation:
 n  1 S
Prob 
  
2
/2,
2

n  1 S
 
2

2
( 1   / 2 ), 
2

  1   

Assumption: Population is approximately normal.
48
Dr. Héctor Allende
Review of Probability and Statistics
CI for the Difference of Two Means
-- large samples -The difference of two means follows a normal density with:
E  Y  Y     
_
_
1
2
1


and Var  Y  Y  

n
n
_
2
_
1
2
2
1
2
2
1
 C.I. for   
1
=  Y  Y   z
_
2

_
1
2
/2
2

1
n
1
  Y  Y   z
_
1
2
/2
2
2
n
2
2
_

2
S
S

n
n
2
1
2
1
2
Assumptions: Independent samples with more than 30
observations each.
49
Dr. Héctor Allende
Review of Probability and Statistics
CI for (p1 - p2) --- (large samples)
For large samples ( n1 and n2  30) :

^
^


^
^

p 1  p 2  z / 2   p  p   p 1  p 2  z / 2
^

^
1
2

^
^
^
^
p 1 q1 p 2 q 2

n1
n2
Approximation is good as long as neither interval includes
0 or 1.
50
Dr. Héctor Allende
Review of Probability and Statistics
CI for the Difference of Two Means
-- small samples, same variance -=  Y1  Y2   t  / 2, n  n  2 S p
_
C.I. for  1   2
where S 2p 
_
1
n
1
 1 S12   n2  1 S 22
( n1  n2  1)
2
1
1

n 1 n2
("pooled variance")
Assumptions:
1. Independent samples taken from normal populations.
2. Variances are unknown but equal (  12   22   2 )
51
Dr. Héctor Allende
Review of Probability and Statistics
CI for the Difference of Two Means
-small samples, different variances-
C.I. for  1   2 =  Y1  Y2   t  / 2,
_
 S1 S 2 
  
 n1 n2 
2
and  =
2
 S1 
 
 n1 

n1  1
2
2
S 12 S 22

n 1 n2
2
 S2 
 
 n2 
n2  1
2
_
2
(round down)
Assumptions: Independent samples taken from normal populations.
52
Dr. Héctor Allende
Review of Probability and Statistics
CI for the Difference of Two Means
-- matched pairs -We have PAIRS of observations related through some
common factor (Y1i , Y2i ):
Let d i  Y1i  Y2i
_
the observed difference for pair i
C.I. for  d  d  t  / 2 , n 1
Sd
n
_
where d and S d are the mean and the standard deviation
of the n sample differences.
Assumptions: Random observations; the population
of paired differences is normally distributed.
53
Dr. Héctor Allende
Review of Probability and Statistics
CI for two variances
Recall:
2
 n  1 S 2
S
 1  1 
1
/  n  1
2
1
 /
2
2
1 1
1
F

 1 ~F
 n  1 S 2
2 / 
S2
n 1, n 1
2 2  2  2 /  n  1
2
1
2


2
2

2
2
2
After some algebraic manipulation:










Prob 
 F
n 1,

1





























2

 1 
  (1 )
2
F


2
n 1, / 2
n  1 , n  1 , (1   / 2) 
2
1
2


S2
1
S2
2












S2
1
S2
2






























Assumption: Independent samples from normal populations.
54
Dr. Héctor Allende
Review of Probability and Statistics
Prediction Intervals
Consider the prediction of the value for the NEXT observation (not the
mean value but its actual value), e.g., we want a "confidence interval" for y .
n +1
Consider the difference between this observation and the sample mean:
E  y  y   E ( y )  E ( y )  
_
_
n 1
n +1
y n +1
     0
_
y

1

  y  y   ( y )   ( y )   
  1  

n
n
If the distribution of y is approximately normal, this difference will also be normal.
This yields the following "prediction interval" for the next observation, y :
_
2
2
_
2
2
2
2
n 1
n +1
n +1

Pr  y  t

_
/ 2,n  1
1

S 1    y

n
_
n1
 y  t
/ 2,n  1
1 

S  1     1   

n 
55
Dr. Héctor Allende
Review of Probability and Statistics
Hypothesis Testing
• Elements of a Statistical Test. Focus on decisions
made when comparing the observed sample to a
claim (hypotheses). How do we decide whether the
sample disagrees with the hypothesis?
• Null Hypothesis, H . A claim about one or more
0
population parameters. What we want to REJECT.
• Alternative Hypothesis, H : What we test against.
a
Provides criteria for rejection of H0.
• Test Statistic: computed from sample data.
• Rejection (Critical) Region, indicates values of
the test statistic for which we will reject H0.
56
Dr. Héctor Allende
Review of Probability and Statistics
Errors in Decision Making
True State of Nature
H0
Ha
Decision
Dishonest client
Honest client
Do not lend
Correct decision
Type II error
Type I error
Correct decision
Lend
57
Dr. Héctor Allende
Review of Probability and Statistics
Statistical Errors
Type I error (  ): Rejecting a true
Null Hypothesis (producer' s risk)
Type II error (  ): Rejecting a true
Alternative Hypothesis (consumer' s risk)
Power of a statistical test,
( 1 -  ), is the probability of rejecting the
null hypothesis H when, in fact, H is false.
0
0
58
Dr. Héctor Allende
Review of Probability and Statistics
Statistical Tests
One - tailed tests:
H: 
Rejection region:
0
0
H: 
z > z
a
(o r  <  )
(or z  z )
0
0
Two - tailed tests:
H: 
Rejection region:
0
0
H:  
z > z
a
0
/2
or z   z
/2
_
where z =
X 
/ n
0
and P(z > z )  
59
Dr. Héctor Allende
Review of Probability and Statistics
The Critical Value
The sample size for specified  and  when testing H 0 :  = 0 versus
H a :   a is given by
n =

z  z 

a

2
2
 0 
2
Assumption:  is the same under both hypotheses.
60
Dr. Héctor Allende
Review of Probability and Statistics
The observed significance level for a test
It is standard in industry to use  = 0.05.
Some researchers prefer to report the observed
" p - value". This is the probability (under H )
of observing the value of the test statistic. This
allows the reader to make his (her) own decision
about accepting or rejecting H .
Most computer packages report the significance as
0
0
(for example) Prob > T
61
Dr. Héctor Allende
Review of Probability and Statistics
Testing proportions (large samples)
^
H 0 : p  p0
test statistic: z =
p  p0
p0 ( 1  p0 )
n
y
p
is the observed proportion of successes
n
^
where
Rejection region (example): z > z ( H a : p  p0 )
^
^
^
Assumption: The interval p  2 p (1 p) / n
does not contain 0 or 1.
62
Dr. Héctor Allende
Review of Probability and Statistics
Testing a Normal Mean
Select  . Set your test as one - tailed or two - tailed.
y
y
Calculate test statistic: z =

 / n S/ n
Compare to the critical value (from book' s table).
_
_
0
0
If sample is small ( n < 30 ):
_
y
Calculate test statistic: t =
S/ n
(ass umes an approximately normal population)
0
63
Dr. Héctor Allende
Review of Probability and Statistics
Testing a variance
H 0 :  2   20
test statistic:  2 
 n  1 S
2

for H :   
for H :   
2
0
Rejection region:  2   2
 
2
2
2
2
2
/2
2
a
0
2
1 
    or   
2
2
1  /2
a
2
0
for H a :  2   20
Assumption: Population is approximately normal.
64
Dr. Héctor Allende
Review of Probability and Statistics
Testing Differences of Two Means
-- large samples --
H 0 :  1   2  D0
 Y  Y   D
2
0
 1
test statistic: z 
S12 S 22

n1 n2
_
_
Rejection region:
z > z
if H a :  1   2  D0
z < - z
if H a :  1   2  D0
z > z  / 2 or z   z / 2 if H a :  1   2  D0
Assumptions: Independent samples with more than 30
observations each.
65
Dr. Héctor Allende
Review of Probability and Statistics
Testing Differences of Two Means
-- small samples, same variance - Y  Y   D
2
0
 1
_
H 0 :  1   2  D0
test statistic: t 
_
1
1
Sp   
 n1 n2 
Rejection region (example): t > t  , n  n  2 ( H a :  1   2  D0 )
1
where S 2p 
n
1
2
 1 S12   n2  1 S 22
("pooled variance")
( n1  n2  1)
Assumptions: 1. Indep. samples from normal populations.
2. Variances are unknown but equal (  12   22   2 )
66
Dr. Héctor Allende
Review of Probability and Statistics
Testing Differences of Two Means
-small samples, different variancesH0 :  1   2  D0
test statistic: t 
 Y_  Y_   D
 1
2
0


S12 S22

n1 n2
Rejection region (example): t > t , ( Ha :  1   2  D0 )
where  =
 S12 S22 
  
 n1 n2 
2
2
2
S 
S 
 
 
 n1 
n 
 2
n1  1 n2  1
2
1
2
2
(round down)
Assumptions: Independent samples taken from approximately normal populations.
67
Dr. Héctor Allende
Review of Probability and Statistics
Testing Difference of Two Means
-- matched pairs --
We have PAIRS of observations related through some
common factor (Y1i , Y2i ):
Let d i  Y1i  Y2i
the observed difference for pair i
_
d  D0
  1   2  D0
test statistic: t =
Sd / n
H 0 :  diff
Rejection region: t > t  , n  1 for H a :  diff   1   2  D0
_
where d and Sd are the mean and the standard deviation
of the n sample differences.
Assumptions: Random observations; the population
of paired differences is normally distributed.
68
Dr. Héctor Allende
Review of Probability and Statistics
Testing a ratio of two variances

H:
 1 ( e. g.,    )

2
0
1
2
2
2
1
2
2
test statistic: F =
larger sample variance
smaller sample variance
Rejection region: F > F
F > F / 2
for H a :  12   22
for H a :  12   22
Assumption: Independent samples from normal populations.
Note: Make sure the df in the numerator are those of
the sample with larger variance!
69
Dr. Héctor Allende
Review of Probability and Statistics
Testing (p1 - p2) --- (large samples)
For large samples ( n1 and n2  30) :
H 0 : p1  p2  D0
test statistic: z =
^
when D0  0
when D0  0


^
 p^  p^ 
2
 1

 p^  p^ 
2
 1

^
^
( p1  p 2 )  D0



1
1
p q  
 n1 n2 
^
 p^  p^ 
2
 1

^
p1 q 1 p 2 q 2

n1
n2
^
^
y1  y 2
and p 
n1  n2
^
Approximation is good as long as no interval includes 0 or 1.
70
Dr. Héctor Allende
Review of Probability and Statistics
Categorical Data
One - way Table: Categories and their frequencies:
Categ.
1
2
..
k
Freq.
n1
n2
..
nk
^
Large sample conf. int. for pi  p i  z / 2
Example: EE
17
Then
ME
11
Total
n
 
^
 1 ^
  p i 1  pi
 n
Others Total
9
37
17
1  17   20
 196
.
.
     0.46  016
37
37  37  37
0.30  pEE  0.62
pEE 
71
Dr. Héctor Allende
Review of Probability and Statistics
One-way Tables (Cont.)
Large sample (1 -  ) 100 % Conf. Int. for pi  p j :
p
 p j   ( p i  p j )  z / 2
^
i
^
^
^
^
^
^
 1  ^
   p i (1  p i )  p j (1  p j )  2 p i p j 
 n 

In the example:
pEE  pME
 17 11
 1   17   20  11  26
 17   11 
     196
.
            2     
 37 37
 37  37  37  37  37
 37  37 
 0162
.
 0.275
  0113
.
 pEE  pME  0.437
NOTE:  0.045  pEE  pOthers  0.477
NOT significant!
again, difference is NOT significant!
72
Dr. Héctor Allende
Review of Probability and Statistics
Categorical Data Analysis
General r x c Contingency Table
1
2
..
1
n(1,1)
n(1,2)
..
2
n(2,1)
n(2,2)
..
..
..
..
..
r
n(r,1)
n(r,2)
..
Totals
c(1)
c(2)
..
c
n(1,c)
n(2,c)
..
n(r,c)
c(c)
Totals
r (1)
r (2)
..
r (r)
n
73
Dr. Héctor Allende
Review of Probability and Statistics
Example of a Contingency Table
STA 3032 - Summer 1994
Grade
Q2
Q4
0-2
13
0
2.1-4
6
1
4.1-6
8
5
6.1-8
4
7
8.1-10
2
16
Total
33
29
Q6
Total
2
1
11
9
6
29
15
8
24
20
24
91
74
Dr. Héctor Allende
Review of Probability and Statistics
Testing for Independence
H : Variables are independent
0
H : They are not
a
2





n 2
nij  E  nij  
c
r
c
r

ij
 n
Test statistic:  2    
1
  


 j  1i  1 ri c j

j  1i  1
En 


 ij 
where
r c

 i j
En  
 ij 
n
Rejection region:  2
0.05, (r - 1) (c - 1)


Note: regroup rows (columns) as needed for E  n   5  i , j.
 ij 
2
2
 192

1
6
2

In the example:   91

 ... 
 1  41.33
 23  33 23  29
24  29 

Note regrouping! Compare to 
 12.5916 ( from Table)
0.05, 6
Conclusion: Variables are NOT independent.
75
Dr. Héctor Allende
Review of Probability and Statistics
Distributions: Model Fitting Steps

Collect data. Make sure you have a random sample.
You will need at least 30 valid cases

Plot data. Look for familiar patterns

Hypothesize several models for distribution

Using part of the data, estimate model parameters

Using the rest of the data, analyze the model’s
accuracy

Select the “best” model and implement it

Keep track of model accuracy over time. If warranted,
go back to 6 (or to 3, if data (population?) behavior
keeps changing)
76
Dr. Héctor Allende
Review of Probability and Statistics
Chi-Square Test of Goodness of Fit
H 0 : p1  p10 ; p2  p20 ; .... ; pk  pk 0 with  pi   pi 0  1
i
i
H a : At least one pi  pi 0
yi
Let n = sample size and p i  the observed frequency in cell i
n
Make sure that e i  npi  5  i (if not, regroup cells as needed).
^
Test Statistic:
2 
Rejection Region:
k

i 1
 ni  ei 
ei
2
k

i 1
 ni  npi 0 
2
npi 0
 2   2
where:  = k - r -1
k = number of cells after regrouping
r = number of parameters estimated from data to calculate pi0
77
Dr. Héctor Allende
Review of Probability and Statistics
Kolmogorov-Smirnov Test of Goodness of Fit
Compares the empirical distribution function F ( y ) with
a hypothesized theoretical distribution function F ( y ).
Empirical: F ( y ) = fraction of the sample less or equal to y
i
= for the ith ranked observation (contains y)
n
1
Let
D  max   F ( y ) 
n

i  1

D  max  F ( y ) 
n 

Then D = max F ( y )  F ( y )  max ( D , D )
n
n

i

i


n
Critical values given in tables
78
Dr. Héctor Allende
Review of Probability and Statistics
A Review of Probability and Statistics
• Descriptive statistics
• Probability
• Random variables
• Sampling distributions
• Estimation and confidence intervals
• Test of Hypothesis
–For mean, variances, and proportions
–Goodness of fit
79