Validity and application of some continuous distributions
Download
Report
Transcript Validity and application of some continuous distributions
Validity and application of some
continuous distributions
Dr. Md. Monsur Rahman
Professor
Department of Statistics
University of Rajshahi
Rajshshi – 6205
E-mail: [email protected]
1
Normal distribution
The first discoverer of the normal probability function
was Abraham De Moivre(1667-1754), who, in 1733,
derived the distribution as the limiting form of the
binomial distribution. But the same formula was derived
by Karl Freidrich Gauss(1777-1855) in connection with
his work in evaluating errors of observation in astronomy
This is why the normal probability is often referred to as
Gaussian distribution.
2
X: Normal Variate
Density:
f ( x)
1
2
exp[ (
1
2
x 2
) ],
x , , 0
2
E ( X ) ,Var ( X )
Standard Normal Variate :
Z
2
X
X Z
3
Normal distribution
4
Properties of Normal distribution
Normal probability curve is symmetrical about the
ordinate at
x
Mean, median and mode of the distribution are
equal and each of these is
The curve has its points of inflection at x
By a point of infection, we mean a point at which
the concavity changes
All odd order moments of the distribution about the
mean vanish
The values of
respectively
1
and
2
are 0 and 3
5
includes about 68.27% of the population
2
includes about 95.45% of the population
3
includes about 99.73% of the population
Application:
Many biological characteristics conform to a Normal
distribution - for example, heights of adult men and
women, blood pressures in a healthy population,
RBS levels in blood etc.
6
Validity of Normal Distribution for a set of data
Many statistical methods can only be used if the
observations follow a Normal Distribution. There
are several ways of investing whether observations
follow a Normal distribution. With a large sample we
can inspect a histogram to see whether it looks like
a Normal distribution curve. This does not work well
with a small sample, and a more reliable method is
the normal plot which is described below.
7
8
X: Normal Variate
Density:
f ( x)
1
2
exp[ (
1
2
x 2
) ],
x , , 0
2
E ( X ) ,Var ( X )
Standard Normal Variate :
Z
2
X
X Z
9
CDF OF X : F(X)
CDF OF Z : (z ) ,
P quantile of X :
Xp
P quantile of Z :
Zp
( z ) 1 ( z )
Xp
is the solution of F ( X p ) p
Zp
is the solution of
Zp
(Z p ) p
X p
X p Z p
10
Dataset
x1 , x2 ,..., xn
• Find empirical CDF values
• Arrange the data in ascending order as
x(1) , x( 2 ) ,..., x( n )
• Empirical CDF values are as follows
F ( x(i ) )
i 0.5
n
, i 1,2,..., n.
•Using normal table obtain the
corresponding to F ( x(i ) )
z (i )
values
11
•If the given set of observations follow normal
distribution, the plot (x, z) should roughly be a straight
line and the line
point
( ,0 )
z
x
and has slope
•Graphical estimates of
obtained.
passes through the
1
and
.
may be
•If the data are not come from Normal distribution we
will get a curve of some sort.
12
Table 1 : RBS levels(mmol/L) measured in the blood
of 20 medical students. Data of Bland(1995), pp. 66
2.2
3.3
3.3
3.4
3.6
3.6
3.7
3.8
3.8
3.8
3.9
4.0
4.1
4.1
4.2
4.4
4.7
4.7
4.8
5.0
Bland,M.(1995): An Introductions to Medical Statistics,
second edition, ELBS with Oxford University Press.
13
14
MLE
ˆ 3.92mmol / L
ˆ .642mmol / L
15
16
•Goodness of Fit Test
•We use here Kolmogorov-Smirnov (KS) test
for the given data
• KS statistic=max |CDF_FIT- CDF_EMP|
• For the RBS level data we calculate KS statistic
KS(cal)=0.07827
• 5% tabulated value=0.294
• Conclusion: Normal distribution fit is good for the
given data
17
Results
• Estimated population having RBS within the normal
range (3.9 – 7.8mmol/L) is about 51%
• Estimated population having RBS below the normal
range is about 49%
• Estimated population having RBS above the normal
range is 0%
18
•Two sample case
X 11 , X 12 ,..., X 1m
X 21 , X 22 ,..., X 2 n
• Empirical CDF values of X (11) , X (12) ,..., X (1m ) are
as follows:
F ( X (1i ) )
i 0.5
m
, i 1,2,..., m
• Obtain the Z (1i ) values corresponding to F ( X (1i ) )
• Similarly
Z ( 2 i ) values are obtained corresponding to
F ( X ( 2i ) )
19
• If the first set of data come from normal distribution
with mean
( X 1 , Z1 )
1 and variance
2
1
, then the plot
will roughly be linear and passes
through the point ( 1 ,0) with slope
1
1
.
• If the second set of data come from normal distribution
with mean
( X 2 , Z2 )
2
and variance
2
2
, then the plot
will roughly be linear and passes
through the point
(2 ,0)
with slope
1
2
.
20
• Both the lines parallel indicating different means
but equal variances
• Both the lines coincide indicating equal means and
equal variances
• Both the lines pass through the same point on the
X-axis indicating same means but different
variances
21
Table 2 : Burning times (rounded to the nearest tenth
of a minute) of two kinds of emergency flares.
Data due to Freund and Walpole(1987),
pp. 530
Brand A: 14.9,11.3,13.2,16.6,17.0,14.1,15.4,
13.0,16.9
Brand B: 15.2,19.8,14.7,18.3,16.2,21.2,18.9,
12.2,15.3,19.4
Freund, J.E. and Walpole, R.E.(1987): Mathematical Statistics, Fourth edition,
Prentice-Hall Inc.
22
Above plot indicates that both the samples come
from normal population with unequal means
23
and variances
Log-normal distribution
In probability theory, a log-normal distribution is a
probability distribution of a random variable whose
logarithm is normally distributed. If X is a random
variable with a normal distribution, then Y = exp(X) has
a log-normal distribution; likewise, if Y is log-normally
distributed, then X = log(Y) is normally distributed.
It is occasionally referred to as the Galton distribution.
24
Density:
f ( x)
x
1
2
exp[ (
1
2
log x
2
) ],
0 x , , 0
2
Mean =
Variance=
Median=
Mode=
25
Log-normal density function
f(x)
x
26
Application
Certain physiological measurements, such as blood
pressure of adult humans (after separation on
male/female subpopulations), vitamin D level in blood
etc. follow lognormal distribution.
Subsequently, reference ranges for measurements in
healthy individuals are more accurately estimated by
assuming a log-normal distribution than by assuming
a symmetric distribution about the mean.
27
Table 3 : Vitamin D levels(ng/ml) measured in the
blood of 26 healthy men.
Data due to Bland(1995), pp. 113
14 25 30 42 54 17 26 31 43 54
20 26 31 46 63 21 26 32 48 67
22 27 35 52 83 24
Bland,M.(1995): An Introductions to Medical Statistics, Second edition,
ELBS with Oxford University Press.
28
29
• MLE
ˆ 3.509 ng/ml
ˆ .449ng/ml
30
31
•Goodness-of-fit test
• KS statistic=max |CDF_FIT- CDF_EMP|
• For the vitamin D level data we calculate KS statistic
KS(cal)=0.0967
• 5% tabulated value=0.274
• Conclusion: Lognormal distribution fit is good for the
given vitamin D data
32
Results
• Estimated population having vitamin D level within
the normal range (30 – 74 ng/ml) is about 56%
• Estimated population having vitamin D level below
the normal range is about 40%
• Estimated population having vitamin D level above
the normal range is about 4%
33
Weibull Distribution
Weibull distribution is used to analyze the lifetime data
T: Lifetime variable
• Density function
t 1
f (t ) ( )
t
exp[ ( ) ],0 t , , 0
•
:
•
: Shape parameter(<1 or >1 or =1)
• CDF
•
Scale parameter(.632 quantile)
:
t
F (t ) 1 exp[ ( ) ]
Reliability (or Survival) function:
R(t ) exp[ ( ) ]
t
34
•Hazard Function :
t 1
h(t ) ( )
•Increasing hazard rate :
h(t ) t for
1
•Decreasing hazard rate:
h(t ) t
for
1
•Constant hazard rate :
h(t ) 1
for
1
E (T ) (1 1 )
V (T ) 2 [(1 2 ) {(1 1 )}2 ]
t p :p quantile, which is the solution of F (t p ) p
•Accordingly, t p [ log( 1 p)]
1
35
Exponential distribution
•Weibull distribution reduces to exponential distribution
when 1
Density function:
f (t ) 1 exp( t ),0 t , 0
•
: Scale
• CDF
•
:
parameter(.632 quantile)
F (t ) 1 exp( t )
Reliability (or Survival) function:
R(t ) exp( t )
36
•Hazard Function :
E (T )
Var (T )
h(t )
1
2
t p : p quantile, which is the solution of F (t p ) p
•Accordingly,
t p [ log( 1 p)]
37
The red curve is the
exponential density
The red line is the
exp. hazard function
38
Validity of Weibull distribution for a set of data
From the Weibull CDF we get
log[ log( 1 F (t ))] log( t ) log( )
Y A X ,
where
•
Y log[ log( 1 F (t ))]
X log( t )
A log( )
• Ordered lifetimes are:
•
t(1) , t( 2) ,..., t( n )
Y(i ) values are obtained through the empirical
CDF values as given below
F (t(i ) )
i 0. 5
n
,
i 1,2,..., n
39
• If the data follow Weibull distribution with scale
parameter
and shape parameter ,
the plot of (X,Y) will roughly be linear with slope
and passes through the point (log( ),0) .
• Accordingly, the graphical estimates of
and may be obtained.
•
40
Table 4: Specimens lives (in hours) of a electrical
insulation at 200 o C temperature appear
below.
Data due to Nelson(1990), pp. 154
2520, 2856, 3192, 3192, 3528
Nelson,W.(1990): Accelerated Testing: Statistical Models, Test Plans,
and Data Analyses, John Wiley and Sons.
41
42
• MLE of
and
• Log-likelihood function of and
on observed data t1 , t 2 ,..., t n
based
ti
LogL n log( ) ( 1) log( ) ( )
• MLE of and by maximizing the
log-Likelihood with respect to and
ti
using numerical method.
• Graphical estimates may be used as starting
values required for the numerical method
•
The MLEs of and
̂ and ˆ respectively.
are denoted by
43
For the insulation fluid data given in table 4 the
following results (based on MLEs) are obtained:
ˆ 3208.49 hours
ˆ 10.61
S .E (ˆ ) 142.56 hours
S .E ( ˆ ) 3.78
Estimated median life= 3099.548 hours
t ˆ
ˆ
•ML estimate of R(t) R(t ) exp[ ( ˆ ) ]
Time (hour): 3000 3500 3700
Reliability : .6124 .0807 .0107
4000
.0000311
44
Weibull versus Exponential Model
•Suppose we want to test whether we accept
exponential or Weibull model for a given set of
data
•The above test is equivalent to test whether the
shape parameter of Weibull distribution is unity
or not i.e.
H 0 : 1 vs H1 : 1
45
•Test Procedure(LR test)
•Under
H0
the log-likelihood function is
l0 n log( ) 1 ti
which yields
•Maximum of
̂ 1n ti , MLE of
.
l0 is given by
lˆ0 n log( ˆ ) 1ˆ ti
46
•Similarly, under H1 the maximum of the
log-likelihood is given by
ˆˆ
ˆˆ
ˆˆ
ˆ
t
t
i
i
ˆ
l1 n log( ˆˆ ) ( 1) log( ˆˆ ) ( ˆˆ )
where ̂ˆ
under
ˆ
and ˆ are the MLE s of and
H1
.
ˆˆ ˆ
2(l1 l0 ) follows chi-square
•LR test implies
distribution with 1 df.
ˆˆ ˆ
2
•If 2(l1 l0 ) (1 ,1) , accept (use)
exponential Model
47
ˆˆ ˆ
• If 2(l1 l0 ) 2 (1 ,1) , accept (use) Weibull
model
• For the insulation fluid data given in table 4
ˆˆ ˆ
2(l1 l0 ) 2(36.1877 45.1293) 17.87
(.95,1) 3.34
2
Conclusion: Weibull model may be accepted at
5% level of significance
48
Accelerated Life Testing (ALT)
for Weibull Distribution
• Stress: Temperature, Voltage, Load, etc.
• Under operating (used) stress level, it takes
a lot of time to get sufficient number of failures
• Lifetimes obtained under high stress levels
• Aim: (i) To estimate the lifetime distribution
under used stress level, say, S 0
(ii) To estimate reliability for a specified
time under S 0
(iii) To estimate quantiles under S 0
Sampling scheme(under constant
stress testing)
• Divide n components into k groups with number
of components n1 , n2 ,..., nk
respectively, where n
k
n
i 1
i
• ni components exposed under stress levels Si
• Tij , j-th lifetime corresponding to Si
• Obtain the equation for the lifetime corresponding
to i-th group
50
•The equation for the lifetime corresponding to i-th group
log[ log( 1 F (tij ))] i log( tij ) i log( i )
Yij Ai i X ij
• If the data corresponding to the i- th group follow
W ( i , i ) , the plot ( X i , Yi ) will roughly be linear
with slope i and passes through the point (log( i ),0)
• If the plots are linear and parallel, then lifetimes
under different stress levels are Weibull
with common slope and different scale i
which implies that i depends on the stress
levels Si
51
• If the k plots are linear and parallel, the
lifetimes under different stress levels are Weibull
with different slopes i and different scales i
which implies that both i and i both depend on
the stress levels Si . In this case modeling is
difficult.
• For the first case the relationship between the life
and stress will be identified
• Plot log(.632 quantile) against the stress levels
• If the plot yields a straight line then the life-stress
relationship will be
log( ) 0 1S
52
• Estimation of 0 , 1 &
using ML method
• Likelihood function under stress level
ni
Li [( )( )
j 1
where
i
tij 1
i
Si
tij
i
exp{( ) }] ,
i exp( 0 1Si )
k
• Total log-likelihood LogL( 0 , 1 , ) Log ( Li )
i 1
53
• Using numerical method MLEs of 0 , 1 & may be
obtained
• MLE of at S 0 , say,
relationship
̂ 0 , is obtained through the
ˆ 0 exp( ˆ0 ˆ1S0 )
• Hence ML estimate of Weibull density under used
stress level S 0 is obtained. Accordingly, estimate of
reliability for a specified time, median life and other
desired percentiles may also be obtained
54
Table 5: Specimens lives (in hours) of a electrical
insulation at three temperatures appear
below, data of Nelson(1990), pp. 154
o
200 C
2520
2856
3192
3192
3528
225 o C
816
912
1296
1392
1488
o
250 C
300
324
372
372
444
Nelson,W.(1990): Accelerated Testing: Statistical Models, Test Plans, and Data
55
Analyses, John Wiley and Sons.
56
Above three plots of the data given in table5 are
roughly linear and parallel, so the lifetimes under three
stress levels are Weibull with common slope and
different scale parameters which implies that the scale
parameters depend on the stress levels
• Arrhenious life-stress relationship (temperature stress)
log( ) 0 1 (1 / W ),
where W is the temperature in degree kelvin
• Temperature in degree kelvin= temperature in degree
centigrade plus 273.16
57
58
• Results based on MLEs for the data given in table 5
with respect to the Arrhenious-Weibull model
ˆ0 13.39707
ˆ1 10596.9961
ˆ .68566
2 log Lˆ 269.9008
At used stress(180 deg. Centigrade) the following
results are obtained
ˆ 0 21754.98 and
ˆ .68566
Estimated median lifetime=12747.08 hours
59
• Results based on MLEs for the data given in table 5
with respect to the Arrhenious-Exponential model
ˆ0 13.17807
ˆ1 10596.98923 2 log Lˆ 273.9037
At used stress(180 deg. Centigrade) the following
results are obtained
ˆ 0 27080.87
Estimated median lifetime=18771.03 hours
60
• Weibull versus Exponential Model for ALT
H0 : 1
vs
2 log Lˆ0 273.9037
2 log Lˆ1 269.9008
H1 : 1
(For Exponential model)
(For Weibull model)
2
ˆ
ˆ
2(log L1 log L0 ) 4.0029 3.34 (.95,1)
Conclusion: Accept Weibull model at 5% level of
significance
61