Transcript Slide 1

Monitoring High-yield processes
MONITORING HIGH-YIELD PROCESSES
Cesar Acosta-Mejia
June 2011
Monitoring High-yield processes
EDUCATION
–
B.S. Catholic University of Peru
–
M.A. Monterrey Tech, Mexico
–
Ph.D. Texas A&M University
RESEARCH
–
Quality Engineering - SPC, Process monitoring
–
Applied Probability and Statistics – Sequential analysis
–
Probability modeling – Change point detection, process surveillance
Monitoring High-yield processes
MONITORING HIGH-YIELD PROCESSES
MOTIVATION
–
High-yield processes
–
Monitor the fraction of nonconforming units p
–
Very small p (ppm)
–
To detect increases or decreases in p
–
A very sensitive procedure
Monitoring High-yield processes
MONITORING HIGH-YIELD PROCESSES
ASSUMPTIONS
•
Process is observed continuously
•
Process can be characterized by Bernoulli trials
•
Fraction of nonconforming units p is constant, but
may change at an unknown point of time 
Monitoring High-yield processes
Hypothesis Testing
For (level ) two-sided tests
the region R is made up of two subregions R1 and R2
with limits L and U such that
P[X ≤ L] =  / 2
P[X ≥ U] =  / 2
L
U
Monitoring High-yield processes
Hypothesis Testing
Consider testing the proportion p
Monitoring High-yield processes
Hypothesis Testing
The test may be based on different random variables
•
Binomial (n, p)
•
Geometric (p)
•
Negative Binomial (r, p)
•
Binomial – order k (n, p)
•
Geometric – order k (p)
•
Negative Binomial – order k (r, p)
Monitoring High-yield processes
Binomial tests
when p is very small
Monitoring High-yield processes
Test 1
•
proportion
p0 = 0.025
•
test
H0 : p = 0.025
(25000 ppm)
against
H1 : p  0.025
•
X
n. of nonconforming units
in 500 items
•

0.0027
Monitoring High-yield processes
Test 1
Let
X  Binomial (500,p)
To test the hypothesis
H0 : p = 0.025 against H1 : p  0.025
the rejection region is
R = {x ≤ 2}  {x ≥ 25}
since
P[X ≤ 2]
= 0.000300
< 0.00135 =  /2
P[X ≥ 25]
= 0.001018
< 0.00135 =  /2
Monitoring High-yield processes
Test 1
Plot of P[rejecting H0] vs. p is
probability of rejecting Ho
0.01200
0.01000
0.00800
0.00600
0.00400
0.0027
0.00200
0.00000
5000
10000 15000 20000 25000 30000 35000 40000 45000
parts per million
Monitoring High-yield processes
Hypothesis Testing
Now consider testing
p0 = 0.0001 (100 ppm)
Monitoring High-yield processes
Test 1
Let
X  Binomial (n = 500,p)
To test the hypothesis
H0 : p = 0.0001
against H1 : p  0.0001
the rejection region is
R = {X ≥ 2}
since
P [X ≥ 2]
= 0.0012
For n=500 there is no two-sided test for p = 0.0001.
Monitoring High-yield processes
Test 1
Binomial (n = 500, p = 0.025)
Binomial (n = 500, p = 0.0001)
Monitoring High-yield processes
Test 1
For this test a plot of P[rejecting H0] vs. p is
0.009
0.008
P [ rejecting Ho]
0.007
0.006
0.005
0.004
0.0027
0.003
0.002
0.001
0
20
40
60
80
100
120
140
160
parts per million
180
200
220
240
260
Monitoring High-yield processes
Consider a geometric test for p
when p is very small
Monitoring High-yield processes
Test 2
Let
X  Geo(p)
To test the hypothesis ( = 0.0027)
H0 : p = 0.0001 against H1 : p  0.0001
the rejection region is
R = {X ≤ 13}  {X ≥ 66075}
since
P[X ≤ 13]
= 0.0013
P[X ≥ 66075] = 0.00135
An observation in {X ≤ 13} leads to conclude that p > 0.0001
Monitoring High-yield processes
Test 2
For this test a plot of P[rejecting H0] vs. p is
0.01200
P[rejecting Ho]
0.01000
0.00800
0.00600
0.00400
0.00270
0.00200
0.00000
50
100
150
200
p
250
300
Monitoring High-yield processes
Another performance measure
of a sequential testing procedure
Monitoring High-yield processes
Hypothesis Testing
Let X1, X2, …  Geo(p) iid
Let T number of observations until H0 is rejected
Consider the random variables for j = 1,2,…
Aj = 1
Aj = 0
if
Xj  R
P[Aj = 0] = PR
otherwise
then the probability function of T is
P[T= t]
= P[A1 = 0] P[A2 = 0]… P[At-1 = 0] P[At = 1]
= PR [1-PR]t-1
Monitoring High-yield processes
Hypothesis Testing
therefore
T  Geo(PR)
Let us consider E[T] = 1/PR as a performance measure
then
E[T] = 1/PR
when p = p0
E[T] = 1/
mean number of tests until H0 is rejected
Monitoring High-yield processes
Test 2
Let
X  Geo(p)
q=1-p
P [X ≤ x] = 1 – qx
Let the rejection region
R = {X < L}  {X > U}
then
PA
= P [not rejecting H0]
= P [ L ≤ X ≤ U]
= 1 – qU – (1 – qL-1)
= qL-1 – qU
PR
= 1 – (1- p )L-1 + (1 - p)U
Monitoring High-yield processes
Test 2
Let
X  Geo(p)
To test the hypothesis ( = 0.0027)
H0 : p = 0.0001 against H0 : p  0.0001
the rejection region is
R = {X < 14}  {X > 66074}
then P[rejecting H0] is
PR
E[T]
when p = p0
= 1 – (1 – p)13 + (1 – p)66074
= 1/PR
E[T] = 1/ = 370.4
Monitoring High-yield processes
Test 2
we want E[T] < 370.4 when p > 0.0001
Monitoring High-yield processes
Test 2
How can we improve upon this test ?
we want E[T] < 370.4 when p > 0.0001
Monitoring High-yield processes
run sum procedure
Monitoring High-yield processes
Geometric chart
A sequence of tests of hypotheses
Monitoring High-yield processes
THE RUN SUM – for the mean
Monitoring High-yield processes
THE GEOMETRIC RUN SUM
Monitoring High-yield processes
THE GEOMETRIC RUN SUM - DEFINITION
• Let us denote the following cumulative sums
SUt = SUt-1 + qt
= 0
if
Xt falls above the center line
otherwise
SLt = SLt-1 - qt
=0
if
Xt falls below the center line
otherwise
where qt is the score assigned to the region in which Xt falls
Monitoring High-yield processes
THE GEOMETRIC RUN SUM - DEFINITION
• The run sum statistic is defined, for t = 1,2,…, by
St = max {SUt, -SLt}
with SU0 = 0, SL0 = 0
and limit sum L
Monitoring High-yield processes
THE GEOMETRIC RUN SUM - DESIGN
• Need to define
region limits (l1, l2, l3 and l5, l6, l7)
region scores (q1, q2, q3 and q4)
limit sum L
Monitoring High-yield processes
THE GEOMETRIC RUN SUM - DESIGN
• Region limits above and below the center line are not
symmetric around the center line.
• To define the region limits we use the cumulative
probabilities of the distribution of X  Geo (p0)
• Such probabilities were chosen to be the same as those of
a run sum for the mean with the same scores
Monitoring High-yield processes
THE GEOMETRIC RUN SUM - DESIGN
Monitoring High-yield processes
THE GEOMETRIC RUN SUM - EXAMPLE
• If X  Geo (p0 = 0.0001)
the region limits are given by
0.00123 =
0.02175 =
0.15638 =
0.50000 =
0.84362 =
0.97825 =
0.99877 =
P [X ≤
P [X ≤
P [X ≤
P [X ≤
P [X ≤
P [X ≤
P [X ≤
l1 ]
l2 ]
l3 ]
l4 ]
l5 ]
l6 ]
l7 ]
Monitoring High-yield processes
THE GEOMETRIC RUN SUM - EXAMPLE
• If X  Geo (p0 = 0.0001)
the region limits are given by
0.00123 =
0.02175 =
0.15638 =
0.50000 =
0.84362 =
0.97825 =
0.99877 =
P [X ≤
13 ]
P [X ≤ 220 ]
P [X ≤ 1701 ]
P [X ≤ 6932 ]
P [X ≤ 18554 ]
P [X ≤ 36280 ]
P [X ≤ 67007 ]
Monitoring High-yield processes
THE GEOMETRIC RUN SUM - EXAMPLE
• Conclude H1: p  p0 when St  L
• Let T number of samples until H0 is rejected
• What is the distribution of T ?
• What is the mean and standard deviation?
Monitoring High-yield processes
RUN SUM (0,1,2,3) L = 5 - MODELING
• Markov chain
• States defined by the values that St can assume
• State space
 = {-4,-3,-2,-1,0,1,2,3,4,C}
where
C ={n  N | n = …,-6,-5,5,6,…}
is an absorbing state
• Transition probabilities
Monitoring High-yield processes
RUN SUM (0,1,2,3) L = 5 - MODELING
• Let
p1
p2
p3
p4
p5
p6
p7
p8
=
=
=
=
=
=
=
=
where X  Geo (p0)
P [ X ≤ l1 ]
P [ l1 ≤X ≤
P [ l2 ≤X ≤
P [ l3 ≤X ≤
P [ l4 ≤X ≤
P [ l5 ≤X ≤
P [ l6 ≤X ≤
P [ X > l8 ]
l2]
l3 ]
l4]
l5]
l6]
l7]
Monitoring High-yield processes
RUN SUM (0,1,2,3) L = 5 - MODELING
Transitions from St = 0
Monitoring High-yield processes
RUN SUM (0,1,2,3) L = 5 - MODELING
Transitions from St = 1
Monitoring High-yield processes
RUN SUM (0,1,2,3) L = 5 - MODELING
Transitions from St = 2
Monitoring High-yield processes
RUN SUM (0,1,2,3) L = 5 - MODELING
Monitoring High-yield processes
RUN SUM (0,1,2,3) L = 5 - MODELING
• Let T be the first passage time to state C
n. of observations until the run sum rejects H0
• Let Q be the sub matrix of transient states, then
P [T ≤ t] = e ( I – Qt ) J
G (s) = se ( I – s Q )-1 ( I – Q) J
E [T]
= e ( I – Q )-1 J
e is a row vector defining the initial state {S0}
Monitoring High-yield processes
Geometric Run sum
For this chart a plot of E[T] vs. p is
600
500
average run length
400
300
200
100
ppm
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
20
0
Monitoring High-yield processes
Geometric Run sum
A comparison with Test 2
600
500
370.47
300
200
100
ppm
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
0
20
average run length
400
Monitoring High-yield processes
RUN SUM – FURTHER IMPROVEMENT
• Consider a geometric run sum
– No regions
– Center line equal to l4
– Scores are equal to X
– Design – limit sum L
Monitoring High-yield processes
NEW GEOMETRIC RUN SUM - DEFINITION
• Let us denote the following cumulative sums
SUt = SUt-1 + Xt
= 0
if
Xt falls above the center line
otherwise
SLt = SLt-1 - Xt
=0
if
Xt falls below the center line
otherwise
Monitoring High-yield processes
NEW GEOMETRIC RUN SUM - DEFINITION
• The run sum statistic is defined, for t = 1,2,…, by
St = max {SUt, -SLt}
with SU0 = 0, SL0 = 0
and limit sum L
Monitoring High-yield processes
NEW GEOMETRIC RUN SUM - MODELING
• Markov chain – not possible
– huge number of states
• Need to derive the distribution of T
• Can show that
Monitoring High-yield processes
NEW GEOMETRIC RUN SUM - MODELING
Monitoring High-yield processes
CONCLUSIONS
•
The run sum is an effective procedure
for two-sided monitoring
•
For monitoring very small p,
it is more effective than
a sequence of geometric tests
•
If limited number of regions
it can be modeled by a Markov chain
Monitoring High-yield processes
TOPICS OF INTEREST
• Estimate  (the time p changes – the change point)
• Bayesian tests
• Lack of independence (chain dependent BT)
• Run sum can be applied to other instances
-
monitoring  - arrival process