Process Monitoring of Bivariate Poisson Data
Download
Report
Transcript Process Monitoring of Bivariate Poisson Data
Process Monitoring
of Bivariate Poisson
Data
A Problem Oriented Solution
Sotiris Bersimis, Department of Statistics and Insurance Science,
University of Piraeus, Piraeus, Greece, and
Petros E. Maravelakis, Department of Statistics and ActuarialFinancial Mathematics, University of the Aegean, Samos, Greece.
• Greek industry is composed of 23 sectors and the most
important of them is the food and drink sector.
• This sector represents about 21% of Greek manufacturing
industry, includes more than 1,300 enterprises and creates
70,000 jobs.
• In 2002, Greece’s food sector was second in the European
Union (out of 15 countries), in terms of growth, reaching a
growth rate of 3.3% (in that period Spain hold the first place).
• The first place in the food sector is taken by dairy products,
which hold 24%.
• The 5 main sectors of the Greek dairy industry are: milk,
yogurt, cheese, ice cream, cream and butter.
The Problem is related to
Food Industry
Contribution of Each Food Sector in
the Greek Food Industry
• The dairy industry showed great signs of improvement, in the
last 10 years, mainly because of the high nutritional value of
dairy products and their close relationship with the Greek diet
(now is also trapped in the economic crisis).
• From the preceding discussion it is clear that the dairy industry
is of great importance for Greek Economy while milk is of
great importance for Greeks’ diet.
• Among the different categories, Greeks prefer fresh milk,
which holds 47.4% of total share.
• At the same time, all companies invest considerable money in
terms of research and development, and installation of units to
gather and process fresh milk of high quality and safety.
In the dairy industry and
Especially in Milk Production
• In fresh milk as well as in many food processing operations,
product safety is controlled, by checking only the final product
by microbiological and chemical methods (Tokatli et al, 2005).
• A major drawback associated with this approach is time delay.
Collecting and examining the samples to determine the safety
of the product takes too much time (the results of the
microbiological analysis are completed only after the product
is released to the market).
• Another drawback is that it can be a high-cost solution if any
contamination is reported after the production is completed.
Furthermore, the recall of the defective product and the
collection from retail outlets add extra significant cost.
A Closer Look to the Problem
• Thus, it is clear that new process monitoring techniques
are needed aiming at this type of problems.
• The significance of new process monitoring techniques
to deal with this type of problems arises from the fact that
these cases are related to public health, since there are many
diseases associated with low quality milk (or similar food
products):
•
•
•
•
•
•
Leptospirosis
Cowpox.
Tuberculosis
Brucellosis
Listeria
Johne's Disease
A Closer Look to the Problem
• A milk pasteurization plant. A continuous pasteurization line.
• Our focus was concentered in the time interval after the
pasteurization is completed and before the product is released
to the consumer, since any pathogenic biological factor
contained in the raw material is removed with the
pasteurization process.
• What now if a pathogenic biological factor appears to the part
of production after the pasteurization of the product?
• As we already said microbiological methods are applied to the
final product to ensure that the milk is safe for consumption.
• But also we said that the there is a time delay and that usually
the exact results of the microbiological analysis are taken
after the product is released to the market.
The Exact Problem
• Thus we need a monitoring procedure !
• But what are we going to monitor ?
• Usually in that case quality control departments monitor the
percentage of the non-conforming products.
• Also there are few cases that the quality control departments
monitor the number of microorganisms of a specific type
found in a sample (microorganisms per milliliter / in a
suspension created by a sample from the production line)…
• In that case we monitor with an appropriate control chart a
Poisson distributed test statistic…
• Note here that the milk (and almost all the food) contains
microorganisms that if they do not exceed a threshold can not
affect human health (in some cases are also useful).
The Exact Problem
• But this needs time…
• The Poisson based control chart is fed with measurements only
after the product is at the hands of the consumer…
• The quality managers are instructed that they have to wait a
certain amount of time in order to proceed to the counting of
microorganisms in the plate.
• In that case, we are assuming that if a contamination factor exists,
affects the new products in an increasing way (the effects are a
function of time).
• In that case, a better solution is to use CUSUM type or an EWMA
control chart.
The Exact Problem
• But the use of these types of charts do not solve the problem,
because there is the time delay, and in that case an extreme
event will identified only when is too late…
• A better solution is to measure the number of microorganisms
(of a specific type) that are developed in a test plate (created
by a sample from the production line) in many time points ….
• from zero point to the final time point (and not only at the end
of the time period given in the microbiological guidelines).
• In that way we may be capable to observe how fast are the
number of microorganisms is growing.
• The idea is that if a contaminating factor exists in the
production line after the pasteurization process is completed
then the number of microorganisms will be growing faster.
The Exact Problem
• Also, if a contaminating factor make its appearance in the
production line then it itself evolves (since it is a biological
factor) causing continuously more and more contamination.
• Thus, the proposed sampling procedure is the following:
• Take one sample from the production line every k time units
(say for example every 8 hours)
• Define a value l for the measurements on microbiological
system (say for example 6) – usually by Optical Density.
• If the guidelines instruct that the number of microorganisms
of a specific type must measure in r hours (say 48 hours),
then perform the 1st test at the r/l (8th) hour, the 2nd test at
the 2r/l (16th ) hour, …, and finally the lth test at the r (48th )
hour.
The Exact Problem
Sampling
Point
Plate /
Interval
1
2
3
4
5
6
7
8
9
X(1,1) X(1,2) X(1,3) X(1,4) X(1,5) X(1,6) X(1,7) X(1,8) X(1,9)
10
11
12
…
….
….
….
….
1
(0,8]
2
(8,16]
3
(16,24]
4
(24,32]
X(4,1) X(4,2) X(4,3) X(4,4) X(4,5) X(4,6) X(4,7) X(4,8) X(4,9)
5
(32,40]
X(5,1) X(5,2) X(5,3) X(5,4) X(5,5) X(5,6) X(5,7) X(5,8)
6
(40,48]
X(6,1) X(6,2) X(6,3) X(6,4) X(6,5) X(6,6) X(6,7)
X(2,1) X(2,2) X(2,3) X(2,4) X(2,5) X(2,6) X(2,7) X(2,8) X(2,9)
X(3,1) X(3,2) X(3,3) X(3,4) X(3,5) X(3,6) X(3,7) X(3,8) X(3,9)
….
• The null hypothesis is that the process is in control, that there is no time dependence, and
that each of the components x(i,j), i=1,2,…,l=6 and j=1,2,…,+∞ follows a Poisson
distribution with the parameter λl.
• Thus, each time point u the sums of the form y(u)= x(1,u)+x(2,u-1)+…+x(l,u-l+1) are
also Poisson random variables with parameter l∙λl.
Sampling Scheme
• Thus, in case we are interested in only one type of
bacterium, we may apply a univariate Shewhart type
control chart on the statistic
y(u)=x(1,u)+x(2,u-1)+…+x(l,u-l+1) for u=1,2,…,+∞.
Univariate Control Chart
• But what happens in the case that we have more than one types
of bacterium ? Say for example 2.
• In that case, we may apply the same technique in both the
types of bacterium.
• Thus, we conclude with two sums of the form
• y(1,u)=x(1,u)+x(2,u-1)+…+x(l,u-l+1) for u=1,2,…,+∞
• y(2,u)=x(1,u)+x(2,u-1)+…+x(l,u-l+1) for u=1,2,…,+∞
• The two variables in most of the cases will be dependent, since
the presence of a contaminating factor will trigger a chain
reaction in the evolution of these types of bacterium.
• In that case, we define the two dimensional random variable
y=(y1,y2) which follows a two dimensional Poisson distribution
with parameters λ1, λ2, and λ.
The Bivariate Case
• The two dimensional random variable y=(y1,y2) has the
following probability function
Pr(Y1 y1 ,Y2 y 2 ) exp{ ( λ1 λ2 λ3 )}
λ λ
y1 ! y 2 !
z1
1
z2
1
min( y1 ,y 2 )
k 0
y1 y 2 λ1
k !
k
k
λ
λ
1 2
• This bivariate setting is actually based on the joint
distribution of the variables Y1, Y2 where in general
Y1=Z1+Z3 and Y2=Z2+Z3
and Z1, Z2, Z3 are mutually independent Poisson random
variables with means λ1, λ2 and λ3, respectively.
The Bivariate Case
k
• The next step in our methodology is to identify the variable
that will be used for the monitoring the bivariate process.
• A fact that will be used to motivate the selection of this
variable is that the number of the bacteria can only increase.
Therefore, we are interested in a variable that will be able to
detect fast this possible increase.
• A straightforward selection is the sum of the two random
variables Y1 and Y2 which is the sum of two dependent Poisson
variables, say Y.
• This random variable identifies an increase in the mean of
either Y1 and Y2.
• The random variable Y follows a Hermite distribution (see
Jonshon, Kotz and Kemp (1992) pages 357-364) with
probability function
z 2 j
j
[ z/ 2 ]
Pr(Y y s ) exp( α1 α2 )
The Bivariate Case
j 0
α1 α2
,
( x 2 j )! j !
a1 1 2 , a2 3
• Consequently, for the identification of an out of control
situation we may construct a Shewhart type control chart with
limits calculated using the Hermite distribution (see
Montgomery (2008)).
• This chart detects a possible increase in the mean of any of the
two variables.
Based on 1000
repetitions.
The Bivariate Case
Variable Shift
1
25%
1
75%
1
125%
2
25%
2
75%
2
125%
1,2
25%
1,2
75%
1,2
125%
ATS Difference
82,4
67,1
22,4
88,2
65,1
24,1
52,2
35,2
20,1
• The next step required by the nature of the problem is to
see what happens after an out-of-control signal is given.
• A method to identify the responsible variable is needed.
• In order to identify the responsible variable after a signal
we have to properly select a random variable that will
help us in this direction.
• Such a random variable is the difference of the two
random variables Y1 and Y2, say Y’.
• From the definition of the bivariate Poisson distribution
we deduce that Y’=Y1-Y2=Z1-Z2, is the difference of two
independent Poisson r.v.
The Bivariate Case
• Since we use Y’ after a signal is issued, we expect to see one of
the following results
• a positive value of Y’ meaning that we have an increase in Z1.
• a negative value of Y’ meaning that we have an increase in Z2.
• a value of Y’ close to zero meaning that both Z1 and Z2 have
shifted.
• Therefore, the use of Y’ assures us that we will be able to
identify the responsible variable in most of the cases. The
probability distribution of Y’ is known and is given in Jonshon,
Kotz and Kemp (1992) pages 190-192 and it is of the form
λ
Pr(Y ' y ) exp( λ1 λ2 ) 1
λ2
y/2
The Bivariate Case
Iz 2
λ1 λ2 ,
x
I z ( x)
2
r
( x 2 / 4) k
.
k
!
(
r
k
1
)
k 0
• Thus, we may use the distribution of Y’ in order to define
a formal procedure for identifying the out-of-control
variable.
• Specifically, if the value of Y’ is above the 95%
percentage point of its theoretical distribution, then
responsible variable is Y1 and if the value of Y’ is below
the 5% percentage point of its theoretical distribution
then Y2 is the responsible variable and if the value of
is between the 5% and 95% percentage point of its
theoretical distribution then both variables have shifted.
The Bivariate Case
Shift
Shift
Variable Size (%) Correct Identification
1
25%
35,2%
1
50%
76,4%
1
75%
93,2%
1
100%
95,2%
1
150%
99,8%
2
25%
34,2%
2
50%
77,2%
2
75%
93,2%
2
100%
94,6%
2
150%
99,2%
Based on 1000
repetitions.
Correct Identification Rates
• Figen (Kosebalaban) Tokatli, Ali Cinar, Joseph E. Schlesser (2005).
HACCP with multivariate process monitoring and fault diagnosis
techniques: application to a food pasteurization process, Food
Control, 16, 411–422.
• Jonshon, N.L., Kotz, S. and Kemp, A.W. (1992). Univariate Discrete
Distributions, Wiley, New York.
• Montgomery, D.C. (2008). Introduction to Statistical Quality
Control, Wiley, New York.
References