An example of a data mining problem
Download
Report
Transcript An example of a data mining problem
An example of
a data mining project
Problem
• Detect and explain faults of a continuous
pulp digester
Faults: drops in the output quality of the
digester.
Solution
• A report which consists of
–
–
–
–
–
description of analyzed data,
analysis methods,
results,
conclusions, and
process improvement recommendations.
Problem understanding
• Several sources of information:
–
–
–
–
description of process instrumentation,
documentation of digester control system,
ISO 9000 documents,
interviews of operation personnel, process
engineers, researchers, and automation system
vendor engineers.
Data acquisition
• About 200 on-line measurements
• Sampling rate 1 sample/10 minutes
• Data stored in SQL-database at the mill
Data acquisition
• Data acquisition procedure
– a shell script run in SQL host twice a month
– ftp-transfer of the data to HUT through firewall
by a mill computer operator
– addition of the new data files after the existing
ones at HUT using shell scripts
Data acquisition
Data file format:
value1 checkbits1 timelabel1
value2 checkbits2 timelabel2
.
.
.
.
.
.
.
.
.
valueN checkbitsN timelabelN
Basic data preparation
• For each measurement channel:
– check that the measurements are valid using
checkbits
– check using timelabels if some samples are
missing; if this is the case, fill in the empty
gaps with NaNs
Data survey
• Visual data inspection (time series plots)
revealed some problems:
– some measurements didn’t work at all,
– some measurements worked properly, but not
all the time,
– changes in production speed could be seen in
most measurements, and
– process tuning altered the behavior of some
measurements.
Data survey
• Computation of material balances provides
a way to roughly estimate reliability of
some sensors
• Process delay from input to output of the
digester about three hours
Delay
between different measurements in
different parts of the process had to be
compensated
Data survey
• In order to get reliable results, only periods
with constant production speed should be
analyzed
Data modeling
• First, only temperature measurements in the
digester sides were used
• Basic idea: to estimate the movement of
chips using correlations between
neighboring measurements
Failed
Data modeling
• Next, all available measurements were used
• The measurements were reduced to the ones
best depicting the state of the digester
• The reduction was carried out using
– process knowledge,
– data visualization, and
– correlation analysis.
Data modeling
• During the project, a digester modeling
expert was consulted
• A model depicting the fault sensitivity of
digester was created