www.zaptron.com
Download
Report
Transcript www.zaptron.com
Industrial Diagnosis by
Hyper Space Data Mining
Presented at
AAAI 99 Spring Symposiumon Equipment Diagnosis
Stanford University
March 23, 1999
Dr. Dongping (Daniel) Zhu
Zaptron Systems, Inc.
Mountain View, CA 94043
Tel: 650-966-8700, Fax: 650-966-8780
E-mail: [email protected]
http://www.zaptron.com
Zaptron, 1999
1
OUTLINE
Diagnosis overview: applications &
technologies
Hyperspace data mining
Diagnostic examples
product quality control (steel making)
resolve bottleneck (gasoline production)
improve yield (chemical plan)
Conclusions
MasterMiner™ demo
Zaptron, 1999
2
Diagnosis &Trouble-Shooting
Cost of support to products/services
Customer satisfaction
Key Issues
how to best approach the same problem next time
how to use history information - data mining
how to update KB
Solutions
on-line help
web-based, remote diagnostics
knowledge management tools
data mining (history data are available)
Zaptron, 1999
3
A Web-based Diagnostic System
Call Centers
Service Teams
Support Teams
Data Collecting Mechanisms
Standardization
Data Management
D Mining
KD(D+K) K Updating
Product Delivery Mechanisms
Training
tools
Web-based
diagnosis
On-line
Help SW
Zaptron, 1999
Remote
Repairs
Factor
Analysis
KB
manage
4
Rule-based Diagnostic Process
History
Database
Fix Fault
Fault
Physics
Primary
Cases
Analysis
Diagnose
Rule Base
Diagnostic
Matrix
Cause
Self
Learning
Query
New data
& Cases
Update
Database
Zaptron, 1999
5
Expert System Architecture
Interviewer (fi, hj)
K Collector (aijl, bikl)
Analyzer, Visualizer
Web
Users
Data
Base
{a, b}
Web
GUI
KB Builder (Mijk)
KB
Problem Solver
(Search Engine)
{Mij}
Self Learner rijk
Zaptron, 1999
6
Evolution of Diagnostic Techniques
• Equipment and Processes
• Sensors
• Data
• Databases
• Data Models
• Data Patterns (behavior in space)
• Data Fusion, sensor fusion
• Data Mining
• Data ……
Zaptron, 1999
7
Data Mining: Techniques
• Correlation/association analysis
• Factor analysis
• Trend prediction & forecasting
• Neural networks
• Genetic algorithms
• Fuzzy logic, expert systems
• Uncertainty reasoning (DS, rough sets)
• Bayessian Networks
• Hyper space data mining • find data pattern first
• no model assumption
• provide solutions to failure isolation/recognition
Zaptron, 1999
8
Hyper Space Data Mining
Introduction
Diagnosis - An optimization problem
A Hyper Space Technology
Application Examples
SW: MasterMiner™
Zaptron, 1999
9
A General Issue
• For any system - find a model to describe
Operating
data record
In situ
sensor report
Raw materials
composition
Design/operating
Relation
ships
Nonlinear
High noise
M-variant
(no model)
process parameters
Zaptron, 1999
Failure & fault
Bottle neck
Energy use
Cost/risk
Quality
Yield/returns
Reliability
Productivity
10
A Catch 21 Problem
Data Pattern <--?--> Data Model
Questions:
what type of data to collect
which data to use in modeling
Solution:
Hyperspace data mining
Zaptron, 1999
11
To Start - A Real Case
Aluminum Production Problem
Target: to Optimize the Leaching Rate of Al2O3
Factors:
a1 - Fe/Al in the ore
a2 - Sodium Na/(Al2O3+Fe2O3))
a3 - leaching temperature
a4 - lime (CaO)/(SiO2-TiO2)
2 Solutions:
Principal Component Analysis (PCA) by SAS JMP or RS/1 - bad
Hyperspace data mining by Zaptron MasterMiner™ - good result
Zaptron, 1999
12
Can you see the pattern?
If not, do data mining
to separate into subspaces
Zaptron, 1999
13
A Real Case - PCA Result: no separation
Zaptron, 1999
14
A Real Case - MasterMiner: good separation
Zaptron, 1999
15
MasterMiner 2nd step: complete separation
Zaptron, 1999
16
A Real Case - MasterMiner: build a model
Zaptron, 1999
17
History Data
Steps in Data Mining
Separability Test
Pretreatment: local view, delete outliers
Linearity, topological type, correlation,
association, best matching point,
Data Mining
NN points
Feature reduction (entropy, voting)
Feature Selection
Modeling (PH, MREC, ANN, GA)
Inequality, equations,
PLS, sensitivity, advisory
Map description Extrapolation to
Equations
as
State
diagnosis
Propose an optimal
of cross-sections optimal zone for
operating condition by using current criteria for
operation data optimal control of normal op zone max yield
or new materials
& failure zones
Zaptron, 1999
18
Clustering - Data Separation
PCA - projection in the max
separable direction
Fisher: line projection with
max distance
between clusters
MREC: projective geometry,
better than either
One-sided
(voting)
Data Base
Data Mining
Data Patterns
Inclusive
(entropy)
Zaptron, 1999
Exclusive
Sandwich
19
Software Architecture
GUI
DataBase
Pattern Recognitin
KnowBase
Artificial Neural Nets
Genetic Algorithm
Zaptron, 1999
20
MasterMiner™ Functions
Zaptron, 1999
21
MasterMiner™ Tools
• Data loading, editing, sorting, calculation
• Preprocessing: statistics, Feature selection, folding
• Factor analysis
target-factor analysis
factor-factor analysis
• Projections
Fisher, LMAP, PCA, PLS, MREC
• Modeling
envelope, auto-box, Sphere, KL, ANN (train, estimation, sensitivity)
• Extrapolation
PLS vector (linear), Simplex, appending,
Zaptron, 1999
22
Virtual Mining Tools for
Convex and concave space
Virtual mining in hyper space
• Hidden projection - tunnel model
• Envelope - generate a convex polyhedron
• Use “auto-box” for concave polyhedrons of samples
• Interchange of data classes
• Folding transform (to change data pattern in space)
Virtual mining of data samples
• divide into multiple segments
• convert concave polyhedron into convex ones
• build the model for each subspace
• separability went from 31% to 96% in one case
Zaptron, 1999
23
Virtual Mining Methods
(b) The Envelop-Boxing
method
(a) Tunnel model to separate
data samples in hyper space
(c) Generate convex polyhedrons
from a concave one
Zaptron, 1999
24
Iterative Feature Selection/Reducton
Data pattern classified into 2 topological classes
“one-sided class”
“inclusive class”
Hidden projections applied
Projected factors are orthogonal in hyper space
Feature selection method (highly effective):
Entropy method is used for inclusive pattern
Voting method is used for one-sided pattern
Reduce features to reduce noise & complexity
e.g., good result based on 5 features out of 500
Reduced feature set needs to pass Separation test
Zaptron, 1999
25
MREC - Map Recognition Method
MREC - Projection in the best direction,
complete separation in 2 steps
PCA:
No separation
Zaptron, 1999
26
We have Improved the Quality of
alloy steels
carbon fiber reinforced, resin-based composite materials
Bi2O3-containing High Tc superconductors
rare earth containing phosphor
electrode materials of Ni/H batteries
VPTC ceramic semi-conductor
high temperature, SiC-based structural ceramics
high-polymers: PVC, synthetic fiber & rubber, polyethylene, ...
high energy materials
semi-conductor devices
MOCVD method of III-V compound film
Zaptron, 1999
27
We have applied MasterMiner™ to
Industrial Optimization & Diagnosis
Petrochemical industry
• distillation
• hydro-cracking
• vapor recovery
• platinum reforming
• delayed cooking
• de-waxing
• vinyl acetate
• polypropylene
• jet fuel (Union Oil recipe, yield 87% -> 94%, +6,000 ton/yr)
• increase life of catalyst in polyvinyl plant (catalyst cost $1.2MM)
• etc.
Zaptron, 1999
28
We have applied MasterMiner™ to
Industrial Optimization & Diagnosis
Metallurgical Industry
• blast furnace
• casting
• alloy steels quality improving (60% -> 80%)
• energy saving in aluminum production
Automobile Industry
• electro-plating
• heat treatment
Chemical Industry
• PVC, polyformaldhyde
• butadiene rubber
Zaptron, 1999
29
Application Areas
Data Mining
Process Optimization
Equipment
Process
Diagnosis
Petrochemical
Industry
Materials Design
Metallurgical Semiconductor
Industry
Industry
GOAL: Optimal control of complex processes involving
Heat transfer
Mass transfer
Fluid flow
Chemical reactions
Zaptron, 1999
30
Pattern Recognition Methods
• Linear Regression (LS) - “forced fitting”
LS fitting coefficients as model parameters, the “best wish”
• PCA - principal component analysis
projection in “best” direction, select two directions, LS
• LMAP - linear mapping
• NN - neural nets
blind learning, over-fitting, forced fitting
origin at cluster center, covered with an ellipsoidal, PCA
• MREC - map recognition (non linear)
polyhedrons, hidden projections, separation, back-mapping
• NNREC - neural nets + MREC
Zaptron, 1999
31
Comparison of Various Methods
CONDITION
METHOD TO USE
1. (in some cases)
Mechanism known
Rule-based expert systems
2. (in 20% cases)
Linear w/o noise
Linear regression, statistical method
3. (in most cases)
Highly noisy
Multi-variant
Non Gaussian
Hyper-space data mining
Zaptron, 1999
32
Why not Principle Component Analysis (PCA) ?
Principle Component Analysis (PCA)
Data Mining by MasterMiner
nonlinear, Hierarchical
Linear
Gaussian
Low noise
Use all data in modeling
20 projections
Non-Gaussian
High noise
Use subset of data in modeling
2 projections
good separation
No separation
Zaptron, 1999
33
Why not Least Square Only ?
PLS applies when PRESS < 0.3 (1/4 of cases in our practice)
PROJECT
synthetic rubber
steel plate for ship building
rare earth phosphor
Baoshan Iron & Steel
Ni/H battery
Ni/H materials
propylene recovery (noisy data)
propylene recovery
solvent oil
VPTC
hydro-cracking plant
methanol production
casting for car
PRESS (Error)
0.2052 (can use PLS)
0.6419 (can not use PLS)
0.3067
0.3441
0.7389
0.1932
0.7755
0.3752
0.3975
0.1330
0.2055
0.8255
0.9157
Zaptron, 1999
34
Why not Neural Networks (GA) Only ?
Over-fitting problem by NN (GA)
Industrial records are not complete
e.g. Leaching rate problem at an aluminum Co.
Leaching rate = f(a, b, c, T)
A cross-section of the
optimal zone:
• by ANN: too large
• by our Yield Mater™: smaller
c
Wrong zone by ANN
Zone by MasterMiner
b
Zaptron, 1999
35
Applications in Diagnosis
• Equipment setup
• steel making (roller distance,
• oil refinery (bottleneck in gasoline production)
• chemical plans (cooling pipe length, inlet
position)
• Process optimization
• drug fermentation
• environmental emission controls
• materials manufacturing
Zaptron, 1999
36
E.g. 1 Steel Making
Blasting
furnace
•
•
•
•
Steel
making
Casting
Hot
rolling
Cold
rolling
German equipment, yield 10,000 tons/yr
ST14 steel plate
for auto body
Problem - “deep pressing” property
100 = 5x20 factors in 5 stages
2 major factors:
• N2 - Nitrogen content should be reduced
• d1/d2 - distance ratio of cold rollers increased
• Benefit - wasted steel reduced by 5 times
Zaptron, 1999
37
2nd issue: QC in ST14 Steel Plate Making
Feed of Scrap, CaO, MgO, Iron Ore
O2 blower
Ladle
Zaptron, 1999
38
Problem Background
• After each batch, samples were taken in a 3-min test for QC
• Need to control the amount of O2 blown and scrap added
• Japanese case-based reasoning SW --> 65% separability
• Problem: ST14 quality is off-spec
• We used MasterMiner to build a model for QC
• Target: FC (C content in steels, 17-30% by customer spec)
• 13 Factors
• Model built and used to control product quality
• Result: 100% separability, products are on-spec
Zaptron, 1999
39
Feature Selection
Feature selected
Property
LY
PLH
DYSLT
DYCD
DYTEMP
PCAO
PMGO
PORE
WCH
TOIRON
SCAPT
LDLIFE
QO2
age of O2 gun (years)
height of O2 gun
O2 amount (m3) before sampling
C content at sampling time (10-2 %)
liquid iron temperature when sampling (C°)
amount of CaO used
amount of MgO added
amount of iron ore added
total charge of the converter in ton
total liquid iron
amount of scrap
life of ladle used to transport liquid iron
amount of O2 blown after sampling
Zaptron, 1999
40
114 Sample Data
Zaptron, 1999
41
Target-Feature Maps
Zaptron, 1999
42
Data Separation by MasterMiner: 100%
Zaptron, 1999
43
Data Separation by PCA: 30%
Zaptron, 1999
44
Feature Selection (1)
- Principle component regression
Zaptron, 1999
45
Feature Selection (2)
- PLS (partial least square)
Zaptron, 1999
46
Feature Selection (3)
- KW method (linear)
Zaptron, 1999
47
Tunnel Models: 32 Inequalities
Zaptron, 1999
48
Quality Control Issue
• Solve the set of 32 equations
• or use “appending” operation
• assign values to uncontrollable factors
• add N random samples
• project them onto the N-dimensional space
• select those falling into the optimal space
• Results:
The C content of ST14 products are on-specs
Zaptron, 1999
49
Add Random Samples (green)
Zaptron, 1999
50
E.g.2 Bottleneck in Gasoline Production
Cooling coil
Jet fuel
Gasoline
Crude
oil inlet
Diesel
Naphtha
heat
Heavy oil
Asphalt
Distillation
Tower
Problem: gasoline yield low
diagnose thermal cracking
setup
data mining method
identify major factors
diagnostic result:
the length of cooling coils is too short
Benefit:
gasoline increased by 10,000 tons/yr
Zaptron, 1999
51
e.g. 3 Ethylbenzene Synthesis
Fractionation
Tower
Naphtha
Inlet
Reactor
Platinum
Catalyst
Ethylbenzene
heat
A Platinum Reforming Workshop
Zaptron, 1999
52
Ethylbenzene Synthesis
Problem: yield low
Data Mining
Diagnostic result:
position of inlet is wrong
Action:
move from layer 99 to 111
Benefit: yield raised by 35%
Zaptron, 1999
53
E.g. 4 Predictive Control of Chaotic Process
• Answer: No
• Reason: Chaotic noises (Dr. Leon Chao of UC-Berkeley)
• An historical story:
a butterfly in Thailand caused a hurricane in Florida!
• Chaotic noises in chemical reactions: A -> B, C -> D
A
B
Materials
C
Product
Atomic collision
Zaptron, 1999
D
54
E.g. 4 Predictive Control of Chaotic Process
• A Real Case: quality control in PTC ceramic production
• Problem: inconsistent (average) particle size (good rate: 60%)
• Material used: ultra-fine Al2O2 powder
• Chemical reaction: NaAlO2 + H2O --> Al(ON2)3 + NaOH
• Process:
• add acid or base to control the above induction process
• or change the cooling rate
• heated Al(ON)3 powder formed
• distribution of the particle size - near Gaussian
• Al2O3 powder formed
•
Zaptron, 1999
55
E.g. 4 Predictive Control of Chaotic Process
• Discovery:use a violet light, the transparency is varying from batch to batch
Violet Light
2800 å
Al2O3
Transparency
measure
Violet Transparency
1 2 3
Zaptron, 1999
Time
100
56
E.g. 4 Predictive Control of Chaotic Process
• Analysis: chaotic noises do have patterns by DataMaster™
• Practical Solution:
• measure the resistance r curve of a Al2O3 block being formed
• predict the product quality 30 min before finishing
• change the cooling rate to control the final r at 60 min
• Result: quality increased from 60% to 100% in 500 experiments
r
Temperature (C°)
1350
r1
r2
r3
0
30
t
60
Zaptron, 1999
30
time (min)
0
57
Conclusion
If Linear (near linear)
must have “one-sided” pattern
use LS - “the best wish”
extrapolate by accurate model-based prediction
If Nonlinear
if one-sided pattern
use Fisher method
extrapolate by principal components
if inclusive pattern
use MREC
extrapolation by Simplex
Zaptron, 1999
58
Conclusion: Integrated Solution
1997, L. Zadeh: “What is important about soft
computing is that FL, NN, GA & PCA are
synergistic rather than competitive.”
In agreement with our experience
Data do have patterns
Different patterns need different methods
Several methods need to be integrated
New data mining technologies developed
Zaptron, 1999
59
Economic Benefit Generated
Factory
Application
Benefit (USD)
A Petroleum Co.
years
yield increased: jet fuel,
3.5million/2
gas solvent, oil, propylene, xylene
A Petrochem Refinery
yield increased:
gasoline, wax products
1.2 million/year
An Iron & Steels .
Yield increased:
alloy steels for ships
3 million/year
Total profit
7.5 million/year
Ratio of cost to profit in 5 years: 1:100
Zaptron, 1999
60
MasterMiner™ Software
• Desktop application software
• Run on Window95/NT
• Software demo download
http://www.zaptron.com/masterminer
• Examples:
Zaptron, 1999
61
4-D Maps for Control
Zaptron, 1999
62
Test Samples Added
Zaptron, 1999
63
Announcement
2nd International Conference on
Information Fusion -- FUSION’99
July 6 -8, 1999
Sunnyvale Hilton
Silicon Valley, California, USA
abstract due: Feb 1, 1999
http://www.inforfusion.org/fusion99
Sponsored by
International Society of Information Fusion
NASA, ARO
IEEE Signal Processing Society
IEEE Robotics and Automation Society
IEEE Control Systems Society
Special Session on Diagnostic Information Fusion
Zaptron, 1999
64
Thank You !
VIP
Zaptron
Zaptron, 1999
65