Data envelopment analysis (DEA)

Download Report

Transcript Data envelopment analysis (DEA)

DATA ENVELOPMENT ANALYSIS
A Tool for Data Mining and Analytics
Joe Zhu
School of Business
Worcester Polytechnic Institute
Worcester, MA 01609
[email protected]
www.deafrontier.net
What is DEA?
When DEA was developed/published in 1978




2
Non-parametric approach to estimating production functions
Thus, we have multiple inputs and multiple outputs (of a
production function)
DEA tries to identify the efficient units
What is DEA exactly?
More than production efficiency estimate
It is a balanced benchmarking



Sherman and Zhu(2013) that enables companies to benchmark
and locate best practices that are not visible through other
commonly-used management methodologies

3
Help executives to study the top-performing units, to identify the best
practice and to transfer the valuable knowledge throughout the
organization to enhance performance, also to test their assumptions
that might be counter-productive
A tool for benchmarking
If one benchmarks the performance of computers, it is natural
to consider different features (screen size and resolution,
memory size, process speed, hard disk size, and others). One
would then have to classify these features into “inputs” and
“outputs” in order to apply a proper DEA analysis. However,
these features may not actually represent inputs and outputs
at all, in the standard notion of production

4
DEA - revisit
Multiple
inputs
Multiple
outputs
the smaller
the better
the larger
the better
a rule for classifying metrics
5
DMU


6
Definition of DMU is generic and flexible
Numerous applications are found in areas of finance, marketing,
transportation, sports, accounting, energy, sustainability, fishery,
insurance and others
(Relative) Efficiency

The term ‘efficiency’ here presents best-practice



An example: measuring the quality of care in the case of
treating heart-attack patients


7
Under general benchmarking, it does not necessarily mean
‘production efficiency’
We may refer to the DEA score as a form of ‘overall performance’ of
an organization
Some measures which can be used in DEA to yield a composite
measure of quality indicators
Patients Given Aspirin at Arrival, Patients Given Beta Blocker at
Discharge, etc.
Mathematical Model
s
max z    r y ro
r 1
subject to
s
m
  r y rj   i x ij  0
r 1
m
i 1
 i x io  1
i 1
 r , i  0
8
 *  min 
Dual
subject to
n
 xij  j  xio
i  1,2,..., m;
 y rj  j  y ro
r  1,2,..., s;
j  0
j  1,2,..., n.
j 1
n
j=1
Business Analytics by
Data Envelopment Analysis (DEA)




Descriptive Analytics: Gain insight from historical
data
Predictive Analytics: Forecasting
Prescriptive Analytics: Recommend decisions using
optimization, simulation, etc.
Decisive Analytics: supports human decisions with
visual analytics
DATA ENVELOPMENT ANALYSIS



DEA is a DATA ANALYSIS tool
Data Mining and Knowledge Discovery by DEA
More than Relative Efficiency
10
Sample Size



DEA is not a form of regression model
It is meaningless to apply a sample size requirement to DEA
It is likely that a significant portion of DMUs will be
benchmarked as the best practice with ratio 1, if there are too
many performance metrics given the number of DMUs

11
One can use certain DEA approaches to reduce the number of bestpractice DMUs
Regression
analysis
12
Data Envelopment Analysis
Numerous Models/Approaches
One modification to DEA is called stratification.
•
13
Stratification results in many efficiency frontiers.
The first represents all DMUs with the highest
efficiency, and so on down each stratified level until
all DMUs have been included.
Network Structure
14
Ship Block Manufacturing Process
Performance Evaluation
Shipbuilding process
Business & Service Computing Laboratory
Main processes of shipbuilding consist of several work stages
For effective ship construction
A ship is divided into properly sized blocks in the design stage
All blocks are manufactured (or assembled) into the body of a ship
Design
Cutting &
Forming
Assembly
Pre-Outfitting
& Painting
Pre-Erection
Erection
Quay
16
management of block manufacturing process
(BMP)
Many blocks are assembled into a ship, each block has complex manufacturing processes
A large ship usually needs more than 250 different blocks, each manufactured through a different
process according to the ship’s type and size
Thus
Effective block manufacturing process (BMP) management has been regarded as one of
the most important issues in shipbuilding industry
An effective and efficient BMP performance enables a reduction of the overall shipbuilding
period and thereby the cost
For example
- If any one block includes unnecessary work stages, the related inefficient resource assignment
or long queuing times in the storage yard will have a negative effect on the overall
shipbuilding period and productivity
For an effective management of BMP performance
practical and accurate performance evaluation method that considers various factors
reflecting real manufacturing processes and situations is crucial
17
Practical difficulties in evaluating BMP
performance
For effective BMP management, the shipbuilding companies have implemented
production information systems (e.g. BAMS (Block Assembly Monitoring System) or RPMS
(Real-time Progress Management System)…)
But
These systems only focus on work scheduling, process monitoring and work automation
There are at least two practical difficulties in evaluating BMP performance
1) There are many block assembly types (e.g. Sub-assembly, Unit-assembly, and Grand-assembly...)
and each assembly type is in turn classified into one of three form types (e.g. Small, Curved, and Large…)
2) There are discrepancies between actual and planned work in the form of time gaps due to various
problems (e.g. work delay, urgent work, and the convergence of blocks at the end of the process… )
Generally, there is a 5~9 day delay between planned work and performed
work
18
Goal of this research
Business & Service Computing Laboratory
This research proposes an integrated systematic approach to evaluate the performance
of BMP in the shipbuilding industry by integrating process mining (PM) and DEA
 This research addresses above two practical difficulties in evaluating BMP performance
Performance evaluation of BMP
Block manufacturing processes
Evaluation
Guideline for improving the
performance of underperforming
BMPs
Database in
shipbuilding
company Data Extraction
Generation
Data preprocessing
Process mining (PM)
Data19envelopment analysis (DEA)
Proposed method
Business & Service Computing Laboratory
20
Clustering
Business & Service Computing Laboratory
BMP is generated as a form of operations flow from the extracted log data
Extract sample log data based on the defined attributes
Database
Defined attributes
Attributes
Data
Identification
Block ID
Identification
Block ID
101
101
101
102
102
102
104
104
104
104
105
105
105
Activity
Operations
Time
Start and End time of
operation
Activity
Operation
C1
G9
S6
C1
G9
S6
C1
G9
H2
S6
C1
G9
H2
Schedule
Planned working
times
Material
Welding amount
Time
End time of unit task
2012/05/24 11:00
2012/06/07 12:00
2012/05/24 14:00
2012/05/25 11:00
2012/06/08 12:00
2012/05/25 14:00
2012/05/29 10:00
2012/06/08 16:00
2012/05/22 12:00
2012/05/29 17:00
2012/06/01 11:00
2012/06/13 11:00
2012/05/30 11:00
Consider block ID 101
 It includes three operations; C1, G9 and S6
 We arrange these operations by End time in
ascending order
 The sequence of operations C1  S6  G9, is the
BMP of block ID 101
Generation
of BMPs
Block ID
101
102
104
105
Sequence of operations
C1  S6  G9
C1  S6  G9
H2  C1  S6  G9
H2  C1  G9
The generated BMPs are then subjected to
performance
21 evaluation
Proposed method
Business & Service Computing Laboratory
Block clustering
Generated BMPs are heterogeneous since there are many kinds of BMPs
For a more accurate performance evaluation
Our intention is to evaluate homogeneous BMPs
Therefore
We classify BMPs into several peer groups by their similarity
The similarity of BMPs is measured by the similarity index, which is calculated by two vectors:
Task vector: based on the presence or absence of the same operations in two BMPs
Transition vector: based on the sequential relationship of the operations in two BMPs
The task vector and transition vector take values from 0 to 1, with values closer to 1 indicating
that two BMPs are more similar
22
Performance evaluation
Business & Service Computing Laboratory
Due to the nature of our performance metrics, we use a DEA model where some
performance metrics have target levels developed recently by Lim & Zhu (2013)
Each BMP is regarded as a DMU, and only BMPs in the same group are considered for performance
evaluation
In our case, the performance metrics are selected based on the extracted log data.
We conducted a questionnaire survey of 30 shipbuilding operating experts to obtain information on
which factors are most critical to BMP performance
23
Case study from a Korean shipbuilding company
Business & Service Computing Laboratory
Condition of Experiment
Two projects’ event logs exported from a Block Assembly Monitoring System (BAMS) were used.
Eighty-six blocks are generated from the log data, which are then classified into six clusters
In general, production planners assign the work resources and establish the production scheduling
based on the block types defined by the empirical knowledge of shipbuilding operating experts. We
refer to these defined block types in deciding the number of clusters
24
Case study
Business
& Service
Computing
Laboratory
Clustering
results
including
the
number of blocks and the process characteristics of each cluster
Cluster Name
# of Blocks
Process characteristics of cluster
C1
12
Block assembly work in work shop #5.
C2
9
Grand assembly processes in work shop #5 after Unit assembly in work shop #4.
C3
7
Component work in work shop ‘C’
C4
9
Unit assembly and Grand assembly in work shop #3 after Component and Plate
works in work shop ‘C’ and ‘P’.
C5
19
Grand assembly in work shop #2 after Component work in work shop ‘C’.
C6
30
Grand assembly or Special Ship assembly in work shop #1 and 2.
We aggregate all BMPs in the cluster C5 to show a concrete instance for the clustering result
The aggregated model of all BMPs in C5
represents BMPs performed in the work
shop #2
25
Case study
The performance metrics are calculated and the descriptive statistics for them are listed
All BMPs
BMPs in C1
BMPs in C2
BMPs in C3
BMPs in C4
BMPs in C5
BMPs in C6
Min
Max
Avg
Min
Max
Avg
Min
Max
Avg
Min
Max
Avg
Min
Max
Avg
Min
Max
Avg
Min
Max
Avg
Total
execution time
(Hour)
247.0
1,910.7
320.0
247.0
1,732.6
307.0
276.0
1,910.7
357.0
250.0
1,040.5
231.0
261.0
1,213.5
269.0
257.0
1,802.1
315.0
251.0
1,910.7
330.4
Waiting time
(Hour)
28.0
1,434.0
95.0
37.0
1,433.0
108.9
52.0
933.0
146.8
32.0
809.0
126.2
28.0
1,434.0
80.5
61.0
1,023.0
104.0
45.0
1,434.0
110.7
Gap between planned
and actual working
(Day)
-10.0
5.0
-2.1
-7.0
5.0
-1.0
-3.0
3.0
-1.5
-8.0
4.0
-1.0
-8.0
2.0
-3.2
-4.0
2.0
-1.0
-10.0
5.0
-4.5
Number of
unit tasks
Material
amount (m)
5.0
25.0
14.0
6.0
15.0
9.0
5.0
14.0
8.0
6.0
15.0
9.0
8.0
21.0
14.0
9.0
24.0
15.0
5.0
25.0
15.0
112.8
719.2
481.8
84.3
410.1
234.6
72.8
510.4
281.1
74.3
489.0
293.7
105.3
607.0
323.1
139.2
689.7
497.1
123.5
719.2
498.5
26
Case study
The evaluation results are summarized
Average performance scores of BMPs
Average
performance
All BMPs
BMPs in C1
BMPs in C2
BMPs in C3
BMPs in C4
BMPs in C5
BMPs in C6
0.60
0.61
0.69
0.43
0.54
0.70
0.62
Score
0.67
0.64
0.61
0.58
0.55
Blocks
2XXX_621
1XXX_621
1XXX_110
2XXX_110
Performance scores of BMPs in C5
Blocks
1XXX_622
2XXX_509
2XXX_622
2XXX_631
2XXX_642
Score
1
1
1
1
1
Blocks
1XXX_632
1XXX_653
2XXX_652
2XXX_632
1XXX_643
Score
0.84
0.81
0.78
0.73
0.69
Blocks
2XXX_653
1XXX_642
1XXX_652
2XXX_643
1XXX_631
Score
0.54
0.46
0.23
0.18
Five blocks (1XXX_622, 2XXX_509, 2XXX_622, 2XXX_631, 2XXX_642) are determined
as the best-practice, whereas the remaining 14 blocks are underperforming
In particular, 1XXX_110 and 2XXX_110 are the most underperforming blocks.
Most of the best-practice blocks have the same BMPs as Comp 101-‘C’  Grand 201-‘P’ 
Grand 202-‘3’  Grand 203-‘3’  Grand 301-‘3’
27
Case study
Business & Service Computing Laboratory
We analyze the underperforming BMPs (block 2XXX_110 and 1XXX_110) in from the
operations execution and resources utilization perspectives
For the analysis of underperforming block from operations execution perspective
We compare the difference between planned operations flow, which is managed by production
schedulers, and the actual operations flow of block 2XXX_110
The actual operations flows for all best-practice blocks are the same as the planned operations flow
On the other hand
The actual operations flows for the underperforming BMPs are different from the planned operations flows
Grand 201-‘P’ and Grand 201-‘3’
have very similar operation
characteristics, but the work shop
and items for these are different
The Grand 201-‘3’ was chosen discretionally by the worker for its similar operation characteristics
As a result, block 2XXX_110 might have incurred a longer waiting time and execution time
28
Conclusion
Business & Service Computing Laboratory
We proposed an integrated approach to BMP performance evaluation in the shipbuilding
industry by using process mining (PM) and DEA
Through application of the proposed approach, we verified its effectiveness and
practicality
Shipbuilding operations experts, moreover, agreed that the provided guidelines can be
valuable in establishing additional strategies for improving the performance and
productivity of block manufacturing
It can be said that this research makes a constructive contribution to practical block
performance evaluation in the shipbuilding industry
29
30
United Network for Organ Sharing
(UNOS)
Many variables and observations related to lung
and heart transplants.
Need for fair and accurate predictions of
survival time and quality of life.
Ability for medical professionals to accurately
predict best donor/recipient pairings may be
flawed/biased.
Variables contributing towards accurate
predictions may be many, complex, and have
poorly understood relationships.
Reduction of large datasets is important.
31
Dataset
Data concerning donor/recipient for lung/heart transplants.
• Over 400 variables and 100,000+ observations  BIG
DATA ANALYTICS
24 variables chosen by Oztekin et al.[2]
• Can reduce to 12,744 observations from cleaning.
Variables
Explanation
Variable type
Donor Age
Years
Cont.
Recipient Age
Years
Cont.
ABO_MAT
ABO match level
Ordinal
EINT
Ethnicity match level
Binary
GINT
Gender match level
Binary
GTIME
Graft survival time
Cont.
Etc…
32
DEANN Methodology
Metrics chosen according to importance with no
need to be few in number.
Preprocessing with DEA allows better training of
ANN.
ANN is applicable for “fuzzy” situations.
Variables are
chosen
according to
contribution
33
Data is
preprocessed
using DEA
ANN is
trained
Predictions
DEANN Methodology
STOP
the DEANN
methodology
START
the DEANN
methodology
Yes
Update the
dataset being
analyzed
Preprocess the
dataset using
DEA
No
Determine the
efficient DMUs
Measure the
overall multi-class
accuracy
Conduct the initial
training of ANN
34
Test:
Is accuracy
satisfactory?
Perform the first
prediction via
ANN
12,744 records
DEA Results
Stratification yielded 12 efficiency levels.
Individual levels yielded a higher correlation
between the recipient functional status and the
input variables when compared to consideration of
many (or all) levels.
The ANN is trained using one or more of these
levels using ten-fold cross validation.
DEA allows efficient observations to be utilized so
that outlying transplants do not result in poor
training of the ANN.
DEANN allows the ANN to be trained from efficient
data which will result in accurate predictions/faster
training time.
35