Pattern mining in system logs: opportunities for process
Download
Report
Transcript Pattern mining in system logs: opportunities for process
Dolev Mezebovsky, Pnina Soffer, and Ilan Shimshoni
BPMDS, Amsterdam, June 2009
The implementation of enterprise systems is often
a driver for business process change.
◦ System implementation as an opportunity for redesigning
business processes
◦ Changes motivated by the need to adapt the enterprise to
the system rather then the other way around
“Vanilla” implementations:
◦ Implement basic functionality without modifications and
make improvements afterwards
◦ Cases of partial support to existing processes – people are
forced to make workarounds and work inefficiently for the
process to achieve its goal.
Process: change a student’s study program
Before implementation
After implementation
1. The secretary reports the change.
2. Acquired credits are automatically
transformed to the new program.
1. The secretary reports the change.
2. She prints a report of acquired
credits
3. For every course, she detaches it
from the old program and attaches it
to the new one.
Total time: 1-2 minutes
Total time: up to 20 minutes
Error free
Error prone
Many such cases may exist in an organization
At first: all users complain
With time, some users may get used to the
inefficient way of working
The question: How to identify the inefficient
processes and prioritize their improvement?
The cases we are looking for include some
repetition of a set of operations, as part of one
“logical” task
These situations should be reflected in the
event log of the system
Solution approach: mine for recurrent patterns
of operations
Row
Num
Operation
Date
Time
User
Name
Student
Name
Course Name
Program
Name
1
Attach Course
15.06.08
13:45:52
YPRESS
Fredrick
Linear Algebra
MIS Major
2
Attach Course
15.06.08
13:46:26
YPRESS
Fredrick
Algorithms
MIS Major
3
Attach Course
15.06.08
13:47:44
YPRESS
Fredrick
Data Structures
MIS Major
4
Detach Course
15.06.08
13:49:18
YPRESS
Fredrick
Linear Algebra
CS Minor
5
Detach Course
15.06.08
13:49:24
YPRESS
Fredrick
Algorithms
CS Minor
6
Detach Course
15.06.08
13:49:31
YPRESS
Fredrick
Data Structures
CS Minor
Attach Course
15.06.08
13:54:19
YPRESS
Fredrick
Detach Course
15.06.08
13:58:20
YPRESS
Fredrick
9
Attach Course
15.06.08
13:59:35
YPRESS
10
Detach Course
15.06.08
14:01:29
YPRESS
7
8
Introduction to IT MIS Major
MIS Minor
Fredrick
Business
Intelligence
Programming
Fredrick
Programming
MIS Minor
MIS Major
Operation
(Event)
Time
User
(Event Creator)
Student
(Event’s Object)
Program
(Target Object)
Course
(Event Property)
13:45:00
1
13:46:00
2
13:47:00
3
MIS
Major
Attach
Linear
Algebra
Algorithms
13:48:00
YPRESS
4
Fredrick
13:49:00
5
Detach
13:50:00
13:51:00
6
CS
Minor
Data
Structures
Log entry=<User, Timestamp, Operation, ORSO>
◦ ORSO: an ordered set of operands
◦ Example: YPRESS, 13:50, Detach course, Fredrick,
Linear Algebra, CS Minor.
For two entries in a log:
◦ Invariant set: set of entry elements whose values are
equal for the two entries
◦ Variant set: set of entry elements whose values are
different for the two entries
Two entries are potentially in the same pattern
if:
◦
◦
◦
◦
User {Invariant}
Timestamp {Variant}; |TS(1)-TS(2)| < Timeframe
{Operation, ORSO} {Invariant}
{Operation, ORSO} {Variant}
Potential pattern entry: <User, TimeRange,
Operations, ORSOs>
The algorithm dynamically aggregates entries
into potential pattern entries, seeking for
largest possible patterns.
[(1),(2)] = [(1): < YPRESS, 13.45.52, Attach course, Fredrick,
Linear Algebra, MIS Major>, (2): < YPRESS, 13.46.26, Attach
course, Fredrick, Algorithms, MIS Major>]
◦ (1, 2) : < YPRESS, (13.45.52, 13.46.26), Attach course,
Fredrick, (Linear Algebra, Algorithms), MIS Major>
Second iteration:
[(1, 2), (3)] = [(1, 2) : < YPRESS, (13.45.52, 13.46.26), Attach
course, Fredrick, (Linear Algebra, Algorithms), MIS Major>, (3):
< YPRESS, 13.47.44, Attach course, Fredrick, Data Structures,
MIS Major>]
◦ (1, 2, 3): < YPRESS, (13.45.52, 13.47.44), Attach course,
Fredrick, (Linear Algebra, Algorithms, Data Structures), MIS
Major>
Pattern type definition: <I, V>.
I: a set of invariant element types (Operation,
operand type)
V: a set of variant element types (Operation,
operand type)
Example:
◦ I = {Operation, Student, Program}
◦ V = {Course}
.
The count CP of a pattern type P: the number of
patterns of this type in the log file.
The average size ASP of a pattern type P: the average
number of entries in patterns of type P. Let P occur CP
times in a log file, so occurrence i includes ni entries.
Then:
The average time ATP of a pattern type p: the average
time range (difference between the maximal and
minimal timestamps) in patterns of type p.
Find out which of the identified patterns
reflects inefficient processes
◦ By interviewing users
Prioritize patterns to be automated
◦ By size-weighted count: SCP = ASP*CP
◦ By time-weighted count: TCP = ATP*CP
We address a situation where technology
drives processes in an undesirable way
We utilize mining technology to identify and
prioritize requirements for automating
inefficient processes.
Our solution identifies recurrent patterns in
the system log and provides metrics for
prioritization.
Finalize the overall algorithm
Experiment with the university log to evaluate
the proposed method
◦ Is it capable of identifying patterns that are a-priori
known?
◦ Ratio of real problems identified vs. patterns that
reflect “normal” processes
◦ Sensitivity to the timeframe parameter
Experiment with logs from other domains