Transcript Document

Visualizing Results of Data
Mining Source Code
by
Mike McCallie
Thoughts

I want to combine Data Mining tools +
Visualization tools

I am motivated in using information in various
forms to make informed decisions

I believe inherit software structure (compliable
source code) has an advantage over free-form
text from a data mining perspective

I wish to “mine” data from source code and
“build” visual models of code representation that
are useful from a software engineer’s perspective
Exploring tools at Moose for Data exploration
Exploring “Code City” for Visual Representation
CodeCity is programmed in
VisualWorks Smalltalk on top of
the Moose platform, uses
OpenGL for rendering
Classes are represented as buildings in the
city. Packages are depicted as the districts in
which the buildings reside.
Conceptual Model
“Mining”
Algorithms
Source
Code
Data
Mining
“Engine”
Visualization
“Engine”
Visual
Results
Data
Output
Thesis Approach – Part i

Theoretical Discussion
◦ Data mining and visualization investigation
◦ 80’s and 90’s focus on program comprehension
 What worked
 What were dead-ends
(as important as what worked IMHO)
◦ Literature review on program comprehension
 Gestalt principles were explored in previous class
◦ Results of past empirical studies
Thesis Approach – Part 1

Motivating Scenario
◦ Problem that is not too big, but not too small
◦ “Bob the programmer was given the assignment to
add enhancement X to legacy system Y.”
◦ Bob has ability to mine data from source code and
visualize results
◦ Question: What information is MOST relevant for
Bob to succeed? (bound problem)
Thesis Approach – Part 2

Implementation
◦ Moose tools for software analysis
◦ Code City for software visualization
◦ Source Code Analysis:
 Public domain:
Analyzing JHotDraw
 Private domain:
Analyzing 20+ year old legacy
system at present employer
JHotDraw Framework
Classes Model
Design Patterns
Role-Model-Enhanced Class Model
Thesis Approach – Part 3
• Empirical Study – Compare resultant artifacts
JHotDraw
Artifacts
JHotDraw
Source Code
Data
Mining
“Engine”
+
Visualization
“Engine”
Legacy System
Source Code
Compare
to existing
JHotDraw artifacts
Legacy System
Artifacts
Compare
to existing
Legacy System
“expertise”
Thesis Approach – Part 4
• Results and Conclusions…
“Rule of Thumb”
Mathematical Model
“I am very curious how close to a workable mathematical model I can create based
on the findings of my empirical study”
Dr. Parvathi Chundi
 Dr. Bill Mahoney
 Dr. Harvey Siy

A big thank you to my thesis
committee








Questions
Comments
Concerns
Observations
Puns
Jokes
Limericks
etc.
And thank you for your time as
well…