SUBDUE GUI interface

Download Report

Transcript SUBDUE GUI interface

GUI implementation for Supervised and Unsupervised
SUBDUE System
Introduction
 Data Mining is a very vast field, which is rapidly
developing today. It is the use of artificial intelligence to
find common, interesting, or previously unknown
patterns in large databases.
 One method for discovering knowledge in structural data
is the identification of common substructures (concepts
represented as graphs) within the data.
2
Introduction (Contd..)
 The SUBDUE system, developed by Cook and Holder
[Cook and Holder, 1999] performs data mining on
databases represented as graphs, i.e. it discovers
interesting substructures in structural data.
 This project deals with the conversion of a textual
representation of the data into a graphical visualization
of the data The input data given to the program is the
outputs provided by the existing SUBDUE.
3
The Subdue System
 The SUBDUE system is a data-mining tool that discovers
interesting substructures in structural data
 By compressing previously discovered substructures in the
data multiple passes of SUBDUE produce a hierarchical
description of the structural regularities in the data.
4
Unsupervised Subdue
Example of the input graph to the unsupervised version of the subdue
and the discovered substructure.
5
Unsupervised (Contd..)
The substructure discovery algorithm used by
SUBDUE is beam search. A substructure consists of a
definition and a set of instances. The substructure
definition is a connected set of vertices and edges that
define a SUBGRAPH within G. An instance of a
substructure is a SUBGRAPH of G that matches, graph
theoretically, to the definition. The algorithm for the
discovery is given below.
6
Unsupervised Contd.)
 SUBDUE (Graph, BeamWidth, MaxBest, MaxSubSize, Limit)
 ParentList={}
 ChildList={}
 BestList={}
 ProcessedSubs=0
 Create a substructure from each unique vertex label and its single-vertex instances; insert the resulting
substructures in ParentList
 While ProcessedSubs < = Limit and ParentList is not empty do
 While ParentList is not empty do
 Parent = RemoveHead (ParentList)
 Extend each instance of Parent in all possible ways
 Group the extended instances into Child substructures
 For each Child do
 If SizeOf (Child) <= MaxSubSize then
 Evaluate the Child
7
Unsupervised Contd.)
Insert Child in ChildList in order by value
If Length (ChildList)>BeamWidth then
Destroy the substructure at the end of the ChildList
ProcessedSubs = ProcessedSubs + 1
Insert Parent in BestList in order by value
If Length (BestList) > MaxBest then
Destroy the substructure at the end of the BestList
Switch ParentList and ChildList
return BestList
8
Supervised
 SUBDUE performs supervised graph based relational
concept learning. The SUBDUE concept learner accepts
both a positive graph and a negative graph , and
evaluates substructures based on their compression of
the positive graph and lack of compression on the
negative graph.
9
Supervised Contd.
10
Supervised (Contd.)
 In the figure above each object has a shape and is
related to other objects using the binary relation on and
shape. The discovered substructure is as shown in the
figure. As seen above the SUBDUE discovered
substructure is from the positive example and not from
the negative example. For this example the best
substructure, which gives the maximum compression, is
triangle on a square.
11
Subdue - Chemical
Compounds
A DNA sequence can be represented as a very simple
linear graph, and higher-level relationships between
different parts of a sequence can be mapped to
additional edges in graph.
SUBDUE system discovers patterns in the input graph
in polynomial time.
 SUBDUE system is capable of discovering known
patterns in the DNA sequence of yeast,as well as
patterns in yeast DNA that are known to be important in
other organism, but which have not yet been shown to
play a role in yeast.
12
Subdue - Chemical
Compounds
A DNA sequence can be represented as a very simple
linear graph, and higher-level relationships between
different parts of a sequence can be mapped to
additional edges in graph.
SUBDUE system discovers patterns in the input graph
in polynomial time.
 SUBDUE system is capable of discovering known
patterns in the DNA sequence of yeast,as well as
patterns in yeast DNA that are known to be important in
other organism, but which have not yet been shown to
play a role in yeast.
13
Subdue - Chemical
Compounds
The figure shown above is the backbone representation
which gives more meaningful graphs then the linear
representation.
This representation separated the base names (A, C, T,
G) from the vertices representing themselves.
The backbone representation mimics the actual
chemical structure of the DNA molecule, in which the
DNA bases are connected by deoxyribose sugars to a
linear phosphate backbone.
14
GUI Design
Requirements
 The requirements of the GUI are as follows.
 File Dialog Boxes should be added for better user
access to the input files.
 The entire visual representation of the graphs needs to
be shown on the screen.
Sometimes these
representations exceed the length of the screen. To
accommodate these large graphs, scrollbars need to be
incorporated into the design.
15
GUI Design
 User interfaces must be provided so that the user can
interact with the GUI for displaying the results of each
and every iteration. So a button called “ Next Iteration”,
which activates the display of substructures on the
screen, needs to be incorporated.
 A Button called the “Compress” button should be
provided to the GUI. This button enables the user to see
the compressed graph.
16
GUI Design
 For the supervised version of SUBDUE, both negative
and positive graphs need to be displayed.
 The vertices of the graph should display their labels
inside the vertices.
 Since directed edges are used, the arrows with
appropriate directions should be displayed.
 The language to be used for implementation
should be portable and be able to run from a
browser. It should also have good GUI
components. So JDK 1.2 was used to implement
the program.
17
Implemenation
Unsupervised SUBDUE
 Unsupervised SUBDUE GUI requires two input files. A
position file which determines the graph position and
the output file from the SUBDUE which is parsed as per
the requirements of the GUI by the conversion program.
 The driver class of the applet is the unsupervised class.
This class initializes the applet. When user clicks the
next iteration button canvas2 class is invoked and best
substructures found in that iteration is displayed. These
substructures are arranged by their MDL value.
18
Implemenation
Unsupervised SUBDUE
 When user clicks the compress button then canvas1
class is invoked. This class compresses the input graph
by replacing the instances of the best substructure of
the iteration by single vertices. The compressed graph
will be further compressed when the using the results of
that iteration.when-compressed button is clicked. The
flow diagrams are as shown below.
19
Implemenation
 Supervised GUI Implemenation
 Supervised SUBDUE GUI requires three input files.Two
position files which determines the positive graph
position and the negative graph position and the output
file from the SUBDUE which is parsed as per the
requirements of the GUI by the conversion program.
 The driver class of the applet is the supervised class.
This class initializes the applet. When user clicks the
next iteration button canvas2 class is invoked and best
substructures found in that iteration is displayed. These
substructures are arranged by their MDL value.
20
Implemenation
Supervised GUI Implemenation
 When user clicks the compress button then canvas1
class is invoked. This class compresses the input graph
by replacing the instances of the best substructure of
the iteration by single vertices. The compressed graph
will be further compressed when the using the results of
that iteration.when-compressed button is clicked. The
flow diagrams are as shown below.
21
Implemenation
Parser for the GUI input.
 The Conversion program parses the output from the
subdue to make it compatible to the GUI program. This
can parse both supervised and unsupervised GUI.The
conversion program takes in longer phrases and
replaces it by shorter ones It eliminates blank lines and
arranges the output in the format required for the GUI
input.
 Each line starts with a small phrase for E.g. It(iteration
number ), C(for compression),
v(vertices),e(edges),Val(Value), It(instances).
22
Implementation
Parser for the GUI input.
The phrase “Best substructures” indicates the
starting of the iteration and the phrase
“Graph
is
compressed
using
best
substructure.” Indicates a compressed graph
information
23
Demo Of the Project
24