Towards Using Grid Services for Mining Fuzzy Association Rules

Download Report

Transcript Towards Using Grid Services for Mining Fuzzy Association Rules

Towards Using Grid Services for
Mining Fuzzy Association Rules
Mihai Gabroveanu, Ion Iancu, Mirel Cosulschi, Nicolae Constantinescu
Faculty of Mathematics and Computer Science,
University of Craiova, ROMANIA
{mihaiug, mirelc,nikyc}@central.ucv.ro,i [email protected]
Introduction
• In this paper we show how the Knowledge
Grid infrastructure can be used to
implement a distributed algorithm for
mining fuzzy association rules from
distributed databases over a Grid network.
MINING
+ FUZZY
Grid network
Outline
• Knowledge Grid services
• Distributed fuzzy association rules
mining
• Distributed problem definition
• The distributed algorithm
• Rules mining implementation over the
Grid
• Conclusion
Knowledge Grid Services-1
• The Knowledge Grid ([4], [5], [6]) defines an
integrating architecture for distributed data
mining and knowledge discovery.
• It uses basic grid services to build specific
knowledge services.
• the Core K-grid layer - offers services directly
implemented on the top of generic grid services;
• the High level K-grid layer - is used to describe,
develop and execute distributed knowledge discovery
computations;
Knowledge Grid Services-2
Resource
allocation
and
execution
management
service
(RAEMS).
Another
Knowledge
The
it is Knowledge
usedimportant
metadata
directory
Base
repository
information
service
Repository
is (KDS).
the
stored
Knowledge
(KBR)
in a Knowledge
Execution
Metadata
Plan
Repository
Repository
These
services
are
used
to find
bestofmapping
between
planfor
This
is
(KEPR).
(KMR).
used
service
toItmaintain
store
extends
thediscovered
execution
the basic
plans
Globus
knowledge.
MDS
data mining
service
processes.
and an
it isexecution
responsible
and available
resources,with
satisfying
the application
maintaining
a description
of allthe
thegoal
dataofand
tools used
in the Knowledge Grid.
requirements.
Knowledge Grid Services-2
Results
presentation
service
(RPS). (EPMS).
Tools
and
algorithms
access service
(TASS).
Data Access
Execution
plan
Service
management
(DAS).
This service specifies
how to
present
and (data
visualize
the services),
models
is responsible
forgenerate,
thetool
search,
selection,
and downloading
of
a semi-automatic
that
takes
selection
data
programs
search
selected
extracted.
data
mining
and algorithms.
extraction,transformation
by
user,
andtools
generate
a set
and
ofdelivery
different,possible
(data extraction
plans that
service)
meetofuser,
datadata
to be
mined.
and
algorithms requirements and constrains.
Distributed fuzzy association
rules mining-1
DB = {t1, . . . , tn}
I = {i1, . . . , im}
Ex: I = {Age, Income, Weight}
Distributed fuzzy association
rules mining-2
For example, we can take into onsideration
for the attribute Weight the following three
fuzzy sets: ”thin”,”middle” and ”fat”.
Fweigth = { thin , middle , fat }
Distributed fuzzy association
rules mining-3
〈X,Fx 〉=〈{Age, Income}, {young, high}〉
Distributed fuzzy association
rules mining-4
“ If
Age is middle and Income is high then Weight is fat ”
X ={Age, Income}, Y = {Weight}, FX = { middle, high }, FY = { fat }
〈X,Fx 〉= >〈Y,FY 〉
〈{Age, Income}, {middle, high} 〉 ⇒〈 {Weight}, {fat} 〉
Distributed fuzzy association
rules mining-4
T1=〈{Age, Income}, {middle, high} 〉= 〈{Age, Income}, { 0.5 , 1 } 〉
T2=〈{Age, Income}, {middle, high} 〉= 〈{Age, Income}, { 1 , 1 } 〉
The fuzzy support value of itemset〈X,Fx 〉=〈{Age, Income}, {middle, high}〉
0.5 * 1 + 1 * 1 = 1.5 / 2 = 0.75
Distributed fuzzy association
rules mining-5
An association rule is considered as interesting if it has
enough support and high confidence value. This
association rule can be encountered under the name
strong rule.
Distributed fuzzy association
rules mining-6
•
The problem of sequential mining of
fuzzy association rules can be
decomposed in two subproblems:
1. find all large fuzzy itemsets.
2. generate the fuzzy association rules from
the large fuzzy itemsets founded.
Example
age
15
30
weight
40
70
age
weight
thin
young
old
1
0
0
0.5 0
0.5 0.5 1
Support count
large fuzzy itemsets
〈{Age, Weight}, {young, thin} 〉=> 1*0.5 + 0*0.5
> Minsup
〈{Age, Weight}, {young, fat} 〉 => 1*0
+0*1
〈{Age, Weight}, {old, thin} 〉 = > 0*0.5 +0.5*0.5
〈{Age, Weight}, {old, fat}
〉 = > 0*0
+0.5*1
fat
Distributed problem definition-1
• Let DB = { DB1,DB2, . . . ,DBn } be a
distributed database over n sites S1,
S2, . . . , Sn.
DB1
…..
DB2
…….
DBn
Distributed problem definition-2
Distributed problem definition-3
Distributed problem definition-4
Distributed problem definition-5
Distributed Mining Fuzzy Association Rules
Given the set of items I, the distributed database DB =
{DB1,DB2, . . . ,DBn}, the fuzzy sets associated with attributes from I,
the minimum support threshold (minsup) and the minimum
confidence threshold (minconf), extract all global fuzzy association
rules.
1. find all global large fuzzy itemsets.
2. generate the global fuzzy association rules from the
global large fuzzy itemsets founded.
Fuzzy Count Distribution
Algorithm
global
globally
large
large
candidates
fuzzy
1-itemsets
1-itemsetsCA(1).
L(1).
First generated L1
globally large fuzzy 1-itemsets L(1).
CA(k) = Fuzzy_Apriori_Gen(L(k−1)).
local large
fuzzy 1-itemsets
local large
local large
fuzzy 1-itemsets fuzzy 1-itemsets
………….
Rules mining implementation
over the Grid-1
Distributed Rules Mining Scenario
Rules mining implementation
over the Grid-2
In order to present the implementation of this
process in a Grid network we shall consider that:
• the database DB is stored on K-grid node NodeA.
• the tools needed for mining association rules (the
partitioner P, mining frequent itemsets tool and
association rules extractor) are available as multiplatform
executables on K-grid node NodeS.
• the results will be stored into the Knowledge Base
Repository (KBR) on NodeU.
Rules mining implementation
over the Grid-3
• Let’s suppose that a Grid User (GU) needs to
extract all association rules from database DB
using tools available on K-grid node NodeS.
• Step 1.The GU starts the search of
computational resources for executing the data
mining process from his K-grid node NodeU. In
order to locate the computation resources
needed to execute the mining process the KDS
(Knowledge Discovery Service) will be used.
Rules mining implementation
over the Grid-4
• Step 2. The GU builds an execution plan for the data
mining task, specifying strategies for tools and data
movements.The execution plan is constructed by using
the EPMS (Execution Plan Management Service). This
plan will be stored into local KEPR (Knowledge
Execution Plan Repository).
• Step 3. The GU sends the execution plan to RAEMS
(Resource Allocation and Execution Management ervice)
which starts the application.
•
Step 4. The GU visualizes and evaluates the result of
computation stored in KBR by means of the RPS (Result
Presentation Service) tools.
Conclusion
• In this article, it is proposed an implementation
of a distributed algorithm for mining fuzzy
association rules from distributed databases into
a Knowledge Grid environment.
• The proposed algorithm uses some properties of
global large fuzzy itemsets and local large fuzzy
itemsets, reduction of computations made
heavily relying on them.
Knowledge Grid Services-2