Transcript Slide 1
DataBases & Data Mining Joined
Specialization Project
„Data Mining Classification Tool”
By
Mateusz Żochowski
&
Jakub Strzemżalski
Agenda
General description of the problem
Functionality
Data Mining aspects
Algorithm and optimisation
Data Base aspects
General entities scheme
2
General Description
Universal Tool
Different kinds of objects (e.g.
preprocessed photos, hospital patients
data)
Finding similar objects
Decision problems
3
Functionality
Independent system – user operated
Using sets of data already provided or
uploading new types
Influence on the way data is processed
Possible usage in bigger systems as a
processing engine
Additional module used as a helping tool in
more complex systems
4
General Use Case
5
Data Mining
General Ideas
K-NN algorithm
Description of a object
Definition of a distance
Brief explanations of the algorithm
Optimization
Problem of comparing large number of objects
Optimized solution – using grouping idea
6
Definitions
Objects
7
K-NN
K – Nearest Neighbors
Idea standing behind k-nn
Aim - finding k-similar objects to the one
we are analyzing and eventually assigning
appropriate decision
Method - calculating distance from
analyzed object to the others in our
database and finding the closest ones
8
K-NN Graphical representation
9
Definitions
Distance
Calculations in multidimensional space
Coefficients
Alfa
wi – weights – underlining importance of
particular attributes
n – number of all the attributes
n
D(O1, O 2) wi * ai(o1) ai(o2)
i
i 1
10
Optimalisation
The reason – cost of multidimensional
distance computation for 1-all elements
Solution – improved Knn
Result – better efficiency because of
reduced number of distance
computations due to narrowed set of
possibly similar objects
11
Step 1 - Group-oriented plane
division
12
Step 2 – new Object appeares
13
Step 3
14
Step 4
15
Step 5
16
Grouping problem
The problem – assigning object into
appropriate groups according to chosen
distance definition
Solution – some clustering algorithm
Brief example – k-means algorithm
17
DataBase – entities
18
DataBase
General structure of database results
from optimization issues
Due to universal purpose of the system
database may contain many different
tables of objects
Need of using system tables for defining
experiments
Group Member as a temporary table ?
19
Summary
There is still a lot of work to do...
20