Mining favorable facets

Download Report

Transcript Mining favorable facets

國立雲林科技大學
National Yunlin University of Science and Technology
Mining Favorable Facets
Presenter : Wei-Hao Huang
Authors : Raymond Chi-Wing Wong, Jian Pei, Ada Wai-Chee Fu,
Ke Wang
SIGKDD, 2008
1
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Outlines






2
Motivation
Objectives
Methodology
Experiments
Conclusions
Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Motivation



3
The importance of dominance and skyline
analysis in multi-criteria decision making
applications.
Fixed order v.s. different customers may have
different preferences on nominal attributes.
Finding favorable facets.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Objectives

Propose to minimal disqualifying condition
(MDC) which can summarize favorable facets
and is meaningful to the user.

Develop two algorithms:

4
─
Computing MDC On-the-fly (MDC-O)
─
A Materialization Method (MDC-M)
Use real data sets and synthetic data set to
verify effectiveness and efficiency
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Methodology
5

Skyline analysis

Naïve Method

Minimal Disqualifying Conditions(MDC)

MDC On-the-fly (MDC-O)

A Materialization Method (MDC-M)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Skyline analysis
6
Intelligent Database Systems Lab
Naïve Method: Lattice Search
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Minimal Disqualifying Conditions

Used to summarize favorable facets effectively.
R’={(T,M)}
R’’={(H,M)}
MDC(f)={(T,M),(H,M)}
8
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
MDC-O: Computing MDC On-the-fly
N.Y.U.S.T.
I. M.
Point: P
Data Set: D
Template: R
Process
MDC(P)
9
Intelligent Database Systems Lab
MDC-M: A Materialization Method
Data Set: D
Template: R
Process
SKY(R)
MDC
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
N.Y.U.S.T.
I. M.
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Indexing for Speed-up



12
Use R-tree index structure
An R-tree can be built the totally ordered
attributes T
Find points that quasi-dominates p, a range
search is conducted on the R-tree
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Experiments

Synthetic Data Set

Dimension







Tuples
Template Size
Cardinality of Nominal Attributes
Zipfian Parameter
Real Data Set


14
Numeric attributes
Nominal attributes
Nursery
Automobile
Intelligent Database Systems Lab
Synthetic Data Set
-Dimension(numeric attributes)
15
N.Y.U.S.T.
I. M.
Numeric
3 3 3 3
Nominal
1 2 3 4
Intelligent Database Systems Lab
Synthetic Data Set
-Dimension(nominal attributes)
16
N.Y.U.S.T.
I. M.
Numeric
2 3 4 5
Nominal
1 1 1 1
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Synthetic Data Set-Tuples
500k -> 1000k
17
Intelligent Database Systems Lab
Synthetic Data Set-Template Size
18
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Synthetic Data Set-Cardinality of
Nominal Attributes
19
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
N.Y.U.S.T.
I. M.
Real Data Set
 Nursery


Data Set
There are 12,960 instances and 8 attributes.
The results in the performance are similar to synthetic data
sets.
 Automobile


20
Data Set
Computation times were negligibly small.
Honda, Mitsubishi and Toyota.
Car Brand names
MDC
Honda
Toyota <Honda
Mitsubishi
Honda<Mitsubishi or Toyota < Mitsubishi
Toyota
none
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Conclusions



21
MDC is effective in summarizing the favorable
facets.
The experimental results show proposed
methods are efficacious.
Future work is used to dynamic data and
ordering is an interesting topic.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Comments

Advantages
─
─

Applications
─
22
Finding favorable facets which has not been
studied before.
Effectiveness and the efficiency of the mining.
Information retrieval
Intelligent Database Systems Lab