Efficient Evaluation of Queries with Mining Predicates by Chaudhuri
Download
Report
Transcript Efficient Evaluation of Queries with Mining Predicates by Chaudhuri
Efficient Evaluation of Queries
with Mining Predicates
by Chaudhuri, Narasayya, and Sarawagi
CSci 8701 – Group G07
Charles Braxmeier
Problem Statement
Find more efficient ways to execute
queries where one or more of the
predicates are the results of data mining
decisions
Example Query: Find fans who went to a
Minnesota hockey game last year who
may be football fans as well
Contributions of the Paper
Great detail about different types of mining
models (clustering, decision trees, etc.)
Discussion regarding the different ways
mining predicate(s) can be joined within a
query
Analysis on the experiments done to test
theories regarding query optimization
based on the structure of mining model
Key Concepts
Upper Envelope Predicate
Tightness of the Query’s Predicates
Mining Model
Decision
Tree
Naïve Bayes Classifiers
Bottom-up
Top-Down
Key Concepts (cont’d.)
Mining Model (continued)
Clustering
Centroid-based
Model-based
Boundary-based
Validation Methodology
Experimentation based on the theories
posed regarding query reorganization
Twenty (20) different data sets used. Data
sets vary based on:
Data
set size
Number of dimensions in data set
Size of data set used to train the mining
model
Validation Methodology (cont’d.)
Analysis of Experiment Results
65%
of query access paths affected by rearranging the query based on the upper
envelope predicate
Average run-time decreased by 65% by rearranging the query based on the upper
envelope predicate
More variance in run-time decrease than access
paths affected
Assumptions
Clustering can be evaluated via Bayes
classifiers
Therefore,
not too much background info on
clustering and how its experiments were different than
Bayes experiments
Continuous data sets are split into discrete data
sets to assist in mining predictions
Not
necessarily realistic
Example, latitude / longitude
Possible Revisions to Paper
Spend more time on analysis of
experiments and results, rather than the
background info
Background
information took up
approximately 60% of the paper
Questions?