Transcript P 2

Research and Practice at
University of Queensland
Wei Lu (卢卫)
2/19/2009
Seminar Plan
• Tour in Australia
• Join UQ
• How to write a good paper(by Xiaofang
and Xuemin)
• Research Interests
Seminar Plan
• Tour in Australia
• Join UQ
• How to write a good paper (by Xiaofang
and Xuemin)
• Research Interests
Tour in Australia
Seminar Plan
• Tour in Australia
• Join UQ
• How to write a good paper (by Xiaofang
and Xuemin)
• Research Interests
Join UQ
UQ
Introduction to UQ
• founded in 1910
• Outstanding Majors
– Business
– Biology
– Medical
• School of ITEE
–
–
–
–
–
–
–
–
–
–
Biomedical Engineering
Cognitive Systems Engineering
Complex & Intelligent Systems
Data & Knowledge Engineering (DKE)
eResearch
Microwave & Optical Communications
Power & Energy Systems
Security & Surveillance
Systems & Software Engineering
Ubiquitous Computing
Academic Staff in DKE Group
Prof. Xiaofang Zhou
He is the Head of the Data and Knowledge Engineering Research Group (DKE). He is
also the Convenor of ARC Research Network in Enterprise Information Infrastructure
(EII), and a Chief Investigator of ARC Centre of Excellence in Bioinformatics. Professor
Zhou received his BSc and MSc degrees in Computer Science from Nanjing University
in 1984 and 1987 respectively, and PhD in Computer Science from the University of
Queensland in 1994. From 1994 to 1999, he worked as a Senior Research Scientist in
CSIRO, leading its Spatial Information Systems group. His research focuses on finding
effective and efficient solutions to managing, integrating and analysing very large
amount of complex data for business and scientific applications. His research
interests include spatial and multimedia databases, data quality, high performance
query processing, Web information systems and bioinformatics.
Associate Professors
Dr. Shazia Sadiq
Her research interests are innovative solutions for
Business Information Systems that span several areas
including business process management, governance,
risk and compliance, data quality management,
workflow systems, and service oriented computing.
Dr. Xue Li
His research interests and expertise include: Data
Mining, Multimedia Data Security, Database Systems,
and Intelligent Web Information Systems.
Senior Lecture
Dr. Hengtao Shen
His Research interests:
Media (Video)/Web Search
Multimedia/Web/Spatial/Genome Database Management
Nonlinear/Local Dimensionality Reduction
Indexing and Query Processing
P2P Database Management
Research Staff
Ken Deng:
1. Data Quality
2. Spatial Database
Helen Huang:
1.Video retrieval
2. knowledge discovery
Gabriel:
1. Data Mining and
Knowledge Discovery
2. Time Series Mining
and Forecasting
3. Skyline Query
Processing
Stella:
1. Video Search &
Retrieval,
2. Web Data Extraction &
Analysis,
3. Recipe Data Modeling
Seminar Plan
• Tour in Australia
• Join UQ
• How to write a good paper (by Xiaofang
and Xuemin)
• Research Interests
Seminar Plan
• Guide of writing a good paper (by
Xiaofang and Xuemin)
• Research Interests:
– Skyline Query Processing (From)
– Data Quality—Record Linkage (To)
Guide of writing a good paper
(by Xiaofang and Xuemin)
• Motivation
– Interesting
– Reasonable
– Pure
• Solution
– Smart
– Sharp
• Conclusion
– Experiment: time and space complexity
Cont.
• Tools
– Word?
– Latex: winEdt/eclipse+GNUPlot+Illustrator/smartdraw
• Format: Jian Pei’s papers
– Ranking Queries on Uncertain Data: A
Probabilistic Threshold Approach
– Efficiently Answering Top-k Typicality Queries on
Large Databases
– ……
Seminar Plan
• Tour in Australia
• Join UQ
• How to write a good paper
• Research Interests
Research Interests
• Skyline Query Processing (2007.12 ~
2008.7)
• Data Quality—Record Linkage (2008.7~)
Skyline Query Processing
• Motivations
• How to do experiments
• Given a dataset of ddimensional points
appearance
Motivations---skyline
height
• Given a dataset of ddimensional points
– a dominates b iff a
outperforms b at least
one dimension and not
worse at other
dimensions
appearance
Motivations---skyline
a
b
height
• Given a dataset of ddimensional points
appearance
Motivations---skyline
a
b
– a dominates b iff a
outperforms b at least
one dimension and not
worse at other
dimensions
height
– S contains points not
dominated by others • Example
– Dataset of girls
– Prefer good-looking
tall
• Given a dataset of ddimensional points
appearance
Motivations---skyline
– a dominates b iff a
outperforms b at least
one dimension and not
Skyline points
worse at other
dimensions
height
– S contains points not
dominated by others • Example
– Dataset of girls
– Prefer good-looking
tall
• Extension of skyline queries
– Given a query point q
– a dominates b iff a
outperforms b at least
one dimension and not
worse at other
dimensions
– S contains points not
dominated by others
appearance
Dynamic Skyline
Query point q
height
• Example
– User defines “ideal”
girlfriend
• Extension of skyline queries
– Given a query point q
– a dominates b iff a
outperforms b at least
one dimension and not
worse at other
dimensions
– S contains points not
dominated by others
appearance
Dynamic Skyline
Query point q
height
• Example
– User defines “ideal”
girlfriend
• Extension of skyline queries
– Given a query point q
– a dominates b iff a
outperforms b at least
one dimension and not
worse at other
dimensions
– S contains points not
dominated by others
appearance
Dynamic Skyline
Query point q
height
• Example
– User defines “ideal”
girlfriend
• Extension of skyline
queries
– Given an area and a
query point q
appearance
Window Skyline
height
• Example
– User defines “ideal”
girlfriend
• Extension of skyline
queries
– A set of query points (red)
– Which points (white)
make q as its skyline
point
appearance
Reverse Skyline
Query point q
height
Skyline Cube
dist
• A real estate example
P3
price (100K)
dist
age
…
P1
3
3
5
…
P2
5
1
1
…
P3
1
4
4
…
P4
4
5
2
…
P5
2
2
3
…
Properties and Values
P4
P1
Skyline on
price & dist
P5
P2
price
age
P1
P3
Skyline on
price & age
P5
P4
P2
price
Variation of Skyline Queries
• Multi-Sources
– Multiple query points
• P2P Skyline Computation
– Each peer runs skyline computation
• ……
An Open Question
• Group Skyline
– 10 NBA Players (score, rebound, assist),
choose 3 Player among them as a team to
maximize the scores, rebounds, and assists
Fatal Defects of Skyline Queries
• The cardinality of result is huge
– Given a set of d-dimensional points with the
cardinality n, the expected number of skyline points is
O(lnd−1n/(d−1)!). (all the dimensions are independent )
• Unpractical
• How to improve this problem?
Reduce the cardinality of skyline
• Selecting Star
• K-Dominant Skyline
• Core Skyline
Selecting Stars
• Skyline {p2,p4,p6}
• 1 Representative skyline point
– p6
• 2 Representative skyline points
– {p2,p6}
Selecting Stars
• Skyline {p2,p4,p6}
Skyline Point
Dominant set
p2
p4
p1
p3
p6
p3, p5, p7
• 1 Representative skyline point
– p6
Selecting Stars
• Skyline {p2,p4,p6}
Skyline Point
Dominant set
p2 , p4
p2 , p6
p1 , p3
p1 , p3, p5, p7
p4, p6
p3, p5, p7
• 2 Representative skyline points
– {p2,p6}
• Challenge: NP-Complete when d > 2
K-Dominant Skyline
k-Dominant Skyline (cont.)
• k-Dominate
– If A is not worse than B on k dimensions, and
better on at least one of the k dimensions, we
say A k-dominate B.
k-Dominant Skyline (cont.)
• k-Dominant Skyline
– k-dominant skyline contains all the points that
cannot be k-dominated by any other point
• Problems:
– The result can be null
– Some good points may be pruned
Core Skyline
Skylines on Uncertain Data
• Consider game-by-game statistics
• Conventional methods compute the skyline on
– Separate game records
– Aggregate: mean or median
• Limitations
– Biased by outliers
– Lose data distributions
• Probabilistic skylines
– An instance has a probability
to represent the object
– An object has a probability
to be in the skyline
The 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, September 23-28 2007
P(A) = 1
P(B) = 6/8
P(C) = 0
Possible
worlds
A
B
C
1
a1
b1
c1
2
a1
b1
c2
3
a1
b2
c1
4
a1
b2
c2
5
a2
b1
c1
6
a2
b1
c2
7
a2
b2
c1
8
a2
b2
c2
• Top-K query
– Given a threshold p, trying to identify all the players
with skyline probability >= p;
KNN probabilistic skyline
• Given an object O, find K objects whose
skyline probabilities are nearest to O.
• Applications:
– Given an NBA player (singer star), try to find K
NBA players (singer stars), whose
performances are most similar to him/her.
Experiment
• Dataset
– Real dataset: NBA
– Synthetic dataset: anti-related, correlated,
independent
• Parameters
– Dimension
– Cardinality of dataset
• Efficiency
– Time
– Memory
Thanks!