Readings in Data Management Spring 2006
Download
Report
Transcript Readings in Data Management Spring 2006
Readings in Data Management
Spring 2008
Computer Science Department
Rutgers University
Seminar Information
Web page:
http://www.cs.rutgers.edu/~amelie/courses/dbseminar.html
Meets Thursday 1-2:30pm in
CoRE A
Organization
Weekly presentation on a DB topic (30 minutes)
We will select 2-3 topics to focus on the course of
the semester
For each topic
Possibly a few external presentations such as:
First week: overview paper (survey, influential
work)
Subsequent weeks: more complex papers on the
subject
Students preparing for DB conference talks or quals
Invited speakers
Discussion on the paper
Topics
First Topic:Probabilistic Databases
We will select next topics from (non exhaustive list):
Question answering
Web Search
Personal Information Spaces
Query Optimization
Data Cleaning
Data Integration
Data Mining
Query Processing Techniques
Adaptive, Automatic, Autonomic Systems
OLAP
Stream Aggregation
Storage, Indexing, and System Architecture
XML Processing
Preference functions
Spatial and High-Dimensional Data
Recovery
Privacy in DBMS
…
What I expect from you
1-2 presentation over the course of the semester
First-year students will be given “overview”
presentation assignments at the beginning of each
topic
More Senior students will present more researchfocused papers
Number of presentations depends on the number of
students in the seminar
Everyone should read the paper in advance and
prepare 1-2 questions/discussion topics
Participation in discussion
There are no “stupid” questions! If you did not
understand something, chances are others did not
either
Presentations
I will select a list of papers to present for
each topic
Start with an introductory paper
The papers that go deeper into one or more
aspect of the problem
You are welcome to suggest some papers
on the topic, as long as it is related (so
that we can have more meaningful
discussions)
Papers that I have overlooked
Papers on a different aspect of the topic that
you would like to focus on
First topic: Probabilistic Databases
Uncertainty/Imprecision in data
Query Semantics
Probabilistic Data Representation
Next few slides from Dan Suciu’s tutorial, more at
Databases Today are Deterministic
An item either is in the database or
is not
A tuple either is in the query
answer or is not
This applies to all variety of data
models:
Relational, E/R, NF2, hierarchical, XML,
…
What is a Probabilistic Database ?
“An item belongs to the database” is
a probabilistic event
“A tuple is an answer to the query”
is a probabilistic event
Can be extended to all data models;
Two Types of Probabilistic Data
Database is deterministic
Query answers are probabilistic
Database is probabilistic
Query answers are probabilistic
Long History
Probabilistic relational databases have been
studied from the late 80’s until today:
Cavallo&Pitarelli:1987
Barbara,Garcia-Molina, Porter:1992
Lakshmanan,Leone,Ross&Subrahmanian:
1997
Fuhr&Roellke:1997
Dalvi&S:2004
Widom:2005
So, Why Now ?
Application pull:
The need to manage imprecisions in
data
Technology push:
Advances in query processing
techniques
Application Pull
Need to manage imprecisions in data
Many types: non-matching data values,
imprecise queries, inconsistent data,
misaligned schemas, etc, etc
The quest to manage imprecisions = major
driving force in the database community
Ultimate cause for many research areas:
data mining, semistructured data, schema
matching, nearest neighbor
Technology Push
Processing probabilistic data is
fundamentally more complex than other
data models
Some previous approaches sidestepped
complexity
There exists a rich collection of powerful,
non-trivial techniques and results, some
old, some very recent, that could lead to
practical management techniques for
probabilistic databases.
Suggested Papers to discuss
Nilesh Dalvi, Dan Suciu: Efficient Query
Evaluation on Probabilistic Databases. (VLDB
2004).
Minos Garofalakis et al, Probabilistic Data
Management for Pervasive Computing: The Data
Furnace Project. IEEE Data Eng. Bull.
29(1)(2006)
Omar Benjelloun, Anish Das Sarma, Chris
Hayworth, Jennifer Widom: An Introduction to
ULDBs and the Trio System. IEEE Data Eng. Bull.
29(1)(2006)
Prithviraj Sen, Amol Deshpande, Representing
and Querying Correlated Tuples in Probabilistic
Databases (ICDE 2007)