Database Research
Download
Report
Transcript Database Research
Department of Computer Science, National Tsing Hua University
Database Research:
The Past, The Present, and The Future
Yi-Shin Chen
Department of Computer Science
National Tsing Hua University
[email protected]
http://www.cs.nthu.edu.tw/~yishin/
Outline
Motivation
The Past
Evolution of Data Management
[Gray 1996]
The Lowell Database Research Self
Assessment Report
Where did it come from?
What does it say?
The Present
The Future
Motivation
Database research is driven by new applications, technology trends, new
synergies with related fields, and innovation within the field itself.
New Stuff
The Database
Community
Evolution of Data Management
Cons:
• The transaction errors cannot
be detected on time
• The business did not know
the current state
1950: Univac had developed a
magnetic tape
1951: Univac I delivered to the
US Census Bureau
Manual
Record
Managers
1900
Punched-Card
Record Managers
Con:
• Navigational programming
interfaces are too low-level
• Need to use very primitive and
procedural database operations
Programmed Record
Managers
• Birth of high-level
programming
languages
• Batch processing
1955
On-line Network
Databases
• Indexed sequential
records
• Data independence
• Concurrent Access
1965 -1980
Evolution of Data Management (Contd.)
E.F. Codd outlined the
relational model
• Give Database users
high-level set-oriented
data access operations
Relational Databases && ClientServer Computing
• Uniform representation
• 1985: first standardized of SQL
• Unexpected benefit
• Client-Server
•Because of SQL, ODBC
• Parallel processing
•Relational operators naturally
support pipeline and partition
parallelism
• Graphical User Interface
•Easy to render a relation
• Oracle, Informix, Ingres
1970
1980
Multimedia Databases
• Richer data types
• OO databases
• Unifying procedures and data
• (Universal Server)
• Projects that push the limits
• NASA EOS/DIS projects
1995
2000
Research Self Assessment
A group of senior database researchers gathers every
few years to access the state of database research and
point out some potential research problems
Laguna Beach, Calif. in 1989
Palo Alto, Calif. in 1990 and 1995
Cambridge, Mass. in 1996
Asilomar, Calif. in 1998
Lowell, Mass. in 2003
The sixth ad-hoc meeting
Last for two days
25 senior database researchers
Output: the Lowell database research self assessment report
More information: http://research.microsoft.com/~gray/lowell/
Attendees
Serge Abiteboul, Martin Kersten, Rakesh Agrawal, Michael Pazzani, Phil
Bernstein, Mike Lesk, Mike Carey, David Maier, Stefano Ceri, Jeff Naughton,
Bruce Croft, Hans Schek, David DeWitt, Timos Sellis, Mike Franklin, Avi
Silberschatz, Hector Garcia Molina, Rick Snodgrass, Dieter Gawlick, Mike
Stonebraker, Jim Gray, Jeff Ullman, Laura Haas, Gerhard Weikum, Alon
Halevy , Jennifer Widom, Joe Hellerstein, Stan Zadonik, Yannis Ioannidis
Photos captured from http://www.research.microsoft.com/~gray/lowell/Photos.htm
The Main Driving Forces
The focus of database research
Information storage, organization, management, and access
The main driving forces
Internet
Particularly by enabling “cross enterprise” applications
Require stronger facilities for security and information integration
Sciences
Generate large and complex data sets
Need support for information integration, managing the pipeline of data
product produced by data analysis, storing and querying “ordered” data,
and integrating with the world-wide data grid
The Main Driving Forces (Contd.)
Traditional DBMS topics
Technology keeps changing the rules reassessment
E.g.: The ratios of capacity/bandwidths change reassess
storage management and query-processing algorithms
E.g., data-mining technology DB component, NLP querying
Maturation of related technologies, for example:
Data mining technology DB component
Information retrieval integrate with DB search techniques
Reasoning with uncertainty fuzzy data
Next Generation Infrastructure
Discuss the various infrastructure components that
require new solutions or are novel in some other way
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Integration of Text, Data, Code and Streams
Information Fusion
Sensor Data and Sensor Networks
Multimedia Queries
Reasoning about Uncertain Data
Personalization
Data Mining
Self Adaptation
Privacy
Trustworthy Systems
New User Interfaces
One-Hundred-Year Storage
Query Optimization
Integration of Text, Data, Code and Streams
Rethink basic DBMS architecture supporting:
Structured data
Text
Space and time
image and multimedia data
Procedural data
Triggers
Data streams and queues
traditional DBMS
information retrieval
spatial and temporal DB
image retrieval/multimedia DB
user-defined functions
make facilities scalable
Data stream management
Integration of Text, Data, Code and Streams
Rethink basic DBMS architecture supporting:
Structured data
Text
Space and time
image and multimedia data
Procedural data
Triggers
Data streams and queues
traditional DBMS
information retrieval
spatial and temporal DB
image retrieval/multimedia DB
user-defined functions
make facilities scalable
Data stream management
Start with a clean sheet of paper
SQL, XML Schema, XQuery
Too complex
Venders will pursue the extend-XML/SQL strategies
Research community should explore a reconceptualization
Information Fusion
The typical approach
Because of Internet
Millions of information sources
Some data can only be
accessed at query time
Perform information integration
on-the-fly
Extracttransformload tool (ETL)
Work with the “Semantic Web”
people
Other challenges
Data Warehouse
Need semantic-heterogeneity
solution
Security policy: Information in
each database is not free
Probabilistic world of evidence
accumulation
Web-scale
Sensor Data and Sensor Networks
Characteristics
Draw more power when
communicating than when
computing
Rapidly changing
configurations
Might not completely
calibrated
Multimedia Queries
Challenges
Create easy ways to:
Analyze
Summarize
Search
View
Require better facilities
for managing
multimedia information
Reasoning about Uncertain Data
Traditional DBMS have no
facilities for either approximate
data or imprecise queries
(Almost) all data are uncertain or
imprecise
DBMSs need built-in support for
data imprecision
The “lineage” of the data must be
tracked
Query processing must move to a
stochastic one
The query answers will get better
The system should characterize the
accuracy offered
Personalization
Query answers should
depend on the user
Relevance feedback
should also depend on the
person and the context
A framework for including
and exploiting
appropriate metadata for
personalization is needed
Need to verify the
information systems is
producing a “correct”
answer
Data Mining
Focus on efficient ways to
discover models of
existing data sets
Developed algorithms are:
classification, clustering,
association-rule discovery,
summarization…etc.
Challenges:
Data-mining research to
develop algorithms for
seeking unexpected
“ pearls of wisdom”
Integrate data mining with
querying, optimization, and
other database facilities
such as triggers
Self Adaptation
Modern DBMSs are more complex
Must understand disk partitioning, parallel
query execution, thread pools, and userdefined data types
Shortage of competent database
administrators
Goals
Perform tuning using a combination of a
rule-based system, a database of knob
settings, and configuration data
No knobs: all tuning decision are made
automatically
Need user behaviors and workloads
Recognize internal malfunctions, identify
data corruption, detect application failures,
and do something about them
Privacy
Security systems
Revitalize data-oriented
security research
Specify the purpose of the
data request
Access decisions should
be based on
Who is requesting the
data
To what use it will be put
Trustworthy Systems
Trustworthy systems
Safely store data
Protect data from unauthorized disclosure
Protect data from loss
Make it always available to authorized users
Ensure the correctness of query results and dataintensive computations
Digital rights management
Protect intellectual property rights
Allow private conversation
New User Interfaces
How best to render data visually?
During the 1980’s, we have QBE,
VisiCalc
Since then, nothing….
Need new better ideas in this area
Query languages
SQL and XQuery are not for end
users
Possible choices?
Keyword-based query InformationRetrieval community
Browsing increasingly popular
Ontology + speech on NL semantic
Web +NLP
One-Hundred-Year Storage
Archived information is disappearing
Capture on a deteriorating medium
Capture on a medium requiring obsolete devices
Application can interpret the information no longer works
A DBMS system can
Content remains accessible in a useful form
Automate the process of migrating content between formats
Maintain he hardware and software that each document needs
Manage the metadata long with the stored document
Query Optimization
Optimization of information integrators
For semi-structured query languages, e.g.,
XQuery
For stream processors
For sensor network
Inter-Query optimization involving large
numbers of queries
Next Steps
A test bed from Information-integration research
Revisit the solved problems Sea changes
Avoid drawing too narrow a box around what we
do Explore opportunities for combining
database and related technologies
Department of Computer Science, National Tsing Hua University
Thank You.
Any Question?
Reference
Jim Gray. "Evolution of Data Management." Computer
v29 n10 (October 1996):38-46.
http://www.research.microsoft.com/~gray/lowell/