Performance Tuning

Download Report

Transcript Performance Tuning

Advanced Database Techniques
Martin.Kersten @ cwi.nl
Stefan [email protected]
Sandor Heman @ cwi.nl
Jennie Zhang @ cwi.nl
Romulo Goncalves @cwi.nl
Administrative details
• The website evolves as during the course
• Exam material is marked explicitly
• Lab work deadlines are strict
• Email is the preferred way to communicate
• Tomorrow the assistants will be available in
person between 11:00-12:00, room REC-P.123
Relational systems
• A database system should simplify the
organization, validation, sharing, and bookkeeping
of information
• Prerequisite knowledge
– Relational data model and algebra
– Data structures (B-tree, hash)
– Operating system concepts
– Using a SQL database system
• What is your practical experience?
[Ruby on Rails expertise needed]
Applications
• Bread-and-butter applications?
– Web-shop
– Banking systems
– Inventory systems
– Production systems
– Shopping systems
– Government systems
– Health systems
– Multimedia systems
– Science systems …
Advanced Applications
• Bread-and-butter applications ???
– Banking systems
• What happens if you install a stock trading system
which should handle >100K transactions/minute
• How to derive trading advice using compute
intensive applications
• How to warn thousands of users about their trading
opportunity
– …. Need for parallel, distributed main-memory
database technology…
Advanced application requirements
• Bread-and-butter applications
– Inventory applications
• How to install a battlefield inventory systems
• How to deliver goods just in time?
• How to keep track of moving objects/persons ?
• … need for sensor-based database support and
RFID tags … need for a new DBMS ?…
Advanced Applications
• Production systems
– How to interact with component suppliers
– How to manage the production workflow
– How to avoid bad production steps
– How to maintain a database with 12000 tables
(SAP)
• … need for interoperability between autonomous
systems… datamining and knowledge discovery…
Advanced Applications
• Health information systems
– How to monitor your health over 30 years
– How to enable quick response to a heart attack
• …need for interoperable database systems …
The Ambient Home
HELP
The Ambient Home
HELP
911
called
MonetDB DataCell
MonetDB DataCell
A Shared Tuple Space
using an SQL DBMS
nucleus
911 called
Advanced applications
• Government systems
– Security
• Biometric data management issues, finger/image
matching
– Public safety
• Forensics, manipulate complex objects using
proprietary algorithms
• …need for extensible database technology…need
to support unstructured data…
Advanced Applications
• Science systems
– The new accelerator in CERN
• how to handle >1PTByte files
– The Sloan Digital Skyserver schema is 200
pages and the catalogued data 2.5Tb
• How to query this efficiently
– ..need for P2P and … a novel way to organize
data…
LOFAR central processor specs
• Streaming Data
– Input: 320 Gbit/s
– Internally within correlator: 20 Tbit/s
– Into storage: 25 Gbit/s = 250 TByte/day
– Final products: 1-3 TByte/day
• High Performance Computing
– Correlation: 15 Tflops
– Pre processing and filtering: 5 Tflops
– Off-line processing (calibration, analysis): 5-10 Tflops
– Visualisation, control, scheduling etc: 2 Tflops
• Storage
– On-line temporal storage: 500 TByte
– Archive: PByte range of data stored in Grid
Technological challenges
• Data is often not structured as tables
– XML and XQuery
• Data does not always fit on one system
– Distributed and parallel databases
• Querying is more like world-wide searching
– Continuous and streaming queries
• A database tells more than facts
– Datamining and knowledge discovery
Code bases
• Database management systems are BIG software
systems
– Oracle, SQL-server, DB2 >1 M lines
– PostgreSQL 300K lines
– MySQL 500 K lines
– MonetDB 200-800 K lines
– SQLite 40K lines
• Programmer teams for DBMS kernels range from
a few to a few hundred
Performance components
•
•
•
•
•
Hardware platform
Data structures
Algebraic optimizer
SQL parser
Application code
– What is the total cost of execution ?
– How many tasks can be performed/minute ?
– How good is the optimizer?
– What is the overhead of the datastructures ?
Not all are equal
Gaining insight
• Study the code base (inspection + profiling)
– Often not accessible outside development lab
• Study individual techniques (data structures +
simulation)
– Focus of most PhD research in DBMS
• Detailed knowledge becomes available, but ignores
the total cost of execution.
• Study as a functional black box
– Analyse a small application framework
The Jack The Ripper Project
• Study the snippet of the database technology and
design an XQuery and SQL application
• What is the schema?
• What are the queries?
• What are unorthodox solutions?
Learning points
• My poor knowledge on relational database? Read
the chapters on SQL and relational algebra.
Knowledge on data structures comes in handy.
• Database systems are much more than
administrative bookkeeping systems
Learning points
– Advanced application challenge the technology
provided by a DBMS
– Many techniques do not easily scale in size,
complexity, functionality
– Effectiveness of a DBMS is determined by
many tightly interlocked components