Power Point - home.apache.org

Download Report

Transcript Power Point - home.apache.org

Open Source Database
Rises to the Challenge
Britt Johnston
CTO
NuSphere
Agenda





2
Rise of Open Source Databases
Traditional and Open Source Licensing
Properties of Successful Community
MySQL Gemini Project
Future Trends
© 2001, NuSphere
Terminology





3
OSDB – Open Source Database
OSI – Open Source Initiative
OSD – Open Source Definition
FSF – Free Software Foundation
GPL – General Public License
© 2001, NuSphere
On Relational Databases…
“Relational databases can handle no more
than one hundred megabytes of data and
maybe ten users.”
Database Product Manager
Digital Equipment Corporation
1985
4
© 2001, NuSphere
Change is Constant

Relational databases
– Early debate on viability
– Clumsy positioning with existing products
– Tidal wave of acceptance

OSDB is in a similar position
– Reluctantly admit usefulness, but
– “Commercial databases are
required for backend functions.”
5
© 2001, NuSphere
Rapid Evolution!

Top 3 Selling Database Books
1. SQL for Dummies
2. MySQL
3. Oracle 8i Reference
B&N Topsellers 10/28/00

References On the Web
•
•
•
•
•
•
Oracle
MySQL
PostgreSQL
SQL Server
DB2
Interbase
Google Searches
6
3.0 Million
2.3 Million
0.7 Million
0.6 Million
0.5 Million
0.1 Million
3.4 Million
2.9 Million
0.8 Million
0.6 Million
0.5 Million
0.1 Million
10/28/00
01/31/01
+13%
+26%
+14%
© 2001, NuSphere
2000 –

First Boxed Commercial Distributions
– September 2000

First Open Source Database Summit
– October 2000

7
Four companies dedicated to Open
Source Database products were launched
© 2001, NuSphere
2001 –

140,000 MySQL Books sell in 12 months

Oracle: MySQL to Oracle Migration Kit

MySQL Track at Major Conferences

NuSphere Delivers Gemini Beta
8
© 2001, NuSphere
Agenda





9
Rise of Open Source Databases
Open Source Licensing
Properties of Successful Community
MySQL Gemini Project
Future Trends
© 2001, NuSphere
The Open Source License
Nine Major Requirements:
Free redistribution – cannot require royalty
Source code – source and binary distribution
Derived works – allow distribution of changes
Integrity of author’s code – may require patch
No discrimination against persons or groups
No discrimination against fields of endeavor
License distributed – no new license required
License not product-specific – extracted code ok
No contamination of other software on same medium
1.
2.
3.
4.
5.
6.
7.
8.
9.
Open Source Definition (Bruce Perens) www.opensource.org
10
© 2001, NuSphere
MySQL License

Prior to June 2000
– Not open source license

June 2000 and Future
– All releases under GPL
11
© 2001, NuSphere
On Open Source…
“We recommend that products near the end of
their life go open source.”
Gartner Group Analyst
October 2000
12
© 2001, NuSphere
Agenda





13
Rise of Open Source Databases
Traditional and Open Source Licensing
Properties of Successful Community
MySQL Gemini Project
Future Trends
© 2001, NuSphere
It’s Not Only About Technology

Feature Wars
– Oracle vs. Microsoft:
competition not based on user need
– Feature bloat will have long term impact

14
Fast, easy-to-use, integrated; with a clean
programming model are most important
© 2001, NuSphere
Signs of Healthy Community







15
Mix of Church and State membership
World-wide contributor community
Active development process
Rich collection of interfaces
Support from other products
Full service offerings
Clear license terms
© 2001, NuSphere
Modular Architecture



16
Key to scalable community
Drive rapid evolution
Allow large contributions from
multiple sources
© 2001, NuSphere
MySQL Modular Architecture

Table Handler
– Specialized storage for individual tables



MyISAM – High Speed Read Mostly
Heap – In Memory Tables
Gemini – Large Scale OLTP
– Row-level Locking, Transactions, Recovery

More under development
MySQL
MyISAM
17
Heap
Gemini
© 2001, NuSphere
Agenda





18
Rise of Open Source Databases
Traditional and Open Source Licensing
Properties of Successful Community
MySQL Gemini Project
Future Trends
© 2001, NuSphere
Gemini - Row Level Locking

Gemini is NuSphere’s contribution to MySQL
project – design targets:







Multi-threaded engine
Supports 10,000 concurrent transactions
Sustains 1 billion tpd on single server
Small footprint for PC class hardware
SMP support for large systems
Concurrent operations on parallel threads
Familiar SQL standard programming model used
by commercial applications today.
– Table and Row-level Locking
– Standard Isolation Levels
19
© 2001, NuSphere
Gemini – Historical Roots
Progress RDBMS
MySQL
Language
Processor
and Server
Language
Processor
and Server
Storage Engine
Gemini Engine
20
© 2001, NuSphere
Gemini – Historical Roots

Progress RDBMS source for Technology


#6 Relational DB Worldwide
#1 Embedded DB Worldwide
IDC Worldwide Database Market – May 2000


Proven Performance and Reliability
Technology is Base for Gemini

21
Recovery, Locking, B-Tree, Concurrency,
Cache, I/O and Transaction mechanisms
© 2001, NuSphere
Important Design Factors

Gemini is designed to be:
–
–
–
–
–
22
Database Schema Independent
Record Format Independent
Index Key Format Independent
Server Architecture Independent
Gemini API Closely Matches
MySQL Table Handler API
© 2001, NuSphere
Gemini Properties

Targeted squarely at OLTP model
– Heavy concurrent update
– Online maintenance operations

Expands open source web platform
– Backend database for e-commerce sites
– Reliability with 10+ years of “experience”
– Proven enterprise-class technology
23
© 2001, NuSphere
Gemini Properties

Multi-threaded storage manager
–
–
–
–

Concurrent read and write operations
Concurrent commit support
Fine grained locking of internal structures
Online Recovery from failed threads
Scalable database cache
– Dynamic data and index cache size
– 128GB capacity (RAM limited)
– LRU mechanism with index page priority
24
© 2001, NuSphere
Gemini Properties - Transactions

Support for ACID Transactions
–
–
–
–
25
Atomicity – All or nothing
Consistency – Data in consistent state
Isolation – Allow independence
Durability – Effects persist always
© 2001, NuSphere
Gemini Properties - Transactions

Support for 4 Standard Isolation levels
–
–
–
–

Read uncommitted
Read committed
Repeatable read
Serializable
Table and row lock support
– 6 mode lock manager (intent support)
– 2 phase with automatic lock acquisition
– Delegated delete locks
26
© 2001, NuSphere
Transaction Isolation Levels

Levels described in terms of possible
anomalies
– Dirty read - read data written by concurrent
uncommitted transaction.
– Non-repeatable reads - re-read data
previously read and see data modified by
another committed transaction
– Phantom read - re-run same query and see
additional rows inserted by another
committed transaction.
27
© 2001, NuSphere
Transaction Isolation Levels
Level
Dirty
Read
Read uncommitted Yes
Read committed
No
Repeatable read
No
Serializable
No
28
Non-Repeatable
Read
Yes
Yes
No
No
Phantom
Read
Yes
Yes
Yes
No
© 2001, NuSphere
Durable Transactions
Recovery Log
DB on Disk
• Previous Record
• Transaction Notes
• Update Notes
Memory Copy
• Updated Record
29
© 2001, NuSphere
Rollback and Recovery
Recovery Log
DB on Disk
• Previous Record
• Transaction Notes
• Update Notes
Memory Copy
• Updated Record
30
© 2001, NuSphere
Scalability

What is scalability, why does it matter?
– Increase workload or hardware


More work gets done (concurrency)
Work gets done faster (response time)
– Improve throughput with additional
resources

Bottlenecks can be removed
– Minimal impact on response time when
scaling
31
© 2001, NuSphere
Scalability

Architectural limits raised as high as possible
– Goal: storage engine does not impose limits.
– Available hardware and OS are limiting factors.
– Example – Concurrent Users:




32
Demonstration 5,000 users
Published limit is 10,000
Tested Limit is 32,000
Architecture limit is 4 billion
© 2001, NuSphere
Scalability

A system may be limited by:
– Number of disks and controllers
– Number of open files, OS kernel
– Memory
Not the underlying database
33
© 2001, NuSphere
Multi-Processor Support




Several flavors of spin locks
Can directly use 32 CPU SMP Hardware
Alpha, IA32, IA64, PA-RISC, Power, Sparc
Work with hardware vendors to tune chip
specific resource locking primitives.
– Non blocking, no unneeded system calls
– Account for CPU cache characteristics
– Instruction and Data Fence requirements
34
© 2001, NuSphere
Gemini Crash Recovery

Automated recovery and logging
– Log is created at system startup
– Space is reused automatically
– Log is optionally removed at shutdown

Asynchronous checkpoint support
– Stable performance under load
– Self tuning multi-threaded I/O subsystem
35
© 2001, NuSphere
24 X 7 Availability

Powerful High Availability solution:
– Automatic Crash Recovery

Online in 15 to 30 seconds
– Fail over Clusters

Eliminate single points of failure
– Flexible Backup Solutions


Online Backup – non blocking
Split Mirror Zero Impact Backup
– Table and Site Replication

36
Support Server Farm Model
© 2001, NuSphere
Gemini – Ease of Use



Familiar storage model used by MySQL
MySQL native record format
Specify table type to use Gemini tables
CREATE TABLE …

Programming model
–
–
–
–
37
TYPE = GEMINI;
Table and row-level locking
Statement atomicity
Multi-statement ACID transactions
Standard isolation levels
© 2001, NuSphere
Gemini – Current Status




Beginning Formal Beta
Check-in via MySQL community process
Basic functions complete (insert, update, delete,
select) for all types
Active reliability testing
– Can run same tests against multiple table types

Benchmark work started
– Target: High concurrency, heavy update
– Goal: Fastest engine for transaction processing
38
© 2001, NuSphere
Agenda





39
Rise of Open Source Databases
Traditional and Open Source Licensing
Properties of Successful Community
MySQL Gemini Project
Future Trends
© 2001, NuSphere
It’s All In The Community!

A traditional software product rarely has a
sustaining community outside the
employees of the company.

Look for existing communities and create
or integrate missing technology.

Contributors using community process?
40
© 2001, NuSphere
Corporate IT Is Catching On

Response to OSDB is no longer to challenge its
viability.

IT managers know they can get support and
packaged distributions.

OSDB is proven solution for wide range of web
infrastructure.

Gemini changes the rules for OLTP systems built
with open source software.
41
© 2001, NuSphere
Corporate IT Is Catching On

New projects are looking at OSDB for
significant aspects of a solution.

Initial acquisition costs no longer a factor;
IT can choose service level on perapplication level based on business
needs.

IT will increasingly go with an OSDB
solution.
42
© 2001, NuSphere
Questions…
[email protected]
www.nusphere.com