World`s Largest OLTP Systems

Download Report

Transcript World`s Largest OLTP Systems

World’s Largest
Databases
Howard Fosdick
(630)-279-4286
(C) 2004 FCI
Who Am I?
Hands-on DBA (and SA) for …
• Oracle, DB2, SQL Server
• Unix, Linux, Windows
• Founder IDUG, MWDUG, CAMP
• Author, Speaker
Independent Contractor
(630)-279-4286
[email protected]
Outline
1.
2.
3.
4.
What’s a “Big Database”
DSS
OLTP
Observations
Statistics Sources
1. Winter Corp.
-----
Database Top Ten
Yearly survey
Vendor neutral
Free at: www.wintercorp.com
2. Survey.com
-- High-End BI/DW Competitive Analysis
-- Survey of 150 companies w/ big warehouses
-- Free at: www.survey.com
“Thank You” to both sources
Classifying Large Databases
DSS
Decision Support Systems
Online Analytical Processing
Data Warehouses
Multi-dimensional Databases
OLTP
(DSS)
(OLAP)
(DW)
(MDD)
+ Query oriented, mainly Read-only
Online Transaction Processing (OLTP)
+ Update with short transactions
(transaction = small CPU & data resources)
Commercial IT vs. Scientific/Research databases
What’s a Large Database ?
Database Size
- User data
- User data plus metadata & indexes
- DASD farm
VLDB = Very Large Database
Users
- Concurrent users
- Total user population
Load
- Concurrent queries
- Queries / day or hour
(simple vs complex queries)
Good definitions and measurements are key to success
II. World’s Biggest DSS Systems
Data Warehouses VS. Data Marts
DW
• Application neutral
• Service multiple organizational needs
DM
• Application specific
• Organizationally focused
Largest systems are usually data warehouses
What’s Driving the Growth of
Large Data Warehouses ?
!!!!! Super Big Groceries !!!!!
Web Sites -- Clickstream data
Retail --
- Transaction Level Detail (TLD)
Preferred Customer Card #283736
Hello, I’m Scot94
03/04/04 02:38 3284 03 2918 33
Store 493 Loc 229
PRETTY-LADY HAIRCLR
AARP MAGAZINE
DIAPERS
BEER SIX-PACK
Understanding customer
behavior means $$$ !
Tax 2.40
Cash
Change
1 5.99
1 4.95
2 10.00
1 3.45
BAL 36.79
40.00
3.21
Save this Receipt –
Get $2.00 off on Prozac
When You Buy Super-Baby Food !
What’s Driving the Growth of
Large Data Warehouses ?
Necessary Preconditions -• Cheap Hardware
• Higher reliability / availability
(based on dynamic hardware swapping)
• Better Software
• Lax privacy laws in USA
• EU curtails cross-usage of data
• EU has stronger privacy laws
World’s Largest DSS Systems
•
•
•
•
•
•
•
•
•
© 2003
Way bigger than just 3 years ago
All Unix “mainframes”
All use SANs (Storage Area Networks) (aka ESS)
No IBM Mainframes
No Windows or Wintel
No SQL Server
No Linux or Open Source databases
NCR/Teradata niche market at 2.7% (Gartner 05/28/03)
Goodbye Informix!
Winter Corp.
Database Size =
disk storage for
user tables,
indices, aggregates
Large DSS Systems
Unix “mainframe”
Query
Users
Storage Area Network
Sun E12/15K
HP Superdome
EMC
IBM Regatta
Hitachi
HP
LSI
Unix “mainframes” –
+ Dynamically add/drop CPUs, RAM
(Sun calls it partitioning)
+ High reliability
(as good as clusters or Mainframes)
+ Capacity on Demand
SANs –
+ Flash (“snap”) backup
(OS-level backup)
+ Large Cache
+ Intelligent data
placement/movement
Example Evolution
– Scaling a Unix “Mainframe”
35 concurrent
users
25 concurrent
users
12 concurrent
users
8 CPUs
@ 16 Gig RAM
32 CPUs
@ 64 Gig RAM
64 CPUs
@ 64 Gig RAM
Other upgrades:
Oracle 8i -> 9i
Sun E10K -> E12K
World’s Largest DSS Systems -- Windows
© 2003 Winter Corp.
•
•
•
•
•
•
Way smaller than Unix systems
Way bigger than just 3 years ago
Oracle vs SQL Server (like market share battle for Windows DBMSs)
Also use SANs (Storage Area Networks)
No IBM DB2 UDB
No Teradata
World’s Largest DSS Systems
-- By Peak Workload
© 2003 Winter Corp.
© 2003 Winter Corp.
Where did IBM Mainframes Go ?
1994
2004
Big
Iron
Big
Silicon
Poof!
-- Goodbye…
-- Largest databases
-- Smaller mainframes (VM, VSE)
-- Reliability advantage eroded
-- High cost per CPU
+ Hello Linux !
+ Good for -+ Consolidation platform
+ Legacy systems
+ Virtualization
(multi-OS platform)
Oracle Rising
• Joined the Top Ten list 3 to 5 years ago
• 8i added essential DSS technologies ...
+
+
+
+
+
+
+
+
+
Partitions
New ROW ID (for bigger databases)
Thorough Parallelism (DML, DDL, utilities)
Index improvements
(bit mapped IXs, function-based, desc, others)
Resource Manager (proactive)
Materialized Views
Large memory mgmt
Optimizer is Partition-aware
Online DDL operations and Utilities
Example Oracle Warehouses
© 2003
Winter
Corp.
Amazon
Best Buy
Colgate
Telecom Italia
Mobile
System
HP Superdome
Sun 15K
HP AlphaServer
Architecture
SMP
SMP
IBM p690
Regatta
SMP
Storage
EMC
EMC
IBM
EMC
Processors
64
24
24
2 node cluster
Oracle Version
9i
8i
9i
8i
DB Size
13 T
6.3 T
3.8 T
16 T
Number of
Tables
600
4025
27,000
1,200
Clickstream
data
Sales
Transaction data
Varied detail
data
Call detail records
Detail Data
User Population
800
16,000
6,200
400
Concurrent
Users
55-60
600-700
600-700
55
2
2
n/a
3
4300 queries /
day
150,000 queries /
4 hour period
14,200 steps /
day
700 M records loaded
/ day
DBAs
Peak Workload
Cluster
Why Not Oracle Clustering ?
+ Great for non-disruptive scaling of existing systems
. . . But the biggest systems tend not to use it
-- Unix “mainframe” no longer requires clustering
for reliability, availability or easy scalability
-- Clustering means complexity in minimizing the…
-- Locking issues
9i improved this via Cache Fusion –
but SMP Unix “mainframe” will still be favored
Where’s SQL Server 2000 ?
• Big in OLTP but lacks essential DSS technologies ...
-- Parallelism restricted to SELECTs
-- Needs it for other DML, DDL, utilities
-- Partitions
-- Wintel restriction
Yukon ?
-- Many new features. . . ready for “Top Ten” DSS ?
(Features = partitioning, database mirroring, mirrored backups, online Indexing & Restore, fast recovery,
ANSI 1999 T-SQL, CLR support, native XML, XML Query, better .NET support,
Reporting Services, Service Broker (async messaging), extensible data types…)
Where’s Open Source ?
Linux
+ 2.6 kernel now out
+ More CPUs (to 16)
+ More RAM (> 4+ Gig)
+ Better threading, file system support
MySQL and PostgresQL
-- Top out at 500,000 page views per day (EWeek 2003)
(or 15 per second)
+ Improving rapidly
Prediction – open source will support big databases
but not “Top Ten” list sites
Risks of Large DWs
• 40% of IT projects fail due to … Management (time & budget issues)
• “Large warehouses are unforgiving” -- Survey.com
• Design issues critical
• Database Design
• Query design (and EXPLAINs)
• ETL design and scheduling
• Pre-program wherever possible
(control users and the resources they use)
• Monitoring and alerts
• Scale gradually (staggered loads on a schedule…)
• Benchmarks (after each Scaling Point)
Risks of Large DWs
• Partitioning data properly is critical
• For better physical management (utilities)
• Optimizers use this info
• Parallelism via multiple partitions
• How to partition
• Depends on data usage
• Examples: geographical, hash, unique id, ranges…
III. World’s Biggest OLTP Systems
World’s Largest OLTP Systems
© 2003 Winter Corp.
•
•
•
•
•
Wintel “mainframes” arrive !
SQL Server arrives
Use SANs
CA can do the job (but has tiny overall database market share)
Oracle has big systems -- but not in the top ten
World’s Largest OLTP Systems
-- Unix
-- Windows
© 2003
Winter Corp.
© 2003
Winter Corp.
World’s Largest OLTP Systems
-- By Number of Rows
© 2003
Winter Corp.
© 2003
Winter Corp.
OLTP Observations
• Wintel “mainframes” w/ SQL Server displace MVS/CICS
• SQL Server dominates Wintel OLTP
• Great for pre-programmed, resource-limited txns
• Oracle dominates Unix OLTP
IV. Observations
Architectures
Shared-disk
Clusters
Shared-nothing
(Massively Parallel Processing or MPP)
Large SMP
“mainframe”
The “architectural debate” means
far less than it used to !
Vendor Architectures
Product:
Architecture:
Implementation:
DB2 UDB for z/OS
Shared-disk clustering
DB2 Data Sharing on Sysplex
DB2 UDB for LUW
Shared nothing
DB2 UDB ESE partitioning
feature
Oracle
Shared-disk clustering
or SMP
Real Application Clusters
(RAC) -- previously known as
Oracle Parallel Server (OPS)
SQL Server 2000
Teradata
Shared nothing
or SMP
Customer-developed
partitioning based on SQL
Server features
Shared nothing
Teradata on NCR MPP
DBMS Licensing Costs
+ Low-cost SQL Server supports the
biggest OLTP systems
Teradata
-- Pressure on Teradata to keep its niche
$$$$$
+ Open Source DBMSs have a role
but it’s not “Top Ten” databases
Oracle
DB2 UDB
Biggest DSS
Systems
SQL Server 2000
$
Open Source
(MySQL, PostgreSQL)
Biggest OLTP
Systems
Database pricing varies by the options
selected and by the deal an IT organization
cuts with the vendor.
TCO ?
Your mileage may vary!
DW Labor Costs
© 2002 Survey.com
Like TCO, Labor Costs may be an un-measurable …
•
•
•
•
•
•
Figures applicable across sites ?
Every vendor claims lowest labor costs
“Terabytes per DBA” may be non-linear!
1 or 2 DBAs for a 24/7 site ?
Development staff will be larger than Maintenance staff
Your mileage will vary
Multi- Machine Mixed Systems
Sabre /
Travelocity
45 Linux w/
MySQL servers
EWeek, 2/23/04
(Fare look-up
and routing)
17 Himalaya
Non-stop w/
Master database
(Transactional updates)
Multi- Machine Mixed Systems
Omaha
Steaks
* 50,000 to 68,000 daily sessions
* 1 year in Production / 8 Million sessions
17 Linux w/
MySQL servers
(Shopping cart)
EWeek 2003
ISeries
DB2
(Transactional
updates)
Conclusions
• Databases are growing exponentially
• IT is closing in on Scientific/Research databases
• “Multiple machine” mixed systems are becoming popular
(Monolithic central databases are no longer the only game in town)
• “Mixed use” databases are becoming more common
• Multiple applications
• Read and update
• Open Source supports large systems -- but not “Top Ten”
• VLDBs are instructive – but unique in some ways
?
?
?
questions...
?
?
?
?
?
?