Here> Oracle Database 11g for Data Warehousing
Download
Report
Transcript Here> Oracle Database 11g for Data Warehousing
<Insert Picture Here>
Oracle Database 11g for Data Warehousing
Presenter’s Name
Presenter’s Title
Agenda
• Technology
• Monitoring
• Information Life-cycle Management (ILM)
• Oracle Optimized Warehouse Initiative
• Market
<Insert Picture Here>
<Insert Picture Here>
Technology
Parallel Execution
select c.cust_last_name
, sum(s.amount_sold)
from customers c, sales s
where c.cust_id = s.cust_id
group by c.cust_last_name ;
Data on Disk
Parallel Servers
scan
join
aggregate
scan
join
aggregate
scan
join
aggregate
Scanners
Joiners
Aggregators
Coordinator
Partitioning – Benefits
Large Table
Partition
Composite Partition
Difficult to Manage
Divide and Conquer
Better Performance
Easier to Manage
More flexibility to match
business needs
Improve Performance
Transparent to applications
Partitioning in Oracle Database 11g
Interval Partitioning
• Partitions are created automatically as data arrives
Partitioning in Oracle Database 11g
Complete Composite Partitioning
•
•
•
•
Range – range
List – list
List – hash
List – range
RANGE-RANGE
Order Date by
Order Value
LIST-RANGE
Region by
Order Value
LIST-LIST
Region by
Customer Type
Partitioning in Oracle Database 11g
Reference Partitioning
• Inherit partitioning strategy
Partition
ORDERS
by Date
Partitioning in Oracle Database 11g
Virtual Column-Based Partitioning
ORDERS
ORDER_ID
---------9834-US-14
8300-EU-97
3886-EU-02
2566-US-94
3699-US-63
ORDER_DATE CUSTOMER_ID...
----------- ----------- -12-JAN-2007
65920
14-FEB-2007
39654
16-JAN-2007
4529
19-JAN-2007
15327
02-FEB-2007
18733
• REGION
REGION AS (SUBSTR(ORDER_ID,6,2))
-----US
EU
EU
US
US
requires no storage
• Partition by ORDER_DATE, REGION
Compression
• Tables and indexes can be compressed
• Can be specified on a per-partition basis
• Typical compression ratio 3:1
• Requires more CPU to load data
• Decompression hardly costs resources
• Compress for all DML operations
• Less data on disk
• Requires less time to read
• Completely transparent
Up To
3X
Compression
SQL Query Result Cache
• Store query results in cache
• Repetitive executions can use cached result
• Data Warehouse queries
• Long-running, IO-intensive
• Expensive computations
• Return few rows
• Excellent opportunity for SQL Query Result Cache
-----------------------------------------------------------------| Id | Operation
| Name
|
-----------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
1 | RESULT CACHE
| fz6cm4jbpcwh48wcyk60m7qypu |
|
2 |
SORT GROUP BY ROLLUP
|
|
|* 3 |
HASH JOIN
|
|
etc.
SQL Query Result Cache
Opportunity
• Retail customer data (~50 GB)
• Concurrent users submitting queries randomly
• Executive dashboard with 12 heavy analytical queries
• Cache results only at in-line view level
• 12 queries run in random, different order – 4 queries cached
• Measure average, total response time for all users
# Users
No cache
Cache
Improvement
2
186 s
141 s
24%
4
267 s
201 s
25%
8
447 s
334 s
25%
Other Performance Features
Transparent to Your Application
• Materialized Views
• Transparent rewrites of expensive queries
• Including rewrites on remote objects
• Incremental automatic refresh
• Bitmap Indexes
• Optimal storage
• Ideal for star or star look-a-like schemas
• SQL Access Advisor – based on workload
• Materialized view advice
• Index advice
• Partition advice
Bring Algorithms to the Data
Not Data to the Algorithms
• Analytic computations
done in the database
SQL analytics
OLAP
•
•
•
•
SQL Analytics
OLAP
Data Mining
Statistics
• Scalability
• Security
Data Mining
Statistics
• Backup & Recovery
• Simplicity
Native Support for Pivot and Unpivot
SALESREP
Q1
Q2
Q3
Q4
---------- ----- ----- ----- ----100
230
240
260
300
101
200
220
250
260
102
260
280
265
310
SALESREP
---------100
100
100
100
101
101
101
101
102
102
102
102
QU
REVENUE
-- ---------Q1
230
Q2
240
Q3
260
Q4
300
Q1
200
Q2
220
Q3
250
Q4
260
Q1
260
Q2
280
Q3
265
Q4
310
Native Support for Pivot and Unpivot
QUARTERLY_SALES
SALESREP
Q1
Q2
Q3
Q4
---------- ----- ----- ----- ----100
230
240
260
300
101
200
220
250
260
102
260
280
265
310
SALESREP
---------100
100
100
100
101
101
101
101
102
102
102
102
QU
REVENUE
-- ---------Q1
230
Q2
240
Q3
260
Q4
300
Q1
200
Q2
220
Q3
250
Q4
260
Q1
260
Q2
280
Q3
265
Q4
310
select * from quarterly_sales
unpivot include nulls
(revenue for quarter in (q1,q2,q3,q4))
order by salesrep, quarter ;
Native Support for Pivot and Unpivot
SALES_BY_QUARTER
SALESREP 'Q1' 'Q2' 'Q3' 'Q4'
---------- ----- ----- ----- ----100
230
240
260
300
101
200
220
250
260
102
260
280
265
310
SALESREP
---------100
100
100
100
100
100
100
101
101
101
101
102
QU
REVENUE
-- ---------Q1
230
Q2
240
Q3
160
Q4
90
Q3
100
Q4
140
Q4
70
Q1
200
Q2
220
Q3
250
Q4
260
Q1
260
select * from sales_by_quarter
pivot (sum(revenue)
for quarter in ('Q1','Q2','Q3','Q4'))
order by salesrep ;
Transform Data Where Data Resides
In-database ETL technology
Extract
Load
Transform
Insert
Data Pump
Transportable Tablespaces
Partition Exchange Loading
Change Data Capture
Distributed Queries
SQL*Loader
External Tables
Table Functions
Multi-Table Insert
MERGE
DML error logging
Asynchronous Change Data Capture
Capture
PMOPs
Log
files
Time-based subscription
windows
Oracle Database 11g
Change
Data Read-consistent
subscription
Log Miner
and
Streams
DW
Tables
Transform
SQL, PL/SQL,
Java
• Capture changes from [redo | archive] logs
• No changes to source applications
• Minimal performance impact on source applications
OLTP
DB
• Store changes in change tables
• Provide (bulk) SQL interface to change data
RAC – Scale Incrementally
W 300%
o
r 200%
k
l 100%
o
d
3
6
9
12 15
Months
18
21 24
Automatic Storage Management
• Storage pool for database files
• Load-balanced across disks
• Capacity on demand
• Add/remove storage on-line
• Automatic IO load balancing
• Fault tolerant, high performance
• Automatically mirrors and stripes
• Low cost
• No IO tuning required
• No volume manager or file system needed
Mixed Workloads
report
Budget table
update
update
accurate
report
• Concurrent small data loads
and queries
• Looks like... OLTP
• Oracle's read consistency
• Readers never block writers
Rollback
• Writers never block readers
Segment
Before • Queries are always consistent
and auditable
Image
• No deadlocks
• Introduced in Oracle V4 (1982)
– major improvements in V6
(1988)
Database Resource Manager
• Protect the system pro-actively
• Maximum number of concurrent operations
• Priority-dependent maximum Degree Of Parallelism (DOP)
Sales Analysis
High Priority
Ad Hoc Reports Medium Priority
ETL Jobs
Low Priority
20 users
(DOP 10)
200 users
(DOP 4)
200 users
(DOP 4)
Oracle Database Security
Authenticate
Protect
data in
transit
Authorize
Marketing
Finance
Sales
Identity Management
Access
Control
Protect
stored
data
Audit
Feature Usage for Large-Scale Data
Warehouses
Partitioning
Parallel Exec
Compression
MV Use
VPD
Read Only
ASM
RMAN
DB Res Mgr
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Partitioning, parallelism, and compression are the
foundation for large-scale data warehousing
Source: TB Club Report: A survey of 30 multi-TB Oracle DW’s – data July 2006
<Insert Picture Here>
Monitoring
I/O Monitoring
Database Control
I/O Monitoring
Database Control
Parallel Execution Monitoring
Database Control
Near Real-Time SQL Monitoring
Coming in Grid Control
Parallel SQL Monitoring
Coming in Grid Control
<Insert Picture Here>
Information Life-cycle
Management (ILM)
Information Lifecycle Management
“The policies, processes, practices, and tools used to align the
business value of information with the most appropriate and
cost effective IT infrastructure from the time information is
conceived through its final disposition.”
Storage Networking Industry Association (SNIA) Data Management Forum
Active
Data
Less Active
Data
Historical
Data
Information Lifecycle Management
Orders
Q1
Orders
Active
High Performance
Storage Tier
Q2
Orders
Q3
Orders
Less Active
Low Cost
Storage Tier
Q4
Orders
Older
Orders
Historical
Online Archive
Storage Tier
Traditional Storage Approach
All data resides on a single storage tier
High Performance
Storage Tier
= $72 per Gb
Active
Active
All data on active
= $972,000!
Partitioning is the Foundation for ILM
Partition data onto appropriate storage tier
High Performance
Storage Tier
= $72 per Gb
Active
Low cost
Storage Tier
= $14 per Gb
Less Active
Read only
Storage Tier
= $7 per Gb
Historical
Partitioning is the Foundation for ILM
Move data onto appropriate storage tier
High Performance
Storage Tier
= $72 per Gb
5% Active
Low cost
Storage Tier
= $14 per Gb
35% Less Active
Read only
Storage Tier
= $7 per Gb
60% Historical
Partitioning is the Foundation for ILM
Reduce storage costs accordingly
High Performance
Storage Tier
= $72 per Gb
5% Active
$49,800
Low cost
Storage Tier
= $14 per Gb
35% Less Active
$67,700
Read only
Storage Tier
= $7 per Gb
60% Historical
$58,000
Introduce Compression
Reduce storage costs across all tiers
5% Active
$49,800
35% Less Active
$67,700
60% Historical
$58,000
Lets use compression factor of 3
$16,600
$22,600
$19,400
Cost Savings by Storage Tier
<Insert Picture Here>
Oracle Optimized
Warehouse Initiative
Oracle Optimized Warehouse Initiative
Goals for Oracle data warehouse solutions:
• Provide superior system performance
• Provide a superior customer experience
Full Range of DW Solution Options
Custom
• Flexibility for the most
demanding data warehouse
• Benefits:
Optimized
Warehouse
Reference
Configuration
• Documented best-practice
configurations for data
warehousing
• Benefits:
High performance
• Scalable systems preinstalled and preconfigured: ready to run
out-of-the-box
• Benefits:
High performance
Unlimited scalability
Simple to scale;
modular building blocks
High performance
Completely
customizable
Industry-leading
database and hardware
Industry-leading
database and hardware
Fast to implement
Easy to maintain
Available today with HP,
IBM, Sun, EMC/Dell
• Database
Options
• Management
Packs
Simple to buy
• Database
Options
• Management
Packs
Competitively priced
• Partitioning
• RAC
Pre-configured, Pre-installed, Validated
Flexibility
<Insert Picture Here>
Market
Data Warehouse Market
Oracle is the Data Warehousing DBMS Market Leader
11.4%
Oracle
10.1%
39.8%
IBM
Microsoft
Teradata
16.0%
Other
22.7%
Source: IDC, 2006 - Worldwide Data Warehousing Tools 2005 Vendor Shares
Leading Scalability
Wintercorp VLDB Survey
1998 Survey
Sears
HCIA
Wal-Mart
Tele Danmark
Citicorp
MCI
NDC Health
Sprint
Ford
Acxiom
Teradata
Informix
Teradata
DB2
DB2
Informix
Oracle
Teradata
Oracle
Oracle
2003 Survey
4.63
4.50
4.42
2.84
2.47
1.88
1.85
1.30
1.20
1.13
France Telecom
AT&T
SBC
Anonymous
Amazon.com
Kmart
Claria
HIRA
FedEx
Vodafone Gmbh
Source: http://www.wintercorp.com
Oracle
Proprietary
Teradata
DB2
Oracle
Teradata
Oracle
Sybase IQ
Teradata
Teradata
2005 Survey
29.23
26.27
24.81
16.19
13.00
12.59
12.10
11.94
9.98
9.91
Yahoo!
AT&T
KT-IT Group
AT&T
LGR - Cingular
Amazon.com
Anonymous
UPSS
Amazon.com
Nielsen Media
Oracle
100.39
Daytona
93.88
DB2
49.40
Daytona
26.71
Oracle
25.20
Oracle
24.77
DB2
19.65
Microsoft 19.47
Oracle
18.56
Sybase IQ 17.69
Oracle DW 10+TB Customers (3/2006)
Various Platforms and Architectures
•
•
•
•
•
•
•
•
•
•
•
Acxiom
Allstate
Amazon
Cellcom
CenturyTel
Chase
Choicepoint
Claria
Experian
KTF
Cingular
16 TB
15 TB
61 TB
14 TB
10 TB
30 TB
14 TB
38 TB
14 TB
14 TB
25 TB
HP
Sun (RAC)
HP (RAC)
HP
HP
IBM (RAC)
Sun
Sun
Sun
HP
HP
•
•
•
•
•
•
•
•
•
•
•
Mastercard
NASDAQ
NexTel
NYSE Group
Reliance Ltd
Starwood
TIM (Italy)
Turkcell
UBS AG
UPS
Yahoo!
20 TB
35 TB
28 TB
15 TB
13 TB
12 TB
12 TB
14 TB
15 TB
10 TB
130 TB
Hundreds of Terabyte+ DW Customers!
IBM (RAC)
Sun
HP
HP (RAC)
Sun
HP
HP (RAC)
Sun (RAC)
Sun
HP
Fujitsu
Summary
• Technology
• Monitoring
• Information Life-cycle Management (ILM)
• Oracle Optimized Warehouse Initiative
• Market
<Insert Picture Here>
For More Information
http://search.oracle.com
BI & Data Warehousing
or
oracle.com