Host - Oracle Software Downloads
Download
Report
Transcript Host - Oracle Software Downloads
Session id: #36568
High Performance Communication
for Oracle using InfiniBand
Ross Schibler
Peter Ogilvie
CTO
Topspin Communications, Inc
Principal Member of Technical Staff
Oracle Corporation
Session Topics
Why the Interest in InfiniBand Clusters
InfiniBand Technical Primer
Performance
Oracle 10g InfiniBand Support
Implementation details
Why the Interest in InfiniBand
InfiniBand is key new feature in Oracle 10g
Enhances price/performance and scalability; simplifies systems
InfiniBand fits broad movement towards lower costs
Horizontal scalability; converged networks, system virtualization...grid
Initial DB performance & scalability data is superb
Network tests done; Application level benchmarks now in progress
InfiniBand is widely supported standard - available today
Oracle…Dell, HP, IBM, Network Appliance, Sun and ~100 others involved.
Tight alliance btw Oracle and Topspin enables IB for 10g
Integrated & tested; delivers complete Oracle “wish list” for high speed
interconnects
System Transition Presents Opportunity
Server Revenue Mix
18%
16%
43%
Share of Revenues
14%
12%
10%
8%
39%
High-End
6%
1996
Mid
4%
23%
Entry
2%
2001
2002
0%
$0-2.9K
$3-5.9K
$6-9.9K
$1024.9K
$2549.9K
$5099.9K
$100249.9K
$250499.9K
$500999.9K
$1M-3M
$3M+
Price Band
Source: IDC Server Tracker, 12/2002
Major shift to standard systems - blade impact not even factored in yet
Customer benefits from scaling horizontally across standard systems
–
Lower up-front costs, Granular scalability, High availability
The Near Future
Server Revenue Mix
18%
16%
Share of Revenues
14%
Web
Services
12%
10%
Enterpris
e Apps
Database
Clusters &
Grids
8%
6%
Legacy
&
Big Iron
Apps
4%
2%
Scale Out
Scale Up
0%
$0-2.9K
$3-5.9K
$6-9.9K
$1024.9K
$2549.9K
$5099.9K
$100249.9K
$250499.9K
$500999.9K
$1M-3M
$3M+
Price Band
Market Splits around Scale-Up vs. Scale-Out
Database grids provide foundation for scale out
InfiniBand switched computing interconnects are critical enabler
Traditional RAC Cluster
Application Servers
Gigabit Ethernet
Fibre Channel
Shared
Storage
Oracle RAC
Three Pain Points
Application Servers
Oracle RAC
Gigabit Ethernet
Scalability within the Database
Tier limited by Interconnect
Latency, Bandwidth, and
Overhead
Throughput Between the
Application Tier and Database
Tier limited by Interconnect
Bandwidth, and Overhead
Fibre Channel
I/O Requirements driven by
number of servers instead of
application performance
requirements
Shared
Storage
Clustering with Topspin InfiniBand
Application Servers
Shared
Storage
Oracle RAC
Removes all Three Bottlenecks
Application Servers
Oracle RAC
InfiniBand provides 10 Gigabit
low latency interconnect for
cluster
Application tier can run over
InfiniBand, benefiting from
same high throughput and low
latency as cluster
Shared
Storage
Central server to storage I/O
scalability through InfiniBand
switch
Removes I/O bottlenecks to
storage and provides
smoother scalability
Example Cluster with Converged I/O
Fibre Channel to InfiniBand gateway for storage access
Two 2Gbps Fibre Channel ports per gateway
Create 10Gbps virtual storage pipe to each server
Ethernet to InfiniBand gateway for LAN access
Four Gigabit Ethernet ports per gateway
Create virtual Ethernet pipe to each server
Industry Standard Network
Industry Standard Storage
Industry Standard Network
Industry Standard Storage
Industry Standard Network
Industry Standard Storage
Industry Standard Network
Industry Standard Storage
Industry Standard Server
InfiniBand switches for cluster interconnect
Twelve 10Gbps InfiniBand ports per switch card
Up to 72 ports total ports with optional modules
Single fat pipe to each server for all network traffic
Industry Standard Server
Industry Standard Server
Industry Standard Server
Industry Standard Server
Topspin InfiniBand Cluster Solution
Cluster Interconnect with Gateways for I/O Virtualization
Family of switches
Ethernet or Fibre Channel
Gateway modules
Host Channel Adapter With Upper Layer Protocols
Protocols
Platform Support
uDAPL
Linux: Redhat, Redhat AS, SuSE
SDP
Solaris: S10
SRP
IPoIB
Windows: Win2k & 2003
Processors: Xeon, Itanium, Opteron
Integrated System and
Subnet management
InfiniBand Primer
InfiniBand is a new technology used to
interconnect servers, storage and networks
together within the datacenter
Runs over copper cables (<17m) or fiber optics
(<10km)
Scalable interconnect:
–
–
–
1X = 2.5Gb/s
4X = 10Gb/s
10X = 30Gb/s
CPU
Server
Mem
Cntlr
System
Memory
SM
Server
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Host
Server
Server
HCA
IB Link
CPU
Host Interconnect
InfiniBand Nomenclature
IB Link
Topspin 360/90
IB Link TCA
Ethernet link
Switch
Ethernet
IB Link TCA
FC link
Storage
Network
CPU
Mem
Cntlr
System
Memory
HCA – Host Channel Adaptor
SM - Subnet manager
TCA – Target Channel
Adaptor
HCA
IB Link
CPU
Host Interconnect
InfiniBand Nomenclature
SM
IB Link
IB Link TCA
Ethernet link
Switch
IB Link TCA
FC link
Kernel Bypass
Kernel Bypass Model
Application
User
Kernel
Sockets
Layer
TCP/IP
Transport
Driver
Hardware
uDAPL
async sockets
SDP
CPU
Server (Host)
System Memory
Mem
Cntlr
App Buffer
OS Buffer
NIC
interconnect
CPU
Host Interconnect
Copy on Receive
Data traverses bus 3 times
CPU
Server (Host)
System Memory
Mem
Cntlr
App Buffer
OS Buffer
HCA
interconnect
CPU
Host Interconnect
With RDMA and OS Bypass
Data traverses bus once, saving CPU and memory cycles
APIs and Performance
Application
BSD Sockets
uDAPL
Async I/O
extension
SDP
TCP
IP
RDMA
IPoIB
1GE
0.8Gb/s 1.2Gb/s
10G IB
3.2Gb/s
6.4Gb/s
6.4Gb/s
Why SDP for OracleNet & uDAPL for RAC?
RAC IPC
–
–
–
Message based
Latency sensitive
Mixture of previous APIs
use of uDAPL
OracleNet
–
–
–
Streams based
Bandwidth intensive
Previously written to sockets
use of Sockets Direct Protocol API
InfiniBand Cluster Performance Benefits
Network Level Cluster Performance for Oracle RAC
30000
InfiniBand
25000
Block
Transfer/sec
(16KB)
GigE
20000
15000
10000
5000
0
2-node cluster
4-node cluster
Source: Oracle Corporation and Topspin on dual Xeon processor nodes
InfiniBand delivers 2-3X higher block
transfers/sec as compared to GigE
InfiniBand Application to Database
Performance Benefits
250
InfiniBand
200
GigE
150
Percent
100
50
0
CPU Utilization
Throughput
Source: Oracle Corporation and Topspin
InfiniBand delivers 30-40% lower CPU
utilization and 100% higher throughput
as compared to Gigabit Ethernet
Broad Scope of InfiniBand Benefits
Intra RAC: IPC over uDAPL over IB
Sniffer Server
monitoring/analysis
Sniffer Server
monitoring/analysis
Sniffer Server
monitoring/analysis
FC gateway:
host/lun mapping
Sniffer Server
monitoring/analysis
Ethernet
gateway
OracleNet:
over SDP
over IB
Sniffer Server
monitoring/analysis
Sniffer Server
monitoring/analysis
Sniffer Server
monitoring/analysis
SAN
Sniffer Server
monitoring/analysis
Sniffer Server
monitoring/analysis
Sniffer Server
monitoring/analysis
Sniffer Server
monitoring/analysis
Network
Sniffer Server
monitoring/analysis
Sniffer Server
monitoring/analysis
Sniffer Server
DAFS
over IB
monitoring/analysis
Sniffer Server
monitoring/analysis
NAS
Sniffer Server
monitoring/analysis
Sniffer Server
monitoring/analysis
Application Servers
20% improvement
in throughput
2x improvement
in throughput and
45% less CPU
Oracle RAC
3-4x improvement in
block updates/sec
Shared
Storage
30% improvement in
DB performance
uDAPL Optimization Timeline
Workload
Database
CacheFusion LM
skgxp
uDAPL
CM
IB HW/FW
April-August 2003: Gathering OAST and industry
standard workload performance metrics. Fine tuning
and optimization at skgxp, uDAPL and IB layers
Feb 2003: Cache Block Updates show fourfold
performance improvement in 4-node RAC
Jan 2003: added Topspin CM for improved scaling
of number of connections and reduced setup times
Dec 2002: Oracle interconnect performance released,
showing improvements in bandwidth (3x),
latency(10x) and cpu reduction (3x)
Sept 2002: uDAPL functional with 6Gb/s throughput
RAC Cluster Communication
High speed communication is key
–
–
Two Primary Oracle Consumers
–
–
must be faster to fetch a block from a remote cache than to read
the block from disk
Scalability is a function of communication CPU overhead
Lock manager / Oracle buffer cache
Inter instance parallel query communication
SKGXP Oracle’s IPC driver interface
–
–
–
Oracle is coded to skgxp
Skgxp is coded to vendor high performance interfaces
IB support delivered as a shared library libskgxp10.so
Cache Fusion Communication
LMS
Lock request
Shadow
processes
to client
RDMA
cache
cache
Parallel Query Communication
PX
Servers
to client
msg data
PX
Servers
Cluster Interconnect Wish List
OS bypass (user mode communication)
Protocol offload
Efficient asynchronous communication model
RDMA with high bandwidth and low latency
Huge memory registrations for Oracle buffer caches
Support large number of processes in an instance
Commodity Hardware
Software interfaces based on open standards
Cross platform availability
InfiniBand is first interconnect to meet all of
these requirements
Asynchronous Communication
Benefits
–
–
–
Reduces impact of latency
Improves robustness by avoiding communication
dead lock
Increases bandwidth utilization
Drawback
-
Historically costly, as synchronous operations are
broken into separate submit and reap operations
Protocol Offload & OS Bypass
Bypass makes submit cheap
–
Requests are queued directly to hardware from
Oracle
Offload
–
–
Completions move from the hardware to Oracle’s
memory
Oracle can overlap commutation and
computation without a trap to the OS or context
switch
InfiniBand Benefits by Stress Area
Stress Area
Benefit
Cluster Network
Extremely low latency
10 Gig throughput
Compute
CPU & kernel offload removes TCP overhead
Frees CPU cycles
Server I/O
Single converged 10 Gig network for cluster,
storage, LAN
Central I/O scalability
Stress level varies over time with each query
InfiniBand provides substantial benefits in all three areas
Benefits for Different Workloads
High bandwidth and low latency benefits for
Decision Support (DSS)
–
Should enable serious DSS workloads on RAC clusters
Low latency benefits for scaling Online
Transaction Processing (OLTP)
Our estimate: One IB Link replaces 6-8 Gigabit
Ethernet links
Commodity Hardware
Higher capabilities and lower cost than
propriety interconnects
InfiniBand’s large bandwidth capability means
that a single link can replace multiple GigE
and FC interconnects
Memory Requirements
The Oracle buffer cache can consume 80%
of a host’s physical memory
64 bit addressing and decreasing memory
prices mean ever larger buffer caches
Infiniband provides…
–
–
Zero copy RDMA between very large buffer
caches
Large shared registrations moves memory
registration out of the performance path
Two Efforts Coming Together
RAC/Cache Fusion and Oracle Net
Two Oracle engineering teams working at cluster and
application tiers
–
10g incorporates both efforts
Oracle Net benefits from many of the same capabilities
as Cache Fusion
–
–
–
–
–
–
OS kernel bypass
CPU offload
New transport protocol (SDP) support
Efficient asynchronous communication model
RDMA with high bandwidth and low latency
Commodity hardware
Working on external and internal deployments
Open Standard Software APIs
uDAPL and Async Sockets/SDP
Each new communication driver is a large
investment for Oracle
One stack which works across multiple platforms
means improved robustness
Oracle grows closer to the interfaces over time
Ready today for immerging technologies
Ubiquity and robustness of IP for high speed
communication
Summary
Oracle and major system & storage vendors are supporting
InfiniBand
InfiniBand presents superb opportunity for enhanced
horizontal scalability and lower cost
Oracle Net’s InfiniBand Support significantly improves
performance for both the app server and the database in
Oracle 10g
Infiniband provides the performance to move applications to
low cost Linux RAC databases. ????
QUESTIONS
ANSWERS
Next Steps….
See InfiniBand demos first hand on the show floor
–
–
Dell, Intel, Netapp, Sun, Topspin (booth #620)
Includes clustering, app tier and storage over
InfiniBand
InfiniBand whitepapers on both Oracle and
Topspin websites
–
–
www.topspin.com
www.oracle.com