Teradata Explained - dbmanagement.info

Download Report

Transcript Teradata Explained - dbmanagement.info

The Teradata Database
Explained, Illustrated and Demystified
Accenture and Teradata
Teradata Architecture
Key Database Features
Accenture Confidential
Teradata Architecture:
Table of Contents
Introduction
Platform Architecture: MPP and SMP
Teradata Architecture: MPP and SMP
Key Differentiators
Accenture Confidential
2
Introduction
Platform Architecture: MPP and SMP
Teradata Architecture: MPP and SMP
Key Differentiators
Accenture Confidential
3
Introduction:
Purpose and Intended Audience
• The purpose of this deck is to familiarize Accenture practitioners with
the Teradata Relational Database System (RDBMS)
• We tend to focus on Teradata’s unique architecture or features,
occasionally contrasting them with Oracle, a much more familiar
reference for most readers
• The reader need not be totally technical to benefit by reading this: we
have attempted to provide high-level overview material vs. deep (and
sometimes boring) details
• Finally, illustrations are provided to add clarity to certain concepts
where it makes sense
Accenture Confidential
4
Introduction:
Unique Teradata Attributes
• Teradata is unique among commercial RDBMSs in a number of ways.
(If that weren’t true, there probably would be no need for this deck)
• Key Teradata differentiators:
– It is implemented on Massively Parallel Processing (MPP) hardware
architecture – and always has been
– It was implemented on proprietary hardware with portions of the database
imbedded in the hardware/firmware (although this is no longer the case)
– The database software is unconditionally parallel
– It is linearly scalable, with hundreds of reference sites exceeding a
terabyte (1,000 gigabytes) in size
– It virtually “owns” the Very Large Database (VLDB) market space – and
has for twelve years
• Some of the above points are discussed in the “Key Differentiators”
section
Accenture Confidential
5
Introduction
Platform Architecture: MPP and SMP
Teradata Architecture: MPP and SMP
Key Differentiators
Accenture Confidential
6
Teradata Platform Architecture:
Uni-processors, SMPs and MPPs
• Computers –can be broadly categorized into one of three hardware
architectures:
– Uni-processor
• The desktop PC is the example
• Generally applied to client, not server, applications
• Not further discussed further in this paper
– Symmetric Multi Processing (SMP)
• A single computing system with multiple processing units, often
microprocessors
– Massively Parallel Processing (MPP)
• A collection of computing systems – usually SMPs – that are interconnected
and that collaborate to solve a common task(s)
• While there are significant differences between these architectures, the
application programming model is essentially unchanged among them:
the platform software deals with the hardware differences
Accenture Confidential
7
Teradata Platform Architecture:
A Closer Look at SMP Hardware
• Typical SMP hardware architectures have:
– Two to eight, up to as many as 64 processors
• Smaller SMPs often have Compaq or Intel motherboards and run MS Windows
• Larger SMPs are typically RISC machines running UNIX
– Sun, H-P and IBM dominate this space
– All of the processors run from a common, shared memory and they all
access that memory via a common, shared memory bus
– All of the processors share the I/O slots, channels and associated
peripherals devices, notable disk storage subsystems
• SMP examples:
– Low-end:
• Compaq ProLiant DL series (2-4 CPUs, desk side)
• NCR’s Model 4455 or similar (1-4 CPUs, desk side)
– Midrange: HP’s NetServer 6000 series (4-6 CPUs, rack mount)
– High-end: Sun’s Enterprise 10000 (16-64 CPUs, free standing)
– Nearly all IBM and compatible mainframes
Accenture Confidential
8
Teradata Platform Architecture:
SMP Hardware
SMP hardware: 4 CPUs (in blue) with shared memory and I/O subsystems
Memory
Memory Bus
I/O Bus
Peripheral Devices
Accenture Confidential
9
Teradata Platform Architecture:
SMP Hardware Scalability
• Scalability options for SMPs include:
–
–
–
–
–
Larger memories
Faster CPUs
More CPUs
More I/O (slots and busses)
More peripherals (usually disk arrays)
• Scalability limitations for SMPs:
– Every shared hardware subsystem is a potential bottleneck for an SMP
– The most common limiter to SMP scalability is the memory subsystem
• Each CPU must access the single memory via a common bus
• As the number of CPUs increases, there is added contention for memory
accesses – CPUs begin waiting on the memory subsystem
• Eventually, a point of diminishing returns is reached, where the added expense
of additional CPUs fails to provide a commensurate increase in performance
Accenture Confidential
10
Teradata Platform Architecture:
An Introduction to MPP Hardware
• Massively Parallel Processing (MPP) hardware systems consist of from
two to perhaps hundreds of SMP systems called “nodes”
– Just like a stand alone SMP, each node has its own memory and I/O
subsystems as well as its own copy of the operating system and
application(s)
– The nodes are interconnected via a dedicated, very high-speed, often
proprietary interconnect network
– Most MPP systems run under UNIX, though a few MPP Teradata
installations run under Windows 2000
• MPP examples:
– IBM’s pSeries (formerly RS6000) and IBM’s “Deep Blue,” their chessplaying machine that defeated Grand Master Gary Kasparov in May, 1997
– The NCR 5250 or 5255 (among other NCR servers), which has never
played chess and probably never will
Accenture Confidential
11
Platform Architecture:
A 2-node MPP System
MPP hardware showing 2 nodes, their disk arrays and the interconnect
Interconnect Network
MPP
Node 0
MPP
Node 1
Disk
Array 0
Disk
Array 1
Accenture Confidential
12
Teradata Platform Architecture:
More on MPP Hardware
• MPP hardware architectures are often called “shared nothing” or
“loosely-coupled” systems, since the nodes – the basic MPP building
blocks – share no computing hardware
• The network that interconnects the nodes enables them to
communicate and cooperate to solve a problem
– Exactly how the interconnect is used depends entirely on the application(s)
running in the system
• So, why bother with the complexity of MPP hardware?
– One word: Scalability: The ability to add processing nodes without “hitting
the wall” before reaching a desired level of performance
Accenture Confidential
13
Platform Architecture:
Teradata’s MPP Hardware
• For Teradata’s MPP hardware, each node is:
–
–
–
–
–
Made by Solectron to NCR’s design and specifications
Powered by a 4-CPU Intel Xeon board
Connected to all the other nodes via NCR’s BYNET interconnect
Connected via SCSI to its own disk array(s)
Optionally connected to the disk array(s) of another node in the complex
for fault tolerance purposes (more on this later)
Accenture Confidential
14
Platform Architecture:
Teradata’s MPP Hardware
NCR MPP hardware showing 4 nodes, their disk arrays and the BYNET interconnect
BYNET BYNET
Interconnect
Node 0
Disk Array 0
Accenture Confidential
Node 1
Disk Array 1
Node 2
Disk Array 2
Node 3
Disk Array 3
15
Platform Architecture:
Teradata’s BYNETtm Interconnect
• NCR’s node interconnect subsystem is called the BYNET
• The BYNET is fully scalable:
– When you add a node, you add bandwidth with it, so that the total
bandwidth available scales as the MPP complex grows
– Early Teradata machines did not have a scalable interconnect (YNET)
• The network architecture is “Folded Banyan”
– All nodes are directly connected to all other nodes
• There are always two BYNETs for redundancy purposes
• The BYNET hardware is an ordinary PCI card designed by NCR
• The BYNET is fast:
– 120 megabytes per second per node per BYNET in each direction
• It’s patented by and proprietary to NCR
Accenture Confidential
16
Platform Architecture:
BYNET Node-to-Node Connections
• Ever node has a dedicated bi-directional channel to every other node
• This architecture is duplicated – there are really 2 channels (one shown)
Point-to-Point Messaging
Accenture Confidential
Broadcast Messaging
17
Platform Architecture:
Teradata “Cliques”
• A Teradata clique provides high availability, and is a configuration option
• A clique is a group of nodes – 4 are shown below – that can access a common
chunk of disk array storage
• Cliques eliminate any single point of failure
BYNET Interconnect
Four nodes
Shared SCSI
Sharable disk
Accenture Confidential
18
Platform Architecture:
Why have Cliques?
• Cliques add high availability via automatic failure detection and
software re-configuration in the event of a hardware failure(s)
BYNET Interconnect
Interconnect
Accenture Confidential
19
Platform Architecture:
MPP Hardware Illustration
• Below is a medium size Teradata MPP system:
–
–
–
–
–
16 nodes, each with their own busses, memory and back plane
8 cliques of 2 nodes
8 disk arrays, one for each clique
2 BYNETs, because there are always two BYNETs
Total BYNET bandwidth is (2 x 2 x 16 x 120) = 7.68 GB/sec!
...
Accenture Confidential
20
Platform Architecture:
MPP Operating System Software
• Operating System software
– For MPP Teradata, the choices are the same as for SMP:
• NCR’s version of UNIX: MP-RAS, or
• Windows 2000
– For both OS options:
• The BYNET device driver is an ordinary (UNIX or Windows) one
• Teradata doesn’t use the native file system for performance reasons; all the
Teradata database structures are managed by Teradata within raw disk
Accenture Confidential
21
Introduction
Platform Architecture: MPP and SMP
Teradata Architecture: MPP and SMP
Key Differentiators
Accenture Confidential
22
Teradata Architecture:
Software “Units of Parallelism”
• Teradata software components are known as “Virtual Processors” or
VPROCs
– VPROCs are software threads or processes
• There are two kinds of VPROCs:
– Access Module Processors (AMPs)
• An AMP reads, writes and manipulates all database rows in the partition that
the AMP “owns”
– Parsing Engines (PEs)
• PE parse SQL statements, reducing them to their component executable steps
• The number of VPROCs is configurable
• VPROCs are in every Teradata node
• VPROCs can migrate around the complex, as in the case of a failed
node
• VPROCS provide parallelism within a node
Accenture Confidential
23
Teradata Architecture:
MPP Platform with AMPs and PEs
• Four-node MPP system showing Virtual Processors – AMPs and PEs –
in each node
BYNET BYNET
Interconnect
VPROCS
AMP & PE
VPROCS
AMP & PE
VPROCS
AMP & PE
Node 0
“w” AMPs
“w” partitions
Node 1
“x” AMPs
“x” partitions
Node 2
“y” AMPs
“y” partitions
Accenture Confidential
VPROCS
AMP & PE
Node 3
“z” AMPs
“z” partitions
24
Teradata Architecture:
Data Partitioning Explained
• Data is automatically distributed to all AMPs – and thus to all disks –
via a proprietary hashing algorithm
– No partitioning or re-partitioning ever required
• File system architecture is fundamentally different
– Rows stored in blocks
– Space allocation is entirely dynamic
• Absolutely minimal DBA effort required
– No reorgs, repartitioning, space management, index rebuilds
– Minimal monitoring required
Accenture Confidential
25
Teradata Architecture:
Data Partitioning Illustrated
• The rows of each table are automatically and unconditionally
distributed to all AMPs (and all available disk storage)
– This enables Teradata’s automatic and unconditional parallelism
AMP1 Disk
AMP2 Disk
AMP3 Disk
AMP4 Disk
SYSTEM TABLES
CUSTOMER
ORDERS
LINEITEM
PART
SUPPLIERS
Accenture Confidential
26
Teradata Architecture:
Data Partitioning Explained
• Let’s take a simple case:
– A four-node, eight AMP Teradata MPP system
– A single database table of 100,000 rows
• The system will configure itself with two AMPs in each node
• Then, via hashing the Unique Primary Index, it will distribute all rows to
all AMPs – giving each AMP about 12,500 rows – and each node
25,000 rows
• This is the ideal “flat” distribution across all the system, and will occur if
the primary key is essentially random – like SSN
• In all processing, each node has to deal with only 1/4 of the total
database
– The name of the game is simple: “Divide and conquer”
Accenture Confidential
27
Teradata Architecture:
SMP Hardware
• In an SMP architecture, Teradata looks much the same as an ordinary
database such as Oracle:
– A single SMP processor does it all
– A single software image can access the entire database
Accenture Confidential
28
RDBMS Architecture:
Teradata on SMP Hardware
• On SMP hardware architecture, Teradata runs on:
–
–
–
–
Windows 2000
Intel microprocessors
The above combination is often called “Wintel”
Almost all Wintel boxes use Compaq or Intel processor boards, typically
populated with Pentium III or Pentium 4 CPUs
– NCR’s SMP machines on either Windows or UNIX (MP-RAS)
• The latter configuration – SMP/UNIX is often used as a low-cost test platform
for a production MPP system under MP-RAS
• Examples of Teradata SMP platforms:
–
–
–
–
–
IBM
HP
Compaq
Dell
NCR (but only rarely, probably due to cost or client standards)
Accenture Confidential
29
Teradata Architecture:
SMP Hardware and Disk Array
SMP hardware showing 4 CPUs and disk array
SMP Box
SCSI Interconnect
(dual paths shown)
Disk Array
Accenture Confidential
30
Introduction
Historical Perspective
Platform Architecture: MPP and SMP
Teradata Architecture: MPP and SMP
Key Differentiators
Accenture Confidential
31
Key Differentiators
•
•
•
•
Ubiquitous, persistent parallelism
Unrelenting partitioning
A really, really mature query optimizer
The above yield the ability to handle very complex queries, large
complex databases and lots of concurrent users doing lots of
different stuff
• Truly linear scalability
• Mainframe connection via direct FIPS-60 channel connect
– ESCON or “Bus and Tag” media
Accenture Confidential
32
Scalable, Parallel, High Availability
MPP Hardware
•
•
•
A group of 1-4 nodes with connections to each other’s storage --- keeps applications
running when node(s) fail
All critical components have redundant backups
Nodes have (optional) LAN/WAN/Mainframe connectivity
Server Management
BYNET
MPP Interconnect
SMP Processing Nodes
Point-to-Point
SCSI or
FibreChannel
Interconnect
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Data
Cache
Memory
Memory
Data
Cache
Memory
Memory
Data
Cache
Memory
Memory
DA Controllers
(w/Cache)
DA Controllers
(w/Cache)
DA Controllers
(w/Cache)
Data
Cache
Memory
Memory
DA Controllers
(w/Cache)
LSI Logic or EMC2
Disk Arrays
Accenture Confidential
33
Shared Nothing Software
Architecture
• Basis of Teradata parallelism and scalability
– Divide the work evenly among many processing units
– No single point of vulnerability or chokepoint for any operation
Accenture Confidential
34
Teradata Data Distribution
• Automatic, Always On
• Rows are distributed evenly by hash partitioning
Table A Table B Table C
– Define the row, we’ll do the rest
– Regardless of queries or demographics
• Shared nothing software
Primary Index
Teradata Parallel Hash Function
VAMP1
VAMP2
VAMP3
P
P
P
M
Accenture Confidential
D
M
D
M
VAMP4 ………………………………………………………VAMPn
P
D
M
P
D
M
P
D
M
P
D
M
P
D
M
P
D
M
D
35
Key Data Warehousing Capabilities
• Technology
–
–
–
–
–
Fully automatic space management
Automatic data distribution
Always-On, Automatic, Integral, Multi-Level Parallelism
Continually Improved Cost Based Optimizer
Full ANSI SQL functionality, complex query optimization
Accenture Confidential
36
Hash Distribution
• Data automatically distributed to AMPs via hashing
• Even distribution results in scalable performance
• Hash map defined and maintained by the system
– 2**32 hash codes, 64K buckets distributed to AMPs
• Prime Index (PI) column(s) are hashed
• Hash is always the same - for the same values
• No partitioning or repartitioning required
VPROCs
AMP & PE
Accenture Confidential
VPROCs
AMP & PE
14
3
16
2
54
1
41
7
21
33
73
18
87
94
53
61
75
23
37
Shared Nothing Software
• Delivers linear scalability
–
–
–
–
Maximizes utilization of SMP resources
To any size configuration
Allows flexible configurations
Incremental upgrades
VPROCs
Amps
Accenture Confidential
VPROCs VPROCs
Amps
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs VPROCs
Amps
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs VPROCs
Amps
Amps
VPROCs
Amps
VPROCs
Amps
VPROCs VPROCs
Amps
Amps
VPROCs
Amps
38
A Shared Nothing Database Architecture Enables
Expansion with Balance
• Amount of parallelism grows at the same rate as the system expands
• Each parallel unit does an equal amount of work
Work Accomplished
Unit of
Hardware
Power
Hardware Scalability
Unit of
Hardware
Power
Unit of
Hardware
Power
Unit of
Hardware
Power
Software Scalability
= Unit of Parallelism
Accenture Confidential
39
Optimizer - Parallelization
• Cost based optimizer
– Parallel aware
•
•
•
•
Rewrites built-in and cost based
Parallelism is automatic
Parallelism is unconditional
Each query step fully parallelized
Accenture Confidential
40
Shared Everything vs. Nothing
Shared Everything
Shared Nothing
Database Architecture
Database Architecture
• A single database buffer used by all UoPs
• A single logical data store accessed by all
• Each UoP is assigned a data portion
• Query Controller ships functions to UoPs
UoPs
• Scalability limited due to control
bottlenecks and scalability of single SMP
platform
that own the data
• Locks, buffers, etc., not shared
• Highly scalable data volumes
Buffers, Locks, Control Blocks
Data
Data
Partition
- Unit of Parallelism
Accenture Confidential
Data
Partition
Data
Partition
Data
Partition
Q/A
Thank You
Accenture Confidential
42