Trafodion Technical Deck

Download Report

Transcript Trafodion Technical Deck

Trafodion
Enterprise-Class
Transactional SQL-on-HBase
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Trafodion – Introduction (Welsh for transactions)
Joint HP Labs & HP-IT project for transactional SQL database capabilities on
Hadoop
20+ years of database investment open sourced by HP on June
10th 2014!
Transactional
Hadoo
SQL
+
p
Complete: Full-function SQL
Scalable: Elastic scale for high concurrency
Reuse existing SQL skills and improve developer
productivity
Provides elastic scalability as number of users / data grows
Protected: Distributed ACID transactions
Highly Available: For enterprise
applications
Data consistency across multiple rows, tables, SQL
statements
Leverages HBase / Hadoop replication
Efficient: Low-latency R/W transactions
Optimized for real-time transaction processing applications
Interoperable: Standard ODBC/JDBC
access
Works with existing tools and applications
Data federation: Trafodion/HBase/Hive
2
Open: Hadoop and Linux distribution
neutral
Easy to add to existing infrastructure with no vendor lock-in
Eco-system: Leverages large Hadoop ecosystem
Can use any tool or database accessing Hadoop
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Hadoop workload profiles
Operational
Interactive
• Real-time
analytics
• Parameterized
reports
• Drilldown
visualization
• Exploration
• Transactiona
l SQL =
OLTP +
interactions
Sub-second
• Incremental batch
processing
• Dashboards,
scorecards
Batch
• Operational batch
processing
• Enterprise
reports
• Data mining
Current Market Focus: Data Warehousing and Analytics
Response Time
Exposes Hadoop limitationsTransaction
Support
3
Noninteractive
• Data preparation
Data
Integrity
Hours
Real-time
Performanc
e
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Operational
Optimizations
Workload
Management
The Case for Operational SQL-on-Hadoop
•
Sector Road Map: SQL-on-Hadoop platforms in 2013 – Joseph Turian,
March 20, 2013
– An operational database offers write access, not just read access, to data. However, there are
other key features for an operational database: concurrency, interactive write speed, and
distributed transactional support (guarantees about data consistency). Currently no existing SQLon-Hadoop solution satisfies these requirements. If a strong player or two emerges in the category,
it will completely shake up the big data and database landscape.
•
5 Reasons Hadoop is Kicking Can and Taking Names – Mike Gualtieri, October
22, 2013
– #5 The future of Hadoop is real-time and transactional. The key commercial vendors are focusing
on fast SQL access, real-time streaming, and manageability features that enterprises demand. The
groundwork is being laid for an eruption in data management technologies as Hadoop sneaks its
way into the transactional database market.
•
•
4
The Future of Hadoop: What Happened & What's Possible? – Doug Cutting,
Oct 30 2013
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
–HP So
I think the prediction we can make here is that it is inevitable that we will see just about every
Operational SQL on Hadoop – Use cases
•
into open file structures
Free at
last!
Open
distributed
HDFS
structures
Accessible for reporting
HBase
Hive
& analytics&with
no
Structured
Semi- structured
Item id Description Cost Price …
TV
Type Display Size Resolution Brand Model 3D
Boo
k
ISBN Author Publish Date Format Dept
…
•
Integration of structured, semistructured, and unstructured
support
Integration of operational,
historical, & external (Big) data
along common master data for
better insights
Capture data directly
Unstructured
Image
…
Review …
SELECT all TVs WHERE Price >
2000 and Type = ‘Plasma’ and
Display Size > ‘50’ and customer
sentiment is very positive
latency
5
HP PRIVATE © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
…
…
Snapfish – Web-based photo sharing and photo printing. Members can upload files for free with
unlimited photo storage. They can share photo albums, individual photos, and various Snapfish
products via email, link URL, and other web services such as Facebook and Blogger. They can buy
personalized photo products such as prints, photo books, cards and mugs. Supports retail pickup
at Meijer, Walgreens and Walmart.
Asset Management
• Create album
• Upload / Import pictures into album
Trafodio
n
Shopping
• Print Calendars, Cards,
…
• Order prints, mugs, linen,
jewelry, cases, covers,
• Share album / project with family / friends
cards, teddy bears, …
• Create a project / photo book
6
OLTP on
Hadoop
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Versus RDBMS &
NoSQL
• High concurrency
low latency
workloads
• Limitless elastic
scale
• Very low TCO
Share pictures
Autonomy could be used to
analyze the pictures in HDFS
to automatically create tags
to be stored in HBase
PIC_ATTR
Tag pictures
INSERT into Trafodion table REL
(cust_id, rel_with_cust_id, rel-type, …)
BEGIN WORK
INSERT list of pictures shared into
Trafodion table SHARED_PIC
(pic_id, rel_with_cust_id)
END WORK
BEGIN WORK
INSERT custom tags for each tagged
picture into
HBase table PIC_ATTR as col-value pairs
END WORK
Upload pictures
Transactio
n
Versus RDBMS & NoSQL
• Rich ANSI SQL RDBMS
features
• Full ACID transactional
support
• Integration of structured, semistructured, & unstructured
data
7
Trafodio
n
Pictures loaded into HDFS by app
BEGIN WORK
INSERT list of pictures uploaded into
Trafodion table PIC
(cust_id, album_id, pic_id, pic_date,
…)
INSERT picture attributes from camera
into
HBase table PIC_ATTR as col-value
pairs
for each of the pictures using pic_id
END WORK
Order photo mug & jewelry
BEGIN WORK
INSERT into ORDER
(cust_id, order_no, order_date, order_total,
…)
INSERT into ORDER_DETAIL
all items that are part of the order
(cust_id, order_no, item_id, pic_id, qty, amt,
…)
END WORK
Search for pictures
Create album
INSERT into Trafodion table
ALBUM (cust_id, album_id,
album_name, …)
SELECT pictures taken with my “Sony
DSC-RX100M2” camera in the last 6
months from my “Travel” album with a tag
“Emma” on it.
Backend operational workloads
Order tracking, supply chain, inventory
control,
…
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject
to change
without notice.
We
b
app
Rohit, consider a blanket for
your granddaughter at 50%
off with her image imprinted
on it
Using model &
customer score /
attributes, and
recent purchase
history make
recommendation
s
Versus RDBMS & NoSQL
• Data captured in an open file system with open APIs
• Is available with no latency for reporting & analysis
• Via a huge open source & proprietary Hadoop ecosystem
8
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
• Items bought together
– market basket
analysis
• Promotion success
customer classification
•…
Vertic
a
Trafodio
n
50%
Analytics
Analytics in
Vertica to
generate
recommendation
model
BI reporting
Reporting &
Analytics via
Vertica
• Sales growth by
product, region,
demo
• Growth in
customers, pictures,
storage, …
• Growth in sharing
•…
Use case examples
Generates Revenue
Touches the Customer
Finance
Government
Telecom
• Online
financial
management
• 911
Emergency
System
• Billing
systems
• Provisioning
systems
9
Low Latency,
High
Concurrency
Manufacturing
• RFID
tracking
Energy
• Smart
Metering
ACID
Protection,
Data Integrity
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Helps Run the Business
Healthcare
• Authorization
and claims
processing
Transportation Consumer &
Retail
• Reservation
systems
• Online
shopping
Multi-Structured
Data
Comprehensive DDL, DML, TCL, and utility
•support
ANSI Core SQL 99 complaint + other SQL 99 and SQL 2003 support with Trafodion
extensions
• Full featured DDL - CREATE/DROP/ALTER statements for tables, views, indexes,
constraints
–
–
Comprehensive data type support - numeric, character, varchar, date, time, interval
Unicode encoding including UTF8, UCS2, and ISO8859-1for user data; UTF8 for metadata
• Full featured DML – SELECT, INSERT, UPDATE, DELETE, UPSERT and MERGE
statements
–
–
–
–
JOIN (INNER, LEFT/RIGHT/FULL OUTER), UNION, WHERE, GROUP BY, HAVING, ORDER BY,
SAMPLING, etc.
Correlated and nested sub-queries
Cursor support (non-holdable)
Extensive SQL function support - aggregate, date/time, character, mathematical, OLAP, sequence,
etc.
• Utilities – Update Statistics, Explain, Control Query Shape, Command Line Interface
10 HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
• Transaction Control – BEGIN WORK, COMMIT WORK, ROLLBACK WORK, SET
Leveraging HBase for scalability and
Table
Regions
Region
availability
Logical
Physical
Server Layer
F
A … …
1
F
A … …
1
F
B … …
1
F
B … …
1
F
C … …
C … …
2
F
A … …
F
C … …
1
F
2
F
A … …
2
2
F
C … …
3
F
C … …
3
F
C … …
4
F
A … …
4
F
A … …
4
F
B … …
4
F
B … …
5
F
A … …
5
F
A … …
5
F
B … …
5
F
B … …
5
F
C … …
5
F
C … …
6
F
A … …
6
F
A … …
7
F
A … …
7
F
A … …
7
F
B … …
7
F
B … …
7
F
C … …
7
F
C … …
8
F
B … …
8
F
B … …
HDFS
F
A … …
9
F
A … …
9
F
B … …
9
F
B … …
9
F
C … …
9
F
C … …
HP © Copyright 2014 Hewlett-Packard Development Company, L.P.
B C
… ….
RK Row Key
Column
CF
Family
Column
CN
Name
TS Timestamp
“F”
A, B, C
Data in different
Column Families
are stored
separately
One
version
CV Cell Value
• Regions store contiguous ranges of table
rows
HDFS
Region Server
HDFS
HDFS
Region Server
• Regions dynamically split by HBase when
they reach a configured limit i.e.
“autosharding”
• Region servers are elastically scalable
• HDFS and HBase replication provide
enhanced data availability and protection
Allows
• Fine-grained load balancing with dynamic
movement based on load
• Fast data recovery when a server or disk
fails
or is decommissioned
The information contained herein is subject to change without
notice.
HDFS
11
Client
Region Server
Region Server
9
RK A
… …
Region Server
HBase
Layout
1
Trafodion
View
Clustering
key
RK CF CN TS CV
HBase vs. Trafodion comparison
HBase
Trafodion + HBase
Data abstraction
Key and value pair
Relational schema
Physical Layout
Column family store where
row data is stored together by
cells
Same except there is a single column
family with space-saving column
encoding
Column values
Uninterpreted array of bytes
Explicitly defined and enforced data
types
ACID Guarantee
Single row atomicity
Multi- SQL statements, tables, and
rows defined as part of transaction
Language API
Get/put/delete
SQL (Trafodion invokes native HBase
API)
Row Key Index
Single (string) row key
Composite (multi-column) row key
Secondary
Indexes Not supported
Arbitrary secondary key columns
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
12
Salting of row keys
How it works
• HBase table gets created, pre-split with one
region per salt value
• A hash value column, “_SALT_”, is added as a
prefix to the row key
• Salting is transparent to SQL statements
–
–
–
Automatically computed during insert/update
statements
Predicates automatically generated where feasible
Minimal overhead for direct lookup by key value
Benefits
• Even data distributions across HBase regions
• Avoids region hotspots caused by insertion of
data in row key order
13
INSERT(s)
SELECT(s)
HBase
HBase
HBase
HBase
Region
Region
Region
Region
PART 1
PART 2
PART 3
HDFS
HDFS
HDFS
PART 4
HDFS
CREATE TABLE t(a integer not null primary key, b
integer) SALT USING 4 PARTITIONS;
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Trafodion – Software architecture (3 layers)
Client
User and ISV
Operational Applications
Driver
JDBC ODBC
Database Connectivity
CMP
SQL
Master
Compiler and Optimizer
Distributed
Transaction
Management
Storage
Engine
Workload Management
*Executor
....
*ESP
DTM
HBase
Native HBase
Tables KVS,
Columnar via
HBase API +
coprocessors
14
WMS
Future
HBase
Trafodion
Tables
Relational
Schema
HDFS
ESP
SQL Parallelism
Hive
Direct HDFS access
to Hive tables using
HCatalog
Server
HP © Copyright 2014Process
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Data Store
Integration
Optimized for varying operational workloads
Client Application
• Optimized performance and efficiency
–
–
–
–
–
OLT optimization for directed keyed
access
ESPs for parallel SQL operations
Multi-layered ESP for complex plans
Data Flow pipeline parallel architecture
Reusable Masters and ESPs for
efficiency
• Cached SQL plans eliminates
recompilation
• Sophisticated Optimizer
Type 2 and Type
4 ODBC/JDBC
driver
Ethernet
Node 1
Node 2
Master
Node n
Master
ESP
ESP
ESP
ESP
ESP
ESP
ESP
ESP
ESP
ESP
–
Leverages Equal Height Histograms
– Pushes down
• Filters e.g. row selection (start-stop key)
• Coprocessors e.g. aggregates
• Multi-Dimensional Access (MDAM)
–
Secondary index access
HBase
HDFS
Filters
Coprocesso
rs
HDFS
• HPService
persistence
(via Zookeeper)
© Copyright 2014
Hewlett-Packard Development
Company, L.P. The information contained herein is subject to change without notice.
and automatic query resubmission
15
HBase
HDFS
HDFS
HBase
HDFS
In Summary
Trafodion: an Enterprise class operational SQL-on-Hadoop DBMS
HP DBMS
• Lower cost – inexpensive storage &
servers
• Elastic scalability
• Open and distributed file system (HDFS)
• Semi-structured & unstructured support
• Schema flexibility
• Automatic data repartitioning
• High availability via replication (k-safety)
• Disaster Recovery1
• Column level access control
• Column level encryption
• Space quotas1
• Vast open source & proprietary ecosystem
• Versioning  snapshot support &
incremental data replication
• Cloud deployable
• Industry push for Hadoop Data Lake
1Select
distributions
16
© Copyright 2013 Hewlett-Packard Development Company, L.P.
K/V &
document
stores
Unstructured
analytics
Structured
Relational
DBMS
Trafodio
n
Operational
SQL on
Hadoop
Able to join Trafodion, HBase,
Hive tables in a single
statement
The information contained herein is subject to change without notice.
• Innovative database engines for OLTP,
ODS, and EDW (20+ years investment)
• Comprehensive ANSI SQL support
• Structured data support (schema)
• ACID transactional protection for multiple
rows, tables, statements, region updates
• Support for nested loop, merge, hash
joins
• Optimized execution plans via
incremental equal height histograms
• Efficient data flow architecture
• Grant/Revoke Security support
• UDFs for Complex Event processing
• Workload Management
• Enterprise class monitoring &
manageability
• Compound primary/secondary keys
• Encoding column names for
compaction
• Salting to eliminate I/O hotspots
See for yourself…
Come discover and develop on Trafodion
www.trafodion.org
17
HP PRIVATE © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank You
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.