Transcript Trafodion
Trafodion
Transactional SQL-on-HBase
www.trafodion.org
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Trafodion - Introduction
Open source project to develop transactional SQL-on-HBase database engine
HBase
+
Rides the unstoppable Hadoop wave!
Transforms how companies store, process, and share big
data
Affordable performance, elastic scalability, availability
Leverages large Hadoop eco-system
Open source – Linux / Hadoop dist.
neutral
Full-function ANSI SQL with JDBC/ODBC
access
Leverages existing SQL skills, tools, & apps for productivity
Distributed ACID transaction protection
Data consistency across multiple rows, tables, SQL
statements
Targeted for operational workloads!
Apache 2.0 software license
Optimized for sub-second high concurrency workloads
Eliminates vendor lock-in and licensing fees
Data federation: Trafodion/HBase/Hive
tables
Leverages community development resources and speed
2
Transactional
SQL
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Enables multiple data model deployment with schema
flexibility
Trafodion heritage timeline
20+ years of database investment open sourced by HP on June 10th 2014!
1980 - 1989
1990 - 1999
2000 - 2009
1984
Tandem
NonStop SQL
R&D
formation
3
1989
NonStop
SQL/MP support
for large query
parallelism
2010 - 2019
2006
2015
Compaq
Neoview SQL
DBMS (EDW)
Trafodion 1.0
2014
Trafodion Open
Source project
released
1987
1999
2012
NonStop
SQL/MP (OLTP)
NonStop SQL/MX
Cascades optimizer and
data flow executor
(OLTP/ODS)
HP Internal
SeaQuest SQL on
Linux (EDW)
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Hadoop workload profiles
Operational
Interactive
• Real-time
analytics
• Parameterized
reports
• Drilldown
visualization
• Exploration
• Transactiona
l SQL =
OLTP +
interactions
Sub-second
• Incremental batch
processing
• Dashboards,
scorecards
Batch
• Operational batch
processing
• Enterprise
reports
• Data mining
Current Market Focus: Data Warehousing and Analytics
Response Time
Exposes Hadoop limitationsTransaction
Support
4
Noninteractive
• Data preparation
Data
Integrity
Hours
Real-time
Performanc
e
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Operational
Optimizations
Workload
Management
The Case for Transactional SQL-on-HBase
•
Sector Road Map: SQL-on-Hadoop platforms in 2013 – Joseph Turian,
March 20, 2013
• An operational database offers write access, not just read access, to data. However, there are
other key features for an operational database: concurrency, interactive write speed, and
distributed transactional support (guarantees about data consistency). Currently no existing SQLon-Hadoop solution satisfies these requirements. If a strong player or two emerges in the category,
it will completely shake up the big data and database landscape.
•
5 Reasons Hadoop is Kicking Can and Taking Names – Mike Gualtieri, October
22, 2013
• #5 The future of Hadoop is real-time and transactional. The key commercial vendors are focusing
on fast SQL access, real-time streaming, and manageability features that enterprises demand. The
groundwork is being laid for an eruption in data management technologies as Hadoop sneaks its
way into the transactional database market.
•
•
5
The Future of Hadoop: What Happened & What's Possible? – Doug Cutting,
Oct 30 2013
•HP So
I think
the prediction
canL.P.make
here
is that
itsubject
is inevitable
© Copyright
2014 Hewlett-Packard
Developmentwe
Company,
The information
contained
herein is
to change without that
notice. we will see just about every
kind of workload being moved to this platform – even Online Transaction Processing.
Operational SQL on Hadoop – Use cases
•
•
Structured
Semi- structured
Item id Description Cost Price …
TV
Type Display Size Resolution Brand Model 3D
Boo
k
ISBN Author Publish Date Format Dept
…
Integration of structured, semistructured, and unstructured
support
Integration of operational,
historical, & external (Big) data
along common master data for
better insights
Capture data directly
Unstructured
Image
…
Review …
into open file structures
Free at
last!
Open
distributed
HDFS
structures
Accessible for reporting
HBase
Hive
& analytics&with
no
SELECT all TVs WHERE Price >
2000 and Type = ‘Plasma’ and
Display Size > ‘50’ and customer
sentiment is very positive
latency
6
HP PRIVATE © Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
…
…
Snapfish – Web-based photo sharing and photo printing. Members can upload files for free with
unlimited photo storage. They can share photo albums, individual photos, and various Snapfish
products via email, link URL, and other web services such as Facebook and Blogger. They can buy
personalized photo products such as prints, photo books, cards and mugs. Supports retail pickup
at Meijer, Walgreens and Walmart.
Asset Management
• Create album
• Upload / Import pictures into album
Trafodio
n
Shopping
• Print Calendars, Cards,
…
• Order prints, mugs, linen,
jewelry, cases, covers,
• Share album / project with family / friends
cards, teddy bears, …
• Create a project / photo book
7
OLTP on
Hadoop
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Versus RDBMS &
NoSQL
• High concurrency
low latency
workloads
• Limitless elastic
scale
• Very low TCO
Share pictures
Autonomy could be used to
analyze the pictures in HDFS
to automatically create tags
to be stored in HBase
PIC_ATTR
Tag pictures
INSERT into Trafodion table REL
(cust_id, rel_with_cust_id, rel-type, …)
BEGIN WORK
INSERT list of pictures shared into
Trafodion table SHARED_PIC
(pic_id, rel_with_cust_id)
END WORK
BEGIN WORK
INSERT custom tags for each tagged
picture into
HBase table PIC_ATTR as col-value pairs
END WORK
Upload pictures
Transactio
n
Versus RDBMS & NoSQL
• Rich ANSI SQL RDBMS
features
• Full ACID transactional
support
• Integration of structured, semistructured, & unstructured
data
8
Trafodio
n
Pictures loaded into HDFS by app
BEGIN WORK
INSERT list of pictures uploaded into
Trafodion table PIC
(cust_id, album_id, pic_id, pic_date,
…)
INSERT picture attributes from camera
into
HBase table PIC_ATTR as col-value
pairs
for each of the pictures using pic_id
END WORK
Order photo mug & jewelry
BEGIN WORK
INSERT into ORDER
(cust_id, order_no, order_date, order_total,
…)
INSERT into ORDER_DETAIL
all items that are part of the order
(cust_id, order_no, item_id, pic_id, qty, amt,
…)
END WORK
Search for pictures
Create album
INSERT into Trafodion table
ALBUM (cust_id, album_id,
album_name, …)
SELECT pictures taken with my “Sony
DSC-RX100M2” camera in the last 6
months from my “Travel” album with a tag
“Emma” on it.
Backend operational workloads
Order tracking, supply chain, inventory
control,
…
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject
to change
without notice.
We
b
app
Rohit, consider a blanket for
your granddaughter at 50%
off with her image imprinted
on it
Using model &
customer score /
attributes, and
recent purchase
history make
recommendation
s
Versus RDBMS & NoSQL
• Data captured in an open file system with open APIs
• Is available with no latency for reporting & analysis
• Via a huge open source & proprietary Hadoop ecosystem
9
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
• Items bought together
– market basket
analysis
• Promotion success
customer classification
•…
Vertic
a
Trafodio
n
50%
Analytics
Analytics in
Vertica to
generate
recommendation
model
BI reporting
Reporting &
Analytics via
Vertica
• Sales growth by
product, region,
demo
• Growth in
customers, pictures,
storage, …
• Growth in sharing
•…
Good fit for Trafodion
Generates Revenue
Touches the Customer
Finance
Government
Telecom
• Online
financial
management
• 911
Emergency
System
• Billing
systems
• Provisioning
systems
10
Low Latency,
High
Concurrency
Manufacturing
• RFID
tracking
Energy
• Smart
Metering
ACID
Protection,
Data Integrity
Helps Run the Business
Healthcare
• Authorization
and claims
processing
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Transportation Consumer &
Retail
• Reservation
systems
• Online
shopping
Multi-Structured
Data
Comprehensive DDL, DML, TCL, security, and
•utilities
DDL: CREATE/DROP/ALTER statements for tables, views, indexes, constraints
–
•
•
•
•
11
•
Comprehensive data type support - numeric, character, varchar, date, time, interval, etc.
– Constraint enforcement - primary key, not null, check, unique, and foreign key (referential integrity)
– Java stored procedures, User Defines Functions
DML: SELECT, INSERT, UPDATE, DELETE, UPSERT and MERGE statements
– JOIN (INNER, LEFT/RIGHT/FULL OUTER), UNION, WHERE, GROUP BY, HAVING, ORDER BY,
SAMPLING, etc.
– Correlated and nested sub-queries
– Extensive SQL function and UDF support - aggregate, date/time, character, mathematical, OLAP,
etc.
Transaction Control – BEGIN WORK, COMMIT WORK, ROLLBACK WORK, SET TRANSACTION
Security – LDAP authentication/authorization, GRANT/REVOKE, roles, ANSI schemas, component
privileges
Utilities – Update Statistics, Explain, Control Query Default, Command Line Interface, Bulk Load and
Unload
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Technology Preview: JDBC T2 driver, multi-temperate data, Backup/Restore, Oracle compatibility
Leveraging HBase for scalability and
Table
Regions
Region
availability
Logical
Physical
Server Layer
F
A … …
1
F
A … …
1
F
B … …
1
F
B … …
1
F
C … …
C … …
2
F
A … …
F
C … …
1
F
2
F
A … …
2
2
F
C … …
3
F
C … …
3
F
C … …
4
F
A … …
4
F
A … …
4
F
B … …
4
F
B … …
5
F
A … …
5
F
A … …
5
F
B … …
5
F
B … …
5
F
C … …
5
F
C … …
6
F
A … …
6
F
A … …
7
F
A … …
7
F
A … …
7
F
B … …
7
F
B … …
7
F
C … …
7
F
C … …
8
F
B … …
8
F
B … …
HDFS
F
A … …
9
F
A … …
9
F
B … …
9
F
B … …
9
F
C … …
9
F
C … …
HP © Copyright 2014 Hewlett-Packard Development Company, L.P.
B C
… ….
RK Row Key
Column
CF
Family
Column
CN
Name
TS Timestamp
“F”
A, B, C
Data in different
Column Families
are stored
separately
One
version
CV Cell Value
• Regions store contiguous ranges of table
rows
HDFS
Region Server
HDFS
HDFS
Region Server
• Regions dynamically split by HBase when
they reach a configured limit i.e.
“autosharding”
• Region servers are elastically scalable
• HDFS and HBase replication provide
enhanced data availability and protection
Allows
• Fine-grained load balancing with dynamic
movement based on load
• Fast data recovery when a server or disk
fails
or is decommissioned
The information contained herein is subject to change without
notice.
HDFS
12
Client
Region Server
Region Server
9
RK A
… …
Region Server
HBase
Layout
1
Trafodion
View
Clustering
key
RK CF CN TS CV
HBase vs. Trafodion comparison
HBase
Trafodion + HBase
Data abstraction
Key and value pair
Relational schema
Physical Layout
Column family store where
row data is stored together by
cells
Same except there is a single column
family with space-saving column
encoding
Column values
Uninterpreted array of bytes
Explicitly defined and enforced data
types
ACID Guarantee
Single row atomicity
Multi- SQL statements, tables, and
rows defined as part of transaction
Language API
Get/put/delete
SQL (Trafodion invokes native HBase
API)
Row Key Index
Single (string) row key
Composite (multi-column) row key
Secondary
Indexes Not supported
Arbitrary secondary key columns
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
13
Salting of row keys
How it works
• HBase table gets created, pre-split with one
region per salt value
• A hash value column, “_SALT_”, is added as a
prefix to the row key
• Salting is transparent to SQL statements
–
–
–
Automatically computed during insert/update
statements
Predicates automatically generated where feasible
Minimal overhead for direct lookup by key value
Benefits
• Even data distributions across HBase regions
• Avoids region hotspots caused by insertion of
data in row key order
14
INSERT(s)
SELECT(s)
HBase
HBase
HBase
HBase
Region
Region
Region
Region
PART 1
PART 2
PART 3
HDFS
HDFS
HDFS
PART 4
HDFS
CREATE TABLE t(a integer not null primary key, b
integer) SALT USING 4 PARTITIONS;
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Trafodion – Software architecture (3 layers)
Client
User and ISV
Operational Applications
Driver
JDBC ODBC
Database Connectivity
CMP
SQL
Master
Compiler and Optimizer
Distributed
Transaction
Management
Storage
Engine
Workload Management
*Executor
....
*ESP
DTM
HBase
Native HBase
Tables KVS,
Columnar via
HBase API +
coprocessors
15
WMS
Future
HBase
Trafodion
Tables
Relational
Schema
HDFS
ESP
SQL Parallelism
Hive
Direct HDFS access
to Hive tables using
HCatalog
Server
HP © Copyright 2014Process
Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Data Store
Integration
Optimized for varying operational workloads
Client Application
• Optimized performance and efficiency
–
–
–
–
–
OLT optimization for directed keyed
access
ESPs for parallel SQL operations
Multi-layered ESP for complex plans
Data Flow pipeline parallel architecture
Reusable Masters and ESPs for
efficiency
• Cached SQL plans eliminates
recompilation
• Sophisticated Optimizer
Type 2 and Type
4 ODBC/JDBC
driver
Ethernet
Node 1
Node 2
Master
Node n
Master
ESP
ESP
ESP
ESP
ESP
ESP
ESP
ESP
ESP
ESP
–
Leverages Equal Height Histograms
– Pushes down
• Filters e.g. row selection (start-stop key)
• Coprocessors e.g. aggregates
• Multi-Dimensional Access (MDAM)
–
Secondary index access
HBase
HDFS
Filters
Coprocesso
rs
HDFS
• HPService
persistence
(via Zookeeper)
© Copyright 2014
Hewlett-Packard Development
Company, L.P. The information contained herein is subject to change without notice.
and automatic query resubmission
16
HBase
HDFS
HDFS
HBase
HDFS
In Summary
Trafodion: an Enterprise class operational SQL-on-Hadoop DBMS
HP DBMS
• Lower cost – inexpensive storage &
servers
• Elastic scalability
• Open and distributed file system (HDFS)
• Semi-structured & unstructured support
• Schema flexibility
• Automatic data repartitioning
• High availability via replication (k-safety)
• Disaster Recovery1
• Column level access control
• Column level encryption
• Space quotas1
• Vast open source & proprietary ecosystem
• Versioning snapshot support &
incremental data replication
• Cloud deployable
• Industry push for Hadoop Data Lake
1Select
distributions
17
© Copyright 2013 Hewlett-Packard Development Company, L.P.
K/V &
document
stores
Unstructured
analytics
Structured
Relational
DBMS
Trafodio
n
Operational
SQL on
Hadoop
Able to join Trafodion, HBase,
Hive tables in a single
statement
The information contained herein is subject to change without notice.
• Innovative database engines for OLTP,
ODS, and EDW (20+ years investment)
• Comprehensive ANSI SQL support
• Structured data support (schema)
• ACID transactional protection for multiple
rows, tables, statements, region updates
• Support for nested loop, merge, hash
joins
• Optimized execution plans via
incremental equal height histograms
• Efficient data flow architecture
• Grant/Revoke Security support
• UDFs for Complex Event processing
• Workload Management
• Enterprise class monitoring &
manageability
• Compound primary/secondary keys
• Encoding column names for
compaction
• Salting to eliminate I/O hotspots
Trafodion release momentum
This is a rolling Roadmap and is subject to change without notice.
2014
2015
October (v 0.9)
April (Private
Beta)
July (v 0.8) Open
Source
• HBase 0.98
January (v1.0)
• Performance
• Performance speedup
• Grant/Revoke
• Manageability –
repository, RMS, bulk
unload
• ANSI SQL DDL/DML • DTM recovery
• Bulk load
• DTM (ACID
protection)
• Stored
procedures
• ODBC/JDBC T4
• Constraints
• Intra-query
parallelism
• Hive & native HBase
integration
• LDAP
TP – Technology Preview
18
• UDF
April (v1.1)
• Transaction recovery
• ANSI schemas
• Privilege checking –
utilities, SPJ, UDF
• LOB data types
• Oracle compatibility
features
TP
TP
•
•
Backup/Restore
Sequence and
IDENTITY support
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
TP
TP
Building an Open Source Community
Recruiting project
contributors
Share your expertise:
Developing, fixing defects, testing,
writing, translating and more!
Modern open source
environment
Source Code Management
GitHub
Build, Test, Code Review
Gerrit, zuul, jenkins
Simple installation
Meritocrac
y
www.trafodion.org
19
HP © Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Defect Tracking
Launchpad
Documentation
MediaWiki
Q&A
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you
• www.Trafodion.org