Transcript Slide 1

the NewSQL database you’ll never outgrow
Taming the Big Data
Fire Hose
John Hugg
Sr. Software Engineer, VoltDB
Big Data Defined
 Velocity
+ Moves at very high rates (think sensor-driven systems)
+ Valuable in its temporal, high velocity state
 Volume
+ Fast-moving data creates massive historical archives
+ Valuable for mining patterns, trends and relationships
 Variety
+ Structured (logs, business transactions)
+ Semi-structured and unstructured
VoltDB
2
Example Big Data Use Cases
VoltDB
Data
Source
High-frequency
operations
Lower-frequency
operations
Capital markets
Write/index all trades,
store tick data
Show consolidated risk
across traders
Call initiation request
Real-time authorization
Fraud detection/analysis
Inbound HTTP
requests
Visitor logging, analysis,
alerting
Traffic pattern analytics
Online game
Rank scores:
• Defined intervals
• Player “bests”
Leaderboard lookups
Real-time ad trading
systems
Match form factor,
placement criteria, bid/ask
Report ad performance
from exhaust stream
Mobile device
location sensor
Location updates, QoS,
transactions
Analytics on transactions
3
Big Data and You
 Incoming data streams are different
than traditional business apps
Big Data and You
+ You need to write data quickly and
reliably, but …
 It’s not just about high speed writes
+
+
+
+
+
VoltDB
You need to validate in real-time
You need to count and aggregate
You need to analyze in real-time
You need to scale on demand
You may need to transact
4
Big Data Management Infrastructure
High Velocity
Online
gaming
Ad
serving
NewSQL




Structured data
ACID guarantees
Relational/SQL
Real-time analytics




Unstructured data
Eventual consistency
Schemaless
KV, document
High Volume
Analytic
Datastore
Sensor
data
Financial
trade
Internet
commerce
SaaS,
Web 2.0
Mobile
platforms
VoltDB
Other OLAP
data stores
NoSQL
5
Big Data Management Infrastructure
High Velocity
Online
gaming
NewSQL
High Volume
Analytic
Datastore
Ad
serving
Sensor
data
Financial
trade
Internet
commerce
SaaS,
Web 2.0
Mobile
platforms
VoltDB
Other OLAP
data stores
NoSQL
6
High Velocity
Data Management
High Velocity DBMS Requirements
 Ingest at very high speeds and rates
 Scale easily to meet growth and demand peaks
 Support integrated fault tolerance
 Support a wide range of real-time (or “near-time”)
analytics
 Integrate easily with high volume analytic datastores
VoltDB
8
High Speed Data Ingestion
 Support millions of write operations per second
at scale
 Read and write latencies below 50 milliseconds
 Provide ACID-level consistency guarantees (maybe)
 Support one or more well-known application
interfaces
+ SQL
+ Key/Value
+ Document
VoltDB
9
Scale to Meet Growth and Demand
 Scale-out on commodity hardware
 Built-in database partitioning
+ Manual sharding and/or add-on solutions are brittle, require
apps to do “heavy lifting”, and can be an operational nightmare
 Database must automatically implement defined
partitioning strategy
+ Application should “see” a single database instance
 Database should encourage scalability best practices
+ For example, replication of reference data minimizes need for
multi-partition operations
VoltDB
10
A Look Inside Partitioning
select count(*) from orders where customer_id = 5
single-partition
select count(*) from orders where product_id = 3
multi-partition
insert into orders (customer_id, order_id, product_id) values (3,303,2)
single-partition
update products set product_name = ‘spork’ where product_id = 3
multi-partition
Partition 1
VoltDB
1
1
4
101
101
401
1
2
3
knife
spoon
fork
Partition 2
2
3
2
2
5
5
201
501
502
1
2
3
knife
spoon
fork
Partition 3
1
3
2
3
6
6
201
601
601
1
2
3
knife
spoon
fork
1
1
2
table orders :
(partitioned)
customer_id (partition key)
order_id
product_id
table products : product_id
(replicated)
product_name
11
Integrated Fault Tolerance
 Database should transparently support built-in
“Tandem-style” HA
+ Users should be able to easily increase/decrease fault tolerance
levels
 Database should be easily and quickly recoverable in
the event of severe hardware failures
 Database should be able to automatically detect and
manage a variety of partition fault conditions
 Downed nodes should be “rejoinable” without the
need for service windows
VoltDB
12
Partition Detection & Recovery
Network fault protection
 Detects partition event
Server A
 Determines which side of fault to disable
Server C
 Snapshots and disables orphaned node(s)
Server B
Live node rejoin
 Allows “downed” nodes to rejoin live cluster
Server A
 Automatically re-synchs all node data
Server C
 Coordinates transactions during re-synch
Server B
VoltDB
13
Real-time Analytics
 Database should support a wide variety of high
performance reads
+ High-frequency single-partition
+ Lower-frequency multi-partition
 Common analytic queries should be optimized in the
database
+ Multi-partition aggregations, limits, etc.
 Database should accommodate a flexible range of
relational data operations
+ Particularly relevant to structured data
VoltDB
14
Integration with Analytic Datastores
 Database should offer high performance,
transactional export
 Export should allow a wide variety of common data
enrichment operations
+ Normalize and de-normalize
+ De-duplicate
+ Aggregate
 Architecture should support loosely-coupled
integrations
+ Impedance mismatches
+ Durability
VoltDB
15
VoltDB Export Data Flow
High Velocity
Database Cluster
 Loosely-coupled, asynchronous
 Queue must be durable
 Bi-directional durability
VoltDB
16
Summary
 Big Data infrastructures will usually require more than
one engine
+ High velocity engine for “fast” data
+ Analytic engine for “deep” data
 Data characteristics will often determine which high
velocity engine to use
+ NewSQL is often well-suited to structured data
+ NoSQL is often a good fit for unstructured data
 Choose solutions that suit your needs and are
designed for interoperability
VoltDB
17