Tolerating Byzantine Faults in Transaction Processing
Download
Report
Transcript Tolerating Byzantine Faults in Transaction Processing
Tolerating Byzantine Faults
in Database Systems
using Commit Barrier Scheduling
Ben Vandiver,
Hari Balakrishnan, Barbara Liskov,
and Sam Madden
CSAIL, MIT
Sponsors: Quanta Computer Inc, NSF
Non-crash faults in Databases
Over 50% of reported bugs were non-
crash faults
Bug Category
DB2
2/03-8/06
Oracle
7/06-11/06
MySQL
8/06-11/06
DBMS Crash
120
21
60
Non-Crash Faults
131
28
64
Incorrect answers, data or index
corruption, etc.
Previous focus on fail-stop faults
Better model: Byzantine faults
Failure Independence
Heterogeneous replicas
Different implementations / versions
Easiest with non-invasive solution
Requires standard interface
SQL is moderately standard
Client Interaction
Organized into Transactions
Query, Query, …, Commit / Rollback
Interactive
Strong consistency
Single-copy serializable
Database Functionality
Each Database provides
Serializable isolation
Strict (rigorous) 2-phase locking
Databases don’t execute in issue-order
Issue
S1
S2
Replica 1 S1
executes
S2
Replica 2 S2
executes
S1
Limited control over execution order
Replica Coordination
BFT well known solution
3f+1 replicas
Globally order client requests
Replicas execute in order
Exhibits no concurrency
Goal: mechanism to extract
concurrency in database context
Architecture
Client
Client
Client
Shepherd
SQL
DB1
SQL
DB2
SQL
DB3
Architecture
Client
Client
Client
Result
SQL
Vote
Shepherd
Result
SQL
Result
SQL
?
SQL
DB1
DB2
DB3
Need
f+1
matching
votes
How to extract concurrency?
Just issue statements to replicas
Likely to get stuck
Solution: pre-determine which
statements conflict
Inspecting SQL is very hard
Commit Barrier Scheduling
Primary / Secondary Scheme
Run transactions first on the primary
Duplicate primary’s ordering on the
secondaries
Works best when primary is Sufficiently
Blocking
Required for performance, not correctness
Commit Barrier Scheduling
Client
Client
DB
Primary
DB
?
SQL
SQL
Shepherd
Result
Result
SQL
Client
DB
Correct Execution
Statement Ordering Rule
Execute statements of transaction in order
Commit Ordering Rule
All replicas commit transactions in the
same order
Order determined by Shepherd
Execution Trace on Primary
T1
T2
SX
SY
C
SZ C
Time
Extracting Conflict Info
T1
T2
SX
C
SY
Don’t Conflict!
SZ C
Avoiding Conflicts
T1
T2
SX
SY
C
SZ C
Might Conflict!
Transaction-Ordering Rule: A query from transaction T2
that was executed by the primary after the COMMIT of
transaction T1 can be sent to a secondary only after it
has processed all queries of T1.
Commit Barrier Scheduling
Maintain barrier for each replica
Mark statements and transactions with
barriers
Issue statements and commits when
replica’s barrier reaches appropriate
value
Simple to implement
Analysis of CBS:
Non-faulty primary
Full concurrency on the Primary
Deadlocks detected and resolved locally
Ample concurrency on Secondaries
allows many statements to run in parallel
Secondaries hardly ever block
Latency increase
Early Return
Client
Client
Client
Next SQL Stmt
Shepherd
SQL
DB
Primary
SQL
Result
Pipelined
Execution!
DB
DB
Early Return Analysis
Cut latency in half
Must vote at Commit
Sent wrong answer, abort the transaction
Correctness Condition
Clients receive correct answers for all
transactions that commit
Masking Faults
Faulty Secondary not a problem
Voting resolves wrong answers
Faulty Primary is a problem
Generates invalid schedule
Goal: correct execution
Faulty Primary Scenario
T1 , T2 – Increment A by 1, return A
A initially 0, should end up 2
Faulty Primary Replica R1 Replica R2
T1: A = 1
T1: A = 1
T1: waiting
T2: A = 1
T2: waiting T2: A = 1
f+1 matching votes for both answers!
Other Issues
Mechanics
Replica Repair
Shepherd crashes
Heterogeneity & SQL
Implementation
Prototype called HRDB
Implemented in Java
About 3500 semicolon-lines of code
JDBC interface to clients and databases
Works with MySQL, DB2, Derby, and
SQLServer
Performance
Transactions per second
Overhead test using TPC-C
300
MySQL
PassThrough
CBS
Serial
250
200
17%
150
100
50
0
0
10
20
30
40
Clients
50
60
70
80
Heterogeneous Replication
Ran 2f+1=3 replica system,
heterogeneous vendors
MySQL, DB2, Commerical DB X
Sufficiently Blocking holds in practice
System runs at slowest of f+1 fastest
replicas, or primary
Fail-Stop Faults
2,500
Transactions Completed
Whole HRDB System
Faulty Replica
2,000
1,500
1,000
Replica Recovered
500
Replica Stops
Replica Restarts
0
50
70
90
110
Time in Seconds
130
150
Bugs and HRDB
Successfully masked bugs
Heterogeneous vendors & heterogeneous
versions
Found a new bug in MySQL
While running TPC-C
Present since October 2001
Patched in recent release
Starting to look for bugs actively with
HRDB
Conclusion
First practical Byzantine Fault Tolerant
Database
Failure independence by supporting
heterogeneous replicas
Novel concurrency extraction scheme
Tool for finding new bugs in databases
Backup Slides
Snapshot Isolation
Allows read-after-write hazards
Converts fail-stop to Byzantine faults
Need write-sets to implement
Scheme called Snapshot Barrier
Scheduling
Implement with Barriers
B=0
T1
SW
T2
T3
B=1
B=3
C
SX
SJ
B=2
SY
SK
SZ
C
C
Primary
• S – Annotate with current barrier upon completion
• C – Increment barrier before issue
Secondary
• S – Issue when replica barrier is at least the value of the annotation
• C – Increment replica barrier after completion
Heterogeneity Issues
Non-determinism in answers
Result set ordering
Non-deterministic functions in queries
Database-assigned row IDs
Query Rewriting
SQL incompatibility
Translation Engine
SQL hiding – Views and Stored Procedures
Future Work
Replicating the Shepherd
Efficient Replica Repair
Finding Bugs
Replica Recovery
Replicas
Fail-stop crashes – Shepherd replays missing
transactions
Uses transaction log table in database to discover
which transactions to replay
Byzantine faults – Shepherd repairs faulty
state, then replays
Efficient repair mechanism under development
Shepherd
Fail-stop crashes - Maintains a write-ahead log
Faulty Primary
Wrong answers result in transaction abort
Concurrency Faults
Can result in secondaries being unable to make
progress
System is back to “Correct but Slow” solution
Same case as when primary is not sufficiently
blocking
Can be hard to tell if primary is faulty
Replace primary by doing a view change