Tolerating Byzantine Faults in Transaction Processing

Download Report

Transcript Tolerating Byzantine Faults in Transaction Processing

Tolerating Byzantine Faults
in Database Systems
using Commit Barrier Scheduling
Ben Vandiver,
Hari Balakrishnan, Barbara Liskov,
and Sam Madden
CSAIL, MIT
Sponsors: Quanta Computer Inc, NSF
Non-crash faults in Databases
 Over 50% of reported bugs were non-
crash faults
Bug Category
DB2
2/03-8/06
Oracle
7/06-11/06
MySQL
8/06-11/06
DBMS Crash
120
21
60
Non-Crash Faults
131
28
64
 Incorrect answers, data or index
corruption, etc.
 Previous focus on fail-stop faults
 Better model: Byzantine faults
Failure Independence
 Heterogeneous replicas
 Different implementations / versions
 Easiest with non-invasive solution
 Requires standard interface
 SQL is moderately standard
Client Interaction
 Organized into Transactions
 Query, Query, …, Commit / Rollback
 Interactive
 Strong consistency
 Single-copy serializable
Database Functionality
 Each Database provides
 Serializable isolation
 Strict (rigorous) 2-phase locking
 Databases don’t execute in issue-order
Issue
S1
S2
Replica 1 S1
executes
S2
Replica 2 S2
executes
S1
 Limited control over execution order
Replica Coordination
 BFT well known solution
 3f+1 replicas
 Globally order client requests
 Replicas execute in order
 Exhibits no concurrency
 Goal: mechanism to extract
concurrency in database context
Architecture
Client
Client
Client
Shepherd
SQL
DB1
SQL
DB2
SQL
DB3
Architecture
Client
Client
Client
Result
SQL
Vote
Shepherd
Result
SQL
Result
SQL
?
SQL
DB1
DB2
DB3
Need
f+1
matching
votes
How to extract concurrency?
 Just issue statements to replicas
 Likely to get stuck
 Solution: pre-determine which
statements conflict
 Inspecting SQL is very hard
Commit Barrier Scheduling
 Primary / Secondary Scheme
 Run transactions first on the primary
 Duplicate primary’s ordering on the
secondaries
 Works best when primary is Sufficiently
Blocking
 Required for performance, not correctness
Commit Barrier Scheduling
Client
Client
DB
Primary
DB
?
SQL
SQL
Shepherd
Result
Result
SQL
Client
DB
Correct Execution
 Statement Ordering Rule
 Execute statements of transaction in order
 Commit Ordering Rule
 All replicas commit transactions in the
same order
 Order determined by Shepherd
Execution Trace on Primary
T1
T2
SX
SY
C
SZ C
Time
Extracting Conflict Info
T1
T2
SX
C
SY
Don’t Conflict!
SZ C
Avoiding Conflicts
T1
T2
SX
SY
C
SZ C
Might Conflict!
Transaction-Ordering Rule: A query from transaction T2
that was executed by the primary after the COMMIT of
transaction T1 can be sent to a secondary only after it
has processed all queries of T1.
Commit Barrier Scheduling
 Maintain barrier for each replica
 Mark statements and transactions with
barriers
 Issue statements and commits when
replica’s barrier reaches appropriate
value
 Simple to implement
Analysis of CBS:
Non-faulty primary
 Full concurrency on the Primary
 Deadlocks detected and resolved locally
 Ample concurrency on Secondaries
 allows many statements to run in parallel
 Secondaries hardly ever block
 Latency increase
Early Return
Client
Client
Client
Next SQL Stmt
Shepherd
SQL
DB
Primary
SQL
Result
Pipelined
Execution!
DB
DB
Early Return Analysis
 Cut latency in half
 Must vote at Commit
 Sent wrong answer, abort the transaction
 Correctness Condition
 Clients receive correct answers for all
transactions that commit
Masking Faults
 Faulty Secondary not a problem
 Voting resolves wrong answers
 Faulty Primary is a problem
 Generates invalid schedule
 Goal: correct execution
Faulty Primary Scenario
T1 , T2 – Increment A by 1, return A
A initially 0, should end up 2
Faulty Primary Replica R1 Replica R2
T1: A = 1
T1: A = 1
T1: waiting
T2: A = 1
T2: waiting T2: A = 1
f+1 matching votes for both answers!
Other Issues
 Mechanics
 Replica Repair
 Shepherd crashes
 Heterogeneity & SQL
Implementation
 Prototype called HRDB
 Implemented in Java
 About 3500 semicolon-lines of code
 JDBC interface to clients and databases
 Works with MySQL, DB2, Derby, and
SQLServer
Performance
Transactions per second
Overhead test using TPC-C
300
MySQL
PassThrough
CBS
Serial
250
200
17%
150
100
50
0
0
10
20
30
40
Clients
50
60
70
80
Heterogeneous Replication
 Ran 2f+1=3 replica system,
heterogeneous vendors
 MySQL, DB2, Commerical DB X
 Sufficiently Blocking holds in practice
 System runs at slowest of f+1 fastest
replicas, or primary
Fail-Stop Faults
2,500
Transactions Completed
Whole HRDB System
Faulty Replica
2,000
1,500
1,000
Replica Recovered
500
Replica Stops
Replica Restarts
0
50
70
90
110
Time in Seconds
130
150
Bugs and HRDB
 Successfully masked bugs
 Heterogeneous vendors & heterogeneous
versions
 Found a new bug in MySQL
 While running TPC-C
 Present since October 2001
 Patched in recent release
 Starting to look for bugs actively with
HRDB
Conclusion
 First practical Byzantine Fault Tolerant
Database
 Failure independence by supporting
heterogeneous replicas
 Novel concurrency extraction scheme
 Tool for finding new bugs in databases
Backup Slides
Snapshot Isolation
 Allows read-after-write hazards
 Converts fail-stop to Byzantine faults
 Need write-sets to implement
 Scheme called Snapshot Barrier
Scheduling
Implement with Barriers
B=0
T1
SW
T2
T3
B=1
B=3
C
SX
SJ
B=2
SY
SK
SZ
C
C
Primary
• S – Annotate with current barrier upon completion
• C – Increment barrier before issue
Secondary
• S – Issue when replica barrier is at least the value of the annotation
• C – Increment replica barrier after completion
Heterogeneity Issues
 Non-determinism in answers
 Result set ordering
 Non-deterministic functions in queries
 Database-assigned row IDs
 Query Rewriting
 SQL incompatibility
 Translation Engine
 SQL hiding – Views and Stored Procedures
Future Work
 Replicating the Shepherd
 Efficient Replica Repair
 Finding Bugs
Replica Recovery
 Replicas
 Fail-stop crashes – Shepherd replays missing
transactions
 Uses transaction log table in database to discover
which transactions to replay
 Byzantine faults – Shepherd repairs faulty
state, then replays
 Efficient repair mechanism under development
 Shepherd
 Fail-stop crashes - Maintains a write-ahead log
Faulty Primary
 Wrong answers result in transaction abort
 Concurrency Faults
 Can result in secondaries being unable to make
progress
 System is back to “Correct but Slow” solution
 Same case as when primary is not sufficiently
blocking
 Can be hard to tell if primary is faulty
 Replace primary by doing a view change