www4.ncsu.edu

Download Report

Transcript www4.ncsu.edu

Computer Science
Integrity Assurance for Outsourced Databases
without DBMS Modification
DBSec 2014
Wei Wei, Ting Yu
1
Overview
 Motivation
o Database outsourcing is a cost-effective solution
o Integrity for Outsourced Databases
 It has been an active research area in decades
 Existing solutions requires modifying DBMSs
 No existing cloud database services support integrity checking
 Our Focus
o Provide integrity assurance for outsourced databases without DBMS
modification
 Basic Idea
o Build a Merkle Hash Tree based Authenticated Data Structure per table
o De-serialize authentication data into tables with well-designed format
 Support highly efficient authentication data retrieval
Computer Science
2
Database Outsourcing Model
•Data Owner
id
col1
…
coln
0
Alice
…
1000
10
Ben
…
2000
…
…
…
…
70
Smith
…
4500
•Update Data and
Authentication Data
•Clients
•Send Data Queries and
Authentication Data Queries
•Upload Data and
Authentication Data
•Database Service
Provider (DSP)
•Query Results Including Data
and Authentication Data
Computer Science
3
System Model
 Assumptions
o DSPs are not fully trusted by data owners and clients
o The data owner has a public/private key pair, and public key is known
to all
o The data owner is the only party who can update data
o Public communications are through a secure channel
 Attacks from a DSP
o Return incorrect data by tampering some data
o Return incomplete data result by discarding some data
o Report that data doesn’t exist or return old data
Computer Science
4
System Model cont’d
 Goal
o Provide integrity assurance for outsourced databases without DBMS
modification
 Design Goals
o Security (Integrity)
 Correctness, completeness, freshness
o Practicability
 Simplicity, flexibility, efficiency
Computer Science
5
Running Example
id
col1
col2
col3
col4
…
coln
0
Alice
F
20
NC
…
1000
10
Ben
M
30
NY
…
2000
20
Cary
F
42
CA
…
1500
30
Lisa
F
15
CA
…
3000
40
Kate
F
18
NY
…
2300
50
Mike
M
24
SC
…
4000
60
Nancy
F
36
VA
…
2300
70
Smith
M
12
TA
…
4500
•A Relational Data Table
Computer Science
6
System Design




Authentication Data Structure
Identify Authentication Data
Store Authentication Data
Extract Authentication Data
Computer Science
7
Authentication Data Structure
 Signature Aggregation based ADS
 Merkle Hash Tree based ADS
•…
id
col1
…
coln
0
Alice
…
1000
10
Ben
…
2000
20
Cary
…
1500
30
Lisa
…
3000
40
Kate
…
2300
50
Mike
…
4000
60
Nancy
…
2300
70
Smith
…
4500
•ki
•pi
•hi=H(h1|…|hf)
•20
•10
•0
•…
•40
•30
•10
•20
•50
•30
40
•60
50
60
70
•Merkle B-tree
•Data Table
Computer Science
8
Identify Authentication Data
 Existing Approaches
o Adjacency list
 Multiple steps to find ancestor, no order of pointers or records in a node
o Path enumeration
 No order of pointers or records in a node, inefficient string operation
o Nested set
 Require joining two tables to find parent node, hard to find siblings
o Closure table
 Consume more storage, no order of pointers or records in a node
 Our Approach
o Radix-Path Identifier
 Combine Radix-based labeling and Dewey labeling
Computer Science
9
Identify Authentication Data
0
1
2
20
00
01
10
10
0
000
11
20
30
10
010
40
20
100
50
30
110
21
40
200
22
60
50
210
60
220
70
221
•Radix-Path Identifier
Computer Science
10
Identify Authentication Data
0
1
20
0
1
4
10
0
0
40
5
8
9
30
10
4
2
20
16
50
30
20
40
32
10
60
50
36
60
40
70
41
•Radix-Path Identifier (rb = 4)
Computer Science
11
Identify Authentication Data
 Radix-Path Identifier Properties
1.
2.
3.
4.
Identifiers in a node are continuous, but not continuous between
sibling nodes
=
Min =
Max =
Easy to find the index in a node, which is
Computer Science
12
Store Authentication Data
 SAT: Single Authentication Table
data_auth (max level - 2)
id
rpid
hash
level
id
rpid
hash
level
-1
0
TvJtus
2
60
10
Udew
1
20
1
asdwS
2
0
0
nudg
0
40
2
DFsQ
2
10
4
Q9ej
0
-1
0
Kjdaw
1
20
16
wVi2
0
10
1
Ujrw
1
30
20
kidDs
0
-1
4
JHds
1
40
32
Kdie*
0
30
5
iueDs
1
50
36
8dFes
0
-1
8
Jdiw.
1
60
40
Iurw
0
50
9
.dkaw
1
70
41
KJdw
0
Computer Science
13
Store Authentication Data
 SAT: Single Authentication Table
o Pros
 Simple and straightforward
o Cons
 Index is built based on all records (inefficient queries)
 Updates could be inefficient
 Concurrent updates may cost more resources
Computer Science
14
Store Authentication Data
 LBAT: Level-based Authentication Table
•data_auth0 (Level 2) •data_auth1 (Level 1)
•data (Level 0)
id
rpid
hash
id
rpid
hash
id
col1
…
coln
rpid
hash
-1
0
TvJtus
-1
0
Kjdaw
0
Alice
…
1000
0
nudg
20
1
asdwS
10
1
Ujrw
10
Ben
…
2000
4
Q9ej
40
2
DFsQ
-1
4
JHds
20
Cary
…
1500
16
wVi2
30
5
iueDs
30
Lisa
…
3000
20
kidDs
-1
8
Jdiw.
40
Kate
…
2300
32
Kdie*
50
9
.dkaw
50
Mike
…
4000
36
8dFes
60
10
Udew
60
Nancy
…
2300
40
Iurw
70
Smith
…
4500
41
KJdw
•data_mapping
level
table
2
data_auth0
1
data_auth1
0
data
Computer Science
15
Store Authentication Data
 LBAT: Level-based Authentication Table
o Pros
 Indexes are more efficient
 Updates could be more efficient (root split)
 Enable concurrent updates with table level lock
o Cons
 Multiple authentication tables
 Relatively complicated for authentication data generation
Computer Science
16
Extract Authentication Data
•Retrieval of Authentication Data
20
10
0
40
30
10
20
50
30
40
60
50
60
70
•Authentication Data for 50
Computer Science
17
Extract Authentication Data
 SingleJoin
•select l1.rpid,l1.hash from data t0
data l1 on l1.rpid/4
= t0.rpid/(4)
•left join data_auth0
data_auth1
l1 on l1.rpid/4
= t0.rpid/(4*4*4)
t0.rpid/(4*4)
•where t0.id=50;
•data_auth0 (Level 0) •data_auth1 (Level 1)
•data (Level 2)
id
rpid
hash
id
rpid
hash
id
col1
…
coln
rpid
hash
-1
0
TvJtus
-1
0
Kjdaw
0
Alice
…
1000
0
nudg
20
1
asdwS
10
1
Ujrw
10
Ben
…
2000
4
Q9ej
40
2
DFsQ
-1
4
JHds
20
Cary
…
1500
16
wVi2
30
5
iueDs
30
Lisa
…
3000
20
kidDs
-1
8
Jdiw.
40
Kate
…
2300
32
Kdie*
50
9
.dkaw
50
Mike
…
4000
36
8dFes
60
10
Udew
60
Nancy
…
2300
40
Iurw
70
Smith
…
4500
41
KJdw
Computer Science
18
Extract Authentication Data
 RangeCondition
•-- find the rpid of the data record with the id 50
•declare @rowrpid AS int;
•set @rowrpid=(select top 1 rpid from data where id=50);
•-- level 2, 1, 0 (from leaf level to root level)
•select rpid,hash from data
•where rpid>=(@rowrpid/(4))*4 and rpid<(@rowrpid/(4))*4+4;
•select rpid,hash from data_auth1
•where rpid>=(@rowrpid/(4*4))*4 and rpid<(@rowrpid/(4*4))*4+4;
•select rpid,hash from data_auth0
•where rpid>=(@rowrpid/(4*4*4))*4 and rpid<(@rowrpid/(4*4*4))*4+4;
Computer Science
19
Extract Authentication Data
 SingleJoin
 RangeCondition
Computer Science
20
Data Operations
 Select
o Unique Select
o Range Select
 Update
o Single Record Update
o Batch Update and Optimization
 Insert
o Single Record Insert
o Batch Insert and Optimization
Computer Science
21
Range Select
 Steps
o Find two boundaries
o Retrieve authentication data for two boundaries
20
10
40
30
10
20
•Left Boundary
50
30
•Query Range
40
60
50
•Right Boundary
•Retrieve Authentication Data for Range Select from 15 to 45
Computer Science
22
Range Select
 Steps
o Find two boundaries
o Retrieve authentication data for two boundaries
•Retrieve Authentication Data for Range Select from 15 to 45
Computer Science
23
Single Record Update
 Steps
20
o Retrieve authentication data
o Generate update statements
40
20
40
o h updates, (h – tree height)
30
o Execute updates in one transaction
30
20
VO for 20
20
VO update for 20
•Update 20
Computer Science
24
Batch Update and Optimization
 Update x records
o x * h update statements
o Some of them update the same authentication data record
20
10
0
40
30
10
20
50
30
40
60
50
60
70
•# of updates is 4 * 3 = 12
•Actually, 8 updates are necessary
Computer Science
25
Batch Update and Optimization
 Optimization – MergeUpdate
o Track all updates to each table
o Find the set of updates to one authentication data record
o Keep the latest one and remove others
Computer Science
26
Experimental Evaluation
 System Implementation
o
o
o
o
o
Implementation based on .NET and SQL Server
Merkle B-tree based on .NET
MultiJoin, SingleJoin, ZeroJoin and RangeCondition
Query rewrite algorithm
A tool to generate authentication data
 Experiment Setup
o
o
o
o
A synthetic database containing one table with 100,000 records
Each record is about 1KB
.NET3.5, SQL Server 2008 R2, Windows OS
Client network with 30Mbps download and 4Mbps upload
Computer Science
27
Experiments – Unique Select
 Setup
o Run a select statement based on the primary key to retrieve one data
record
o Compute the overhead of our scheme when using different
authentication retrieval methods
 Results
1.
2.
RangeCondition is much more efficient
than others
The overhead of our scheme could be as
low as 5%
Computer Science
28
Experiments – Update
 Setup
o Execute batch updates for different number of rows based on different
cases
 D – Direct Update, C – Cached Update, RC – RangeCondition, MU – MergeUpdate
o Compute the overhead of our scheme
 Results
1. The overhead of Direct Update could go
up to 200%
2. The overhead of C-RC ranges from 3%
to 30%
3. The overhead of C-RC-MU is about 3%
4. The MergeUpdate effectively reduce the
number of update statements
Computer Science
29
Experiments – Scalability
 Setup
o Run a range query to select 512 rows as # of rows changes in the data
table
o Record time spent to complete the select query
 Results
1. The overhead of our scheme is about 2 ~
3% in all cases
Computer Science
30
Experiments – Comparison
 Setup
o Run queries to retrieve authentication data as # of rows in table
increases for three schemes: our scheme, OPEN-XML and DT-XML
o Record time spent to retrieve authentication data
 Results
1. Our scheme takes about 250ms in all
cases
2. Our scheme is about 100 times faster
than two XML-based schemes
Computer Science
31
Related Work
 Authentication Data Structure
o Verifiable B-tree [Pang et al. ICDE 2004]
o Embedded Merkle B-tree [Hakan et al. SIGMOD 2006]
o Signature aggregation and chaining [Narasimha et al. DASFFA 2006]
 Others
o
o
o
o
Protect data privacy [Hakan et al. SIGMOD 2002]
Only handle read-only queries [Sion et al.VLDB 2005]
Authenticated join processing [Yin et al.SIGMOD 2009]
Partially materialized digest scheme [Mouratidis et al.VLDB 2009]
 Similar to our work
o Probabilistic approaches [Xie et al.VLDB 2007]
o Tamper-Evident Database System [Gerome et al. ASIAN 2005]
o Authenticated Relational Tables [Giuseppe et al. DBSec 2007]
Computer Science
32
Conclusion
 Contributions
o Proposed a novel approach called Radix-Path Identifier to serialize/deserialize MHT-based authentication data
o Explored the efficiency of different methods to retrieve authentication
data, and optimize the update processing
o Build a proof-of-concept prototype and conduct extensive experiments
to evaluate the performance overhead and efficiency
Computer Science
33
•Questions?
•Thank you
Computer Science
34