slides17 - University of California, Irvine

Download Report

Transcript slides17 - University of California, Irvine

ICS 214B: Transaction Processing and Distributed Data
Management
Lecture 17: Providing Database as a Service
Professor Chen Li
Based on slides developed by
Hakan Hacigumus, Bala Iyer, and Sharad Mehrotra
ICDE 2002, San Jose, CA, USA
Talk Outline

Software as a Service

Database as a Service


Challenges for Database as a Service



User Interface Issues
Performance Issues
Data Privacy Issues


NetDB2 System
Data Encryption in DBMSs for Data Privacy
Conclusion
ICS214B
Notes 17
2
Software as a Service

Get …



Pay …


what you need
when you need
what you use
Don’t worry …

how to deploy, implement, maintain, upgrade
ICS214B
Notes 17
3
Software as a Service

Driving forces to paradigm shift

Faster, cheaper, more accessible networks
Rise of distributed architectures
Virtualization in server and storage technologies
Established e-business infrastructures

Hardware/Software is not the largest in total cost of ownership







User Operations
Technical Support
Capital Cost (HW/SW)
46%
24%
21%
(Source: Gartner Group)
Hardware, software, network costs have been decreasing more sharply
than personnel cost
ICS214B
Notes 17
4
Software as a Service

Already in the market as


storage services, disaster recovery services, e-mail services, renta-spreadsheet services etc.
Sun ONE, Oracle Online Services, Microsoft .NET My Services etc.
Why not Database as a Service ?
ICS214B
Notes 17
5
Database as a Service - Why?


Organizations need data management
DBMSs are complex systems to deploy, setup, maintain

requires highly skilled people (DBAs etc.) with high cost
Most Significant DB Execution Problems
Ease of Administration
58%
Qualified Administrators
57%
Compatibility
51%
Qualified Programmers
51%
Platform Independence
40%
0
ICS214B
10
20
30
40
50
60
70
% of respondents (Source: InfoWeek Research)
Notes 17
6
Database as a Service - Offerings

Inherits all advantages of software as a service, plus …

Service provider allows mechanisms to


DB management transferred to service provider for


create, store, access databases
backup, administration, restoration, space management, upgrades
Clients use the services providers HW, SW, personnel instead of
their own
ICS214B
Notes 17
7
NetDB2 - Database Service Provision


Developed in collaboration with University of California, Irvine
and IBM
Deployed on the Internet over a year ago


Been used by 15 universities and more than 2500 students to help
teaching database classes
Currently offered through IBM Scholars Program
ICS214B
Notes 17
8
NetDB2 System Architecture
Internet
User
HTTP Server Servlet Engine
(Web Browser)
Database
(User Data)
Warm Standby
Standby System





Backup/Recovery
Three tier architecture
Client - as thin as possible - just a browser
Java based implementation
Backed by fail-over solutions
Allows expansions and user driven integration for application development
ICS214B
Notes 17
9
Database as a Service - Issues
Issues to address:

User Interface
Performance

Data Privacy

ICS214B
Notes 17
10
User Interface
2

Simple yet powerful

1
4

Consistent


3
Notes 17
Region-based
composition
Expansion/Integration

ICS214B
supports SQL queries,
scripts, UDFs, stored
procedures, metadata,
data upload
User defined interfaces
11
Performance


Interaction in a different medium - network
Performance should -at least- match what we have already
Experimented with TPC-H database and queries
Performance DB2 vs NetDB2
Performance Ratio

1.4
1.2
1
0.8
0.6
0.4
0.2
0
1.2
1.01
DB2
NetDB2
0.1
ICS214B
1.08
1
Scale Factor
Notes 17
10
12
Data Privacy

Users give control of their data to service provider
Attacks on stored data is a well known problem
So, they need data security in place

Security of data over the network is well studied




SSL,TSL
Establish security for stored data

even it is stolen should not make sense
ICS214B
Notes 17
 Encryption !
13
Encryption Alternatives

myEncryption{
asdas dadsasd asdd;
asdas dad;
asfsfffsddas dadsasd asdd;
asdas dad asdd;
asdas dadsasd asdd;
asdadsasd asdd;
}
Implementation Level

Software v.s. Hardware encryption
?

Granularity of Data



Field (Attribute) level
Row (Record) level
(Disk) Page level
ID
ID
NAME
NAME
DEPTID
DEPTID
SALARY
SALARY
$Sfsdf@_))#$dw?~$@33<?.%*##!@<<&&=+
20 Fg4$$xX@<+John White
White
40000
20
John
22
40000
<?.%d(*##!@%&*((@
<<&&=+sFDdw?~$@33
41 %25>LWe?#@
Linda Cone
Cone
90000
%&*((@sFDdw?~$@33<?.%d(*##!@<<&&=+
41
Linda
33
90000
iiiiiiiiiiiiiiiiiiiiiiiii
0000000000
iiiiiiiiiiiiiiiiiiiiiiiii
0000000000
iiiiiiiiiiiiiiiiiiiiiiiii
0000000000
iiiiiiiiiiiiiiiiiiiiiiiii
0000000000
iiiiiiiiiiiiiiiiiiii
00000000
ICS214B
?~$<&&=+@33<?.%
%&*((@sFDdwd(*##!@<
43 2We??#@$&&
Bob Drake
Drake
85000
43
Bob
33
85000
%&*((@sFDdw?
@<<&&=+~$@33<?
((@sFD
50 Dadsf$&%!Aq
Sarah Brown
Brown
95000
50
Sarah
77
95000
Notes 17
14
Encryption Alternatives (2)

Field level encryption

Pros:




Easier to implement and integrate
Flexible
Allows selective encryption, reduces number of bytes to
encrypt/decrypt
Cons:



ICS214B
Increases encryption overhead significantly due to invocation cost
Data size expansion (for block cipher algorithms)
Current optimization technologies do not handle foreign functions well
Notes 17
15
Encryption Alternatives (3)

Row level encryption

Pros:




Reduces the data size expansion problem
Reduces invocation cost
Better security because of total encryption
Cons:


ICS214B
Does not allow selective encryption, increases the number of bytes to
encrypt/decrypt
Implementation and integration can be hard when row functions are
not supported
Notes 17
16
Encryption Alternatives (4)

Page level encryption

Pros:




Significantly reduces encryption/decryption overhead due to reduced
invocation cost
Eliminates data size expansion problem (for block ciphers)
Better security because of total encryption
Cons:



ICS214B
Implementation and integration is not straightforward
Increases the number of bytes to encrypt/decrypt each time
Higher update/delete cost, requires re-encryption of all affected pages
Notes 17
17
Encryption Alternatives Experiments

Experimented with TPC-H database and queries
Data Granularity
Implementation
Field Level
Row Level
Page Level
Software Encryption
V
×
×
Hardware Encryption
×
V
V
Encryption scheme alternatives (V: evaluated, ×: not evaluated)
ICS214B
Notes 17
18
Software - Field Level Encryption




Block Cipher Algorithm - Blowfish
Implemented as foreign function (UDF)
Sample insert
insert into lineitem (discount) values
(encrypt(10,key));
Sample select
select decrypt(discount,key) from lineitem
where custid = 300;
ICS214B
Notes 17
19
Software - Field Level Encryption (2)

Creator supplies the key

Unauthorized person can not get hold of the key


protection even from the service provider at some level
User can easily implement different encryption algorithm and
check that into the system

different encryption algorithm/key can be used for different fields
ICS214B
Notes 17
20
Software - Field Level Encryption (3)
Performance Ratio
NetDB2 vs NetDB2* with encryption
3.5
3
2.5
2
1.5
1
0.5
0
NetDB2
NetDB2* with
encryption
0.1
1
10
Scale Factor

TPC-H queries, except Q#1
* Only one field (l_discount of lineitem table) encrypted

Introduced very large overhead

ICS214B
Notes 17
21
TPC-H Query # 1

Problem: Multiple decryption on same field
select
l_returnflag, l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from tpcd.lineitem
where l_shipdate <= date ('1998-12-01') - 90 day
group by l_returnflag, l_linestatus
order by l_returnflag, l_linestatus;
ICS214B
Notes 17
22
Query Rewrite to Improve
Performance


Problem: Multiple decryption on same field (e.g., TPC-H Q#1)
CSE based algorithm to eliminate redundant decryptions
Use temporary view
Improvement due to rewrite
Response Time
Improvement Ratio

4
3.5
3
2.5
2
1.5
1
0.5
0
0.1
ICS214B
1
10
Scale Factor
Notes 17
23
Hardware - Row Level Encryption


Specialized hardware IBM S/390 Cryptographic Coprocessor
under IBM OS/390
“editproc” facility
 invoked for “whole row”

upon read/write request, encrypt/decrypt is invoked from
hardware for the row
ICS214B
Notes 17
24
SW Field Level v.s. HW Row Level
Query Response Time
Software vs Hardware Encryption
30
00
00
75
00
0
18
00
0
44
00
12
00
40
0
10
0
SW
HW
Number of Rows



Experimented on TPC-H Q#1
Software Field Level: Only one field is encrypted
Hardware Row Level: All fields are encrypted
ICS214B
Notes 17
25
Hardware - Page Level Encryption
Encryption Alternatives
Relative CPU Time
5
4.3
4
3
2
1.3
1
0
No
Row Level Page Level
Encryption


Page level encryption is simulated
It gives significant improvement due to reduction in start-up cost
ICS214B
Notes 17
26
Conclusion

Database as a Service is a new model to alleviates the need to



hire professionals
purchase expensive hardware/software
deal with administrative and maintenance tasks

It is viable model and can emerge as a successful offering

Encryption is a solution for privacy -the most important issue


Hardware encryption has a clear superiority over software
Hardware makes encryption practical for databases
There are trade-offs for granularity of data
ICS214B
Notes 17
27