Relational Semantic Hiding Databases (RSHDB)

Download Report

Transcript Relational Semantic Hiding Databases (RSHDB)

Protecting data privacy and integrity in clouds
By Jyh-haw Yeh
Computer Science
Boise state University
Cloud Computing
 Cloud computing paradigm provides a new concept of
IT management.
 Business purchases IT services from Clouds
 Cost saving
 Unlimited computing power
 Charged by usage
 More secure?
 Better resource utilization, thus green computing
Cloud Computing
 Cloud computing also has some known problems
 Trust issues
 Data privacy and integrity
 Non-transparency of data locations
 Liability issue
Outsourcing Databases
 Database-as-a-service is an emerging service starts to
appear in cloud industry.
 Clients has the flexibility to design an application as a
database that is suitable for their business.
 Outsource the database to clouds.
 Clouds is able to execute queries over the database upon
client’s requests.
 Clouds (may not be trusted) have the total control of
data.
 Data privacy/integrity is a big concern .
Encrypted Databases
 An extreme approach to protect data privacy:
 Encrypt the whole database and then outsource the
encrypted database to clouds.
 This approach works if a practical fully homomorphic
encryption (FHE) algorithm exists.
 FHE: arithmetic, rational comparisons can be applied
directly to ciphers.
 No practical and efficient FHE exists.
RSHDB
 RSHDB (relational semantic hiding databases) is a
proposed database system that is able to hide
semantics from DBAs.
 Suitable for business to outsource their business
applications as a RSHDB instance to Clouds.
 Enable the DBAs or DBMS in clouds to operate on the
RSHDB databases without knowing private business
information.
RSHDB: Idea of Hiding Semantics
 Idea of semantic hiding in RSHDB:
 An XYZ company has a PAYROLL database, in which a
record in a table EMPLOYEE shows that John Smith
SALARY is 63,000.
 An ? company has a ? database, in which a record in a
table ? shows that ? ? is 63,000.
RSHDB: Basic Operations
 Basic database operations:
 Arithmetic: add or multiply numeric data.
 Equality test: test the equality of two data items.
 Rational comparison: decide A> B or A < B.
 Substring matching: decide whether a string A is a
substring in another string B
 Other database operations: sorting, searching,
aggregate functions, set operations are
extension/combination of basic operations.
RSHDB: Data Types
 Data types:
 NC-type: Numeric with Comparison only.
 NCA-type: Numeric with both Comparison and
Arithmetic.
 SC-type: String with Comparison only.
 SCS-type: String with both Comparison and Substring
matching.
RSHDB: Design Goal
 Partially encrypts the database so that the cloud is able
to execute queries over encrypted data.
 Encrypt enough information (but not all) to hide
semantics from data operators.
 Minimize the impacts for the DBMS, the SQL, the
hosting clouds, and the clients.
RSHDB: Encryption Strategy
 Use a secure deterministic encryption for all semantic
telling information: database, table, attribute names.
 String type data is also semantic telling: always
encrypted.
 SC-type: order-preserved encryption (less secure)
 SCS-type:


char-by-char (less secure) order-preserved encryption.
word-by-word order-preserved encryption.
RSHDB: Encryption Strategy
 Numeric data itself reveal less semantics.
 NC-type: order-preserved encryption.

Example: bdate data
 NCA-type: no practical homomorphic encryption
available for this type of data.



Leave the data in clear
Homomorphic encoding (not too much help for security)
Example: salary data
Impacts
 The DBMS: Need to be semantic hiding aware
 The SQL: New data types for DDL
 The hosting clouds:
 More storage space for encrypted data.
 Install semantic hiding aware DBMS
 The clients: Install an query API:
 Perform encryption
 Convert SQL query to semantic hiding query
 Perform decryption
 Return the result to the clients
Example Database
EMPLOYEE
NAME
SSN
John Smith
123456789
Frank Wong
DEPT_NO
JOB _TYPE
BDATE
1
Manager
1966-05-04 83,000
333445555
3
Staff
1985-07-26
48,000
Joey English
453453453
2
Engineer
1978-10-03
72,000
Joe Johnson
999887777
2
Engineer
1982-03-29
70,500
DEPARTMENT
DEPT_NAME
DEPT_NO
LOCATION
Headquarter
1
Houston
Research
2
Boise
Finance
3
Houston
SALARY
Example Database
T
A1
A2
A3
A4
A5
A6
X11
X12
25,300
X14
2,418,241,992
83,000
X21
X22
75,900
X24
2,441,639,298
48,000
X31
X32
50,600
X34
2,437,900,467
72,000
X41
X42
50,600
X44
2,433,063,369
70,500
R
B1
B2
B3
Y11
25,300
Y13
Y21
50,600
Y23
Y31
75,900
Y33
Semantic Hiding Query (SHQ)
 The sensitive information or data is encrypted in SHQ.
 To make a query to a RSHDB, the SQL query must be a
SHQ.
 Example
 Retrieve the name and salary of each employee in
‘Research’ department whose salary is more than
$50,000, sort the report in ascending order of names.
SHQ Example
select EMPLOYEE.NAME, EMPLOYEE.SALARY
from EMPLOYEE, DEPARTMENT
where EMPLOYEE.DEPT_NO = DEPARTMENT.DEPT_NO
AND DEPT_NAME = ‘Research’ AND
EMPLOYEE.SALARY > 50000
asc
EMPLOYEE.NAME;
--------------------------------------------------------------------------select
T.A1, T.A6
from
T, R
where
T.A3 = R.B2 AND R.B1 = Y21 AND T.A6 > 50000
asc
T.A1;
SHQ Result
T.A1
T.A6
X41
70,500
X31
72,000
Query API decrypts the result and return to the clients
EMPLOYEE.NAME
EMPLOYEE.SALARY
Joe Johnson
70,500
Joey English
72,000
Research Issues
 Storage requirement.
 Is order-preserved encryption secure enough?
 More secure encryption + order-preserved hashing?
 Guessing the semantics from the range and format of
NCA-type data in clear.
 Adding noises?
 RSHDB’s DBMS has a weaker domain constraint
enforcement.
 All encrypted data are in type of bit-string
Research Issues
 Char-by-char versus word-by-word encryption for
SCS-type data.
 Flexibility, security and space.
 Who should develop the query API?
 Performance downgrade:
 Implementation and simulation
 Real world databases and queries
Future Work
 Designing algorithms for data integrity protection for
outsourced database.
 Completeness
 Non-forgery
 Freshness
 Adding data integrity protection to RSHDB is
challenging.