Relational Semantic Hiding Databases (RSHDB)
Download
Report
Transcript Relational Semantic Hiding Databases (RSHDB)
Protecting data privacy and integrity in clouds
By Jyh-haw Yeh
Computer Science
Boise state University
Cloud Computing
Cloud computing paradigm provides a new concept of
IT management.
Business purchases IT services from Clouds
Cost saving
Unlimited computing power
Charged by usage
More secure?
Better resource utilization, thus green computing
Cloud Computing
Cloud computing also has some known problems
Trust issues
Data privacy and integrity
Non-transparency of data locations
Liability issue
Outsourcing Databases
Database-as-a-service is an emerging service starts to
appear in cloud industry.
Clients has the flexibility to design an application as a
database that is suitable for their business.
Outsource the database to clouds.
Clouds is able to execute queries over the database upon
client’s requests.
Clouds (may not be trusted) have the total control of
data.
Data privacy/integrity is a big concern .
Encrypted Databases
An extreme approach to protect data privacy:
Encrypt the whole database and then outsource the
encrypted database to clouds.
This approach works if a practical fully homomorphic
encryption (FHE) algorithm exists.
FHE: arithmetic, rational comparisons can be applied
directly to ciphers.
No practical and efficient FHE exists.
RSHDB
RSHDB (relational semantic hiding databases) is a
proposed database system that is able to hide
semantics from DBAs.
Suitable for business to outsource their business
applications as a RSHDB instance to Clouds.
Enable the DBAs or DBMS in clouds to operate on the
RSHDB databases without knowing private business
information.
RSHDB: Idea of Hiding Semantics
Idea of semantic hiding in RSHDB:
An XYZ company has a PAYROLL database, in which a
record in a table EMPLOYEE shows that John Smith
SALARY is 63,000.
An ? company has a ? database, in which a record in a
table ? shows that ? ? is 63,000.
RSHDB: Basic Operations
Basic database operations:
Arithmetic: add or multiply numeric data.
Equality test: test the equality of two data items.
Rational comparison: decide A> B or A < B.
Substring matching: decide whether a string A is a
substring in another string B
Other database operations: sorting, searching,
aggregate functions, set operations are
extension/combination of basic operations.
RSHDB: Data Types
Data types:
NC-type: Numeric with Comparison only.
NCA-type: Numeric with both Comparison and
Arithmetic.
SC-type: String with Comparison only.
SCS-type: String with both Comparison and Substring
matching.
RSHDB: Design Goal
Partially encrypts the database so that the cloud is able
to execute queries over encrypted data.
Encrypt enough information (but not all) to hide
semantics from data operators.
Minimize the impacts for the DBMS, the SQL, the
hosting clouds, and the clients.
RSHDB: Encryption Strategy
Use a secure deterministic encryption for all semantic
telling information: database, table, attribute names.
String type data is also semantic telling: always
encrypted.
SC-type: order-preserved encryption (less secure)
SCS-type:
char-by-char (less secure) order-preserved encryption.
word-by-word order-preserved encryption.
RSHDB: Encryption Strategy
Numeric data itself reveal less semantics.
NC-type: order-preserved encryption.
Example: bdate data
NCA-type: no practical homomorphic encryption
available for this type of data.
Leave the data in clear
Homomorphic encoding (not too much help for security)
Example: salary data
Impacts
The DBMS: Need to be semantic hiding aware
The SQL: New data types for DDL
The hosting clouds:
More storage space for encrypted data.
Install semantic hiding aware DBMS
The clients: Install an query API:
Perform encryption
Convert SQL query to semantic hiding query
Perform decryption
Return the result to the clients
Example Database
EMPLOYEE
NAME
SSN
John Smith
123456789
Frank Wong
DEPT_NO
JOB _TYPE
BDATE
1
Manager
1966-05-04 83,000
333445555
3
Staff
1985-07-26
48,000
Joey English
453453453
2
Engineer
1978-10-03
72,000
Joe Johnson
999887777
2
Engineer
1982-03-29
70,500
DEPARTMENT
DEPT_NAME
DEPT_NO
LOCATION
Headquarter
1
Houston
Research
2
Boise
Finance
3
Houston
SALARY
Example Database
T
A1
A2
A3
A4
A5
A6
X11
X12
25,300
X14
2,418,241,992
83,000
X21
X22
75,900
X24
2,441,639,298
48,000
X31
X32
50,600
X34
2,437,900,467
72,000
X41
X42
50,600
X44
2,433,063,369
70,500
R
B1
B2
B3
Y11
25,300
Y13
Y21
50,600
Y23
Y31
75,900
Y33
Semantic Hiding Query (SHQ)
The sensitive information or data is encrypted in SHQ.
To make a query to a RSHDB, the SQL query must be a
SHQ.
Example
Retrieve the name and salary of each employee in
‘Research’ department whose salary is more than
$50,000, sort the report in ascending order of names.
SHQ Example
select EMPLOYEE.NAME, EMPLOYEE.SALARY
from EMPLOYEE, DEPARTMENT
where EMPLOYEE.DEPT_NO = DEPARTMENT.DEPT_NO
AND DEPT_NAME = ‘Research’ AND
EMPLOYEE.SALARY > 50000
asc
EMPLOYEE.NAME;
--------------------------------------------------------------------------select
T.A1, T.A6
from
T, R
where
T.A3 = R.B2 AND R.B1 = Y21 AND T.A6 > 50000
asc
T.A1;
SHQ Result
T.A1
T.A6
X41
70,500
X31
72,000
Query API decrypts the result and return to the clients
EMPLOYEE.NAME
EMPLOYEE.SALARY
Joe Johnson
70,500
Joey English
72,000
Research Issues
Storage requirement.
Is order-preserved encryption secure enough?
More secure encryption + order-preserved hashing?
Guessing the semantics from the range and format of
NCA-type data in clear.
Adding noises?
RSHDB’s DBMS has a weaker domain constraint
enforcement.
All encrypted data are in type of bit-string
Research Issues
Char-by-char versus word-by-word encryption for
SCS-type data.
Flexibility, security and space.
Who should develop the query API?
Performance downgrade:
Implementation and simulation
Real world databases and queries
Future Work
Designing algorithms for data integrity protection for
outsourced database.
Completeness
Non-forgery
Freshness
Adding data integrity protection to RSHDB is
challenging.