Transcript Document
Management of Encrypted Databases
--Homomorphic Encryption and Order-Preserving Indexing
Dongxi Liu
CSIRO ICT Centre
[email protected]
Outline
• Overview of Encrypted Databases
• Encrypted databases in our system
• Our techniques
• Homomorphic encryption scheme
• Order-preserving indexing
• Translation of SQL queries
• Related works
• The advantages of our method
Databases – an appealing target to attack
• Databases are widely used in information systems.
• E.g., Web applications
• Information in organizations is usually gathered in their
databases
• An attacker could get a large amount of information if a
database is compromised
• An example in Australia:
• http://thenextweb.com/au/2012/08/15/hackers-grab500000-credit-card-details-australian-retailer-disasterwaiting-happen/
Database Security by Encryption
• Obviously, encryption can be used to improve database
security
• even if the databases are outsourced to untrusted servers, or
• the database servers are compromised.
• A fragment of an encrypted table in our system
A plain table
The table after encryption
Management of Encrypted Databases
• However, encryption should not hamper the normal
functionality of databases (e.g., performing queries)
• Decrypting databases for answering queries is not acceptable
• Ideally, queries should be executed directly over
encrypted databases
• Our method for encrypted database management is designed
for this purpose
The Architecture of Managing Encrypted
Databases
• The query proxy mediates communication
between applications and encrypted
databases
• Our threat model
• databases might be deployed on an untrusted
or compromised machine.
• the proxy and applications are in the trusted
domain.
SQL Queries and Cryptographic Techniques
• Databases interact with applications mainly through
SQL queries
• Different types of queries
• Equality query
• “select a staff whose id is 10”
• Secure hash or deterministic encryption
• Range query
• “select staffs whose incomes are in between 1000 and 2000”
• Order-preserving encryption or indexing
• Aggregate query (e.g., SUM and AVG)
• “select the income sum of staffs”
• Homomorphic encryption
• Their combinations
• “select the income average of staffs who join the company from
year 2000 to year 2010”
Features of SQL Operations
• Addition on a large number of records
• Multiplication on a few fixed number of table attributes
• The bound of an aggregate query result is hard to
determine for a long-standing databases
• A query example
• Select SUM(Rate*Hours) from Satff
Name
Address
Peter
1 Vimiera Rd 70.0
36.32
Tom
2 Pembroke
Rd
40.12
…
Rate
53.2
Hours
Homomorphic Encryption – Overview (1)
• Described in the Australian Provisional Patent
2012902653
• Let K(n) be the key and v a value to encrypt
• Enc(K(n), v) = (c1, …, cn)
• Dec(K(n), (c1, …, cn)) = v
• Additively homomorphic
• Enc(K(n), v) = (c1, …, cn)
• Enc(K(n), v’) = (c1’, …, cn’)
• Dec(K(n), (c1+c1’, …, cn+cn’)) = v + v’
• Used when calculating sum
Homomorphic Encryption – Overview (2)
• Multiplicatively homomorphic
• Enc(K(n), v) = (c1, …, cn)
• Dec(K(n), (h*c1, …, h*cn)) = h*v, where h can be a real number
• Used when calculating average
• Enc(K(m), v’) = (c1’, …, cm’)
• K(n) and K(m) can be two different keys
• The multiplication of ciphertexts is their outer product
(c1*c1’, …, cn*c1’
c1*c2’, …, cn*c2’
…
c1*cm’, …, cn*cm’)
• Steps of Dec(K(n), K(m), (c1*c1’,…, cn*cm’))
• Dec(K(n), (c1*ci’, …, cn*ci’ )) obtains v*ci’ (1≤ i ≤m)
• Dec(K(n), (v*c1’, …, v*cm’)) returns v*v’.
Homomorphic Encryption – Instance 1
• Key Generation
• the key K(n) is a list of tuples of real numbers, [(k1, s1, t1), …, (kn, sn, tn)],
where
• n ≥ 3, ki ≠ 0 and ti ≠ 0 (1 ≤ i ≤ n-1),
and kn+sn+ tn ≠ 0.
• Encryption
• Decryption
• Variant: more s components in subciphertexts.
Homomorphic Encryption - Correctness 1
• Simple proof
• S = cn/(kn+tn+sn) = rn-1
• I = cn-1 – S*sn-1 = cn-1 – rn-1*sn-1= kn-1*tn-1*(r1+…+rn-2)
• For i from 1 to n-2, we have
• ci – S*si = ci – rn-1*si= ki*ti*v + ti*ri
• (ci – S*si )/(L*ti) = ( ki*ti*vi + ti*ri)/(L*ti) = (ki*v + ri)/L
• The sum of (ki*v + ri)/L from 1 to n-2 is
• ((k1+…+kn-2)*v+r1+…+rn-2) /L= (L*v+r1+…+rn-2) /L
• I/(L*kn-1*tn-1) = kn-1*tn-1*(r1+…+rn-2)/(L*kn-1*tn-1)= (r1+…+rn-2)/L
• Finally, (L*v+r1+…+rn-2) /L - (r1+…+rn-2)/L = v
Homomorphic Encryption - Instance 2
• Key Generation
• the key K(n) is a list of tuples of real numbers, [(k1, s1, t1), …, (kn, sn,
tn)], where
• n ≥ 3, ki ≠ 0 and ti ≠ 0 (1 ≤ i ≤ n-1),
, and kn+sn+ tn ≠ 0.
• Encryption
• Decryption
• Variant: more s components in subciphertexts.
Homomorphic Encryption - Correctness 2
• Simple proof
• S = cn/(kn+tn+sn) = rn
• c1 – S*s1 = c1 – rn*s1= k1*t1*v + k1*(r1-rn-1)
• (c1 – S*s1)/(T*k1) = (t1*v + (r1-rn-1))/T, denoted by N1
• For i from 2 to n-1, we have
• ci – S*si = ci – rn*si= ki*ti*v + ki *(ri-ri-1)
• (ci – S*si )/(T*ki) = (ki*ti*v + ki *(ri-ri-1))/(T*ki) = (ti*v + (ri –ri-1))/T,
denoted by Ni
• The sum of Ni from 1 to n-1 is
• ((t1+…+tn-1)*v+r1-rn-1+r2-r1…+rn-1-rn-2) /T= (T*v) /T=v
Security Analysis – IND-CPA
Instance 1
• cn and cn-1 is indistinguishable for every pair of values
• Since they are not dependent on plaintexts
• For ci (1≤i≤n-2) , we require ti *ki < si, v < rn-1 and ki *vi < ri
• ci is dominated by the noises
• Bigger noises means less probability to distinguish two
ciphertexts
• Maybe the hardness of LWE problem can also be used
to prove the security.
Conversion to a Public Key Scheme
• Based on the subset sum problem, our scheme can be
converted into a public key scheme.
• The private key is the same K(n).
• The public key is two sets of ciphertexts
• The encryptions of zero
• The encryptions of one
• To encrypt v,
•
•
•
•
•
Step 1: Choose randomly an encryption of one
Step 2: Multiply each subciphertexts in this encryption with v
Step 3: Choose randomly a subset of zero encryptions
Step 4: Add all zero encryptions in this subset
Step 5: Add the ciphertexts in Step 2 and Step 4
Performance of Instance 1
• Encrypt the value 7985746234523.12 and decrypt the
resulting ciphertext 200,000 times.
• Key and noise configuration:
•
•
•
•
4 subciphertexs
Each ki or ti has 4 digits (ddd.d)
Each si has 8 digits (ddddddd.d)
Each ri has have 6 digits (ddddd.d)
• Key space size
• The product of the space sizes of L, S, s1, t1, s2, t2, s3, t3, k3.
• 104*108*(108*104*108*104*108*104*104)=1052, bigger than the key space
size of AES-128.
• Time on a Dell Latitude E4310 laptop in Eclipse
• 422ms (HomoEnc: Java code uses data type double)
• 8359ms (HomoEnc: Java code uses BigDecimal to represent
value)
• 859ms (AES-128 in SunJCE with ECB/PKCS5Padding)
Order-Preserving Encryption (OPE)
• If v1< v2, then OPE(k,v1) < OPE(k,v2)
• Recall that OPE is used to process range queires
• R. Agrawal et al. “Order Preserving Encryption for
Numeric Data” (SIGMOD 2004)
• Alexandra Boldyreva, et al. “ Order-Preserving
Symmetric Encryption”. (EUROCRYPT 2009)
Our Work: Order-Preserving Indexing (OPI)
• If v1< v2, then OPI(k,v1) < OPI(k,v2)
• Unlike OPE, it is not required to recover plaintexts from
indexes.
• Used together with the existing encryption schemes (e.g., AES
or Homomorphic schemes) to manage encrypted databases
• benefiting from the advance of existing encryption schemes.
• Sensitivity of plaintexts
• The absolute minimum between two consecutive plaintext
values.
• For example, the integer values can have sensitivity 1; the
salary of “ddd.dd” (d is a digit from 0 to 9) have the sensitivity
0.01.
• Configured in query proxy.
The OPI Scheme: basic form
• a*f(x)*x+b+noise
• a, b and parameters in f are kept secret
• a >0
• noise is sampled from the range [0, a*f(x+sens)*(x+sens)a*f(x)*(x)), where sens is the sensitivity of plaintexts;
• f(x) > 0 for x ≠ 0;
• f(x1) ≥ f(x2) for x1 > x2 ≥ 0 or x1 < x2 ≤ 0.
• Denoted by nindexsens[a,b,f](x)
The OPI Scheme: instances of f(x)
• f(x) = |x|;
• f(x) = logc(d + e * |x|),
• where c > 1, d > 1 and e > 0.
• f(x) = c * |x|/π + d * cos (|x|% π + π) + e,
• where d > 0, c ≥ 2 * d, e ≥ d, and _ and % are the floor and
modulo operators, respectively.
• Composition of f(x):
• logc(d + e * | g * |x|/ π + h * cos (|x|% + π) + i |),
• where c > 1, d > 1, e > 0, h > 0, g ≥ 2 * h, i ≥ h.
The OPI Scheme: An example
• Input values: -10 to 10 with the sensitivity 1
• the indexing expression: 1600*log7(10+18*|x|)*x+317+noise
The Indexing Scheme: programmability
• Different plaintexts can be indexed with different
expressions
• Separate the distribution of plaintexts and ciphertexts
• Make the indexes more robust
• More secrets
• The indexing program is not necessarily public, though the syntax is public
The Indexing Scheme: a program
The Indexing Scheme: a program
Distribution of Ciphertexts
Distribution of plaintext values
Security of OPI
• Like OPE, OPI is also vulnerable to Plaintext Chosen
Attacks (CPA)
• Since the order among ciphertexts reals information of plaintexts and
the attack can be done by binary search
• In our system, the proxy is placed in trusted domain to
prevent CPAs.
• We are still not clear if an attack happens to know some
pairs of plaintexts and their indexes, what is the
probability of obtaining plaintexts from other indexes.
• But complex indexing programs can make it harder for performing such
attacks
Encrypted Database Management
• Different table structures
• Designed by application developers
• E.g., a Staff table, with a Salary attribute
• Created by the Query Proxy on the database services
• E.g., a table with its name (Staff) hashed, and its attributes with
hashed names
• If an attribute does not support aggregate operations,
its values can be encrypted with other schemes, like
AES.
• If an attribute does not support range queries, its values
do not need to be indexed such as a Boolean attribute.
Creation of Databases and Tables
• Statements from applications
• Statements from the proxy
Data Insertion
• Statements from applications
• Statements from the proxy
• where Enc(K(n),v) = (c1,…,cn)
SQL Query (1)
• Statements from applications
• Statements from the proxy
• cond’ is the translation of cond
• Translation of conditions:
• col_name < c Hash(k, col_name+“RngIdx”) < Index(c,0).
• col_name = c Hash(k, col_name+“EqIdx”) = Hash(k, c).
• col_name > c
Hash(k, col_name+“RngIdx”) >= Index(c+sens,0).
SQL Query (2)
• Statements from applications
• Statements from the proxy
SQL Query (3)
• Statements from applications
• Statements from the proxy
• One query might be translated into several ones
• Another example: “select staffs whose salaries grater than average”
Unsupported Queries
• The conditions operating on several columns cannot be
supported
• E.g., the condition colnm1*colnm2+ colnm3 > colnm4*colnm5
• Our solution: by designing or adjusting table
structures, almost all types of queries can be
processed
• E.g., we can add two columns in a table storing the OPI indexes of
colnm1*colnm2+ colnm3 and colnm4*colnm5, respectively.
• Need more work
Performance of Querying Encrypted DB
• The key configuration
•
•
•
•
4 subciphertexs
Each ki or ti has 4 digits (ddd.d)
Each si has 8 digits (ddddddd.d)
Each ri has have 6 digits (ddddd.d)
• The indexing expression
• The table
person(id int, name varchar(64), gender varchar(8), birthdate
bigint, income numeric(10,2))
Performance of Insertion
Performance of Query
• select * from person where income > min and income <
max
Performance of Aggregate Query
• select SUM(income) from person where income > min
and income < max
Related Works
• Comparisons from the following perspectives
• The native operations in DBMSs, such as SUM and AVG,
should be used to support the operations on encrypted data.
• The existing DBMSs should be used.
• The database servers should not necessarily own the
encryption keys.
• The maximum sum of values in one table column should not be
predetermined.
• The maximum number of values should not be required.
Related Works – Use of Native DBMS
Operations
• R. A. Popa, et al. “CryptDB: protecting confidentiality
with encrypted query processing” . In ACM SOSP ’11.
• From MIT
• In this work, to calculate sum of values, they calculate
the multiplication of encrypted data due to the use of
Paillier’s homomorphic encryption system
(EUROCRYPT 1999)
• Multiplication is implemented as user-defined functions in particular
DBMSs.
• Multiplication is slower than addition.
• Multiplication generates big values than addition.
Related Works – Existing DBMSs (1)
• T. Ge and S. Zdonik. “Answering aggregation queries in
a secure system model”. VLDB 2007.
• From Brown University
• The values in multiple records are encrypted into one
ciphertexts
•
•
•
•
E.g., Salaries of four people are encrypted together .
Their encrypted databases are not in the relational model.
DBMSs need to be changed to process queries.
Databases are hard to update.
Related Works – Existing DBMSs (2)
• Craig Gentry: Fully homomorphic encryption using
ideal lattices. ACM STOC 2009.
• From Stanford University and IBM Watson
• Homomorphic operations must be mixed with a
“ciphertext refresh step” to reduce noises in the
ciphertexts
• Since the somewhat homomorphic encryption scheme can only support
a limited number of additions and multiplications
• If Gentry’s idea is used, it means that the DBMSs must
be changed to take into account “ciphertext refersh”
when they executing the operations like SUM
Related Works – Location of keys
• In Oracle Database, AES can be used to encrypt some
columns
• Encryptions are transparent to users
• That is, keys must be accessed by database server.
• If the server is compromised, the key might be stolen, too.
• AES is not a homomorphic encryption scheme.
Related Works – Maximum Sum of Values
• Zvika Brakerski, et al. “Fully Homomorphic Encryption
from Ring-LWE and Security for Key Dependent
Messages”. Crypto 2011.
• From Weizmann Institute of Science
• Craig Gentry: Fully homomorphic encryption using
ideal lattices. ACM STOC 2009.
• From Stanford University and IBM Watson
• Other homomorphic encryption schemes
• Their correctness asks the sum of one table column not
greater than the modulus (for addition)
• That is, the maximum sum of plaintexts needs to be
predetermined to use these schemes.
• Big modulus leads to big ciphertexts.
• The maximum sum is hard to determine for long-standing
databases.
Related Works – Maximum number of
Values
• R. Agrawal, et al. “Order preserving encryption for
numeric data”. ACM SIGMOD 2004.
• From IBM Research
• Their OPE scheme needs to know the total number of
plaintexts (or the values in a table column)
• This number might change for a table column after a longperiod of time
Some Ideas for Collboration
• Security Analysis
• Both homomorphic encrtyption and order-preserving
indexing schemes
• Application of homomorphic encryption in big data
processing platform
• Dealing with scalability of multiplication
• Appling homomorphic encryption to secure outsourced computing
to clouds (e.g., data mining algorithm)
Conclusion
• Our techniques have the better usability for practical
applications.
• Homomorphic encryption scheme
• Order-preserving index scheme
• Translation of Query rewriting
• A demo system is publically available
• http://150.229.2.229/familySys/home
Thanks!