Recent Advances in Homomorphic Encryption

Download Report

Transcript Recent Advances in Homomorphic Encryption

Recent Advances in
Homomorphic Encryption
Shai Halevi – IBM Research
February 13, 2012
Computing on Encrypted Data
• Wouldn’t it be nice to be able to…
o
o
o
Encrypt my data before sending to the cloud
While still allowing the cloud to
search/sort/edit/… this data on my behalf
Keeping the data in encrypted form
• Without shipping it back and forth to be decrypted
Computing on Encrypted Data
• Wouldn’t it be nice to be able to…
o
o
o
Encrypt my queries to the cloud
While still allowing the cloud to process them
Cloud returns encrypted answers
• that I can decrypt
Computing on Encrypted Data
$skj#hS28ksytA@ …
Computing on Encrypted Data
$kjh9*mslt@na0
&maXxjq02bflx
m^00a2nm5,A4.
pE.abxp3m58bsa
(3saM%w,snanba
nq~mD=3akm2,A
Z,ltnhde83|3mz{n
dewiunb4]gnbTa*
kjew^bwJ^mdns0
Privacy Homomorphisms
• Rivest-Adelman-Dertouzos 1978
Plaintext
space
x1 P x2
ci  Enc(xi)
*
y
Ciphertext space C
c1
c2
#
y  Dec(d)
d
• Example: RSA-encrypt(e,N)(x) = xe mod N
o
x1e  x2e = (x1  x2) e mod N
• “Somewhat Homomorphic” (SWHE):
can
compute some functions on encrypted data
“Fully Homomorphic” Encryption (FHE)
• Encryption for which we can compute
arbitrary functions on the encrypted data
Enc(x)
Eval
f
Size independent
of f ’s complexity
Enc(f(x))
• Enough to do Enc(x1x2), Enc(x1+x2)
o
Every function can be expressed as a polynomial
How To Do It?
• Open for >30 years
• First plausible construction in [Gentry’09]
o
o
SWHE with security based on hard problems in
(ideal) integer lattices
A transformation SWHE  FHE
• Requires SWHE that can evaluate its own
decryption circuit
• Several other constructions since then
A Taste of Gentry’s SWHE Scheme
• Public key is two integers p,r
• Secret key is one integer w
• Encp,r(𝑏 ∈ {0,1}):
o
Implicit
encoding of
matix, vecot
choose high-degree polynomial with small
coefficients (call it Q), output c=2Q(r)+b mod p
• Decw(c): output (c∙w mod p) mod 2
• Addition, multiplication of ctxts modulo p
(main catch: integers are huge…)
Performance
• A little slow…
• First working implementation in [GH11]
• ½ -hour to compute a single  gate
o
o
Because of homomorphic decryption
13-14 orders of magnitude slowdown
vs. computing on non-encrypted data
• The underlying SWHE is faster
o
o
Can evaluate polynomials of degree upto ~200
About ½-second for a single  gate
Performance
• A little slow…
• Butler Lampson: “I don’t think we’ll see
anyone using Gentry’s solution in our
lifetimes […]”
– Forbes, Dec-19, 2011
Rest of this talk: reasons to believe otherwise
Why is [G’09] So Slow?
implicit in
p,r,w
• Ciphertext of SWHE is large
o
Ciphertext has dimension D, bitsize N
• Ciphertext size |c| ~ D∙N
o
Ciphertext contains n-bit “noise”
• D > N∙l to get security
• N > n to be able to decrypt
o
initial noise, say
n0 = log(l)
Mult(c1,c2) adds the noise bits in the ci’s
• e.g., from n bits each to 2n-bits in the product
o
Degree-k polynomial has ≥kn0-bit noise
• Need N>kn0, D>kn0∙l, |c|>W(lk2)
Why is [G’09] So Slow?
• Ciphertext of SWHE has size |c|≥ W(lk2)
• SWHEFHE relies on homomorphic
evaluation of the decryption function
Of degree W(l), so |c|≥W(l3)
Decryption takes ≥W(l3) operations
Homomorphic evaluation of each operation on
two ciphertexts takes time ≥W(l3)
o
Total complexity ≥W(l6)
o
Can be improved to 𝑂(𝜆3.5 )
Faster Homomorphic Evaluation
• Brakerski-Vaikuntanathan’11 and
Brakerski-Gentry-Vaikuntanathan’12
o
From 𝑂 𝜆3.5 to 𝑂(𝜆)
• Smart-Vercateran‘11
o
SIMD operations, batching Ω 𝐷 ops at a time
• Gentry-Halevi-Smart’12
o
o
From SIMD to general-purpose computing
Evaluating t-gate circuit in time t∙polylog(l)
• Assuming average width Ω 𝜆
A Taste of [BV11, BGV12]
• Secret key is a vector s
• Ciphertext is a vector c
• Decs(c): output (c•s mod q) mod 2
o
q is a parameter
• Addition is vector-addition modulo q
• Multiplication is tensor-product mod q,
•
followed by “dimension reduction”
Parameter q evolves during computation
The [BV11, BGV12] Techniques
• A better multiplication technique
o
o
Decrease bitsize rather than increase noise
c1,c2, have bitsize N, noise bitsize = n
 Mult(c1,c2) has bitsize N-n, noise bitsize = n
• Need N0>(k+1)n to handle depth k
• This gives degree 2k
• Decryption has depth O(log l)
o
N0 = polylog(l), D = 𝑂(𝜆), |c|=𝑂(𝜆)
Exploiting Parallelism
• SIMD: Working on Data Arrays
Array of length ℓ
ℓ-ADD
2
1
9
2
4
4
5
0
7
3
6
…
1
2
8
2
0
6
1
5
9
3
8
0
1
…
4
4
10 3
9
8
5
9
14 3
15 3
7
…
5
6
Exploiting Parallelism
• SIMD: Working on Data Arrays
Array of length ℓ
ℓ-MULT
2
1
9
2
4
4
5
0
7
3
6
…
1
2
8
2
0
6
1
5
9
3
8
0
1
…
4
4
16 2
0
12 4
20
45 0
56 0
6
…
4
8
• What’s the point?
o
Efficiency: ℓ-fold parallelism
Plaintext Algebra
• Some FHE variants use polynomial rings
• Native plaintext space is 𝑅2 = 𝑍2 𝑋 /Φ𝑚 (𝑋)
o
Binary polynomials modulo Φ𝑚 (𝑋)
• Φ𝑚 (𝑋) is m’th cyclotomic polynomial
o
Dimension is 𝐷 = 𝜙 𝑚 ≈ 𝑚
• Φ𝑚 (𝑋) irreducible over Z, but not mod 2
o
Φ𝑚 𝑋 = ∏ℓ𝑗=1 𝐹𝑗 𝑋 (mod 2)
• The Fj’s are irreducible, all have the same degree
o
𝐷
)
log 𝐷
For some m’s we can get ℓ = Ω(
Plaintext Slots
• Plaintext element 𝑎 ∈ 𝑅2 encodes ℓ values
ℓ
o
𝑎 ≅ 𝛼𝑗
o
Polynomial Chinese Remainders
𝑗=1
, 𝛼𝑗 = (𝑎 𝑚𝑜𝑑 𝐹𝑗 )
• Each 𝛼𝑗 can be a bit
• Ops ,+ work independently on the slots
o
𝑎 × 𝑎′ ≅ 𝛼𝑗 × 𝛼𝑗′
𝑗
, 𝑎 + 𝑎′ ≅ 𝛼𝑗 + 𝛼𝑗′
𝑗
Homomorphic SIMD [SV’11]
• Computing same function on ℓ inputs at
•
the price of one computation
Pack the inputs into the slots
o
Bit-slice, inputs to j’th instance go in j‘th slots
• Compute the function once
• After decryption, decode the ℓ output bits
from the output plaintext polynomial
Aside: an ℓ-SELECT Operation
x
x
x1
x2
x3
x4
x5
x6
x7
1
0
0
1
0
1
0
x8
x9
x10
x11
x12
x13
0
1
1
0
1
0
=
x14 =
x1
0
0
x4
0
x6
0
+
0
x9
x10
0
x12
0
x14
x1
x9
x10
x4
x12
x6
x14
1
• We will use this later
Low-Overhead HE [GHS’12]
• Start from the [BGV’12] cryptosystem
o
Overhead Ω(𝜆)
• Apply the [SV’11] SIMD trick
o
Overhead polylog(l) when computing the same
function f on Ω(𝜆) inputs
• Extend to computing a single instance of f
o
o
As long as the circuit of f has width Ω(𝜆)
Use internal parallelism inside this one circuit
+
+
+
+
+
+
+
+
+
So you want to compute some function…
×
+
×
+
×
×
+
+
×
+
×
+
×
×
+
+
×
+
×
×
1
x1 x2 x3 x4 x5
0
x7 x8x9 x10 x11 x12
1
x14 x15 x16 x17 x18 x19
1
+
+
+
+
1
x21 x22 x23 x24 x25 x26
ADD and MULT are a complete set of operations.
Input
bits
+
+
+
+
+
+
+
+
+
So you want to compute some function…
Using SIMD…
×
+
×
+
×
×
+
+
×
+
×
+
×
×
+
+
×
+
×
×
1
x1 x2 x3 x4 x5
x1 x2 x3 x4 x5
0
x7 x8x9 x10 x11 x12
x7 x8 x9 x10 x11 x12
1
x14 x15 x16 x17 x18 x19
x14 x15 x16 x17 x18 x19
1
+
+
+
+
1
x21 x22 x23 x24 x25 x26
Input
bits
x21 x22 x23 x24 x25 x26
ℓ-ADD and ℓ-MULT are not a complete set of operations!!!
… unless, of course, we use ℓ=1… 
Routing Values Between Levels
• We need to map this
x1
x2
x3
x4
x5
0
x7
x8
x9
x10
x11
x12
x15
x16
x17
x18
x19
1
x21
x22
x23
x24
x25
x26
1
+
+
+
+
+
+
+
+
+
+
+
+
+
• Into that … so we can use ℓ-add
x1
x3
x5
x7
x9
x11
1
x15
x17
x19
x21
x23
x25
x2
x4
0
x8
x10
x12
x14
x16
x18
1
x22
x24
x26
x14
+
+
+
ℓ-ADD, ℓ-MULT, ℓ-PERMUTE:
a complete set of SIMD ops
x1 x2 x3 x4 x5 0
x1 x2 x3 x4 x5
x7
x1 x2 x3 x4 x5
x7
ℓ-PERMUTE(π)
x1 x3 x5 * * * *
x2 x4 * * * * *
ℓ-MULT
1 1 1 0 0 0 0
1 1 0 0 0 0 0
x1 x3 x5 0 0 0 0
x2 x4 0 0 0 0 0
+
+
+
ℓ-ADD, ℓ-MULT, ℓ-PERMUTE:
a complete set of SIMD ops
x1 x2 x3 x4 x5 0
ℓ-ADD
x1 x3 x5 0 0 0 0
+ +
xx21 xx43 x5 0 0 0 0
x2 x4 0 0 0 0 0
+
+
+
+
+
+
+
+
+
ℓ-ADD, ℓ-MULT, ℓ-PERMUTE:
a complete set of SIMD ops
×
×
×
×
×
×
×
×
×
×
+
+
+
+
+
+
+
+
+
+
×
1
+
+
+
1
x1 x2 x3 x4 x5 0 x7 x8 x9 x10 x11 x12 1 x14 x15 x16 x17 x18 x19 1 x21 x22 x23 x24 x25 x26
Use ℓ-PERMUTE for routing between circuit levels
• Not quite obvious
Input
bits
Routing Values Between Levels
1. How to implement ℓ-permute?
o
o
𝑎 ∈ 𝑅2 encodes an ℓ-array using polynomial-CRT
We are given an encryption of 𝑎
2. Fan-out: need to clone values from high
o
o
Implemented using ℓ-permute on ℓ-arrays
Even when 𝑊 ≫ ℓ
not today
fan-out gates before routing to next level
3. Big permutation: For a width-W level,
we need a permutation over 2W values
Implementing ℓ-Permute
• Recall: native plaintext is binary polynomial
modulo Φ𝑚 𝑋 , 𝑎 ∈ 𝑅2 = 𝑍2 [𝑋]/Φ𝑚 (𝑋)
o
o
o
𝑎 ≅ 𝛼1 , … , 𝛼ℓ , 𝛼𝑗 = (𝑎 𝑚𝑜𝑑 𝐹𝑗 )
𝑎 + 𝑎′ ≅ 𝛼1 + 𝛼1′ , … , 𝛼ℓ + 𝛼ℓ′
𝑎 × 𝑎′ ≅ 𝛼1 × 𝛼1′ , … , 𝛼ℓ × 𝛼ℓ′
• Is there a natural operation on polynomials
that moves values between slots?
Consequences of Galois Theory
Assuming that slots contain single bits:
• For every 𝑗 ∈ 𝑍𝑚∗ the mapping
𝜅𝑗 𝑎 𝑋 = 𝑎 𝑋𝑗 𝑚𝑜𝑑 Φ𝑚 𝑋
induces a permutation on the slots
o
o
o
permutation=identity iff j is power of two
The 𝜅𝑗 ’s form a group under composition
For every pair (i1, i2), exists j such that 𝜅𝑗 sends
the contents of slot i1 to slot i2
An Illustrative Special Case
• For some Φ𝑚 ’s, the 𝜅𝑗 ’s induce circular
shifts on the ℓ-array of slots:
o
If 𝑎 ≅ 𝛼1 , 𝛼2 … , 𝛼ℓ then 𝜅𝑗 (𝑎) ≅ 𝛼2 , … , 𝛼ℓ , 𝛼1
∗
• for some 𝑗 ∈ 𝑍𝑚
o
It mean also that for every 𝑗 ′ = ℎ ⋅ 𝑗 𝑚𝑜𝑑 𝑚,
𝜅𝑗′ (𝑎) ≅ 𝛼ℎ+1 , … , 𝛼ℓ , 𝛼1 , … , 𝛼ℎ
• For these Φ𝑚 ’s, all the 𝜅𝑗 ’s induce rotations
Applying the 𝜅𝑗 ’s homomorphically
• Roughly, applying 𝜅𝑗 to the cipehrtext
c=Enc(a) yields an encryption of 𝜅𝑗 (𝑎)
o
With respect to a different secret key
• 𝜅𝑗 (𝑠) rather than s
o
But this can be fixed
• So we can implement circular shifts
• But we need arbitrary permutations
o
In order to do intra-circuit routing
From Shifts to Arbitrary Permutations
• A naïve solution:
o
p is the permutation that we need to implement
For every i, let j(i) be an index such that the
permutation of 𝜅𝑗(𝑖) sends i to p(i)
o
Let 𝑎𝑖 = 𝜅𝑗
o
Use a big SELECT on all the 𝑎𝑖 ’s
o
𝑖
𝑎
• Pick the slot p(i) from 𝑎𝑖
• Implements p, but takes Θ
o
ℓ ops
Inefficient, we might as well not use SIMD
From Shifts to Arbitrary Permutations
Using Beneš/Walksman Permutation Networks:
• Two back-to-back butterflies
o
o
Every exchange
is controlled by a bit
Values sent on either
straight edges
or cross edges
• Every permutation can be realized by
appropriate setting of the control bits
Realizing Permutation Networks
Claim: every level of the Benes network can
be realized by two shifts and two SELECTs
Example:
0
1
2
3
4
5
6
7
2
1
0
3
6
7
4
5
Control bits: 1 0 1 1
Realizing Permutation Networks
a1
0
1
2
3
4
5
6
7
input
a2
0
1
2
3
4
5
6
7
shift(-2)
a3
0
1
2
3
4
5
6
7
shift(2)
2
1
0
3
6
7
4
5
output
Realizing Permutation Networks
a1
0
1
2
3
4
5
6
7
input
a2
2
3
4
5
6
7
0
1
shift(-2)
a3
6
7
0
1
2
3
4
5
shift(2)
a4
2
1
4
3
6
7
0
1
SELECT(a1,a2)
a5
2
1
0
3
6
7
4
5
SELECT(a3,a4)
2
1
0
3
6
7
4
5
output
Realizing Permutation Networks
Claim: every level of the Benes network can
be realized by two shifts and two SELECTs
Proof : In every level, all the exchanges are
between nodes at the same distance
o
Distance 2i for some i
Can implement all these exchanges using
shift(2i), shift(-2i), and two SELECTs
Realizing Permutation Networks
• Every level takes 2 shifts and 2 SELECTs
• There are 2log(ℓ) levels
Any permutation on ℓ-arrays can be
realized using 4log(ℓ) shifts and 4log(ℓ)
SELECTs
• Some more complications when ℓ is not a
power of two
o
But still only O(log ℓ) operations
Routing Values Between Levels
• Implementing ℓ-permute
o
o
o
Using 𝑋 ↦ 𝑋𝑗 to get simple shifts
Benes network to get arbitrary permutation
Takes O(log ℓ) operations
• Cloning values from high fan-out gates
• Permutations over 𝑊 ≫ ℓ elements
o
Both can be done in O(log W) operations
 Intra-level routing takes O(log W) work
o
For a width-W level
Low Overhead HE
• Pack inputs into ℓ-arrays
o
o
o
ℓ can made as large as Ω(𝜆)
Route values to their place at the input to next level
Takes O(W polylog W) work
• Apply SIMD operation to implement level
• Total work for size-t width-Ω(𝜆) circuit is
only t∙polylog(𝜆)
So How Fast Is It?
• Not quite as slow as before
o
Not blazing fast either…
• Should be able to evaluate AES in time
between 1 hour and 1 day
o
But can evaluate 20-100 AES blocks in this time,
so amortized per-block time is better
• Should be usable in niche applications
within a year or two
Questions?
Handling Large Permutations
•
•
Can we arbitrarily permute m×ℓ items, given in m
arrays of size ℓ, using ℓ-ADD, ℓ-MULT, ℓ-PERMUTE?
Theorem (Lev, Pippenger, Valiant ‘84): A permutation
π over m×ℓ addresses (viewed as a rectangle) can be
decomposed as π = π3◦ π2◦ π1, where:
o
o
•
•
o
π1 only permutes within the columns
π2 only permutes within the rows
π3 only permutes within the columns
Within rows: Use ℓ-PERMUTE on each row (array).
Within columns: We swap elements with same array
index. Can do this using only ℓ-ADD and ℓ-MULT.
Decomposing Permutations
13
18
14
16
12
15
3
4
5
6
7
8
9
20
19
2
17
1
11
10
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Decomposing Permutations
13
18
14
16
12
15
3
4
5
6
7
8
9
20
19
2
17
1
11
10
13
17
14
5
6
15
3
4
16
12
7
8
9
11
10
2
18
1
20
19
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Decomposing Permutations
13
18
14
16
12
1
2
3
4
5
15
3
4
5
6
6
7
8
9
10
7
8
9
20
19
11
12
13
14
15
2
17
1
11
10
16
17
18
19
20
13
17
14
5
6
6
17
13
14
5
15
3
4
16
12
16
12
3
4
15
7
8
9
11
10
11
7
8
9
10
2
18
1
20
19
1
2
18
19
20
?
Decomposing Permutations
13
18
14
16
12
1
2
3
4
5
15
3
4
5
6
6
7
8
9
10
7
8
9
20
19
11
12
13
14
15
2
17
1
11
10
16
17
18
19
20
13
17
14
5
6
6
17
13
14
5
15
3
4
16
12
16
12
3
4
15
7
8
9
11
10
11
7
8
9
10
2
18
1
20
19
1
2
18
19
20
?
Decomposing Permutations
13
18
14
16
12
1
2
3
4
5
15
3
4
5
6
6
7
8
9
10
7
8
9
20
19
11
12
13
14
15
2
17
1
11
10
16
17
18
19
20
13
17
14
5
6
6
17
13
14
5
15
3
4
16
12
16
12
3
4
15
7
8
9
11
10
11
7
8
9
10
2
18
1
20
19
1
2
18
19
20
?
Decomposing Permutations
13
18
14
16
12
1
2
3
4
5
15
3
4
5
6
6
7
8
9
10
7
8
9
20
19
11
12
13
14
15
2
17
1
11
10
16
17
18
19
20
13
17
14
5
6
6
17
13
14
5
15
3
4
16
12
16
12
3
4
15
7
8
9
11
10
11
7
8
9
10
2
18
1
20
19
1
2
18
19
20
?