Transcript H - KIAS

Universal Hash Families
Universal hash families
 Family of hash functions
 Finite multiset H of string-valued functions, each h ∈ H having the same
nonempty domain A ⊆{0,1}* and range B ⊆{0,1}*, for some constant b
 [Definition] -almost universal2 (-AU2) & -almost XOR
universal2 (-AXU2)
 A family of hash functions H ={h: A {0,1}b} is -almost universal2 ,
written -AU2, if, for all distinct x, x’∈A, Prh∈ H [h(x)=h(x’)] .
 The family of hash functions H is -almost XOR universal2 , written AXU2, if, for all distinct x, x’∈A, and for all c∈{0,1}b, Prh∈ H
[h(x)h(x’)=c] .
  =maxxx’{Prh [h(x)=h(x’)]} : collision probability
 Principle measure of an AU2 : How small its collision probability is
and how fast one can compute its functions
1
Composition of universal hash families
 Make the domain of a hash family bigger
 H ={h: {0,1}a{0,1}b}
 Hm ={h: {0,1}am{0,1}bm}
 its elements are the same as in H where h(x1 x2… xm), for | xi |=a, is defined
by h(x1)|| h(x2)||…|| h(xm)
 [Proposition] If H is -AU2, then Hm is -AU2
 Make the collision probability smaller
 H1 ={h: A {0,1}b1}, H2 ={h: A {0,1}b2}
 H1 & H2 ={h: A {0,1}b1+ b2}
 its elements are pairs of functions (h1, h2)∈H1H2 and where (h1, h2)(x) is
defined as h1(x)||h2(x)
 [Proposition] If H1 is 1-AU2 and H2 is 2-AU2, then H1 & H2 is 12-AU2
2
Composition of universal hash families(cont.)
 Make the image of a hash function shorter
 H1 ={h: {0,1}a{0,1}b}, H2 ={h: {0,1}b{0,1}c}
 H2 ○ H1 ={h: {0,1}a{0,1}c}
 its elements are pairs of functions (h1, h2)∈H1H2 and where (h1, h2)(x) is
defined as h2(h1(x))
 [Proposition] If H1 is 1-AU2 and H2 is 2-AU2, then H2 ○ H1 is (1+2)AU2
 Turn an AU2 family H1 and AXU2 family H2 into an AXU2 family
H2 ○ H1
 [Proposition] Suppose H1 ={h: AB} is 1-AU2 and H2 ={h: BC} is
2-AXU2. Then H2 ○ H1 ={h: AC} is (1+2)-AXU2
3
Related researches
 Carter-Wegman(1979, 1981)
 Efficient authentication code under strongly universal hash functions
 Key observations
 Long messages can be authenticated efficiently using short keys if the
number of bits in the authentication tag is increased slightly compared to
‘perfect’ schemes
 If a message is hashed to a short authentication tag, weaker properties are
sufficient for the first stage of the compression
 Under certain conditions, the hash function can remain the same for many
plaintexts, provided that hash results is encrypted using a one-time pad
 Stinson(1994)
 Improves the works by Wegman-Carter and establishes an explicit link
between authentication codes and strongly universal hash functions
 Johansson-Kabatianskii-Smeets(1996)
 Establish a relation between authentication codes and codes correcting
independent errors
4
Related researches(cont.)
 Krawczyk(1994, 1995)
 Propose universal hash functions that are linear with respect to bitwise
XOR
 Makes it easier to reuse the authentication code
 Encrypt the m-bit hash result for each new message using a one-time pad
 Simple and efficient constructions based on polynomials and LFSR
 Shoup(1996)
 Propose and analyze the constructions based on polynomials over finite
fields
 Rogaway : Bucket hashing (1995)
 Halevi-Krawczyk : MMH (1997)
 Make optimal used of the multiply and accumulate instruction of the
Pentium MMX processor
 Black-Halevi-Krawczyk-Krovertz-Rogaway : UMAC (1999)
 Further improved the performance on high end processors
5
Constructions
 Bucket hashing
 is an -AU introduced by Rogaway
 Defining the Bucket Hash Family B
 word size w(1), parameters n(1), N(3)
 domain D={0,1}wn, range R={0,1}wN
 Let h B and let X=X1  Xn be the string we want to hash, where each
|Xi|=w. Then h(X) is defined by the following algorithm. First, for each
j{1,, N}, initialize Yj to 0w. Then, for each i{1,, n} and k  hi,
replace Yk by Yk  Xi. When done, set h(X) = Y1||Y2||||YN.
 Pseudocode
for j 1 to N do Yj 0w
for i  1 to n do
Yhi1  Yhi1  Xi
Yhi2  Yhi2  Xi
Yhi3  Yhi3  Xi
return Y1||Y2||||YN
6
Constructions(cont.)
 Bucket Hashing with Small Key Size
 N=2s/L
 Each hash function hB’[w,M,N] is specified by a list of length M
 each entry contains L integers in the interval [0, N-1]
 L arrays are introduced, each containing N buckets
 Next, each array is compressed to s/L words, using a fixed primitive
element GF(2s/L)
 The hash result is equal to the concatenation of the L compressed arrays,
each containing s/L words
7
Constructions(cont.)
 Hash Family Based on Fast Polynomial Evaluation
 is based on polynomial evaluation over a finite field
 q = 2r, Q = 2m = 2r+s, n = 1+2s,  : a linear mapping from GF(Q) onto
GF(q)
 Q = q0m, q = q0r , q0 : a prime power
 fa(x) =a0 + a1x +  + an-1xn-1
 x, y, a0, a1, , an-1  GF(Q), z  GF(q)
 H = {hx,y,z : hx,y,z(a) = hx,y,z(a0, a1, , an-1) =  (y fa(x)) + z}
8
Constructions(cont.)
 Hash Family Using Toeplitz Matrices
 Toeplitz matrices are matrices with constant values on the left-to-right
diagonals
 A Toeplitz matrix of dimension n  m can be used to hash messages of
length m to hash results of length n by vector-matrix multiplication
 The Toeplitz construction uses matrices generated by sequences of
length n + m - 1 drawn from -biased distributions
 -biased distributions are a tool for replacing truly random sequences by
more compact and easier to generate sequences
 The lower , the more random the sequence is
 Krawczyk proves that the family of hash functions associated with a
family of Toeplitz-matrices corresponding to sequences selected from a
-biased distribution is  -AXU with  = 2-n + 
9
Constructions(cont.)
 Evaluation Hash Function
 is one of the variants analyzed by Shoup
 The input (of length  tn) : viewed as a polynomial M(x) of degree < t
over GF(2n)
 The key : a random element   GF(2n)
 the hash result : equal to M()  GF(2n)
 This family of hash functions is -AXU with  = t/2n
10
Constructions(cont.)
 Division Hash Function
 represents the input as a polynomial M(x) of degree less than tn over
GF(2)
 The hash key : a random irreducible polynomial p(x) of degree n over
GF(2)
 The hash result : m(x) xn mod p(x)
 This family of hash functions is -AXU with  = tn/2n
 The total number of irreducible polynomials of degree n is roughly equal
to 2n/n
11
Constructions(cont.)
 MMH(Multilinear Modular Hashing) hashing
 consists of a (modified) inner product between message and key
modulo a prime p (close to 2w, with w the word length; below w = 32)
 is an -AXU2, but with xor replaced by subtraction modulo p
 The core hash function maps 32 32-bit message words and 32 32-bit key
words to a 32-bit result
 The key size is 1024 bits and  = 1.5/ 230
 For larger messages, a tree construction can be used
 the value of  and the key length have to be multiplied by the height of the
tree
 This algorithm is very fast on the Pentium Pro, which has a multiply
and accumulate instruction
 On a 32-bit machine, MMH requires only 2 instructions per byte for a 32bit result
12
Comparing the Hash Functions
13
Comparing the Hash Functions(cont.)
 Scheme A
 the input : divided into 32 blocks of 8 Kbyte
 each block is hashed using the same bucket hash function with N = 160
 results in an intermediate string of 20480 bytes
 Scheme B
 the input : divided into 64 blocks of 4 Kbyte
 each block is hashed using the same bucket hash function with short
key(s=42, L=6, N=128)
 results in an intermediate string of 10752 bytes
 Scheme C
 the input is divided into 64 blocks of 4 Kbyte
 each block is hashed using a 331024 Toeplitz matrix, based on a -biased
sequence of length 1056 generated using an 88-bit LFSR
 The length of the intermediate string is 8448 bytes
14
Comparing the Hash Functions(cont.)
 Scheme D
 the input : hashed twice using the polynomial evaluation hash function
with  = 2-15 resulting in a combined value of 2-30
W = 5
 The performance is slightly key dependent. Therefore an average over a
number of keys has been computed.
 Scheme E
 this is simply the evaluation hash function with t = 32768
 the resulting value of  is too small
 However, choosing a smaller value of n that is not a multiple of 32 induces
a performance penalty
 Scheme F
 the input : divided into 2048 blocks of 128 bytes
 each block is hashed twice using MMH
 the length of the intermediate string is 16384 bytes
– It is not possible to obtain a value of closer to 2-32 in an efficient way
15
Universal Hashing MAC
Message authentication based on Universal hashing
 Message authentication based on Universal hashing
 Wegman-Carter approach
 The parties share a secret key k=(h,P)
– P : infinite random string
– h : function drawn randomly from a strongly universal2 family of hash
functions H
» H is strongly universal2 if, for all xx’, the random variable
h1(x)||h2(x), for h ∈ H , is uniformly distributed
 To authenticate a message x, the sender transmits h(x) xored with the next
piece of the pad P
 Standard cryptographic technique
 use of a pseudorandom function family, F
MAC( h ,a ) ( xi )  (i, Fa (i )  h( xi ))
 [Theorem] Assume H is -AXU2, and that F is replaced by the truly
random function family R of functions. In this case, if an adversary makes
q1 queries to the authentication algorithm S and q2 queries to the
verification algorithm V, the probability of forging a MAC is at most q2
17
Universal hashing MAC
 Why Universal hashing MAC?
 The speed of a universal hashing MAC depends on the speed of the
hashing step and encrypting step
 The encryption does not take long
 hash function compresses messages => the encrypting message is short
 The combinatorial properties of the universal hash function family is
mathematically proven
 needs no “over-design” or “safe margin” the way a cryptographic primitive
would
 Universal hashing MAC makes for desirable security properties
 can select a cryptographically conservative design for the encrypting
step
 can pay with only a minor impact on speed
 the cryptographic primitive is applied only to the much shorter hashed
image of the message
 security and efficiency are not conflicting requirements
18
UMAC
 The UMAC algorithm
 species how the message, key, and nonce determine an authentication
tag
 The sender
 will need to provide the receiver with the message, nonce, and tag
 The receiver
 can then compute what “should be” the tag for this particular message and
nonce, and see if it matches the received tag
 employs a subkey generation process in which the shared
(convenient-length) key is mapped into UMAC's internal keys
 subkey generation is done just once, at the beginning of a
communication session during which the key does not change, and so
subkey-generation is usually not performance-critical
 UMAC depends on a few different parameters
19
UMAC(cont.)
 An illustrative special case of UMAC
 Subkey generation:
 Using a PRG, map Key to K = K1K2  K1024 and to A
– each Ki : a 32-bit word, |A| = 512
 Hashing the message Msg to HM = NHXKey(Msg):
 Let Len be |Msg| mod 4096, encoded as a 2-byte string
 Append to Msg the minimum number of 0 bits to make |Msg| divisible by
8
 Let Msg = Msg1 || Msg2 ||  || Msgt where each Msgi is 1024 words
except for Msgt, which has between 2 and 1024 words
 Let HM = NHK(Msg1) || NHK(Msg2) ||  || NHK(Msgt) || Len
 Computing the authentication tag:
 The tag is Tag = HMAC-SHA1A(HM || Nonce)
20
UMAC(cont.)
 Definition of NH
 blocksize n  2, wordsize w  1
 domain : A = {0, 1}2w {0, 1}2w   {0, 1}nw
 range : B = {0, 1}2w
 a random function in NH[n,w] is given by a random nw-bit string K
 Uw : {0,  , 2w-1}, U2w : {0,  , 22w -1}
 for integers x, y let (x +w y) denote (x + y) mod 2w
 M  A and M = M1  Ml , |M1| =  = |Ml| = w
 K = {0, 1}nw and K = K1  Kn , |K1| =  = |Kn| = w
21
UMAC(cont.)
 where mi Uw is the number that Mi represents (as an unsigned integer)
 where ki Uw is the number that Ki represents (as an unsigned integer)
 the right-hand side of the above equation is understood to name the
(unique) 2w-bit string which represents (as an unsigned integer) the
U2w-valued integer result
22
Compression Function of MD4
23
Compression Function of MD5
24
Compression Function of RIPEMD-160
25
Compression Function of SHA-1
26
Conclusions