Coding Theory and Cryptography: Putting Mathematics to Work!

Download Report

Transcript Coding Theory and Cryptography: Putting Mathematics to Work!

Secret Codes, CD players, and Missions to Mars:
An Introduction to Cryptology and Coding Theory
Sarah Spence Adams
Associate Professor of Mathematics
Franklin W. Olin College of Engineering
AKA
• Sit back, relax, and learn something new while
your combinatorics knowledge marinates for
the test
Cryptology

Cryptography
 Inventing
cipher systems
Alice
Bob
Eve

Cryptanalysis
 Breaking
cipher systems
Hidden Messages Through the Ages

440 BC – The Histories of Herodotus
 messages
concealed beneath wax on wooden tablets
 tattoos on a slave's head concealed by regrown hair

WWII – microdots

Modern day – hidden information within digitial
pictures
The Scytale of Ancient Greece
Used by the
Spartan military
5th Century B.C.
Caesar’s Substitution Cipher
1st Century B.C.
Example
 Plaintext:
OLINCOLLEGE
= 3
Shift forward by 3
 Ciphertext: ROLQFROOHJH
 Decryption: Shift backwards by 3 (or forwards by 23)
 Encryption:
Modular Arithmetic



15 mod 12 = 3
27 mod 26 = 1
90 mod 10 = 0

a mod m is the remainder obtained upon
dividing a by m

Can be obtained by subtracting off multiples of
m from a until the result is between 0 and m-1
Caesar Using Modular Arithmetic

Map A,B,C,…Z to 0,1,2,…,25

View cipher as function e(pi) = pi + k (mod 26)

Example with key = k = 15

Plaintext: OLINCOLLEGE=14,11,8,13,2,14,11,11,4,6,4

Encryption: e(pi)

Ciphertext:
3,0,23,2,17,3,0,0,19,21,19 = DAXCRDAATVT

Decryption:
= pi + 15 (mod 26)
d(ci) = ci -15 (mod 26) or
d(ci) = ci +11 (mod 26)
p
Kerckhoff’s Principle

Must assume the
enemy knows the
system

Also known as
Shannon’s Maxim
Cryptanalysis of Substitution Ciphers

Caesar’s 26 shifts to test

Generally 26! permutations to test

If parsed, guess at common words or
cribs

Frequency analysis
Frequency Analysis (Al-Kindi ~850A.D.)
www.wikipedia.com
The Letter E
Vigenère Cipher (1586)

Polyalphabetic cipher

Involves multiple Caesar shifts

Example
 Plaintext:
 Key:
 Encryption:
O L I N C O L L E G E
S UN S U N S U N SU
e(pi) = pi + ki (mod 26)
 Decryption: d(ci) = ci - ki (mod 26)
One-Time Pads:
The Ultimate Substitution Cipher

Plaintext:
MATHISUSEFULANDFUN

Key:
NGUJKAMOCTLNYBCIAZ

Encryption: e(pi) = pi + ki (mod 26)

Ciphertext: BGO…..

Decryption: d(ci) = ci - ki (mod 26)
One-Time Pads

Unconditionally secure

Problem: Exchanging the key
Public-Key Cryptography



Diffie & Hellman (1976)
Known at GCHQ years before
Uses one-way (asymmetric) functions, public
keys, and private keys
Public Key Algorithms

Based on two hard problems (traditionally)
 Factoring
large integers (RSA)
 The discrete logarithm problem (ElGamal)
WWII: The WeatherBeaten Enigma

3 x 10114 ciphering
possibilities; polyalphabetic

Destroyed frequency counts

Cracked thanks to traitors,
captured machines,
mathematicians, and human
error
Not only do you want secrecy…

…but you also want reliability!

Enter coding theory…..
Communication System
Digital Source
Digital Sink
Source
Encoding
Encryption
Source
Decoding
Decryption
Error Control
Encoding
Error Control
Decoding
Modulation
Channel
Demodulation
What is Coding Theory?

Coding theory is the study of error-control
codes, which are used to detect and
correct errors that occur when data are
transferred or stored
What IS Coding Theory?

A mix of mathematics, computer science,
electrical engineering, telecommunications
 Linear
algebra
 Abstract algebra (groups, rings, fields)
 Probability&Statistics
 Signals&Systems
 Implementation issues
 Optimization issues
 Performance issues
General Problem

We want to send data from one place to another
over some channel
 telephone
lines, internet cables, fiber-optic lines,
microwave radio channels, cell phone channels, etc.

or we want to write and later retrieve data…
 channels:
hard drives, disks, CD-ROMs, DVDs, solid
state memory, etc.

BUT the data may be corrupted
 hardware
malfunction, atmospheric disturbances,
attenuation, interference, jamming, etc.
General Solution

Introduce controlled redundancy to the
message to improve the chances of
recovering the original message

Trivial example: The telephone game
Introductory Example

We want to communicate YES or NO

Message 1 represents YES

Message 0 represents NO
If I send my message and there
is an error, then you will decode
incorrectly
Repetition Code of Length 2
Encode message 1 as codeword 11
 Encode message 0 as codeword 00


If one error occurs, you can detect that
something went wrong
Repetition Code of Length 5
Encode message 1 as codeword 11111
 Encode message 0 as codeword 00000


If up to two errors occur, you can correct
them using a majority vote
Evaluating and Comparing Codes

Important Code Parameters
 Minimum

distance
Determines error-control capability
 Code
rate
Ratio of information bits to codeword bits
 Measure of efficiency

 Length
of codewords
 Number of codewords
Complicated Problem

Want
 Large
minimum distance for reliability
 Large number of codewords
 High rate for efficiency

Conflicting goals
 Require
trade-offs
 Inspire more sophisticated mathematical
solutions
Inherent Trade-offs

What are the ideal trade-offs between rate,
error-correcting capability, and number of
codewords?

What is the biggest distance you can get given
a fixed rate or fixed number of codewords?

What is the best rate you can get given a fixed
distance or fixed number of codewords?
The ISBN-10 Code

x1 x2… x10

x10 is a check digit chosen so that
S = x1 + 2x2 + … + 9x9 + 10x10 = 0 mod 11

If check digit should be 10, use X instead

Can detect all single and all transposition
errors
ISBN-10 Example

Cryptology by Thomas Barr: 0-13-088976-?

Want 1(0) + 2(1) + 3(3) + 4(0) + 5(8) + 6(8) + 7(9)
+ 8(7) + 9(6) + 10(?) = multiple of 11

Compute 1(0) + 2(1) + 3(3) + 4(0) + 5(8) + 6(8) +
7(9) + 8(7) + 9(6) = 272

Ponder 272 + 10(?) = multiple of 11

The check digit must be 8
New ISBN-13

x1 x2… x13

x13 = 10-((1x1 + 3x2 + … + 1x11 + 3x12) mod 10)

If check digit should be 10, use 0 instead

Convert ISBN-10 to ISBN-13 using 978 prefix
and new check digit
Universal Product Code (UPC)
Universal Product Code (UPC)

First number: Type of product
 0,
6, 7: Standard
 2: Random-weight items (fruits, meats, etc)
 4: In-store
 5: Coupons



Next chunk: Manufacturer identification
Next chunk: Item identification
Last number: Check digit
UPC Check Digit

x1 x2… x12

x12 is a check digit chosen so that
S = 3x1 + 1x2 + … + 3x11 + 1x12 = 0 mod
10

Can detect all single and most
transposition errors

Which transposition errors go undetected?
Hadamard Code

Used in NASA Mariner Missions

Pictures divided into pixels

Pixels are assigned a level a darkness
on a scale of 0 to 63
Binary Representation of Messages

Express 64 levels of darkness (our
messages) using binary strings
 000000
 1  000001
 2  000010
 3  000011
 ….
 63  111111
0
Hadamard Matrices

Map length 6 messages to length 32
codewords obtained from rows of
Hadamard matrices

Hadamard matrices (Sylvester, 1867)
have special properties that give these
codewords a minimum distance of 16
HHT = n I
Compare with Length 5 Repetition Code

Send a message of length 6

Probability of bit error p= .01

Length 5 Repetition Code
 Sending
6 info bits requires 30 coded bits: Rate = 6/30
 P(decode incorrectly) = .00006

Hadamard Code
 Sending
6 info bits requires 32 coded bits: Rate = 6/32
 P(decode incorrectly) = .0000000008
Voyager Missions (1980’s-90’s)

Reed-Solomon codes use abstract algebra to get
even better results

Ideals of polynomial rings (Dedekind,1876)

Same codes used to protect CDs from scratches!
Summary

Cryptology and coding theory are all around us

From Caesar to RSA…. from Repetition to Reed-Solomon Codes….

More sophisticated mathematics  better ciphers/codes

New uses for old mathematics, motivation for new mathematics

Cryptology has existed for thousands of years… what ciphers and codes
will be next?
Jefferson Disk - Bazeries Cylinder

Used by US Army from early 1920’s to early 1940’s

Alice and Bob agree on order of disks – the key

Alice

rotates wheels to spell
message in one row
 sends any other row of text

Bob

spells out the ciphertext on wheel
 looks around the rows until he sees the (coherent)
message
The Code Talkers

Refers primarily to Navajo speakers in WWII

Only unbroken wartime cipher

Fast and efficient

Interesting connections between cracking secret
ciphers and cracking ancient languages
Major Questions

What are the ideal trade-offs between rate,
error-correcting capability, and number of
codewords?

What is the biggest distance you can get
given a fixed rate or fixed number of
codewords?

What is the best rate you can get given a
fixed distance or fixed number of
codewords?
A Parity Check Code

Suppose we want to send 4 messages 00, 01, 10, 11

Form codewords by appending a parity check bit to the end
of each message
 00  000
 01  011
 10  101
 11  110

Compare with length 2 repetition code



Both detect all single errors using minimum distance 2
Now rate 2/3 compared with rate 1/2
Now 4 codewords compared with 2 codewords