Lecture 0: Objectives, Scope, and Organization of the Course

Download Report

Transcript Lecture 0: Objectives, Scope, and Organization of the Course

Kris Gaj
Research and teaching interests:
• cryptography
• computer arithmetic
• VLSI design and testing
Contact:
Engineering Bldg., room 3225
[email protected]
(703) 993-1575
Office hours: Monday, 7:30-8:30 PM
Tuesday, 6:00-7:00 PM,
and by appointment
ECE 645
Part of:
MS in CpE
Digital Systems Design – pre-approved course
Other concentration areas – elective course
MS in EE
Certificate in VLSI Design/Manufacturing
PhD in ECE
PhD in IT
DIGITAL SYSTEMS DESIGN
1. ECE 545 Digital System Design with VHDL
– K. Gaj, project, FPGA design with VHDL, Aldec/Synplicity/Xilinx/Altera
2. ECE 645 Computer Arithmetic
– K. Gaj, project, FPGA design with VHDL or Verilog,
Aldec/Synplicity/Xilinx/Altera
3. ECE 586 Digital Integrated Circuits
– D. Ioannou
4. ECE 681 VLSI Design for ASICs
– N. Klimavicz, project/lab, front-end and back-end ASIC design with
Synopsys tools
5. ECE 682 VLSI Test Concepts
– T. Storey, homework
Prerequisites
ECE 545 Digital System Design with VHDL
or
Permission of the instructor,
granted assuming that you know
VHDL or Verilog,
High level
programming
language
(preferably C)
Prerequisite knowledge
• This class assumes proficiency with the FPGA
CAD tools from ECE 545
• You are expected to be proficient with:
– Synthesizable VHDL coding
– Advanced VHDL testbenches, including file
input/output
– Xilinx FPGA synthesis and post-synthesis simulation
– Xilinx FPGA place-and-route and post-place and route
simulation
– Reading and interpreting all synthesis and
implementation reports
Course web page
ECE web page  Courses  Course web pages
 ECE 645
http://ece.gmu.edu/coursewebpages/ECE/ECE645/S10/
Computer Arithmetic
Lecture
Homework
10 %
Midterm exam (in class)
15 %
Final Exam (in class)
25 %
Project
Project 1
20 %
Project 2
30 %
Advanced digital circuit design course covering
Efficient
• addition and subtraction
• multiplication
• division and modular reduction
• exponentiation
Integers
unsigned and signed
Real numbers
Elements
of the Galois
field GF(2n)
• fixed point
• single and double precision
floating point
• polynomial base
Course Objectives
At the end of this course you should be able to:
• Understand mathematical and gate-level algorithms for computer
addition, subtraction, multiplication, division, and exponentiation
• Understand tradeoffs involved with different arithmetic
architectures between performance, area, latency, scalability, etc.
• Synthesize and implement computer arithmetic blocks on FPGAs
• Be comfortable with different number systems, and have familiarity
with floating-point and Galois field arithmetic for future study
• Understand sources of error in computer arithmetic and basics
of error analysis
This knowledge will come about through homework, projects
and practice exams.
Lecture topics (1)
INTRODUCTION
1. Applications of computer arithmetic algorithms
2. Number representation
• Unsigned Integers
• Signed Integers
• Fixed-point real numbers
• Floating-point real numbers
• Elements of the Galois Field GF(2n)
ADDITION AND SUBTRACTION
1. Basic addition, subtraction, and counting
2. Carry-lookahead, carry-select, and hybrid adders
3. Adders based on Parallel Prefix Networks
MULTIOPERAND ADDITION
1. Carry-save adders
2. Wallace and Dadda Trees
3. Adding multiple unsigned and signed numbers
TECHNOLOGY
1. Internal Structure of Xilinx and Altera FPGAs
2. ASIC standard cell libraries and
synthesis tools for ASICs
3. Two-operand and multi-operand addition
in FPGAs
MULTIPLICATION
1. Tree and array multipliers
2. Sequential multipliers
3. Multiplication of signed numbers and squaring
TECHNOLOGY
1. Pipelining
2. Multi-cycle paths
3. Multiplication in Xilinx and Altera FPGAs
- using distributed logic
- using embedded multipliers
- using DSP blocks
LONG INTEGER ARITHMETIC
1. Modular Exponentiation
2. Montgomery Multipliers and Exponentiation Units
DIVISION
1. Basic restoring and non-restoring
sequential dividers
2. SRT and high-radix dividers
3. Array dividers
FLOATING POINT
AND
GALOIS FIELD ARITHMETIC
1. Floating-point units
2. Galois Field GF(2n) units
Literature (1)
Required textbook:
Behrooz Parhami,
Computer Arithmetic: Algorithms and Hardware Design,
2nd edition, Oxford University Press, 2010.
Literature (2)
Recommended textbooks:
Jean-Pierre Deschamps, Gery Jean Antoine Bioul,
Gustavo D. Sutter,
Synthesis of Arithmetic Circuits: FPGA, ASIC and
Embedded Systems,
Wiley-Interscience, 2006.
Milos D. Ercegovac and Tomas Lang
Digital Arithmetic, Morgan Kaufmann Publishers, 2004.
Isreal Koren, Computer Arithmetic Algorithms, 2nd edition,
A. K. Peters, Natick, MA, 2002.
Literature (2)
VHDL books:
1. Pong P. Chu, RTL Hardware Design Using VHDL:
Coding for Efficiency, Portability, and Scalability,
Wiley-IEEE Press, 2006.
2. Volnei A. Pedroni, Circuit Design with VHDL,
The MIT Press, 2004.
3. Sundar Rajan, Essential VHDL: RTL Synthesis Done Right,
S & G Publishing, 1998.
Literature (3)
Supplementary books:
1. E. E. Swartzlander, Jr., Computer Arithmetic,
vols. I and II, IEEE Computer Society Press, 1990.
2. Alfred J. Menezes, Paul C. van Oorschot,
and Scott A. Vanstone,
Handbook of Applied Cryptology,
Chapter 14, Efficient Implementation,
CRC Press, Inc., 1998.
Literature (3)
Proceedings of conferences
ARITH - International Symposium on Computer Arithmetic
ASIL - Asilomar Conference on Signals, Systems, and Computers
ICCD - International Conference on Computer Design
CHES - Workshop on Cryptographic Hardware and
Embedded Systems
Journals and periodicals
IEEE Transactions on Computers,
in particular special issues on computer arithmetic:
8/70, 6/73, 7/77, 4/83, 8/90, 8/92, 8/94, 7/00, 3/05.
IEEE Transactions on Circuits and Systems
IEEE Transactions on Very Large Scale Integration
IEE Proceedings: Computer and Digital Techniques
Journal of VLSI Signal Processing
Homework
• reading assignments
• design of small hardware units using VHDL
• analysis of computer arithmetic algorithms
and implementations
Midterm exams
Midterm Exam - 2 hrs 30 minutes, in class
multiple choice + short problems
Final Exam – 2 hrs 45 minutes
comprehensive
conceptual questions,
analysis and design of arithmetic units
Practice exams on the web
Tentative days of exams:
Midterm Exam - Monday, March 23
Final Exam
- Tuesday, May 11, 7:30-10:15 PM
Project (1)
Project I (individual, 20% of grade)
Comprehensive analysis of basic operations of
SHA-3 candidates
Optimization criteria:
• minimum latency
• minimum area
• minimum product latency · area
• use of embedded FPGA resources
(BRAMs, embedded multipliers,
DSP units,
Different for all students
Done individually
Final report due
Tuesday, March 16
Limitations of the Current Approach
• Time and effort
• Accuracy of comparison
One designer = too long time to implement all
candidates
Multiple designers = significant inaccuracies
associated with different skills and coding styles
Problem
How to predict ranking and relative performance
of candidate algorithms without the actual
time-consuming hardware implementation
at the Register Transfer Level (RTL)?
Applications:
• Ranking of candidate algorithms submitted to the
contests (large number of candidates, time limit)
• Ranking of candidate algorithms during the design
process by designers themselves
(no experience in hardware design, short response time
needed)
Features of our Problem to Exploit
• No need to obtain the functioning netlist or HDL
description (performance numbers sufficient)
• Limited accuracy required (less than 20% differences
in performance considered insignificant)
• Limited number of basic operations
• Limited number of architectures used in practice
The proposed approach
Steps of Our Methodology (1)
1.
Determine the minimum set of basic operations
required to implement a given class of cryptographic
transformations
2.
Determine the required range of parameters of these
operations (e.g., operand sizes in arithmetic operations)
3.
Implement basic operations in RTL VHDL (or
Verilog) in a parametric fashion (using constants and
generics)
4.
Characterize all operations, for all required parameter
values using Xilinx and/or Altera development
environments
- Area and latency
- Low cost FPGAs and high-performance FPGAs
Major operations of AES finalists
Serpent Twofish Rijndael
S-boxes
Multiplication
in GF(2m)
Variable
rotation
Integer
multiplication
RC6
Mars
Auxiliary operations of AES finalists
Serpent Twofish Rijndael
Boolean
Fixed rotation
Addition/
subtraction
Permutation
RC6
Mars
Major cipher operations (1) - S-box
Software
S-box n x m
n
C
Hardware
ROM
n-bit address
WORD S[1<<n]=
{ 0x23, 0x34, 0x56
..............
}
2n  m
bits
S
m
ASM
m-bit output
direct logic
...
...
S DW 23H, 34H,
56H
…..
x1
x2
xn
2n words
y1
y2
ym
Major cipher operations (2) – Variable Rotation
Software
Hardware
Mux-based rotation
A<<<0 A<<<16
C
C = (A << B) | (A >> (32-B));
A <<< B
32
variable rotation
ROL32
B[4]
B[3]
B[2]
B[1]
B[0]
A<<<B
ASM
High-speed clock
ROL A, B
A
fast clock
CLK’
min (B, 32-B) CLK’ cycles
Auxiliary cipher operations (1) - Permutation
Software
n
P
n
Permutation
Hardware
x1 x2 x3
C
complex
sequence of
instructions
<<, |, &
ASM
complex
sequence of
instructions
ROL, OR, AND
xn-1 xn
...
...
y1 y2 y3
yn-1 yn
order of wires
Auxiliary cipher operations (4)
Addition/subtraction
A
Software
B
n
n
Hardware
C
n
n
unsigned long A, B, C;
C = A+B;
ADD
n
C
C=A+B mod 2n
n=32, 16
ASM
ADD
n
Adder/subtractor
Multiple designs for hardware adders
Delay
Ripple carry adder (RC)
Carry-Skip adder (CS)
Carry-LookAhead adder (CLA)
Carry-Select adder
Parallel-Prefix Network adder
(Kogge-Stone, Brent-Kung)
Area
Basic operations
Delay and area in HARDWARE
Delay
modular
multiplication
addition (RC)
GF(2n)
multiplication
Boolean
permutation
fixed rotation
modular
inverse
variable
rotation
addition (CLA)
S-box
4x4
S-box
8x8
S-box
9x32
Area
Basic operations
Delay and area in SOFTWARE
Delay
modular inverse
permutation
GF(2n)
multiplication
variable rotation
fixed rotation
multiplication
addition
Boolean
S-box
4x4
S-box
8x8
S-box
9x32
Memory
Steps of Our Methodology (2)
5.
Develop a simple and human-friendly notation to describe
cryptographic algorithms (or their repetitive parts [rounds]),
which reveals the parallelism present in the algorithm
 Graphical representation more human friendly
 Textual representation easier to process by computer
programs
Possible Approach:
• start from a textual description
• adopt one of the existing graphical editors
Steps of Our Methodology (2)
6.
Develop a tool capable of estimating algorithm performance in
terms of area and throughput using
 High-level description
 Library of basic components
 Choice of architecture
 Optimization criteria (minimum area, maximum
throughput, maximum throughput to area ratio, etc.)
 Other constraints, such as required clock frequency, etc.
7.
Calibration of the developed tools using existing RTL designs
for a limited subset of the algorithms
Possible Problems
• Routing (interconnect) delays
• Optimizations on the boundary between two
operations
• Combining multiple operations into one (e.g., using
look-up table approach)
• Inter-round optimizations
• Resource sharing techniques, in particular resource
sharing between encryption and decryption circuits
• Dependence of results on selected FPGA devices
• Others…
Summary
Main project goals:
• Provide cryptographic community and in particular
standardization organizations/groups with a reliable
and fast way of comparing large number of
candidates for a cryptographic standard
• Save designers of cryptographic algorithms from
design blunders (such as that of IBM team in case of
MARS)
• Project in progress…
• Feedback and collaboration is very welcome
MARS – IBM team
Delay and area in SOFTWARE
Delay
modular inverse
permutation
GF(2n)
multiplication
variable rotation
fixed rotation
multiplication
addition
Boolean
S-box
4x4
S-box
8x8
S-box
9x32
Memory
MARS – IBM team
Delay and area in HARDWARE
Delay
modular
multiplication
addition (RC)
GF(2n)
multiplication
Boolean
permutation
fixed rotation
modular
inverse
variable
rotation
addition (CLA)
S-box
4x4
S-box
8x8
S-box
9x32
Area
Project (2)
Project II (30% of grade)
New Design in the area of Public Key Cryptography,
Cryptanalysis,
Digital Signal Processing, etc.
• Real life application
• Requirements derived from the analysis of an application
• Software implementation (typically public domain)
used as a source of test vectors and to determine
HW/SW speed ratio
• Several project topics proposed on the web
You can suggest project topic by yourself
Project II (rules)
• Can be done in a group of 1-3 students
• Every team works on a slightly different problem
• Project topics should be more complex for larger teams
• Cooperation (but not exchange of codes)
between teams is encouraged
Oral presentation and written report: Tuesday, May 4
Degrees of freedom and possible trade-offs
speed
area
ECE 645
power
ECE 586, 681
testability
ECE 682
Degrees of freedom and possible trade-offs
speed
latency
area
throughput
Primary applications (1)
Execution units of general purpose microprocessors
Integer units
Floating point units
Integers
(8, 16, 32, 64 bits)
Real numbers
(32, 64 bits)
Primary applications (2)
Digital signal and digital image processing
e.g., digital filters
Discrete Fourier Transform
Discrete Hilbert Transform
General purpose
DSP processors
Specialized circuits
Real or complex numbers
(fixed-point or floating point)
Primary applications (3)
Coding
Error detection codes
Error correcting codes
Elements of
the Galois fields GF(2n)
(4-64 bits)
Secret-key (Symmetric) Cryptosystems
key of Alice and Bob - KAB
key of Alice and Bob - KAB
Network
Encryption
Alice
Decryption
Bob
Hash Function
arbitrary length
m
message
h
It is computationally
infeasible to find such
m and m’ that
h(m)=h(m’)
h(m)
fixed length
hash
function
hash value
Primary applications (4)
Cryptography
IDEA, RC6, Mars
Twofish, Rijndael,
SHA-3 candidates
Integers
(16, 32 bits)
Elements of
the Galois field GF(2n)
(4, 8 bits)
Main
operations
RC6
2 x SQR32,
2 x ROL32
MARS
MUL32,
2 x ROL32,
S-box 9x32
Twofish
96 S-box 4x4,
24 MUL GF(28)
Auxiliary
operations
XOR,
ADD/SUB32
XOR,
ADD/SUB32
XOR
ADD32
Rijndael
16 S-box 8x8
24 MUL GF(28)
XOR
Serpent
8 x 32
S-box 4x4
XOR
Public Key (Asymmetric) Cryptosystems
Private key of Bob - kB
Public key of Bob - KB
Network
Encryption
Alice
Decryption
Bob
RSA as a trap-door one-way function
PUBLIC KEY
M
C = f(M) = Me mod N
C
M = f-1(C) = Cd mod N
PRIVATE KEY
N=PQ
P, Q - large prime numbers
e  d  1 mod ((P-1)(Q-1))
RSA keys
PUBLIC KEY
PRIVATE KEY
{ e, N }
{ d, P, Q }
N=PQ
P, Q - large prime numbers
e  d  1 mod ((P-1)(Q-1))
Primary applications (5)
Cryptography
Public key cryptography
RSA, DSA,
Diffie-Hellman
Long integers
(1000-16,000 bits)
Elliptic Curve Cryptosystems
Elements of
the Galois field GF(2n)
(150-500 bits)
Primary applications (5)
Cipher Breaking
Public key cryptography
RSA PUBLIC KEY
RSA PRIVATE KEY
{ e, N }
{ d, P, Q }
N=PQ
P, Q
e  d  1 mod ((P-1)(Q-1))