Intel_Itanium

Download Report

Transcript Intel_Itanium

Intel©– ItaniumTM Architecture
-- Satya P. Vedula
Intel – Itanium Architecture
Agenda
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
History
Introduction
Block Diagram
Pipeline
Register Set
Instruction Set
EPIC
x86 Compatibility
Database on Itanium
Security & Itanium
Itanium and Java
Itanium and Win64
Intel – Itanium Architecture
History
Generation
1
2
3
4
5
5+
Transistors
29k
134k
275k
1.2M
3.1M
4.5M
FPU
8087
80287
80387
None/
built-In
built-in
built-in
8k – L1
16k L1
32k L1
Cache
8086/8088
1978-81
80286
1984
80386 DX/SX 80486 SX/DX Pentium
1987-88
1990-92
1993-95
Pentium MMX
1997
Intel – Itanium Architecture
History contd..
Generation
6
Transistors
5.5M –
7.5M
Cache
16k L1
512k L2
6+
27.4M
8
42M
25M
32k – L1
96k – L2
4M – L3
32k L1
Pentium Pro Pentium II
1995
9.3M
7
1997
Mobile Pentium
1997
Pentium III
1999
Pentium 4
2001
Itanium
2001
Intel – Itanium Architecture
Introduction - Itanium
The Intel® ItaniumTM processor is the first in a family of processors
based on the new Itanium architecture.
Product Highlights

Explicitly Parallel Instruction Computing (EPIC) technology enables up to 20
operations/clock.

Three levels of cache reduce memory latency: 2MB or 4MB Level 3 cache, 96K Level
2 cache, and 32K Level 1 cache.

Operating frequencies of 733MHz and 800MHz.

266MHz data bus enables fast system bus transactions with 2.1 GB/sec bandwidth.

Advanced error detection, correction and containment provided by Machine Check
Architecture (MCA), comprehensive error logging, and Error Correcting Code (ECC)
on caches and the system bus.

IA-32 instruction binary compatibility in hardware.

6.4 giga flops at peak performance
Intel – Itanium Architecture
2. Block Diagram
Simple block diagram
Complex block diagram
Intel – Itanium Architecture
Pipeline
Comparison with others
Itanium – 10 stages
Pentium III - 12-stages
Alpha 21264 – 8 stages
Pentium 4 - 20 stages
Athlon - 10 stages
10 stage In-Order pipeline
Intel – Itanium Architecture
Register Set
Each task can have individual
set of registers
general-purpose integer registers (each 64 bits wide), - 128
floating-point registers (each 82 bits wide), - 128
1-bit predicate registers - 64
branch registers - 8
Intel – Itanium Architecture
Instruction Set
Instructions are 41 bits long.
It takes 7 bits to specify one of 128 GPR
2 source-operand fields and a destination field = 21 bits
Predication = 6 bits (64 combination)
1 Bundles = 128 bits (Instructions are given in bundles)
three 41-bit instructions (making 123 bits), plus one 5-bit template
Instruction categories = 4
integer, load/store, floating-point, and branch operations.
Intel – Itanium Architecture
EPIC
EPIC: Explicitly Parallel Instruction Computing
It is a combination of features from RISC and VLIW
Advantages
-Conditional (predicated) execution
-hinted and speculative loads
(LD.A – Load Advanced, uses special buffer ALAT)
-64 free-form predicate bits
(Earlier Chips have (zero), V (overflow), S (sign), and N (negative) flags )
-One conditional branch with 64 predicate bits
-VLIW features
-Groups of independent instructions
-Simple hardware
-Exploit Instruction Level Parallelism (ILP) with Compiler
Disadvantages
-Large increase in code size
-Blocking caches
Intel – Itanium Architecture
EPIC – Power to Compilers
C source code:
if (x == 4) z = 9
else z = 0;
Compiled on Itanium
1. Compare x to 4 and store result
in a predicate bit (we'll call it A)
2. If A==1; z = 9
Compiled on Pentium
1. Compare x to 4
2. If not equal go to line 5
3. z = 9
4. go to line 6
5. z = 0
6. // Program continues from here
32-bit compiled code
3. If A==0; z = 0
64-bit compiled code
Intel – Itanium Architecture
EPIC Features
Data Speculation
A sequence of instructions which consist of an advanced load, zero or more instructions
dependent on the value of that load, and a check instruction
Code speculation
It is a Compiler Concept.
An instruction or a sequence of instructions is executed before it is known that
the dynamic control flow of the program will actually reach the point in the
program where the sequence of instructions is needed
Prediction
Branch prediction now given to Programmers. For dynamic runtime branch
prediction
Preprocessing
1) Register use, 2) Loop optimization, 3) Instruction execution order,
and 4) logical program layout
Intel – Itanium Architecture
EPIC Features contd..
Compiler advantages
-Complexity shifts to compilers
-Methods to express compile time information
-Optimized FPUs for multimedia applications
-Reliability and performance – server side
Intel – Itanium Architecture
x86 compatibility
- Supports all x86 instructions including MMX, SSE (not SSE2),
Protected, Virtual 8086, and Real mode features
- Run entire OS in x86 mode, or run the applications under
a new IA-64 OS.
- X86 compatible registers: AR24 through AR31
- JMPE: Switch instruction to switch between x86 and new mode
x86 – Register compatibility
Intel – Itanium Architecture
How does it looks like?
Transistors: 325 million
Processor chip: 25 million
(including L1 and L2 caches)
each of the four L3 cache: 75 million
Pentium III : 24 million
Pentium 4: 42 million
Itanium Code: 2x Pentium (estimated)
30% more than other RISC
Intel – Itanium Architecture
Itanium - anatomy
Intel – Itanium Architecture
Other 64 bit processors
IBM Power4 module
MIPS 20K processor
Photograph of Alpha 21264
Slot B module
UltraSPARC-III chips
Intel – Itanium Architecture
Overview of the processors
Intel – Itanium Architecture
It’s just beginning
Deerfield
Madison
McKinley
Merced
Itanium Code names
Intel – Itanium Architecture
Databases
A quantum leap
Intel – Itanium Architecture
Databases – Storage needs Contd..
2003 24B
The Coming Content “Big Bang”
40,000 BCE
cave paintings
bone tools
3500
writing
0 C.E.
paper 105
2001 6B
1450
printing
2000 3B
1870
electricity, telephone
transistor 1947
computing 1950
Late 1960s
Internet
(DARPA)
Source: IBM Informix Conference,
2001 Las Vegas
1993
The web
1999
GIGABYTES
2002 12B
Intel – Itanium Architecture
Databases – Storage – Requirements
Data Explosion!
• We are in the midst of a data explosion
– “The Big Bang”!
• Terabytes of data
– Common corporate expression
– Petabytes(10^15) & Exabytes(10^18) is fast approaching
• 2-3 Exabytes = total volume of all information
generated worldwide annually
• Storage capacities are growing
– 72 GB Hard Drive (HD) becoming industry standard
– 180 GB High Density HD – in production
Source: IBM Informix Conference,
2001 Las Vegas
Intel – Itanium Architecture
Databases contd..
The Need for Speed
• Memory access speeds desired – long term
– Memory latency averaging 235-360 nano seconds
– Max = 256 GB of RAM
– 64 bit => 20 Exabytes addressing capabilities
• Disk access speeds are the reality – near term
– Disk latency averaging 3-4 milli seconds
– 4 “orders of magnitude slower”
• DW tables contain Billions of rows
• Light table Scan – 100 byte row @ 1 GB/s
– ~ 9 million rows/sec
– ~ 540 million rows/minute
– 5.4 billion rows (500GB) ~ 10 minutes
Source: IBM Informix Conference,
2001 Las Vegas
Intel – Itanium Architecture
Databases – Itanium advantages
64-bit addressing
Tens of Gigabytes to thousands of Terabytes stored in nanosecond access main memory
eliminates millisecond disk access times thus improving application response time.
Large number of Registers and innovative register model
Data and intermediate calculations stored in on-chip registers reduce the repetitive load and
store of intermediate data values thus improving the response time of an application’s database
request.
Instruction set parallelism
Ability to execute instructions in parallel allows quick access simultaneously and manipulation
of data derived from multiple rows and columns of a large in-memory database table or tables.
Predication
Predication allows the conditional execution of instructions before it is known whether the
execution is needed. Predication allows more code to execute in parallel, the performance
penalty of branch-dependent code is less, and applications with heavy branching speed Up.
Intel – Itanium Architecture
Databases – Itanium advantages contd..
Control/Data Speculation
Control speculation allows certain load instructions to be scheduled before conditional branch
instructions, rather than after. Data speculation is similar to control speculation but allow loads
to be scheduled above stores. Both allow a reduction in the CPU wait states generated by
branch-intensive code with high latency RAM accesses thus speeding application performance.
Instruction/Data Prefetch
Instruction prefetches can be signaled on branch instructions. Data can be prefetched with
explicit prefetch instructions. Both prefetches speed application performance by reducing wait
states.
Advantages
Big databases like,
-Data warehousing
-Decision Support
-Web-Enabled ERP
Intel – Itanium Architecture
Security
Intel – Itanium Architecture
Security
-Common encryption algorithms run 3-5 times faster
-EPIC parallelism with register rotation makes algorithms more faster
-Performance boost to CAD/CAE applications due to increased floating point
registers
-Performance boost to 3d applications
-82-bit floating-point unit offers high precision
-RSA computations are 512-bits to 1024-bits in length
-New Multiply-Add Instruction comes to aide
-Parallelism comes to aide (2 128-bit computations are performed in parallel)
-Predication eliminates branches (if) from RSA computations
-RSA, AES, SHA-1 algorithms are improved, as they use only counted loops
utilizing Register Rotation
-Vast number of registers
-Large Physical Memory for Security Cache: Directory Services can be stored on
Memory
-Network traffic can be encrypted
Intel – Itanium Architecture
Security contd..
Performance statistics – Encryption algorithms
RSA ECC
AES
DES
RC6
SHA
Multi-precision arithmetic


X
X
X
X
Multi-precision logical operation
X

X
X
X
X
Fixed data rotate
X

X



Variable data rotate
X

X
X

X
Integer multiplication
X

X
X

X
Sbox lookup
X
X


X

Logical Operation
X
X




Intel – Itanium Architecture
Java
Intel – Itanium Architecture
Java
Common Java Limitations (J2SE 1.3)
-Garbage Collection
-Object-oriented programming (OOP)
-Byte code vs. native machine code
-Variability of performance because of interpretation
-Multithreaded applications
-Java Native Interface Vs. Native Method Interface
-Network Performance
-Limitations with current architectures
-EJB involves frequent invocation of method calls
-Java needs dynamic bounds checking, null checking, exception handling
-Java has a 64 bit integer data type – long
-Java Object Handles (ObjId) is 64-bit
Intel – Itanium Architecture
Java Contd..
Advantages using IBM Java2
-Streamlined Garbage Collection reduces pause time
-OOP: IBM Java uses Thread Local Heaps allowing variable sized thread local heaps
-Just-In-Time compiler translates to optimized native code
-Mixed Mode Interpreter does Selective Compilation
-Multi-threading now has light weight and full power mode
-JNI enhanced and NMI removed in Java 2
-N/w Performance: Java Socket API overhead removed
Intel – Itanium Architecture
Java Contd..
Advantages using Itanium
-Predication: Branching caused by Java technology’s bounds checking is benefited
-Speculation: Multiway branching allows address locations and data needed for Java’s bounds
and null checks to be prefetched increasing performance
-Instruction Parallelism: Multiple execution units run instructions concurrently increasing the
performance
-Register Set: Smaller methods need not contend for registers as more registers are available
Intel – Itanium Architecture
Win64
Intel – Itanium Architecture
Win64
Win64 data types
Type Name
What it is
Type Name
What it is
LONG32, INT32
32-bit Signed
LONG64, INT64
64-bit Signed
INT_PTR,
LONG_PTR
Signed Int, Pointer
Precision
UINT_PTR,
ULONG_PTR
DWORD_PTR
Unsigned Int,
Pointer Precision
SIZE_T
Unsigned Count,
Pointer Precision
SSIZE_T
Signed Count,
Pointer Precision
ULONG32,UNIT32, 32-bit Unsigned
DWORD32
ULONG64,UNIT64, 64-bit Unsigned
DWORD64
Intel – Itanium Architecture
Win64 Contd..
Win64 Issues
- LLP64 issues
-Porting issues (32-bit to 64-bit)
-Polymorphic data usage
-Pointer/length combinations
-RPC and COM
-Supports RPC between IA-32 and IA-64
-Supports LocalServer style (out-of-proc) COM between IA32 and IA-64 bit processes
-IA-32 DLL cannot be loaded into 64-bit process
-IA-64 DLL cant be loaded into 32-bit process
-Use COM as out-of-proc (Solves prev 2 problems)
-PnP should be RPCable enabled
Intel – Itanium Architecture
Questions?