Introdution to SSE or How to put your algorithms on steroids!
Download
Report
Transcript Introdution to SSE or How to put your algorithms on steroids!
Introdution to SSE
or
How to put your algorithms on steroids!
Christian Kerl
11.05.2012
1
Outline
•
•
•
•
•
What is SSE?
Basic Operations
Example: Image Pyramid
Summary
Further Resources
11.05.2012
2
What is SSE?
• SSE
• SIMD
= Streaming SIMD Extensions
= Single Instruction, Multiple Data
• Developed by Intel in 1999
• Further extensions SSE2, SSE3 (SSSE3, SSE4)
• Allows parallel processing of multiple integer
or floating point values
11.05.2012
3
What is SSE?
• 8 XMM registers (special CPU registers, 16 on
64 bit)
• Each XMM register is 128 bits wide
– 2 int64 / doubles
– 4 int32 / floats
– 8 int16
– 16 int8
11.05.2012
4
What is SSE?
• Special instructions working on XMM registers
– SSE
70,
SSE2 144, SSE3 13
instructions
• Different instructions for each data type
• Usable in
– Assembly
– C/C++ through SSE “intrinsics”
11.05.2012
5
Basic Operations
•
•
•
•
•
Load / Store
Arithmetic
Comparison / Logical
Type conversion
…
11.05.2012
6
Basic Operations
• Requirements on memory layout for loading
and storing data
• Memory addresses (pointers) need to be 16
byte aligned!
11.05.2012
7
Example: Image Pyramid
• Performance on 2560x1920 image
– Standard C++ version: 7.4 ms
– SSE optimized version: 1.62 ms
=> ≈ 4.5x speedup
11.05.2012
8
Summary
• SSE available on all modern x86 CPUs
• Good for sequential data processing
• Provides considerable speedups (2-4x)
• SSE intrinsic code harder to program and read
=> Use wrapper library, e.g. EasySSE, ut-sse
– Need to evaluate / extend / write one
11.05.2012
9
Further Resources
• Tutorials:
– http://supercomputingblog.com/optimization/getting-started-with-sseprogramming/
– http://www.codeproject.com/Articles/4522/Introduction-to-SSEProgramming
– http://sci.tuomastonteri.fi/programming/sse
• MSDN: good reference manual for intrinsics
– http://msdn.microsoft.com/de-de/library/y0dh78ez
• Wrapper Libraries:
– http://sourceforge.net/projects/easysse/
– http://code.google.com/p/ut-sse/
11.05.2012
10