Design of A Custom Vector Operation API Exploiting SIMD Intrinsics

Download Report

Transcript Design of A Custom Vector Operation API Exploiting SIMD Intrinsics

Design of A Custom Vector
Operation API Exploiting
SIMD Intrinsics within Java
Presented by John-Marc Desmarais
Authors: Jonathan Parri, John-Marc Desmarais, Daniel Shapiro, Miodrag Bolic and Voicu Groza
CARG 2010
Overview









Introduction
What is SIMD?
jSIMD Userflow
Issues and Considerations
Java Native Interface
Current Implementation
Results
Future Work
Conclusion
carg.site.uottawa.ca
CARG 2010
Introduction
SIMD
(Single Instruction Multiple
Data)
• Many embedded systems have
begun to take advantage of the Java
framework.
• JVMs can be embedded or rest on
top of the OS.
• SIMD is an often under utilized
option available on many
processors. (O3 compilation)
• In Java it is up to the JVM to best
decide how to use SIMD if available
at runtime.
carg.site.uottawa.ca
CARG 2010
SIMD
Single Instruction Multiple Data
Instruction
Functional
Unit
…
Multiple processing
elements that
performs the same
operation on data
simultaneously.
Functional
Unit
Functional
Unit
carg.site.uottawa.ca
Data
Common SIMD Implementations
xmm0
xmm1
xmm2
xmm3
• AMD 3DNow!
128 bits
• Intel MMX
• SSE (Streaming SIMD Extensions)
• AltiVec from Apple, IBM and Freescale
• VIS from Sun Microsystems
SPARC
x86/x64
PowerPC
CARG 2010
jSIMD: User Flow
SIMD
(Single Instruction Multiple
Data)
Current SIMD Optimization Java Flow
Standard
Java
Profile at
Runtime
Runtime JVM to SIMD Mapping
Change Java
Code
May take very long or may not even achieve
best SIMD usage
jSIMD SIMD Optimization Approach
Standard
Java with
jSIMD
carg.site.uottawa.ca
CARG 2010
Issues & Considerations
SIMD
(Single Instruction Multiple
Data)
Packing
• Packaging and aligning data into SIMD
registers is very time consuming.
Transactional
• Intermediate values should not leave SIMD
memory and register space.
Target Specifics
• Various targets have different SIMD
implementations. (May not even exist,
fallback)
carg.site.uottawa.ca
CARG 2010
jSIMD: SIMD for Java
SIMD
(Single Instruction Multiple
Data)
Java and the JNI
• Java allows programs to use native libraries.
• SIMD instructions can be called manually
from native code.
• Solution! Map all SIMD intrinsic into JNI
making them invisible to the Java
programmer.
• No system specific code/headers are
permitted in the library so compilation can
be performed automatically on any
platform.
carg.site.uottawa.ca
CARG 2010
jSIMD: SIMD for Java
SIMD
Current Implementation
(Single Instruction Multiple
Data)
Running Targets:
Intel x86/x64
AMD x86/x64
SPARC
PowerPC
Future Targets:
NIOS II with
custom SIMD
Unit
carg.site.uottawa.ca
CARG 2010
jSIMD: User Perspective
SIMD
(Single Instruction Multiple
Data)
Transparency
• Extended Java ISA with parallel
SIMD operations.
• Native operations hidden as Java
methods.
• User is not concerned with native
interface.
Java
Base Java
ISA
carg.site.uottawa.ca
jSIMD
API
Native
SIMD
Mappings
in C
CARG 2010
Results
SIMD
(Single Instruction Multiple
Data)
JVM versus Programmer Know-How
JVM does an impressive job at SIMD mapping but is not as
effective as a determined programmer with an understanding of
the underlying target architecture.
carg.site.uottawa.ca
Results
10000
Mean Execution Time(ms)
1000
100
10
basic Java
jSIMD
1
C++
C++ with SSE
0.1
0.01
0.001
100
1000
10000
100000
Vector Length
1000000
10000000
CARG 2010
Design Space Exploration
Memory
Hierarchy
Multiprocessors
(Add more processors)
Coprocessors &
Hardware
Accelerators
Multi-Core
Interconnect
Topology
ISE
SIMD
GPU
(Instruction Set Extensions)
(Graphical Processing Units)
carg.site.uottawa.ca
(Single Instruction Multiple
Data)
PARRI:CARG 2010
Future Work
SIMD
Future as DSE Avenue
Profile
(Single Instruction Multiple
Data)
Manual Specification
and Automatic
Detection for Native
SIMD
Analysis
Vectorize
Rewrite Java
for jSIMD
carg.site.uottawa.ca
PARRI:CARG 2010
Conclusion
We have shown that jSIMD can be used
to accelerate VM-based applications
more effectively than contemporary
automated solutions.
• VMs should integrate this approach into
their languages.
• Until such time as VM support is made
available, programmers can use our API to
accelerate their applications.
carg.site.uottawa.ca