Themegallery PowerTemplate
Download
Report
Transcript Themegallery PowerTemplate
PARALLEL PROCESSING INSTITUTE · FUDAN UNIVERSITY
1
Outline
Motivation
Design
& Implementation
Evaluation
Future work
2
The popularity of Java
20.299%
3
Java!
Architecture neutral
Simplified memory management
Security and Productivity
……
Write Once
Run Anywhere
How to further improve Java runtime performance?
4
Our Research
Leverage the synergy between static and
dynamic optimizations
Dynamic environment while leveraging
static benefits
Finding performance opportunities before
runtime
Static annotation to help runtime
optimization
5
Opencj
It is our first milestone in the whole project
Develop based on Open64
Takes Java source files or Class files as input
Outputs executable code for
Linux/IA32&x86-64
Compilation process is similar to compiling
C/C++ applications
6
Outline
Motivation
Design
& Implementation
Evaluation
Future work
7
Design Overview of Opencj
8
Migrate frontend of gcj into Open64
Java exception handling
Similar to C++ exception, but has some
differences, such as
runtime exceptions: a/0, NullPointerException
No “catch-all” handler used in C++
“finally” mechanism, makes Java exception
more complex than C++
9
The key point of Java exception handling is
to record the relationship among
try/catch/finally blocks.
Devirtualization
10
Easy to reuse code for programmers but hard to
analyze for compiler
Resolve java virtual function call to promote
indirect call into direct call
Class hierarchy analysis and Rapid type analysis
Devirtualization is implemented at IPA phase
Many optimizations can benefit from this
transformation
In SciMark 2.0 Java benchmark test, it can resolve
all 21 user defined virtual function calls.
Synchronization elimination
Based on Escape Analysis
Flow-insensitive & interprocedural analysis
Connection Graph: captures the connectivity
relationship among objects and object
references.
Easily determine whether an object is local
to a thread.
If a synchronized object is local to a thread,
the synchronized operation can be removed
11
Building connect graph
Only five kinds of statements
1. p = new P()
2. p = return_new_P()
3. p = q
4. p = q.f
5. p.f = q
12
Analysis process
Intra-procedural analysis
Check every call graph node to find out whether
there is a synchronized call in a PU
Set initial escape state of each reference node
Inter-procedural analysis
Start from main function and traverse the call
graph in depth-first order
Pass escape states between caller and callee
13
Example 1
GlobalEscape
OutEscape
OutEscape
GlobalEscape
NoEscape
14
Example 1
GlobalEscape
NoEscape
GlobalEscape
NoEscape
15
Example2
GlobalEscape
ArgEscape
ArgEscape
GlobalEscape
NoEscape
16
Example2
GlobalEscape
NoEscape
GlobalEscape
NoEscape
17
Array bounds check elimination
Array bounds check to guarantee Java typesafe execution
Prevent many useful code optimizations
since array bounds check may raise
exceptions
Fully elimination: if the check never fails
Partial elimination: whenever possible,
moves bounds check out of loops
18
Example of ABCE
19
Fully redundant check elimination
Example
0<=i1<100
jc1
20
Fully redundant check elimination
21
Example
Partial elimination
Adopting loop
versioning technique to
guarantee the exception
semantic for Java
Set trigger conditions
before and after the
optimized loop
22
Partial redundant check elimination
23
Example
Checks elimination of ABCE
Total: the total number checks in the test case
PRCE: the number of Partial Redundant Check Elimination
FRCE: the number of Fully Redundant Check Elimination
ABCE: FPCE+PRCE
28.4% speedup in Scimark2 test, lower than we expected
24
Outline
Motivation
Design
& Implementation
Evaluation
Future work
25
Performance gap between Java & C
higher is better
26
opencj -O3 -IPA -fno-bounds-check
opencc -O3 -IPA
gcj -O3 -fno-bounds-check -funroll-loops
gcc -O3 -funroll-loops
Static compilation vs JIT
higher is better
1000
900
800
700
opencj
600
JDK1.6
500
Harmony
400
300
200
100
0
Composite Score
FFT
Comparing
27
SOR
MC
SM
LU
two Java running modes.
Running in JVM
Running executable file directly
Static compilation vs JIT
16
lower is better
14
12
gcj
10
opencj
8
JDK1.6
Harmony
6
4
2
0
compress
JDK
jess
db
javac
mpegaudio
1.6 is best except mpegaudio
More analysis work need to do.
28
mtrt
Outline
Motivation
Design
& Implementation
Evaluation
Future work
29
Future Trends – for Java
Where is Java headed with its dynamic
optimization framework:
Exploring opportunities to achieve performance
parity with native code
Online profiling mechanisms and feedbackdirected optimizations becoming mainstream
…
30
Java advantages
31
Several studies show that Java could potentially be
faster than C/C++ for some reasons:
C/C++ Pointers make optimization difficult
It is easier to do memory management in Java
than C/C++ as Java only allocates memory
through object instantiation. So Java garbage
collectors can achieve better cache coherence
Dynamic compilation of Java can use
additional information available at run-time to
optimize code more effectively.
Future of Opencj
Opencj will achieve better runtime performance by
using JVM as the execution environment
Static annotation with annotation-aware JIT
- Runtime IPA
Using just-in-time compiler
- Apply more effective optimizations by profiling runtime information
Using garbage collection
- Better performance due to cache coherence
32
There are three steps in our schedule
Framework---step1
C/C++/F
.java
FE
FE
.class
IPL
IPA
BE (LNO, WOPT)
CG
IR Writer
Byte Code Reader
x86
IA
LWHIRL
C/C++
HIR ACTIONS
WHIRL Reader
Existing Module
Whirl_to_LIR
LIR ACTIONS
New Module
runtime
library
JIT
33
Interp
Framework—step2
C/C++/F
.java
FE
FE
.class
IPL
RIPA
BE (LNO, WOPT)
CG
IR Writer
C/C++
Byte Code Reader
x86
IA
LWHIRL
HIR ACTIONS
WHIRL Reader
Existing Module
RIPA
IR
Whirl_to_LIR
New Module
LIR ACTIONS
JIT
34
Interp
runtime
library
Framework---final
C/C++/F
.java
FE
FE
.class
IPL
HWHIRL
RIPA
BE (LNO, WOPT)
CG
IR Writer
C/C++
x86
IA
LWHIRL
Byte Code Reader
WHIRL Reader
Existing Module
HIR ACTIONS
RIPA
IR
Runtime OPT.
New Module
W to LIR
LIR ACTIONS
Feedback
35
JIT
Interp
runtime
library
Discussion
Shin is the leader of this project
Q&A
36
PARALLEL PROCESSING INSTITUTE · FUDAN UNIVERSITY
37