Themegallery PowerTemplate

Download Report

Transcript Themegallery PowerTemplate

PARALLEL PROCESSING INSTITUTE · FUDAN UNIVERSITY
1
Outline
 Motivation
 Design
& Implementation
 Evaluation
 Future work
2
The popularity of Java
20.299%
3
Java!
Architecture neutral
 Simplified memory management
 Security and Productivity
 ……

Write Once
Run Anywhere
How to further improve Java runtime performance?
4
Our Research
Leverage the synergy between static and
dynamic optimizations
 Dynamic environment while leveraging
static benefits
 Finding performance opportunities before
runtime
 Static annotation to help runtime
optimization

5
Opencj
It is our first milestone in the whole project
 Develop based on Open64
 Takes Java source files or Class files as input
 Outputs executable code for
Linux/IA32&x86-64
 Compilation process is similar to compiling
C/C++ applications

6
Outline
 Motivation
 Design
& Implementation
 Evaluation
 Future work
7
Design Overview of Opencj

8
Migrate frontend of gcj into Open64
Java exception handling

Similar to C++ exception, but has some
differences, such as
 runtime exceptions: a/0, NullPointerException
 No “catch-all” handler used in C++
 “finally” mechanism, makes Java exception
more complex than C++

9
The key point of Java exception handling is
to record the relationship among
try/catch/finally blocks.
Devirtualization






10
Easy to reuse code for programmers but hard to
analyze for compiler
Resolve java virtual function call to promote
indirect call into direct call
Class hierarchy analysis and Rapid type analysis
Devirtualization is implemented at IPA phase
Many optimizations can benefit from this
transformation
In SciMark 2.0 Java benchmark test, it can resolve
all 21 user defined virtual function calls.
Synchronization elimination

Based on Escape Analysis
 Flow-insensitive & interprocedural analysis
 Connection Graph: captures the connectivity
relationship among objects and object
references.
 Easily determine whether an object is local
to a thread.
 If a synchronized object is local to a thread,
the synchronized operation can be removed
11
Building connect graph

Only five kinds of statements
1. p = new P()
2. p = return_new_P()
3. p = q
4. p = q.f
5. p.f = q
12
Analysis process

Intra-procedural analysis
 Check every call graph node to find out whether
there is a synchronized call in a PU
 Set initial escape state of each reference node

Inter-procedural analysis
 Start from main function and traverse the call
graph in depth-first order
 Pass escape states between caller and callee
13
Example 1
GlobalEscape
OutEscape
OutEscape
GlobalEscape
NoEscape
14
Example 1
GlobalEscape
NoEscape
GlobalEscape
NoEscape
15
Example2
GlobalEscape
ArgEscape
ArgEscape
GlobalEscape
NoEscape
16
Example2
GlobalEscape
NoEscape
GlobalEscape
NoEscape
17
Array bounds check elimination
Array bounds check to guarantee Java typesafe execution
 Prevent many useful code optimizations
since array bounds check may raise
exceptions
 Fully elimination: if the check never fails
 Partial elimination: whenever possible,
moves bounds check out of loops

18
Example of ABCE
19
Fully redundant check elimination

Example
0<=i1<100
jc1
20
Fully redundant check elimination

21
Example
Partial elimination
Adopting loop
versioning technique to
guarantee the exception
semantic for Java
 Set trigger conditions
before and after the
optimized loop

22
Partial redundant check elimination

23
Example
Checks elimination of ABCE
Total: the total number checks in the test case
PRCE: the number of Partial Redundant Check Elimination
FRCE: the number of Fully Redundant Check Elimination
ABCE: FPCE+PRCE
28.4% speedup in Scimark2 test, lower than we expected
24
Outline
 Motivation
 Design
& Implementation
 Evaluation
 Future work
25
Performance gap between Java & C
higher is better




26
opencj -O3 -IPA -fno-bounds-check
opencc -O3 -IPA
gcj -O3 -fno-bounds-check -funroll-loops
gcc -O3 -funroll-loops
Static compilation vs JIT
higher is better
1000
900
800
700
opencj
600
JDK1.6
500
Harmony
400
300
200
100
0
Composite Score
FFT
 Comparing


27
SOR
MC
SM
LU
two Java running modes.
Running in JVM
Running executable file directly
Static compilation vs JIT
16
lower is better
14
12
gcj
10
opencj
8
JDK1.6
Harmony
6
4
2
0
compress
 JDK
jess
db
javac
mpegaudio
1.6 is best except mpegaudio
 More analysis work need to do.
28
mtrt
Outline
 Motivation
 Design
& Implementation
 Evaluation
 Future work
29
Future Trends – for Java

Where is Java headed with its dynamic
optimization framework:
 Exploring opportunities to achieve performance
parity with native code
 Online profiling mechanisms and feedbackdirected optimizations becoming mainstream
…
30
Java advantages

31
Several studies show that Java could potentially be
faster than C/C++ for some reasons:
 C/C++ Pointers make optimization difficult
 It is easier to do memory management in Java
than C/C++ as Java only allocates memory
through object instantiation. So Java garbage
collectors can achieve better cache coherence
 Dynamic compilation of Java can use
additional information available at run-time to
optimize code more effectively.
Future of Opencj

Opencj will achieve better runtime performance by
using JVM as the execution environment
 Static annotation with annotation-aware JIT
- Runtime IPA
 Using just-in-time compiler
- Apply more effective optimizations by profiling runtime information
 Using garbage collection
- Better performance due to cache coherence

32
There are three steps in our schedule
Framework---step1
C/C++/F
.java
FE
FE
.class
IPL
IPA
BE (LNO, WOPT)
CG
IR Writer
Byte Code Reader
x86
IA
LWHIRL
C/C++
HIR ACTIONS
WHIRL Reader
Existing Module
Whirl_to_LIR
LIR ACTIONS
New Module
runtime
library
JIT
33
Interp
Framework—step2
C/C++/F
.java
FE
FE
.class
IPL
RIPA
BE (LNO, WOPT)
CG
IR Writer
C/C++
Byte Code Reader
x86
IA
LWHIRL
HIR ACTIONS
WHIRL Reader
Existing Module
RIPA
IR
Whirl_to_LIR
New Module
LIR ACTIONS
JIT
34
Interp
runtime
library
Framework---final
C/C++/F
.java
FE
FE
.class
IPL
HWHIRL
RIPA
BE (LNO, WOPT)
CG
IR Writer
C/C++
x86
IA
LWHIRL
Byte Code Reader
WHIRL Reader
Existing Module
HIR ACTIONS
RIPA
IR
Runtime OPT.
New Module
W to LIR
LIR ACTIONS
Feedback
35
JIT
Interp
runtime
library
Discussion
Shin is the leader of this project
 Q&A

36
PARALLEL PROCESSING INSTITUTE · FUDAN UNIVERSITY
37