Measuring Wait and Service Time in Java Using Bytecode

Download Report

Transcript Measuring Wait and Service Time in Java Using Bytecode

Measuring Wait and Service Time in Java Using
Bytecode Instrumentation
Amol Khanapurkar, TCS
Chetan Phalak, TCS
Agenda
• Problem Statement – Measuring Response Time
components
• Available Solutions
• Our solution
• Validating solution
•Q & A
2
Problem Statement
• Code comprises of Serial and Parallel portions. Serial portions limit scalability.
• Larger serial portion manifests in form of loss of throughput / lack of scalability
3
Problem Statement
• For purpose of this discussion, lets assume that code is inherently multi-threaded and serial
portion of the code is implemented using Critical Section.
A Critical Section in Java looks as below
synchronized(_lock)
{
try
{
sharedVal++;
}catch(Exception e)
{
e.printStackTrace();
}
}
4
Acquire a lock
Thread-safe operation
Problem Statement - Response Time Components
Response Time measurements in multi-threaded environment involving Critical Sections
1
1
2
3
2
3
1
2
3
4
5
6
6
7
7
Critical Section
1
Service Time
Wait Time
1
2
4
5
Threads
1
2
3
4
5
6
7
5
Problem Statement – Service and Wait Time
• Java provides 3 mechanisms to manage concurrency
– Synchronized Block
– Synchronized Object
Out-of-Scope
– Synchronized Method
public void inc(){
long t1 = System.currentTimeMillis();
synchronized(_lockA1){
long t2 = System.currentTimeMillis();
try{
sharedVal++;
}catch(InterruptedException e){
e.printStackTrace();
}
long t3 = System.currentTimeMillis();
}
long t4 = System.currentTimeMillis();
}
6
Where,
• (t4 - t1) is the Response Time,
• (t3 - t2) is the Service Time and
• (t2 - t1) is the Wait Time
Also, in practice t3 = t4
Alternate Solutions
• Logging - Insert probes in source code using logging techniques
– Logging has its own overheads
– After the logs have been written to, these logs will need to be crunched programmatically
to get the desired information
– Even for simple programs, the crunching program can get complex because it has to tag
the appropriate timestamps to appropriate threads
– For complex programs involving nested synchronized blocks (e.g. code that implements
2-phase commits), the crunching program can quickly become more complex and may
require significant development and testing time
– This method will fail in cases where source code is not available
• Buy tools - Very few commercial tools provide this data.
7
Our Solution
• Our objective in building a state machine is to get the following details about a Synchronized
Block.
– Location of the block i.e. which class and which method is synchronized()
– Variable name on which this block is synchronized()
– Breakup of synchronized() block response time into wait and service time components
• We take same approach as discussed in Logging section in previous slide to take our
measurements except that the probes are inserted using Bytecode Instrumentation
techniques.
• We call the Bytecode instrumentation infrastructure, the State Machine. Lets see how the
state machine works.
8
Introduction to Bytecode Instrumentation
• Java bytecode is the instruction set of the Java virtual machine. Each bytecode is
composed by one, or in some cases two, bytes that represent the instruction (opcode), along
with zero or more bytes for passing parameters. Of the 256 possible byte-long opcodes, 198
are currently in use, 51 are reserved for future use. : Source Wikipedia
• Bytecode instrumentation is the process of reading the compiled code and adding /
transforming the existing code into new code with additional behavior(s). The input and
output of the instrumentation process, both is the bytecode itself.
• We use ObjectWeb ASM library to do the instrumentation. ASM library has 2 APIs – The
Streaming API and the Tree API. We use the Streaming API to do the instrumentation.
• Load-time instrumentation is convenient, however in our prototype implementation, due to
time constraints, we did compile-time instrumentation. Conversion is trivial and is just a
matter of implementing the right API calls which provides hooks for load-time
instrumentation.
9
Introduction to Bytecode Instrumentation
public void inc(){
long t1 = System.currentTimeMillis();
synchronized(_lockA1){
long t2 = System.currentTimeMillis();
try{
sharedVal++;
Thread.sleep(10);
}catch(InterruptedException e){
e.printStackTrace();
}
long t3 = System.currentTimeMillis();
}
long t4 = System.currentTimeMillis();
}
Fig 1. Source Code
Fig 2. Javap depiction of bytecode
10
public void inc();
Code:
0: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J
3: lstore_1
4: aload_0
5: getfield
#3; //Field _lockA1:Ljava/lang/Object;
8: dup
9: astore_3
10: monitorenter
11: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J
14: lstore 4
16: aload_0
17: dup
18: getfield
#5; //Field sharedVal:I
21: iconst_1
22: iadd
23: putfield
#5; //Field sharedVal:I
26: ldc2_w #6; //long 10l
29: invokestatic #8; //Method java/lang/Thread.sleep:(J)V
32: goto 42
35: astore 6
37: aload 6
39: invokevirtual #10; //Method java/lang/InterruptedException.printStackT
race:()V
42: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J
45: lstore 6
47: aload_3
48: monitorexit
49: goto 59
52: astore 8
54: aload_3
55: monitorexit
56: aload 8
58: athrow
59: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J
62: lstore_3
63: return
Exception table:
from to target type
16 32 35 Class java/lang/InterruptedException
11
52
49
56
52 any
52 any
Bytecode instrumentation challenges
• The bytecode will change when source code changes. How to write instrumentation logic
such that measurements become source code agnostic?
• Java Source code and individual Java opcodes are well described in the literature. But
mapping between source code and opcode is not well described. It’s a standard practice to
use javap utility of the JDK to figure out what bytecode gets generated for a given source
code.
• The JVM specification is prescriptive i.e. it talks only about what should be the behavior of a
JVM. It does not talk about How a particular behavior should be implemented. Hence
ensuring portability of the solutions across JDK vendors is non-trivial.
• Steep learning curve (Java opcode to API call mapping), hence easy to make mistakes.
• Trial and error method was the order of the day to build an empirical model of how the Sync
Block in Java works.
11
State Machine – The Empirical Model
public void inc();
Code:
0: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J
3: lstore_1
4: aload_0
5: getfield
#3; //Field _lockA1:Ljava/lang/Object;
8: dup
9: astore_3
10: monitorenter
11: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J
14: lstore 4
16: aload_0
17: dup
18: getfield
#5; //Field sharedVal:I
21: iconst_1
22: iadd
23: putfield
#5; //Field sharedVal:I
26: ldc2_w #6; //long 10l
29: invokestatic #8; //Method java/lang/Thread.sleep:(J)V
32: goto 42
35: astore 6
37: aload 6
39: invokevirtual #10; //Method java/lang/InterruptedException.printStackT
race:()V
42: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J
45: lstore 6
47: aload_3
48: monitorexit
49: goto 59
52: astore 8
54: aload_3
55: monitorexit
56: aload 8
58: athrow
59: invokestatic #4; //Method java/lang/System.currentTimeMillis:()J
62: lstore_3
63: return
Call Stack
Event of interest
invokestatic
getfield
public void inc(){
long t1 = System.currentTimeMillis();
lstore
astore
aload
monitorenter
synchronized(_lockA1){
long t2 = System.currentTimeMillis();
try{
sharedVal++;
Thread.sleep(10);
}catch(InterruptedException e){
e.printStackTrace();
}
long t3 = System.currentTimeMillis();
}
long t4 = System.currentTimeMillis();
}
getfield
monitorexit
12
dup
astore
monitorenter
….
….
monitorexit
State Machine – The Empirical Model
getfield
astore
monitorenter
monitorexit
Implementation of the State Machine revolves
largely around these 4 Opcodes
The implementation uses 2 instrumentation adapters
1. ResponseTime Adapter
2. ServiceTime Adapter
getfield
• Sets
a flag tobytrue
bytecode
astore 1. Instrumentation Adapters change the state of •the
Takes timestamp (T1)
adding / inserting
their own probes.
ResponseTime
Adapter
monitorenter
2. When more than one adapters are involved, it•Takes
is the responsibility
timestamp (T3)
monitorexit
of the developer to ensure proper order and coordination between
two adapters.
getfield
• Checks if flag is true
thetakes
bytecode.
• Ifout
true,
timestamp (T2)
astore 3. In our solution, we do one instrumentation and write
This bytecode becomes input for the second adapter. • sets flag to false
ServiceTime Adapter
monitorenter
13
State Machine – Implementation nuts & bolts
The ASM API provides the following hooks for events (of interest to us) to be
generated
• visitInsn() :- For monitorenter and monitorexit
• visitVarInsn() :- For astore
• visitFieldInsn() :- For getstatic and getfield
• visitMethodInsn() :- For class and method names
The adapters provide implementations of these methods to achieve the desired
behaviour.
Once all timestamps are embedded in the code, at runtime the code logs the
following tuple to in-memory data structure - <tid, time, locationId> where
• tid - Thread Identifier
• time – Timestamp
• locationID - a combination of class, method and exact location (say line
number or variable on which synchronization happened).
Tuples are then crunched to get Service Time and Wait Time components of
Response Time
14
15
Testing and Validating Solution
1. Theory-based programs which demonstrate computer science
principles or concepts
• Producer-Consumer problem
• Dining Philosopher problem
• Cigarette Smokers problem
• M/M/1 Queues
2. Custom programs which are comparable to code that gets written in IT
industry today
• Database connection pooling code
• Update account balance for money transfer transaction (2-phase
commit)
These programs check the qualitative and quantitative correctness of
the code that passes through the state machine.
16
Testing - Producer–Consumer Problem
Consumer1
Consumer2
Consumern
Buffer Full
Common Buffer
Buffer Empty
Producer1
Producer2
17
Producern
Previous Bytecode Instrumentation Benchmarks (~2011)
Overheads are constant
in the range of
10 – 30 microseconds
Fig. 4 Jensor overheads on response times
Y-axis : Logarithmic scale from 1 microsecond to 10K seconds
X-axis : Samples in increasing order of payload
18
- 18 -
Takeaways
• It is possible to decompose Response Time at critical section
into its Service Time and Wait Time components using BCI
• Having visibility into Wait times can enable tuning /
optimization / redesign of the critical section of the code
• Having visibility into Service time can enable better modeling
of Java applications
• Our State machine based approach is capable of getting
values for above metrics
• Other researchers have built utilities on top of this
implementation to predict performance in presence of
software and hardware resource bottlenecks
19
Q&A
20