Transcript CF_bit
DEP : Detailed Execution Profile
Qin Zhao1, Joon Edward Sim2,
WengFai
Wong1,2
1SingaporeMIT
Alliance
2Department of Computer Science
National University of Singapore
{zhaoqin,esim,wongwf}@comp.nus.edu.sg
Larry Rudolph
SingaporeMIT
Alliance
Computer Science and Artificial
Intelligence
Laboratory
Massachusetts Institute of Technology
[email protected]
Chine -Cheng Wu
PAS Lab ,CSIE , NTU
Introduction
Previous work on profiling needs large memory space and
big times slowdown
DEP (detailed Execution Profile) captures the complete
dynamic control flow , data dependency and memory
reference at the same time
The profile size is significantly reduced
DEP uses DynamoRIO binary instrumentation framework to
profile in an infrastructure called Adept (A dynamic
execution profiling tool)
DEP Advantage
DEP complete coverage of the program including shared
libraries
Multi-threaded application can be collected by independent
DEPs
Collection is very efficient , incurring a 5 times slowdown
Profile contains memory reference and control flow
information
Control Flow Profile : DEPc
Traditional way to record basic
block entries using 4 byte for each
DEP use 2-byte for each and an
extra 2-byte if needed
H-tag for high 2 bytes
L-tag for low 2 bytes
This compressibility does not
guarantee space optimization
Memory References Profile : DEPm
Memory reference : {pc,addr,size,type}
PC of the memory reference instruction
Address of memory reference
Size of the data being accessed
If it’s a read or a write
Storing only the necessary values that
Memory Reference
There are three memory references above
Push ebp;
Mov 0 -> [esp+4];
Mov 0 -> [esp+8];
BB_pc+Mem_addr Compared to DEP
DEP trigger fewer analyzer calls than (BB_pc+Mem_addr)
cause of smaller profile data that reach overflow to signal
analyzer
Penalty includes
steal and restore registers
Address calculation
Storage of the address
Update profile counter
Extra overhead
Checking H-tag changes
Checking and updating register status
DynamoRIO
Running on IA-32 under both Linux and Windows
DynamoRIO executes applications by copying user code into
cache and then executing
Code is the same as original one except control operation
return to DynamoRIO
Trace cache will cache code for in-direct branch lookup
ADEPT :
A Dynamic Execution Profiling Tool
Control Flow : Obtaining DEPc
If the L-tag is 0x0000
Memory References: Obtaining DEPm
Two state of each register variable :
UPDATED , RECORDED
Profile Buffer
Store the collected profile for future analysis
One buffer for each thread
Using large buffer will reduce analyzer invocations
Profile buffer has two parts for DEPc and DEPm separately
20 % for DEPc , 80 % for DEPm works well
Analyzer is triggered by buffer full using OS signal of page
segmentation fault
Optimizing DEPc
Basic block 0x0804ffa4 branch to 0x08050000
Optimizing DEPm
Optimized
Evaluation
Platform : Dual-core 3.2GHz Intel Pentium D 840 ,
2GBytes of RAM
OS : Linux Fedora Core 4 and Windows XP SP2
Benchmarks : SPEC CPU2000 integer benchmarks for
Linux , SysMark 2004SE for windows ( run Access ,
PowerPoint and Word )
Compiler : gcc with -O3 flag
Execution Time
Relative slowdown
Profile Frameworks
Pin
Count number of basic blocks executed
Count number of memory references
Valgrind
Cachegrind is a cache profiler for capture the number of basic
blocks counts and memory references counts
eWPP (Extended Whole Program Paths)
Recording control flow and dependence information
Uses two-phase profiling approach
First phase , identify all memory dependence
Second phase , collection phase
Profile Size and Compressibility
* CF_bit uses bits and 4-byte target addresses for indirect branches
CF_bit not
compress well
Normalize by
uncompress
BB_pc size
Normalize by
uncompress
Mem_addr
Related Work
Whole Execution Traces (WET)
Simulation environment
Whole Program Paths (eWPP)
Encode trace information in WPP
Whole Program Paths (WPP)
They have difficulties to support multi-thread applications
Conclusion
DEP captures major program execution
Control flow , memory reference
DEP collected by Adept which can perform on-line or off-
line analysis
Adept builds the mapping between collected information and
original apps.
Experiment results show 5 times slowdown and save 40%
space compared to traditional profiles
Complete trace to recover whole program execution is not
necessarily , particular segment can be reproduced for
simulations or replay
Back-up Slides
Recovering memory reference trace
Using naïve approach of recovering the memory reference
trace from a DEP
Recovering Memory References
Tradeoff
Scenario 2 almost triple of native execution time
Scenario 1 : complete memory reference profile { pc,addr,size,type}
Scenario 2 : DEP collected by Adept