Transcript CF_bit

DEP : Detailed Execution Profile
Qin Zhao1, Joon Edward Sim2,
WengFai
Wong1,2
1SingaporeMIT
Alliance
2Department of Computer Science
National University of Singapore
{zhaoqin,esim,wongwf}@comp.nus.edu.sg
Larry Rudolph
SingaporeMIT
Alliance
Computer Science and Artificial
Intelligence
Laboratory
Massachusetts Institute of Technology
[email protected]
Chine -Cheng Wu
PAS Lab ,CSIE , NTU
Introduction
 Previous work on profiling needs large memory space and
big times slowdown
 DEP (detailed Execution Profile) captures the complete
dynamic control flow , data dependency and memory
reference at the same time
 The profile size is significantly reduced
 DEP uses DynamoRIO binary instrumentation framework to
profile in an infrastructure called Adept (A dynamic
execution profiling tool)
DEP Advantage
 DEP complete coverage of the program including shared
libraries
 Multi-threaded application can be collected by independent
DEPs
 Collection is very efficient , incurring a 5 times slowdown
 Profile contains memory reference and control flow
information
Control Flow Profile : DEPc
 Traditional way to record basic




block entries using 4 byte for each
DEP use 2-byte for each and an
extra 2-byte if needed
H-tag for high 2 bytes
L-tag for low 2 bytes
This compressibility does not
guarantee space optimization
Memory References Profile : DEPm
 Memory reference : {pc,addr,size,type}
 PC of the memory reference instruction
 Address of memory reference
 Size of the data being accessed
 If it’s a read or a write
 Storing only the necessary values that
Memory Reference
 There are three memory references above
 Push ebp;
 Mov 0 -> [esp+4];
 Mov 0 -> [esp+8];
BB_pc+Mem_addr Compared to DEP
 DEP trigger fewer analyzer calls than (BB_pc+Mem_addr)
cause of smaller profile data that reach overflow to signal
analyzer
 Penalty includes




steal and restore registers
Address calculation
Storage of the address
Update profile counter
 Extra overhead
 Checking H-tag changes
 Checking and updating register status
DynamoRIO
 Running on IA-32 under both Linux and Windows
 DynamoRIO executes applications by copying user code into
cache and then executing
 Code is the same as original one except control operation
return to DynamoRIO
 Trace cache will cache code for in-direct branch lookup
ADEPT :
A Dynamic Execution Profiling Tool
Control Flow : Obtaining DEPc
If the L-tag is 0x0000
Memory References: Obtaining DEPm
Two state of each register variable :
UPDATED , RECORDED
Profile Buffer
 Store the collected profile for future analysis
 One buffer for each thread
 Using large buffer will reduce analyzer invocations
 Profile buffer has two parts for DEPc and DEPm separately
 20 % for DEPc , 80 % for DEPm works well
 Analyzer is triggered by buffer full using OS signal of page
segmentation fault
Optimizing DEPc
Basic block 0x0804ffa4 branch to 0x08050000
Optimizing DEPm
Optimized
Evaluation
 Platform : Dual-core 3.2GHz Intel Pentium D 840 ,
2GBytes of RAM
 OS : Linux Fedora Core 4 and Windows XP SP2
 Benchmarks : SPEC CPU2000 integer benchmarks for
Linux , SysMark 2004SE for windows ( run Access ,
PowerPoint and Word )
 Compiler : gcc with -O3 flag
Execution Time
Relative slowdown
Profile Frameworks
 Pin
 Count number of basic blocks executed
 Count number of memory references
 Valgrind
 Cachegrind is a cache profiler for capture the number of basic
blocks counts and memory references counts
 eWPP (Extended Whole Program Paths)
 Recording control flow and dependence information
 Uses two-phase profiling approach
 First phase , identify all memory dependence
 Second phase , collection phase
Profile Size and Compressibility
* CF_bit uses bits and 4-byte target addresses for indirect branches
CF_bit not
compress well
Normalize by
uncompress
BB_pc size
Normalize by
uncompress
Mem_addr
Related Work
 Whole Execution Traces (WET)
 Simulation environment
 Whole Program Paths (eWPP)
 Encode trace information in WPP
 Whole Program Paths (WPP)
 They have difficulties to support multi-thread applications
Conclusion
 DEP captures major program execution
 Control flow , memory reference
 DEP collected by Adept which can perform on-line or off-
line analysis
 Adept builds the mapping between collected information and
original apps.
 Experiment results show 5 times slowdown and save 40%
space compared to traditional profiles
 Complete trace to recover whole program execution is not
necessarily , particular segment can be reproduced for
simulations or replay
Back-up Slides
Recovering memory reference trace
 Using naïve approach of recovering the memory reference
trace from a DEP
Recovering Memory References
Tradeoff
Scenario 2 almost triple of native execution time
Scenario 1 : complete memory reference profile { pc,addr,size,type}
Scenario 2 : DEP collected by Adept