Memory Behavior of Data Structures
Download
Report
Transcript Memory Behavior of Data Structures
The Memory Behavior
of Data Structures
Kartik K. Agaram,
Stephen W. Keckler, Calvin Lin, Kathryn McKinley
Department of Computer Sciences
The University of Texas at Austin
Memory hierarchy trends
• Growing latency to main memory
• Growing cache complexity
– More cache levels
– New mechanisms, optimizations
• Growing application complexity
– Lots of abstraction
Hard to predict how an application
will perform on a specific system
2
Application understanding is hard
• Observations can generate Gigabytes of data
– Aggregation is necessary
• Current metrics are too lossy
– Different application behaviors →similar miss-rate
• New metrics needed, richer but still concise
Our approach: data structure
decomposition
3
Why decompose by data structure?
• Irregular app = multiple regular data
structures
– while (tmp) tmp=tmp->next;
• Data structures are high-level
– Results easy to visualize
– Can be correlated back to application source code
4
Outline
• Data structure decomposition using DTrack
– Automatic instrumentation + timing simulation
• Methodology
– Tools, configurations simulated, benchmarks
studied
• Results
– Data structures causing the most misses
– Different types of access patterns
– Case study: data structure criticality
5
Conventional simulation methodology
• Simulated application
shares resources with
simulator
– disk, file system, network
• ..but is not aware of it
Application
Simulator
Host Processor
Resources
6
A different perspective
Application
Simulator
Resources
Application can communicate with simulator
Leave core application oblivious; automatically add
simulator-aware instrumentation
7
DTrack
Application
Sources
Instrumented
Sources
Source
Translator
Application
Executable
Compiler
Detailed
Statistics
Simulator
- DTrack’s protocol for
application-simulator communication
8
DTrack’s protocol
1. Application stores mapping at a
predetermined shared location
– (start address, end address) → variable name
2.Simulator
Applicationnow
signals
simulator
somehow
knows
variable
names
•
•
We enhance ISA with new opcode
of address
regions
Other techniques
possible
3. Simulator detects signal, reads shared
location
9
DTrack instrumentation
Global variables: just after initialization
Before:
After:
int globalTime ;
int main () {
…
}
int Time ;
int main () {
print (FILE, “Time”, Time,
sizeof(Time));
…
asm (“mop”);
}
10
DTrack instrumentation
Heap variables: just after allocation
Before: x = malloc(4);
After:
x = malloc(4);
DTRACK_PTR = x ;
DTRACK_NAME = “x” ;
DTRACK_SIZE = 4 ;
asm(“mop”);
11
Design decisions
• Source-based rather than binary-based
translation
• Local variables – no instrumentation
– Instrumenting every call/return is too much
overhead
– Doesn’t cause many cache misses anyway
– Dynamic allocation on the stack: handle alloca
just like malloc
• Signalling opcode: overload an existing one
– avoid modifying compiler, allow running natively
12
Instrumentation can perturb app behavior
• Minimizing perturbance
– Global variables are easy
• One-time cost
– Heap variables
are hardcount <4%
Instruction
• DTRACK_PTR, etc. always hit in the cache
even with frequent malloc
• Measuring perturbance
– Communicate specific start and end points in
application to simulator
– Compare instruction counts between them with
and without instrumentation
13
Outline
• Data structure decomposition using DTrack
– Automatic instrumentation + timing simulation
• Methodology
– Tools, configurations simulated, benchmarks
studied
• Results
– Data structures causing the most misses
– Different types of access patterns
– Case study: data structure criticality
14
Methodology
• Source translator: C-Breeze
• Compiler: Alpha GEM cc
• Simulator: sim-alpha
– Validated model of 21264 pipeline
• Simulated machine: Alpha 21264
– 4-way issue, 64KB 3-cycle DL1
• Benchmarks: 12 C applications from SPEC
CPU2000 suite
15
zi
p
cc
30
0.
tw
ol
f
zi
p2
ar
se
r
25
6.
b
19
7.
p
m
p
ak
e
18
8.
am
18
3.
eq
u
18
1.
m
cf
17
9.
ar
t
17
7.
m
es
a
17
6.
g
17
5.
vp
r
16
4.
g
Major data structures by DL1 misses
90
80
70
60
50
40
30
20
10
0
16
Code + Data profile = Access pattern
art
mcf
f1[i]
i=i+1
bu[i]
i=i+1
node child
node parent
node sibling
node siblingp
node[i]
Large variety
in access patterns
twolf
i=i+1
t1 = b[c[i]cblock]
t2 = t1tileterm
t3 = n[t2net]
…
node = DFS(node)
i=rand()
17
Most misses ≣ Most pipeline stalls?
• Process:
– Detect stall cycles when no instructions were
committed
– Assign blame to data structure of oldest instruction
in pipeline
• Result
– Stall cycle ranks track miss count ranks
– Exceptions:
• tds in 179.art
• search in 186.crafty
18
Summary
• Toolchain for mapping addresses to highlevel data structure
– Communicating information to simulator
• Reveals new patterns about applications
– Applications show wide variety of distributions
– Within an application, data structures have a
variety of access patterns
– Misses not correlated to accesses or footprint
– ..but they correlate well with data structure
criticality
19