Analytic Models and Empirical Search: A Hybrid Approach to Code

Download Report

Transcript Analytic Models and Empirical Search: A Hybrid Approach to Code

Analytic Models and Empirical
Search: A Hybrid Approach to
Code Optimization
A. Epshteyn1, M. Garzaran1, G. DeJong1,
D. Padua1, G. Ren1, X. Li1,
K. Yotov2, K. Pingali2
1 University
of Illinois at Urbana-Champaign
2 Cornell University
Two approaches to code
optimization:
• Empirical Search
• Models
– E.g., execute and
– E.g., calculate the
measure different
best tile size for MM
versions of MM code
as a function of cache
with different tile sizes.
size.
– Slow
– Fast
– Accurate because of
– May be inaccurate
feedback
– No verification
through feedback
Hybrid Approach
• Faster than empirical search
• More accurate than the model
– Use the model as a prior
– Use active sampling to minimize the amount
of searching
Why is Speed Important?
• Adaptation may have to be applied at
runtime, where running time is critical.
• Adaptation may have to be applied at
compile time (e.g., with feedback from a
fast simulator)
• Library routines can be used as a
benchmark to evaluate alternative
machine designs.
Problem: Matrix Multiplication
• Tiling
– Improves the locality of references
• Cache Blocking (NB): Matrix is decomposed into smaller
subblocks of size NBxNB
• Matrix multiplication - illustrative example for testing the
hybrid approach
• Ultimate goal: a learning compiler that specializes itself to
its installation environment, user profile, etc.
Empirical Search: ATLAS
• Try tiling parameters NB in the range
16... min( 80, L1 cache size ) in steps of 4
Model (Yotov et. al.)
• Compute NB which optimizes the use of the L1 cache.
• Constructed by analyzing the memory access trace of the
matrix multiplication code.
• Formula:
max NB
 NB 2



NB
L1 size
such that 

3
*

1




L
1
line
size
L
1
line
size
L1 line size




• Has been extended to optimize the use of the L2 cache
Model in action:
• Performance curve:
• Vertical lines:
model-predicted L1
and L2 blocking
factors
• Whether to tile for
the L1 or the L2
cache depends on
the architecture
and the application
Hybrid approach
• Model performance
with a family of
regression curves
• Regression (nonparam)
– minimizing the
average error
• Regression (ML)
– Distribution over
regression curves
– Pick the most likely
curve
Regression (Bayesian)
• Prior distribution P(curve) over regression
curves
– Make regression curves with model-predicted maxima
more likely
• Posterior distribution given the data (Bayes rule):
– P(curve|data)=P(data|curve) P(curve)/P(data)
• Pick the maximum a-posteriori curve
– Picks curves with peaks in model-predicted locations
when the data sample is small
– Picks curves which fit the data best when the sample
is large
Active sampling
•
Objectives:
1) Sample at lower-tile sizes – takes less time
2) Explore – don’t oversample in the same
region
3) Get information about the dominant peak
Solution: Potential Fields
objectives 1,2
• Positive charge at
the origin
• Negative charges
at previously
sampled points
• Sample at the point
which minimizes
the field
Potential Fields
objective 3
• Positive charge in the region of the dominant
peak
• How do we know which peak dominates:
– Distribution over regression curves

• can compute:
P(peak1 is located at x), P(peak2 is located at x),
P(peak1 is of height h), P(peak2 is of height h)
• Hence, can compute P(peak1 dominates peak2)
• Impose a positive charge in the region of each peak
proportional to its probability of domination
Results I – Regression Curves
Results II – Time, Performance
Time (mins)
Performance (MFLOPS)
Model
Hybrid
ATLAS
Sparc
376.66
851.04
832.63
SGI
499.81
553.15
505.4
Model
Hybrid
ATLAS
Sparc
0:00
3:12
8:59
SGI
0:00
14:02
59:00
• Sparc – actual improvement due to the hybrid search for NB: ~10%
• SGI – improvement over both the model and ATLAS due to choosing to
tile for the L2 cache
Results III – Library Performance
Conclusion
• Approach: incorporates the prior.
• Active sampling: actively picks to sample
in the most informative region.
• Decreases the search time of the empirical
search, improves on the model’s
performance.