Coding highly parallel instructions on TigerSHARC

Download Report

Transcript Coding highly parallel instructions on TigerSHARC

Generation of highly parallel
code for TigerSHARC
processors
An introduction
Background assumed
Familiarity with TigerSHARC
architecture
Familiarity with TigerSHARC
programmer’s model for registers
Some assembly experience
An interest in beating the compiler
in those special cases when you
need the last drop of blood out of
the CPU :-)
2 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
To be tackled
What’s causing the problem
– General limitations of instruction sets
How to recognize when you might
be coming up against TigerSHARC
architecture limitations
A process for optimizing the
TigerSHARC parallelism
– Example -- Temperature conversion
– Bonus if time permits
-- Average and instantaneous power
3 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
When are DSP instructions valid?
You are going to customize
– When can you use the DSP instructions?
– Most -- From Monday to Friday
– Some Only between 9:00 a.m. and 9:00 p.m.
 Check against architecture
 MIMD -- Parallel ops MUST be able to do this
–
–
–
–
Can it be fetched in one cycle (1 instruction line)
Can it be executed in one cycle (resource question)
Can it execute without conflicting with other instructions?
Then PROBABLY legal
 HOWEVER -- The designers had the final decision
and you have to live by that decision!
4 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
Under best conditions
 If instruction described the right way
– 2 data memory access (in or out) with a
REQUIRED post modification operation
possibly with a modify register containing the
value 0
– 1 add compute operation on data registers
– 1 multiply compute operation on data
registers
– Ability to redo code using both X and Y
– Sometimes – audio for example – do left
channel in X and right channel in Y
5 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
Introduction to PPPPIC
Professor’s Personal Process for
Parallel Instruction Coding
Basic code development -- any system
Write the “C” code for the function
void Convert(float *temperature,
float *result, int N)
which converts an array of temperatures
measured in “Celsius” (Canadian Market)
to “Fahrenheit” (Tourist Trade)
 Convert the code to TigerSHARC assembly
code, following the standard coding and
documentation practices, or just use the
7 / 45 + B14
Introduction
to highly parallel
M. Smith and S. Lei Contact [email protected]
compiler
toTigerSHARC
do the code
jobCopyright
for you
4/8/2016
Standard “C” code
void Convert(float *temperature, float *result, int N)
{
int count;
for (count = 0; count < N; count++) {
*result = (*temperature) * 9 / 5 + 32;
temperature++;
result++;
}
8 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
Process for developing custom code
 Rewrite the “C” code using “LOAD/STORE”
techniques TigerSHARC is essentially superscaler RISC
 Write the assembly code using a hardware loop
– Check that end of loop label is in the correct place
 REWRITE the assembly code using registers
and instructions that COULD be used in parallel
IF you could find the correct optimization
approach
 Move algorithm to “Resource Usage Chart”
 Optimize (Attempt to)
 Compare and contrast time -- include set up and
loop control time -- was it worth the effort?
9 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
TigerSHARD-style
load/store “C” code
void Convert(register float *temperature,
register float * answer, register int N) {
register int count;
register float scratch;
for (count = 0; count < N; count++) {
scratch = * temperature;
scratch = scratch * (9 / 5);
scratch = scratch + 32;
*answer = scratch;
temperature++;
answer++;
}
10 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
11 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
12 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
13 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
14 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
15 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
16 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
17 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
18 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
19 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
20 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
21 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
22 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
23 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
24 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
25 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
26 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
27 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
28 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
29 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
30 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
31 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016