Coding highly parallel instructions on TigerSHARC
Download
Report
Transcript Coding highly parallel instructions on TigerSHARC
Generation of highly parallel
code for TigerSHARC
processors
An introduction
Background assumed
Familiarity with TigerSHARC
architecture
Familiarity with TigerSHARC
programmer’s model for registers
Some assembly experience
An interest in beating the compiler
in those special cases when you
need the last drop of blood out of
the CPU :-)
2 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
To be tackled
What’s causing the problem
– General limitations of instruction sets
How to recognize when you might
be coming up against TigerSHARC
architecture limitations
A process for optimizing the
TigerSHARC parallelism
– Example -- Temperature conversion
– Bonus if time permits
-- Average and instantaneous power
3 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
When are DSP instructions valid?
You are going to customize
– When can you use the DSP instructions?
– Most -- From Monday to Friday
– Some Only between 9:00 a.m. and 9:00 p.m.
Check against architecture
MIMD -- Parallel ops MUST be able to do this
–
–
–
–
Can it be fetched in one cycle (1 instruction line)
Can it be executed in one cycle (resource question)
Can it execute without conflicting with other instructions?
Then PROBABLY legal
HOWEVER -- The designers had the final decision
and you have to live by that decision!
4 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
Under best conditions
If instruction described the right way
– 2 data memory access (in or out) with a
REQUIRED post modification operation
possibly with a modify register containing the
value 0
– 1 add compute operation on data registers
– 1 multiply compute operation on data
registers
– Ability to redo code using both X and Y
– Sometimes – audio for example – do left
channel in X and right channel in Y
5 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
Introduction to PPPPIC
Professor’s Personal Process for
Parallel Instruction Coding
Basic code development -- any system
Write the “C” code for the function
void Convert(float *temperature,
float *result, int N)
which converts an array of temperatures
measured in “Celsius” (Canadian Market)
to “Fahrenheit” (Tourist Trade)
Convert the code to TigerSHARC assembly
code, following the standard coding and
documentation practices, or just use the
7 / 45 + B14
Introduction
to highly parallel
M. Smith and S. Lei Contact [email protected]
compiler
toTigerSHARC
do the code
jobCopyright
for you
4/8/2016
Standard “C” code
void Convert(float *temperature, float *result, int N)
{
int count;
for (count = 0; count < N; count++) {
*result = (*temperature) * 9 / 5 + 32;
temperature++;
result++;
}
8 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
Process for developing custom code
Rewrite the “C” code using “LOAD/STORE”
techniques TigerSHARC is essentially superscaler RISC
Write the assembly code using a hardware loop
– Check that end of loop label is in the correct place
REWRITE the assembly code using registers
and instructions that COULD be used in parallel
IF you could find the correct optimization
approach
Move algorithm to “Resource Usage Chart”
Optimize (Attempt to)
Compare and contrast time -- include set up and
loop control time -- was it worth the effort?
9 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
TigerSHARD-style
load/store “C” code
void Convert(register float *temperature,
register float * answer, register int N) {
register int count;
register float scratch;
for (count = 0; count < N; count++) {
scratch = * temperature;
scratch = scratch * (9 / 5);
scratch = scratch + 32;
*answer = scratch;
temperature++;
answer++;
}
10 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
11 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
12 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
13 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
14 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
15 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
16 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
17 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
18 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
19 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
20 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
21 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
22 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
23 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
24 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
25 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
26 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
27 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
28 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
29 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
30 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016
31 / 45 + B14
Introduction
to highly parallel TigerSHARC code Copyright M. Smith and S. Lei Contact [email protected]
4/8/2016