Load a Constant 1 of 4

Transcript Load a Constant 1 of 4

Assignment

One of the simplest operations in C is to assign a
constant to a variable:
int x;
x = 10;

The variable x will contain the value 10 decimal.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
1
Load a Constant 1 of 4




We will use register A1 to hold variable x:
In assembly language we write:
MVK 10, A1;
The instruction MVK moves (copies) the constant 10
into register A1
Register A1 now contains 00000000Ah
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
2
Load a Constant 2 of 4


The correct syntax is number then register. The number
can also be in hexadecimal:
MVK 10, A1; 
MVK 0xA, A1; 
MVK 0Ah, A1; 
Do not use #
MVK #10, A1; X
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
3
Load a Constant 3 of 4


To load full 32 bits of a register needs 2 instructions:
MVK 0x5678, B2;
MVKLH 0x1234, B2;
Register B2 now contains 12345678h
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
4
Load a Constant 4 of 4

To load the full 32 bits of a register with 0 (zero)
requires only a single instruction:
ZERO B2;

Register B2 now contains 00000000h
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
5
Incrementing a Register 1 of 2

To increment a variable in C we can write:
int x;
x++;

This adds 1 to the value of the variable x
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
6
Incrementing a Register 1 of 2

In assembly language we use the instruction ADDK (add
constant)
ADDK 1, A2 ;

This adds the constant 1 to the contents of register A2
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
7
Decrementing a Register 1 of 2

To decrement a variable in C we can write:
int x;
x--;

This subtracts 1 from the variable x.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
8
Decrementing a Register 2 of 2

In assembly language we again use the instruction
ADDK (add constant)
ADDK -1, A2;


This adds the constant -1 to the contents of register A2
There is no such instruction as SUBK
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
9
No Operation

The TMS320C6000 provides an instruction that does
nothing except take time. This is called NOP (No
operation)
NOP ;

If we want to execute 4 NOP instructions one after
another we can write:
NOP 4;

This instruction can be used to generate time delays.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
10
Topic Two

Controlling program flow
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
11
Testing Conditions 1 of 5


The if-else construct is widely used in C.
Consider the following simple piece of
code:
int x, y ;
if ( x != 0 )
{ y++; }

This means if x is not equal to zero, then increment
variable y.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
12
Testing Conditions 2 of 5

The assembler provides a neat way to do this. Assuming
x is stored in register A1 and y is stored in register A2:
[A1]

ADDK 1, A2;
The term in [ ] is the condition to be tested. If the
condition is A1 is not equal to zero is true, add 1 to the
value in A2. Otherwise do nothing.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
13
Testing Conditions 3 of 5

Consider another piece of C code:
int x, y ;
if ( x == 0 )
{ y--; }

This means if x is equal to zero, decrement y.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
14
Testing Conditions 4 of 5

Again the assembler provides a neat way to do this.
Assuming x is stored in register A1 and y is stored in
A2:
[!A1]

ADDK -1, A2;
The term in [ ] is the condition to be tested. If the
condition is A1 is equal to zero is true, add -1 to the
value in A2. Otherwise do nothing.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
15
Testing Conditions 5 of 5

The test can use register A1, A2, B0, B1 or B2:
180909
ajay patil
10,
10,
10,
10,
10,
A2;
A2;
A2;
A2;
A2;

X
X


[A1]
[A0]
[A3]
[!B0]
[B1]
MVK
MVK
MVK
MVK
MVK
[B3]
MVK 10, A2; X
TMS320C6000 Assembly Language
and its Rules
16
Branch Instructions 1 of 3

Program execution can forced to a different place using
the B (branch) instruction:
label: B label;

When the B (branch) is reached, the next instruction to
be executed will be at the address label. It is similar
to the goto instruction in C.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
17
Branch Instructions 2 of 3

Rather than using a label with the instruction B, a
register can be used.
MVKH label, B3;
MVKL label, B3;
B B3;
This method is used by the C compiler, usually with
B3, to return from a function.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
18
Branch Instructions 3 of 3

The instruction B can also be combined with a test for
a condition.
label: ADDK 1, A3;
[A1] B label;
When the B (branch) is reached, the next instruction to
be executed will be at the address label, but only if
A1 is non-zero.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
19
Implementing a Delay Loop 1 of 2

In C, a delay loop can be implemented using the dowhile construct:
int i = 10;
do {
i--;
} while (i != 0);
We start with i = 10. Every time through the loop i
is decremented. When i == 0 then the loop
terminates.
TMS320C6000 Assembly Language
180909

ajay patil
and its Rules
20
Implementing a Delay Loop 2 of 2

In assembly language, we can use A1 to hold i. This
can be decremented and tested:
MVK 10, A1 ; A1 = 10
loop: ADDK –1, A1; Decrement A1
[A1] B loop; Branch to loop
We start with A1 = 10. Every time through the loop A1
is decremented. When A1 == 0 then the loop
terminates.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
21
Topic Three

Allocating storage for variables and constants.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
22
To Declare a Variable 1 of 2

In C code, a 32-bit variable can be declared as follows:
int x;

In assembly language use:
x: .usect ".far",4, 4
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
23
To Declare a Variable 2 of 2

This means:


x:
A label. Where to find variable
.usect In un-initialised data
".far", Large memory model
4,
How many bytes

4;


180909
ajay patil
Align on 4-byte boundary
TMS320C6000 Assembly Language
and its Rules
24
To Declare a Buffer 1 of 2

In C code, a 32-element buffer can be declared as an
array:
int buffer[32];

In assembly language use:
buffer: .usect ".far",128, 4
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
25
To Declare a Buffer 2 of 2

This means:


buffer: A label. Where to find data
.usect In un-initialised data
".far", Large memory model
128,
How many bytes

4;


180909
ajay patil
Align on 4-byte boundary
TMS320C6000 Assembly Language
and its Rules
26
To Declare Constants 1 of 3

In C code, an array of constants can be declared as:
const int constants[5] =
{1,2,3,4,5};

This is an array of 5 read-only constants of value 1, 2,
3, 4 and 5.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
27
To Declare Constants 2 of 3

In assembly language use:
.sect
".const"
.align 4
coefficients:
.field
1, 32 ;
.field
2, 32 ;
.filed
3, 32 ;
.field
4, 32 ;
.field
5, 32 ;
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
28
To Declare Constants 3 of 3





Here .sect “.const” tells the linker where in
memory to store the values.
.align 4 means align on a 4-byte boundary.
The constants are found at the address
coefficients.
Each constant is declared as a field of a given value,
and size 32 bits.
.field
3, 32
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
29
Topic Four

Using pointers.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
30
Pointing to a Buffer 1 of 3

To set up a pointer to a buffer in C we write:
int buffer[32];
int *ptr;
ptr = &buffer[0];

The pointer ptr is given the address of the start of
the buffer.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
31
Pointing to a Buffer 2 of 3

When using TMS320C6000 assembly
language, it is usual practice to use the
following registers as pointers:

A4, A5, A6, A7
B4, B5, B6, B7


180909
ajay patil
These registers also support circular
addressing.
TMS320C6000 Assembly Language
and its Rules
32
Pointing to a Buffer 2 of 3

To use register A4 as the pointer to the
buffer:

buffer: .usect ".far",128,
4
MVKL buffer, A4
MVKH buffer, A4




180909
ajay patil
First instruction MVKL writes to the low
half of register A4
The second instruction MVKH writes to
the high half of register A4
TMS320C6000 Assembly Language
and its Rules
33
Moving Data to a Register 1 of 2

We can load a register with the 32-bit contents of a data
memory address. Assume that register A4 points to
buffer[0]
LDW *A4, A5;


The instruction LDW (load word) copies a word of data
from buffer[0] to register A5
Here W = word = 32 bits
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
34
Moving Data to a Register 2 of 2

The instruction LDW takes 4 cycles to get the data,
which makes it slow. Care is needed to wait the required
time, for example using 4 NOPs.
LDW *A4, A5;
NOP 4
; A5 not ready
ADDK 2, A5 ; A5 now ready
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
35
Moving Data from a Register

We can store the 32-bit contents of a register at an
address in data memory. Assume that register A5 points
to buffer[0]:
STW A4, *A5;


The instruction STW (store word) copies a word of
data from register A4 to buffer[0]. The data are
available immediately.
Here W = word = 32 bits
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
36
Operations on Pointers 1 of 4





Several pointer operations are possible in C:
*ptr++; Post-increment
*ptr--; Post-decrement
++*ptr; Pre-increment
--*ptr; Pre-decrement
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
37
Operations on Pointers 2 of 4





The same pointer operations are available in assembly
language:
*A4++; Post-increment
*A5--; Post-decrement
++*B6; Pre-increment
--*B7; Pre-decrement
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
38
Operations on Pointers 3 of 4

The pointer increment and decrement operators can be
used with load and store instructions.

Suppose we want to copy data from one place to
another. In C we might write:

for ( i = 0 ; i < 10 ; i++)
{ *ptr2++ = *ptr1++; }
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
39
Operations on Pointers 4 of 4

In assembly language, the part:
*ptr2++ = *ptr1++;

Could be written as:
LDW *A4++, A0;
NOP 4;
STW A0, *A5++;
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
40
Topic Five

Multiplications and Division
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
41
Multiplications 1 of 5

Multiplication is widely used in DSP for Finite Impulse
Response (FIR) filters and correlation.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
42
Multiplications 2 of 5

Multiply instructions use registers:

MPY A1, A2, A3;

Multiply the 16-bit value in register A1 by the 16-bit
value in register A2 and put the 32-bit product in
register A3.
In other words, A3 = A1 x A2

180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
43
Multiplications 3 of 5

Multiplication instructions can only use registers. They
cannot use pointer operations:

MPY A3, A4,
A5
MPY *A3, A4,
A5
MPY A3, *A4++, A5


180909
ajay patil

X
X
TMS320C6000 Assembly Language
and its Rules
44
Multiplications 4 of 5

The MPY instruction has one delay slot. This means
that the product is not available until 2 cycles after the
MPY instruction.
MPY A3, A4, A5;
NOP
; Wait 1 cycle
STW A5, *A4 ; Store product

It may be necessary to follow the MPY instruction with
a NOP.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
45
Multiplications 5 of 5


For multiplications by powers of 2, for example 2, 4, 8,
16, 32 etc, use the instruction SHL (Shift Left).
SHL A3, 1, A3; Multiply by 2
SHL A4, 2, A4; Multiply by 4
SHL B5, 3, B5; Multiply by 8
SHL B7, 8, B7; Multiply by 256
This is a single-cycle instruction.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
46
Division


To divide by powers of 2, for example 2, 4, 8, 16, 32
etc, use the instruction SHR (Shift Right).
SHR B3, 1, B3; Divide by 2
SHR A4, 2, A5; Divide by 4
SHR B5, 3, A3; Divide by 8
SHR B7, 8, B7; Divide by 256
This is a single-cycle instruction.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
47
Topic Six

Introducing Delay Slots
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
48
Delay Slots 1 of 3



So far we have ignored the time it takes the processor
to implement an instruction.
In fact, the instruction B takes 6 cycles before the
branch actually occurs.
Rather than just waiting 6 cycles, the TMS320C6000
allows another 5 other instructions to be executed.
These are called delay slots.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
49
Delay Slots 2 of 3


For correct operation of the processor we need to put
5 NOPs (or other instructions) after the B instruction.
loop: B loop; 1 cycle
NOP; 1st delay slot
NOP; 2nd delay slot
NOP; 3rd delay slot
NOP; 4th delay slot
NOP; 5th delay slot
NOP; B taken here.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
50
Delay Slots 3 of 3

For correct operation of the next instruction, the delay
loop we saw earlier should be written as:
MVK 10, A1 ; A1 = 10
loop: ADDK –1, A1; Decrement A1
[A1] B loop; Branch to loop
NOP 5 ; 5 delay slots
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
51
Topic Seven

Writing an assembly language function callable
from C code.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
52
C Callable Function 1 of 6

Suppose we want to write the following
assembly language function that adds
together two numbers:
int sum ( int x, int y)
{return (x + y);}

180909
ajay patil
To use the function in C we might write:
int result;
TMS320C6000 Assembly
result
= Language
function (100,
and its Rules
53
C Callable Function 2 of 6

The C compiler implements the function as follows:


Parameter x is passed in A4
Parameter y is passed in B4
The return value is in A4:

The C function can be thought of as:

A4 sum ( A4, B4);

180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
54
C Callable Function 3 of 6

In assembly language we write:

.global _sum;
.sect “.text”;
.align 4;
_sum: ADD A4, B4, A4;
B B3 ;
NOP 5 ;
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
55
C Callable Function 4 of 6

The .global assembler directive makes the label
_sum available outside this module.
.global _sum;

Notice the underscore at the beginning of _sum. This
is a C compiler convention.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
56
C Callable Function 5 of 6

The line .sect “.text” puts the code in the code
segment.

The assembler directive .align 4 aligns the code on
a 32-bit boundary.

The instruction ADD A4, B4, A4 adds the value
in A4 to the value in B4 and puts the result in A4.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
57
C Callable Function 6 of 6
Finally, return from the function:
B B3 ;
NOP 5 ;


Just before the function sum() is called, the compiler
puts the return address in register B3.
Important: Do not change the register B3 inside the
function. The return address will be lost and the
program may well crash!
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
58
Allocating Local Variables 1 of 5

In C, variables are sometimes used only in a particular
function. These are local variables.
int my_function (void)
{
int x; // A local variable.
};

The variable x is only available within
my_function()
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
59
Allocating Local Variables 2 of 5

To allocate temporary storage for a 32-bit variable (4
bytes), subtract 4+4 from the Stack Pointer (SP) at the
beginning of the function.
.asg SP, B15 ; Make B15 the SP
function:
SUB SP -8, SP
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
60
Allocating Local Variables 3 of 5

To write to a local variable use:
STW A4,*+SP(4)

This stores the contents of register A4 at the data
memory location with a positive offset of 4 bytes from
SP.
This can be used as a “push” instruction.

180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
61
Allocating Local Variables 4 of 5

To read a local variable use:
LDW *+SP(4), A2

Here *+SP(4) means the contents of the memory
location at a positive offset of 4 bytes from SP.

This can be used as a “pop” instruction.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
62
Allocating Local Variables 5 of 5

At the end of the function, add the same number of
bytes to the SP to restore it to its original value.
ADDK 8, SP
B
180909
ajay patil
B3
; Return
TMS320C6000 Assembly Language
and its Rules
63
Topic Eight

Parallel Operations.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
64
Parallel Operations 1 of 4

We have already seen how to write constants to a
registers.
MVK 1234h, A4; 1 cycle
MVK 5678h, B4; 1 cycle

In this case we write the value 1234h to register A4
then write 5678h to register B4.

This takes 2 cycles.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
65
Parallel Operations 2 of 4

We can write the same instructions using the ||
operator:
MVK 1234h, A4;
|| MVK 5678h, B4;


In this case we write the value 1234h to register A4, at
the same time as we write 5678h to register B5.
This takes 1 cycle.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
66
Parallel Operations 3 of 4

We need one register from A0 to A15 and the other
from one of B0 to B15.
MVK 1234h, A4;
|| MVK 5678h, A5; X

We cannot perform parallel operations on two registers
from the same register bank.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
67
Parallel Operations 4 of 4




Parallel operations are very useful for stereo audio
processing.
We can process both the left channel and the right
channel at exactly the same time.
This means it can be as fast to process two channels as
it is to process one.
This cannot be done in C.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
68
Topic Nine

Using Circular Addressing for Circular Buffers.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
69
Circular Buffers 1 of 7

We can implement a circular buffer in C code as
follows:
int buffer[16];
static int * ptr; // Pointer
ptr = &buffer[0]; // Initialise
if ( ptr < &buffer[15] )
ptr++;
// Increment
else
// Language
Back to start
TMS320C6000 Assembly
180909 ptr = &buffer[0];
ajay patil
and its Rules
70
Circular Buffers 2 of 7



The TMS320C6000 can support circular buffers of size
8, 16, 32, 64, 128 etc bytes.
To set up a particular register for use as circular buffer,
we must configure the AMR (Address Mode Register).
At power up, the AMR contains 00000000h.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
71
Circular Buffers 3 of 7

To read the AMR register we must use the special
instruction MVC (move control register).
MVC AMR, B1;  Copy AMR to B1
MVC AMR, A1; X Must be B-side

To write to the AMR register we again use the special
instruction MVC (move control register).
MVC A1, AMR; Copy A1 to AMR
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
72
Circular Buffers 4 of 7

For example, to set up A7 for use in circular addressing,
we write the value 00050040h to the AMR (Address
Mode Register).
MVKL 00050040h, A2
MVKH 00050040h, A2
MVC A2, AMR ; Update AMR
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
73
Circular Buffers 5 of 7

The buffer has a size 16 * 4 = 64 bytes.
int buffer[16];

For circular addressing, the buffer must be aligned on a
boundary equal to the size of the buffer.
buffer: .usect ".far", 64, 64
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
74
Circular Buffers 6 of 7

To store the value of the ptr we can write:
ptr: .usect “.far”, 4, 4

At the beginning of program, set ptr to contain the
starting address of the buffer.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
75
Circular Buffers 7 of 7

To read a value from the buffer with increment of A7:
LDW *A7++,A3; Read from buffer

To write a value from a register back to the buffer :
STW A3, *A7; Write to buffer
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
76
Topic Ten

40-bit Operations.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
77
40-bit Operations 1 of 8



So far we have used registers A0 to A15 and B0 to B15
for 32-bit operations.
The TMS320C6000 also supports 40-bit maths.
Let us look at a simple addition.
ADD A2, A1:A0, A3:A2
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
78
40-bit Operations 2 of 8

Here A1:A0 is a 40-bit register.
8 bits are provided by A1
32 bits are provided by A0

ADD A2, A1:A0, A3:A2

This means: add the 40-bit value register pair A1:A0 to
32-bit register A2, then put the 40-bit result in register
pair A3:A2.


180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
79
40-bit Operations 3 of 8

40-bit operations are particularly useful for FIR filters,
which use a large number of multiplies and additions.

Suppose we are implementing a 64-element FIR filter
using 32-bit maths. To prevent overflow, we have to
divide each multiplication by 32 before adding. This can
mean a loss of accuracy.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
80
40-bit Operations 4 of 8

In C code, the divide by 32 is implemented as a shift
right 5 places:
int temp = 0; // 32-bit variable
for ( i = 0 ; i < 64 ; i++)
{
temp = input[i]*coeff[i];
result += (temp >> 5);
}
result >>= 10;
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
81
40-bit Operations 5 of 8

Using 40-bit maths, there is no need to perform a
division before each addition:
long temp = 0; // 40 bits
for ( i = 0 ; i < 64 ; i++)
{
temp = input[i]*coeff[i];
}
temp >>= 15;
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
82
40-bit Operations 6 of 8

The FIR implementation becomes:
MVK 64, B1
180909
ajay patil
loop: LDW *B4++, A0 ; B4 ->
coeffs
LDW *B5++, A1 ; B5 ->
inputs
NOP 4
MPY A0, A1, A2 ; Multiply
ADDK –1, B1
ADD A2, A3:A2,A3:A2 ;
Accumulate
TMS320C6000 Assembly Language
and its Rules
83
[B1]
B loop
40-bit Operations 7 of 8

When performing 40-bit operations, the
instruction ADDK cannot be used. The
instruction ADD must be used instead:
ADDK 1, A1:A0
X
ADD 1, A1:A0, A1:A0


180909
ajay patil
Similarly, the instruction SUB must be
used to subtract from a 40-bit register:
ADDK -1, A5:A4
SUB 1, A5:A4, A5:A4
TMS320C6000 Assembly Language
and its Rules
X

84
40-bit Operations 7 of 8




180909
ajay patil
When converting a 40-bit value to 32bits, it is wise to use the SAT (saturate)
instruction to prevent the sign changing:
SAT
A1:A0, A4
Suppose A1:A0 contains
00 FFFF FFFFh. This is a positive
number.
However, the 32-bit value in A0 is
TMS320C6000 Assembly Language
its Rules
85
FFFF and
FFFFh.
This is a negative number.
Topic Eleven

Optimising assembly code for speed.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
86
Optimising Code 1 of 5

Let us start with a very simple C function that copies
one block of data to another:
void copy(int* p1, int* p2, int size)
{
while (size--)
{
*p2++ = *p1++;
}
}
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
87
Optimising Code 2 of 5

180909
ajay patil

In assembly language we could write:
loop: LDW *A4++, A0
NOP 4
STW A0, *B4++
ADDK –1, A1
[B1] B loop
NOP 5
B B3
NOP 5
TMS320C6000 Assembly Language
and its Rules
This takes 144 cycles to execute.
88
Optimising Code 3 of 5
We can move the ADDK and B loop instructions
upwards:
loop: LDW *A4++, A0
ADDK –1, A1
[B1] B loop
NOP 2 ; Lose 2 NOPs here
STW A0, *B4++
NOP 2 ; Lose 2 NOPs here
B B3
NOP 5

180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
89
Optimising Code 4 of 5
We can also move the B B3
instruction upwards:
loop: LDW *A4++, A0
ADDK –1, A1
[A1] B copy
|| [!A1]B B3 ; Add test
NOP 1
STW A0, *B4++
NOP 3

180909
ajay patil

TMS320C6000 Assembly Language
and its Rules
This takes 93 cycles to execute.
90
Optimising Code 5 of 5



The optimised version runs 35% faster than the unoptimised version.
The only downside is that the code is harder to read
and debug.
It is therefore recommended that the code is written
and tested, then optimised and re-tested again.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
91
Topic Twelve

A typical application of assembly language.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
92
Implementing a Stereo FIR Filter 1 of 6



We will design a stereo Finite Impulse Response (FIR)
filter that processes both the left and right hand audio
channels at exactly the same time. It will use 64
coefficients.
We will bring together several techniques explained
earlier.
Compare performance with C code version.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
93
Implementing a Stereo FIR Filter 2 of 6

We will start with a C callable function.
int stereo_FIR (const int *,
int x,
int y )
{
};


const int * points to the filter coefficients.
Here x and y are the inputs.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
94
Implementing a Stereo FIR Filter 3 of 6

The C function has register usage as follows:
A4 stereo_FIR ( A4, B4, A6 )



A4 points to the coefficients.
B4 contains x. A6 contains y.
Note that there can only be one return value, so A4 will
contain both outputs.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
95
Implementing a Stereo FIR Filter 4 of 6

We can start with the same code we used for 40-bit
operations, but modified for buffer1.
MVK 64, B1
loop: LDW *A4++, A0 ; B4 -> coeffs
LDW *A5++, A1 ; A5 -> buffer1
NOP 4
MPY A0, A1, A2 ; Multiply
ADDK –1, B1
ADD A2, A7:A6, A7:A6; Accumulate
[B1] B loop
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
96
Implementing a Stereo FIR Filter 5 of 6

Now we add parallel operations for the second channel
using B instead of A:
loop: LDW *A4++, A0 ; B4 -> coeffs
LDW *A5++, A1 ; A5 -> buffer1
|| LDW *B5++, B1 ; B5 -> buffer2
NOP 4
MPY A0, A1, A2 ; Multiply
|| MPY B0, B1, B2 ;
ADDK –1, B1
ADD A2,A7:A6,A7:A6 ; Accumulate
|| ADD B2,B7:B6,B7:B6 ;
[B1] B loop
TMS320C6000 Assembly Language
180909
ajay patil
and its Rules
97
Implementing a Stereo FIR Filter 6 of 6


The operations on the B registers is done exactly at the
same time as those on the A registers.
The full assembly code for the stereo FIR filter is given
in the files FIR_filters_asm.asm and
FIR_filters_asm.h
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
98
Topic Twelve

Some other information.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
99
Other Instructions




Besides the instructions given here, the C67xx and
C64xx have additional assembly language instructions.
The C67xx has floating point instructions.
The C64xx has more registers and supports 32-bit
maths.
See the References section for details.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
100
References


TMS320C6000 CPU and Instruction Set Reference
Guide SPRU189.
TMS320C6000 Assembly Language Tools User's
Guide SPRU186.
180909
ajay patil
TMS320C6000 Assembly Language
and its Rules
101

Load a Constant 1 of 4

Transcript Load a Constant 1 of 4

Directory