ENCM515 -- Compare 68K and SHARC

Transcript ENCM515 -- Compare 68K and SHARC

Comparing 68k (CISC) with
21k (Superscalar RISC DSP)
M. R. Smith,
Electrical and Computer Engineering
University of Calgary, Alberta, Canada
smithmr @ ucalgary.ca
To be tackled today





When to use assembly code
Useful sub-set of 68K CISC instructions
Recap Effective addressing modes
Load/Store Programming style for 68K
Load/Store Architecture of 21K by
comparison with 68K
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
2 / 37
“Reminder”

Reuse the following ENCM415 concepts






Don’t use “Assembly Code” unless “really have” to
Write in “C/C++” whenever appropriate
Connect to the hardware “in assembler” using
instructions that always work -- RISC-like (MIPS)
Understand linkages between “assembly” and “C”
Customize “C” only when necessary
ENCM515


Basic requirement for “Custom DSP” code -- need to
know features of processor
Recognize that speed comes from instructions that work
only under special conditions because of processor
architectural constraints -- opcode size, bus availability
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
3 / 37
Very limited set of instructions used
in Assembly Code most of the time

Operational Instructions




MOVE
ADD, SUB
AND, OR
(FADD, FSUB)
Program Flow




BRA, JMP, JSR, RTS, TRAP
CMP, BNE, BEQ
BHI, HLO, BLS (unsigned branches)
BGE, BLT, BGT (signed branches)
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
4 / 37
Easiest way to program 68K in assembly



Have a PSP process to avoid the stupid mistakes that stop
you getting to the stuff that is worth doing
Never bother with the complex EA-mode instructions
 Don’t gain much any way
Program CISC as if had “LOAD/STORE” architecture like the
MIPS processor






MOVE memory to register
(LOAD)
MOVE register to memory
(STORE)
OPERATE register on register -- Memory access in FETCH only
Plus a few other non-RISC instructions that you find very useful to
use
(e.g. ADD.L #5, D0)
Customize for speed later -- if it is worth the effort
EASIER TO CUSTOMIZE when in this “simple” mode
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
5 / 37
Easiest way to program 21k in assembly



Have a PSP process to avoid the stupid mistakes that stop
you getting to the stuff that is worth doing
Never bother with the complex EA-mode instructions
 Don’t gain much any way
Program Superscalar RISC DSP which has “LOAD/STORE”
architecture like the MIPS processor PLUS DSP-special





MOVE memory to register
(LOAD)
MOVE register to memory
(STORE)
OPERATE register on register
Plus a few other non-RISC instructions that you find very useful to
use
(e.g. ADD.L #5, D0)
Customize for speed later -- if it is worth the effort
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
6 / 37
Some of the effective address modes for 68k MOVE

Register to Register -- RISC like

MOVE.L D1, D0
[D0] <- [D1] (31:0)
21k equivalent

Immediate to Register -- RISC like


MOVE.L #0x5000, D1
[D1] <- 0x5000 (31:0)
21k equivalent
R1 = 0x5000;
Memory to Register -- RISC like


R0 = R1;
MOVE.L 0x5000, D1
[D1] <- [M(0x5000)] (31:0)
21k equivalent R1 = dm(0x5000);
Memory to Memory -- CISC

MOVE.L 0x5000, 0x6000
21k equivalent
[M(0x6000)] <- [M(0x5000)]
(31:0)
4 animations
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
7 / 37
Look behind the instruction
at the architecture

68k --- MOVE.L D0, D0


Involves fetching the instruction (4 cycles) and then
everything else is done with out extra (slow) memory
operations
21k --- R0 = R1

Involves fetching the instruction (1 cycle) and then
everything else is done with out extra memory
operations. Pipelining issue
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
8 / 37
Look behind the instruction
at the architecture

68k --- MOVE.L #0x5000, D0


Involves fetching the instruction, (4 cycles) then
fetching the hi (4 cycles) and low (4 cycles)
components of the constant stored in program space
and then everything else is done with out extra memory
operations -- Really MOVE.L #0x00005000, D0
21k --- R0 = 0x5000

Involves fetching the instruction (1 cycle) and then
everything else is done with out extra memory
operations. More like MOVEQ.L #0x5000, D0 where
constant is built into the op-code
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
9 / 37
Look behind the instruction
at the architecture

68k --- MOVE.L 0x5000, D0


Involves fetching the instruction (4), then fetching the hi (4) and
low (4) components of the constant stored in program space, then
fetching the hi (4) and low (4) values from adjacent addresses in
data space and then everything else is done with out extra
memory operations. Again really MOVE.L 0x00005000,D0
21k --- R0 = dm(0x5000)

Involves fetching the instruction (1) and then later
fetching the value from data memory space (1). More
like MOVE.L (Address_temp), D0 with the address
register being preloaded during the instruction fetch.
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
10 / 37
Some of the effective address modes for ADD

Register to Register -- RISC like


R0 = R0 + R1;
ADD.L #0x5000, D1
21k equivalent
[D1] <- [D1] + 0x5000
Memory to Register -- CISC


[D0] <- [D0] + [D0]
Immediate to Register -- CISC


ADD.L D1, D0
21k equivalent
ADD.L 0x5000, D1
[D1] <- [D1] + [M(0x5000)]
Memory to Memory -- CISC

ADD.L 0x5000, 0x6000 -- illegal on 68K
[M(0x6000)] <- [M(0x6000)] + [M(0x5000)]
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
21k illegal too
2 animations
11 / 37
Look behind the instruction
at the architecture

68k --- ADD.L #0x5000, D0

Involves fetching the instruction (4), then fetching the
hi (4) and low (4) components of the constant stored in
program space and then doing addition during
“execution” phase. On the 68k the 32-bit add takes
extra cycles.

21k --- R1 = 0x5000; R0 = R1 + R0;

Involves fetching the two instructions and then
everything else is done with out extra memory
operations. More like MOVEQ.L #0x5000, D0
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
12 / 37
Basic LOAD/STORE operations

LOAD -- Memory to register
[Reg] <- [Memory(address)]
MOVE.L 0x5000, D1
[D1] <- [Memory(0x5000)]

CAREFULL!!!!
21k -- NOT QUITE
2 memory busses
R1 = dm(0x5000);
R1 = pm(0x5000);
F1 = dm(0x5000);
STORE -- Register to Memory
[Memory(address)] <- [Reg]
MOVE.L D1, 0x5000
[Memory(0x5000)] <- [D1]
7/7/2015
dm(0x5000) = R1;
pm(0x5000) = R1;
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
13 / 37
Basic LOAD/STORE operations

LOAD register with a constant
[Reg] <- constant value
MOVE.L #0x5000, D1
[D1] <- 0x5000
R1 = 0x5000;
CAREFULL!!!!
21k -- NOT QUITE
Can’t always make parallel
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
14 / 37
Basic Register-to Register operations

LOAD -- Register to register
[Reg] <- [Reg2]
CAREFULL!!!!
21k -- NOT QUITE
especially when parallel
MOVE.L D1, D0
R0 = R1;
[D0] <- [D1]
Sometimes R0 = pass R1; is better

Operation -- Register to register
[Reg] <- [Reg] Operation [Reg2]
ADD.L D1, D0
[D0] <- [D0] + [D1]
7/7/2015
R0 = R1 + R2;
is also possible on 21k
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
15 / 37
Basic 68k Register-to Register operations

Operation -- Register to register
[Reg] <- [Reg] Operation [Reg2]
ADD.L D0, D1
SUB.L D0, D1
AND.L D0, D1
OR.L D0, D1
CMP.L D0, D1
ASR #3, D0
LSR #3, D0
7/7/2015
[D1] <- [D1] + [D0]
[D1] <- [D1] - [D0]
[D1] <- [D1] & [D0]
[D1] <- [D1] | [D0]
[BB] <- [D1] - [D0]
[D0] <- [D0] >> 3 (signed)
[D0] <- [D0] >> 3 (unsigned)
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
16 / 37
Basic 21k Register-to Register operations

Operation -- Register to register
[Reg] <- [Reg1] Operation [Reg2]
[D1] <- [D2]
[D1] <- [D1]
[D1] <- [D1]
[D1] <- [D1]
Compare
[D0] <- [D0]
[D0] <- [D0]
7/7/2015
+ [D0]
- [D2]
& [D2]
| [D2]
YOU
COMPLETE
THE
21k
Instructions
>> 3 (signed)
>> 3 (unsigned)
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
17 / 37
Basic Indirect Addressing Operations to Memory

CAREFULL!!!!
21k -- NOT QUITE
LOAD INDIRECT
[Reg] <- [Memory([AddressReg2])]
MOVE.L (A0), D0
[D0] <- [Memory([A0])]

R1 = dm(0, I4);
R1 = dm(I4, 0) ;
R1 = pm(I12, 0);
R1 = dm(I12, 0); NO!
LOAD INDIRECT with CONSTANT offset
[Reg] <- [Memory([AddressReg2 + offset])]
MOVE.L (8, A0), D0
[D0] <- [Memory([A0] + 8)]

R1 = dm(2, I4);
R1 = dm(I4, 2) ; NO!
Same with store operations
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
18 / 37
Indirect Addressing Operations to Memory

LOAD INDIRECT with Register offset
CAREFULL!!!!
21k -- NOT QUITE
[Reg] <- [Memory([AddressReg2 + offset])]
D1 used as loop counter
R0 = dm(R1, I4); NO!!
MOVE.L (A0,D1), D0
M4 = R1; R0 = dm(M4, I4);
[D0] <- [Memory([A0] + [D1])] R1 = dm(I4, M4); NO!!
R1 = pm(M12, I12);
R1 = pm(M4, I12); NO!!

Same with store operations

LOAD INDIRECT with Register + constant offset
[Reg] <- [Memory([AddressReg2 + offset1 + offset 2])]
MOVE.L (8, A0, D1), D0
[D0] <- [Memory([A0] + [D1] + 8)]
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
NO!!, multiple 21k
19 / 37
MOVE.L (8,A0,D1),D0








Fetch the MOVE instruction (4 cycles)
Fetch the Value 8 (4 cycles)
Move A0 to ALU then add D1 (loop variable)
Move result of ALU to ALU then add 8 (structure offset)
Move result to address register -- fetch
memory value and store in high part of D0 (4 cycles)
Move result of ALU and add 2 (next address) (?)
Move result to address register -- fetch
(4 cycles) memory value and store in low part of D0
Note A0 and D1 must remain unchanged
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
20 / 37
MOVE.L (8,A0,D1),D0 -- 21k style









A0 -> I4, D1 -> R1, D0 -> R0
Fetch the MOVE instruction (4 cycles)
Fetch the Value 8 (4 cycles)
R2 = 8;
Move A0 to ALU then add D1
R2 = R1 + R2;
Move result of ALU to ALU then add 8
M4 = R2;
Move result to address register -- fetch
memory value and store in high part of D0
Move result of ALU and add 2 (next address)
Move result to address register -- fetch
memory value and store in low part of D0
R0 = dm(M4, I4)
If using 21k hardware loop, how do you access the loop counter with
minimum overhead?
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
21 / 37
Indirect Addressing Operations to Memory

LOAD INDIRECT with register post-increment
[Reg] <- [Memory([AddressReg2])]
[AddressReg2] <- [AddressReg2] + 4
MOVE.L (A0)+, D0
[D0] <- [Memory([A0])]

R0 = dm(I4, 1);
; [A0] <- [A0] + 4
LOAD INDIRECT with register pre-decrement
[AddressReg2] <- [AddressReg2] - 4
[Reg] <- [Memory([AddressReg2])]
MOVE.L -(A0), D0
[A0] <- [A0] - 4 ;
7/7/2015
Modify (I4, -1);
R0 = dm(0, I4);
[D0] <- [Memory([A0])]
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
22 / 37
21k processor is DSP



Digital Signal Processing Processor
Customized for DSP
In real life, programmer must be really close
to the architecture if want speed
However most of the time, treat like a
version of the 68K
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
23 / 37
Compare MOVE on 29K and 68K

Register to Register
R1 = R0

Immediate to Register
R0 = 0x5000

MOVE.L #0x5000, D0
Memory to Register
R0 = dm(0x5000)
R0 = pm(0x5000)

MOVE.L D0, D1
MOVE.L 0x5000, D0
--- No equivalent ---
Memory to Memory
-- No equivalent --
MOVE.L 0x5000, 0x6000
R0 = dm(0x5000); dm(0x6000) = R0;
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
24 / 37
Comparing ADD operations

Register to Register Add
R1 = R1 + R0

ADD.L D0, D1
Immediate to Register Add
-- No equivalent --
ADD.L #0x5000, D0
R1 = 0x5000; R0 = R1 + R0;

Memory to Register Add
-- No equivalent --
What are the equivalent?

ADD.L 0x5000, D0
Memory to Memory


Not available on EITHER processor
What are the equivalents
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
25 / 37


Easiest way to program 21K assembly
Can’t bother with the complex instructions
DSP has “LOAD/STORE” architecture like the MIPS
processor

MOVE memory to register
MOVE register to memory
OPERATE register on register

There are not any other type of instructions

Customize for speed later using hardware




(LOAD)
(STORE)
Develop a process to avoid the standard simple errors so
that you can get to the stuff that is important.
Most of you will not bother to use the process for 5
minutes in order to avoid wasting 1 hour of time
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
26 / 37
Basic LOAD/STORE operations

LOAD -- Memory to register
[Reg] <- [Memory(address)]
R0 = dm(0x5000)

MOVE.L 0x5000, D0
STORE -- Register to Memory
[Memory(address)] <- [Reg]
pm(0x5000) = R0
7/7/2015
no 68k equivalent for pm
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
27 / 37
Basic LOAD/STORE operations

LOAD register with a constant
[Reg] <- constant value
R0 = 0x5000
7/7/2015
MOVE.L #0x5000, D0
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
28 / 37
Basic Register-to Register operations

LOAD -- Register to register
[Reg] <- [Reg2]
R0 = R1

MOVE.L D1, D0
Operation -- Register to register
[Reg] <- [Reg] Operation [Reg2]
R1 = R1 + R0
R1 = R2 + R3
7/7/2015
ADD.L D0, D1
-- no equivalent -ENCM515 -- Compare 68k and 21k
Copyright [email protected]
29 / 37
Basic Register-to Register operations

Operation -- Register to register
[Reg] <- [Reg] Operation [Reg2]
R1
R1
R1
R1
= R1 + R0
= R1 - R0
= R1 AND R0
= R1 OR R0
-- many alternatives --
7/7/2015
ADD.L D0, D1
SUB.L D0, D1
AND.L D0, D1
OR.L D0, D1
CMP.L D0, D1
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
30 / 37
Basic Indirect Addressing Operations to Memory

LOAD INDIRECT
[Reg] <- [Memory([AddressReg2])]
R0 = dm(I0)

MOVE.L (A0), D0
LOAD INDIRECT with CONSTANT offset
[Reg] <- [Memory([AddressReg2 + offset])]
R0 =
dm(2, I4)
MOVE.L (8, A0), D0
but
R0 =

pm(2, I12)
-- No need for distinction --
Special DAGS for custom data and program memory ops
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
31 / 37
Indirect Addressing Operations to Memory

LOAD INDIRECT with Register offset
[Reg] <- [Memory([AddressReg2 + offset])]
R0 = dm(M4, I4)
MOVE.L (A0,D1), D0
Order is absolutely key -- dm(I4, M4) means something VERY different

Same with store operations

LOAD INDIRECT with Register + constant offset
[Reg] <- [Memory([AddressReg2 + offset1 + offset 2])]
-- NO Equivalent -MOVE.L (8, A0, D1), D0
but wait till Lab. 2, 3 and 4 for some REALLY fancy SHARC addressing
modes
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
32 / 37
Indirect Addressing Operations to Memory

LOAD INDIRECT with register post-increment
[Reg] <- [Memory([AddressReg2])]
[AddressReg2] <- [AddressReg2] + 4
R0 = dm(I4, M6)
(with M6 preset to 1)
MOVE.L (A0)+, D0
R0 = dm(I4, 1) -- An instruction that is only useful on a Monday/Weds
and our labs are on Friday and exams on Tues!

LOAD INDIRECT with register pre-decrement
R0 = dm(I4, M7)
(with M7 preset to -1)
MOVE.L -(A0), D0
R0 = dm(I4, -1) -- Only useful on a Monday/Weds
R0 = dm(I4, M15) illegal
but R0 = pm(I12, M15) is OKAY
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
33 / 37
You complete, without next slide
// long int value = 6;
// Memory[2000] = value;
// Memory[3000] = 7;
// long int pt = &Memory[4000];
//
*pt = value;
//
*pt = 9;
//
*pt++ = value + 1;
//
*pt-- = value + 2;
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
34 / 37
Fix RISC architecture and speed Issues
#define valueR1 R1
valueR1 = 6;
dm(2000) = value;
#define tempR0 R0
tempR0 = 7;
dm(2000) = tempR0;
#define ptI4 I4
ptI4 = 4000;
dm(ptI4) = value;
tempR0 = 9;
dm(ptI4, M5) = tempR0;
#define tempR2 R2
tempR2 = valueR1 + 1;
dm(ptI4, M6) = tempR2;
tempR0 = 2;
tempR2 = tempR1 + tempR0;
dm(pt4, M7) = tempR2
7/7/2015
// long int value = 6;
// Memory[2000] = value;
// Memory[3000] = 7;
// long int pt = &Memory[4000];
//
*pt = value;
//
*pt = 9;
// M5 preset to 0 by C start-up procedure
//
*pt++ = value + 1;
// M6 preset to +1 by C startup procedure
//
*pt-- = value + 2;
// M7 preset to -1 by C startup procedure
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
35 / 37
NON-NEGIOTABLE




NON-NEGIOTABLE -- means that is the way the
processor is designed and you can’t fight it
NON-NEGIOTABLE -- means that if you don’t do it
this way you will waste a lot of time in the labs on
the simple stuff -- and lose many marks in quizzes
NON-NEGIOTABLE -- means that this is fixed,
standard, life. Develop a simple PSP process to
review code to make sure this stuff is not there
and you can get onto the interesting stuff.
CONTRACT -- The moment the class stops making
80% of these simple errors, I will stop taking most
marks off in the quizzes for the simple stuff.
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
36 / 37
Tackled today






When to use assembly code
Useful sub-set of 68K CISC instructions
Recap Effective addressing modes
Load/Store Programming style for 68K
Load/Store Architecture of 21K by comparison with
68K
21K architecture is customized for DSP
7/7/2015
ENCM515 -- Compare 68k and 21k
Copyright [email protected]
37 / 37

ENCM515 -- Compare 68K and SHARC

Transcript ENCM515 -- Compare 68K and SHARC

Directory