DAC Presentation kit

Download Report

Transcript DAC Presentation kit

Matlab Extensions for the
Development, Testing and Verification
of Real-Time DSP Software
David P. Magee
Communication Systems Engineer
Texas Instruments
Dallas, TX
Presentation Outline
 DSP Software Development
 DSP Simulator
 Introduction to Intrinsics
 FFT Example
 Algorithm Optimization Results
 Other Matlab and Simulink Extensions
 Closing Remarks
 Q&A
DSP Software Development
 Common steps for DSP software development
Step 1: Develop
Understanding
Develop Floating
Point Simulation
Debug
Simulation
Step 2: Address
Scaling Issues
Develop Fixed
Point Simulation
Debug
Simulation
Step 3: Optimize
for Performance
Develop
Assembly Code
Debug
Assembly Code
Issues with the 3 Step Approach
 Each step takes time and resources
 Algorithm testing at each stage
 Multiple versions of the algorithm – version control
headaches
 Evaluation of processor instruction set
compatibility and MIPS requirements often occurs
late in the software development cycle
 Debugging algorithms on a pipelined and/or
parallel processor can be very difficult (the
problem is getting more difficult as processors
become more complicated)
Can the development cycle be improved ? Yes !
Improved Software Development Cycle
 Merge Steps 2 and 3
Step 1: Develop
Understanding
Develop Floating
Point Simulation
Debug
Simulation
Step 2: Address
Scaling Issues and
Optimize for
Performance
Simultaneously
Develop Fixed
Point Simulation
and Assembly
Code
Simultaneously
Debug
Simulation and
Assembly Code
Question: How can these steps be combined ?
Matlab + DSP Simulator
 Develop Floating Point and Fixed Point Simulations in a single
development environment - Matlab
 Develop and test C/C++ code for Fixed Point Simulation in
cooperation with the DSP Simulator
 Migrate the C/C++ code directly to the target DSP
Floating Point
Simulation
System
Simulation
e1  
Matlab Simulation
Environment
Fixed Point
Simulation
System
Simulation
Host
Environment
DSP
Simulator
DSP Simulator in Matlab
Develop and Debug Fixed Point
C/C++ Code in Matlab
DSP Simulator
Benefits:
 Accelerate the development and
analysis of DSP code
C/C++ code
 A mechanism to implement your
IP blocks in efficient DSP code
 Process large amounts of data
 Compare fixed point and floating
point algorithm implementations
 Provide mixed simulation
environment with fixed point and
floating point algorithm
implementations
 Advanced graphing capabilities
MEX-file
Matlab
What is a MEX-file ?
 A file containing one function that interfaces C/C++
code to the Matlab shell
 MathWorks specifies the syntax for this function
void mexFunction(int nlhs,mxArray *plhs[ ],
int nrhs,const mxArray *prhs[ ])
 See http://www.mathworks.com
 Enter
mex files into their Search engine
What is a DSP Simulator ?
 A library of functions that simulate the
mathematical operations of DSP assembly
instructions.
 For TI DSPs, the compiler recognizes special
functions called Intrinsics and maps them directly
into inline assembly instructions
 In the DSP Simulator, make each function represent
a supported compiler Intrinsic
Intrinsic Example
 ADD2: adds the upper and lower 16-bit portions of
a 32 bit register
 Intrinsic: dst = _add2(src1,src2)
 Assembly Instruction: ADD2 (.unit) src1,src2,dst
C code
C6x Assembly Code
Function Example() {
.
}
Example:
Compile
.
y = _add2(a,b);
ADD2 . S1 A1,A2,A3
.
.
.
DSP Simulator Example
DSP Simulator
 C Code with _add2() Intrinsic
typedef struct _REG32X2
{
short lo;
C code
short hi;
} reg32x2;
Function Example() {
.
int32 _add2(int32 a,int32 b) {
y = _add2(a,b);
int32 y;
.
reg32x2 *pa,*pb,*py;
}
pa = (reg32x2 *)&a; pb = (reg32x2 *)&b;
py = (reg32x2 *)&y;
py->lo = pa->lo+pb->lo;
py->hi = pa->hi+pb->hi;
return(y);
} // end of _add2() function
DSP Simulator
 How many Intrinsics exist for each DSP family ?
TMS320C54x: 36
TMS320C55x: 42
TMS320C62x: 59
TMS320C64x: 135
TMS320C64+: 162
TMS320C67x: 68
Most algorithms
previously written in
assembly code can
now be expressed in
C/C++ code with
Intrinsic function
calls
DSP Simulator
 Consists of two files
 C6xSimulator.c
 C6xSimulator.h
 Contains C functions for representing the
numerical operations of 158 DSP assembly
instructions
 Can control endianness with a symbolic constant
DSP Simulator and C++
 DSP Simulator works in C++ programming
environments
 Partition
data into appropriate types (real, complex)
and bit widths (8/16/32 bits)
 Write functions in C++
 Use operator overloading for required data types to
map operators to the desired Intrinsic functions
Benefit: Operator overloading allows for easy
migration to next generation DSP
instruction sets
Using the DSP Simulator
 Develop C/C++ code with Intrinsic function calls
 Compile and link the C/C++ code and the DSP
Simulator to form a Matlab executable file
 Debug and evaluate the performance of the fixed
point algorithms in Matlab
 Rely on TI tools to generate an optimized assembly
version of the C/C++ code for the target DSP
Benefit: One version of C/C++ code runs in Matlab
and in the target DSP !
Migrating C/C++ Code to the DSP
 How does it work ?
C/C++ code can directly access DSP assembly instructions
without actually writing assembly code
Benefit: Eliminate headaches associated with assembly
programming
Pipeline scheduling
Register allocation
Unit allocation
Stack manipulation
Parallel instruction debug
Conclusion: Make the compiler do the hard work !
When is the C/C++ Code Optimized ?
 Look at compiler report in the assembly file to
determine unit loading.
 Look
at the assembly code. Are all the units being
used each cycle ?
 Try to balance loading by using different sequence of
Intrinsics to perform the same overall mathematical
operation.

e.g. X * 4 => X << 2
 May
require manual unrolling of loops.
 Determine the ideal number of MAC operations for
an algorithm and compare it to the compiler report
Limitations
 DSP software engineer must perform algorithm
mapping from floating point to fixed point manually
 ranges
for floating point values
 fixed point scaling issues
 saturation issues
 DSP software architecture is limited to the
creativity of the software engineer
Recommendation: Develop an automated tool that
converts Matlab/Simulink floating
point files to fixed point DSP C/C++
code using the programming
guidelines discussed in the paper.
FFT Example
Developed an FFT for the C64x DSP architecture
Briefly discuss
 FFT Functions
 FFT Simulation File
 Development time between hand coded assembly
and C code with Intrinsics
 Software
development time
 Software performance
FFT Functions
The FFT functions
// inside the Radix-2 stage
for(k=Nover2;k>0;k--)
 Main FFT function
{
.
 First FFT stage
// compute the real part
 Radix-2 stage
// (x0.real-x1.real)*w1.real
reg2 = _mpyhir(w1,reg1real);
 Radix-4 stage
// (x0.imag-x1.imag)*w1.imag
reg3 = _mpylir(w1,reg1imag);
 Last FFT stage
reg2 -= reg3;
// compute the imag part
// (x0.imag-x1.imag)*w1.real
Example: Radix-2 stage
reg4 = _mpyhir(w1,reg1imag);
 Uses mpyhir() and mpylir()
Intrinsics
// (x0.real-x1.real)*w1.imag
reg5 = _mpylir(w1,reg1real);
reg4 += reg5;
.
}
Note: Twiddle factor indexing not shown in this Example
FFT Simulation File
The simulation file is a Matlab
script file
% test_fft.m
 Performs the simulation
Nin = 64;
 Calls the floating point
Matlab FFT function fft()
 Calls the fixed point FFT
function ti_fft()
 Compares the frequency
responses of fixed point
and floating point FFTs in
Matlab
 Computes the SNR, NSR,
etc. using Matlab
% initialize some parameters
N = 128;
NumFFTs = 1000;
% create a random input
h = rand(NumFFTs,Nin);
h = [h;zeros(NumFFTs,N-Nin)];
% compute FFT using Matlab function
Hd = fft(h,[],2);
% call the fixed point function
[H] = ti_fft(h1dfilt,Nin,N);
% compute the NSR in dB scale
e = Hd-H;
NSR = 10*log10(sum(abs(e).^2,2)…
./sum(abs(Hd).^2,2));
FFT Development Time
Software Development Time Comparison
 Time required to develop hand-coded assembly
functions
 2-3
person months
 Time required to develop C code with Intrinsic
function calls
 2-3
person weeks
Development time is reduced by a factor of 4 to 5 !
FFT Performance Comparison
Metric: Kernel sizes and cycle counts
 Kernel sizes for hand-coded assembly functions




FirstFFTStage:
R2Stage:
R4Stage:
LastFFTStage:
18*(N/16)
7*(N/8)
12*(N/8)
24*(N/16)
 Kernel sizes for C code with Intrinsic function calls




FirstFFTStage:
R2Stage:
R4Stage:
LastFFTStage:
19*(N/16)
8*(N/8)
14*(N/8)
27*(N/16)
Intrinsics performance is within 15% of assembly !
Algorithm Optimization Results
Algorithm
C Intrinsics
Code Size
Assembly
Code Size
(cycles)
(bytes)
(cycles)
(bytes)
autocorr
97
800
66
384
bit_unpack
124
192
108
192
bk_massey
302
416
262
320
ch_search
485
1056
321
864
dotprod
32
160
29
160
fir_cplx
1227
832
985
448
forney
195
864
156
864
maxval
38
128
34
128
rs_encoder
463
512
402
288
syndrome
510
1216
478
1152
vecsum
45
96
40
96
In most cases, Intrinsics performance is within 10% !
Matlab Function Libraries
For a particular DSP application
DSP Simulator
 The DSP Simulator emulates the
numerical behavior of the DSP
instructions
 Power User develops a library of
optimized algorithms that contain
Intrinsic function calls
 General user writes C/C++ code
that calls the optimized functions in
the library
 The user’s C/C++ code is compiled
with the DSP Simulator, the library
and the MEX-file
 User tests the algorithms for
performance, evaluates cycle
counts, etc. in Matlab
 The same C/C++ code is migrated
directly to the target DSP
Library
Function 2
Function 1
Function N
C/C++ code
MEX-file
Matlab
Matlab Function Library Examples
Math Library
Library
Communications Library
Library
OuterProduct
VectorSum
FIR
RS
BF
InnerProduct
ChanEst
NoiseEst
Viterbi
Controls Library
Library
NoiseEst
Hinf
SlidingMode
PID
ResEqu
Benefit: Ability to share fixed-point DSP C/C++ code
and test vectors between multiple users
IC
Closing Remarks
DSP Simulator Benefits
 Develop fixed point DSP code in Matlab
 Easily compare floating point and fixed point algorithm
implementations in Matlab
 Bit-true, fixed point simulations
 Reduce software development time by a factor of 4 to 5
 Incorporate DSP code into higher level system simulations
 Debugging code in Matlab is easier than in a real-time system
 Easily evaluate/predict MIPS requirements
 Run the same C/C++ source code in Matlab and in the DSP
 Easily migrate algorithms to new DSP instruction sets
 Develop software before next generation DSPs are available
Q&A
 Thanks for attending my presentation !