Symbol table

Download Report

Transcript Symbol table

Program design and
analysis
 Designing embedded programs is more
difficult and challenging as compared to
PC/workstation programs.
 Design challenges:
-
rich functionality
Meet system deadlines
Fit into restricted memory
Meet power requirements
Debugging
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Assembly and linking
Steps in compilation:
HLL
HLL
HLL
compiler
Symbolic to bit
level translation
assembly
assembly
assembly
code
assembler
Object
assembly
code
assembly
loader
© 2000 Morgan
Kaufman
Executable
binary
Overheads for Computers as
Components
linker
Exec built from
many files, addresses
linked, …
Multiple-module programs
Programs may be composed from several
files  addresses for instructions/data are done by linker
Addresses become more specific during
processing:
relative addresses are measured
relative to the start of a module;
absolute addresses are measured
relative to the start of the CPU address
space.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Assemblers
Major tasks:
generate binary for symbolic
instructions;
translate labels into addresses;
handle pseudo-ops (data, etc.).
Starting point in
Assembly labels:
label1
© 2000 Morgan
Kaufman
ORG 100
ADR r4,c
memory
Pseudo-op
Overheads for Computers as
Components
Two-pass assembly
Pass 1:
Scan code to determine address of each
module  generate symbol table
Pass 2:
Assembles instructions using labels
computed in first pass  generate binary
instructions
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Symbol table
PLC
xx
yy
ADD r0,r1,r2
ADD r3,r4,r5
CMP r0,r3
SUB r5,r6,r7
assembly code
© 2000 Morgan
Kaufman
xx
yy
0x8
0x10
symbol table
Overheads for Computers as
Components
Symbol table generation
Use program location counter (PLC) to
determine address of each location.
Scan program, keeping count of PLC.
Addresses are generated at assembly
time, not execution time.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Symbol table example
PLC=0x7
Symbol table
ADD
r0,r1,r2
PLC=0xb
xx
ADD r3,r4,r5
PLC=0xf
CMP r0,r3
PLC=0x13
yy
SUB r5,r6,r7
xx
yy
0xb
0x13
NOTE: Assume that
each instruction takes
4 memory locations
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Pseudo-operations
Pseudo-ops do not generate instructions:
ORG sets program location.
EQU generates symbol table entry without
advancing PLC.
 Example:
- Label FOO=5 will be
added to symbol table
add r0, r1, r2
FOO EQU 5
- BAZ will remain the
nd
same
as
when
2
BAZ SUB r3,r4,#FOO
statement is not present
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Linking
Programs are written as several smaller
pieces
Programs may use library routines (.o files)
Linker combines several object modules
into a single executable module.
Jobs:
put modules in order;
resolve labels across modules.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Externals and entry points
entry point
xxx
yyy
ADD r1,r2,r3
a
B a external reference
%1
ADR r4,yyy
ADD r3,r4,r5
Some labels are both
defined and used in
the same file
Some will be defined in
a single file but used
elsewhere
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Module ordering
Controlling where code modules are loaded
in memory is important, e.g., :
- Interrupt code must be placed at precise
locations in memory
- Code has to be placed in specific memory
locations in EPROM/RAM.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
… Module ordering
Code modules must be placed in absolute
positions in the memory space.
Load map or linker flags (given by user)
control the order of modules.
module1
module2
module3
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Dynamic linking
Some operating systems link modules
dynamically at run time:
shares one copy of library among all
executing programs  Saves storage space as we
do not have to link separate copies of commonly used routines
such as I/o to every executable program
allows programs to be updated with
new versions of libraries.
Some delay is introduced before
program starts execution
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Program design and
analysis
 We need to control:



where interrupt code is placed
where data and instructions are placed
execution speed, memory/power consumption
 Compilation flow.
 Basic statement translation.
 Basic optimizations.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Compilation:
High level  Low level code
Compilation strategy:
compilation = translation + optimization
High level language translation
Better instruction sequence
Compiler determines quality of code:
use of CPU resources, e.g., register usage
memory access scheduling;
code size.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Basic compilation phases
HLL
parsing, symbol table, semantic analysis
machine-independent
optimizations
machine-dependent
optimizations
assembly
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
e.g., arithmetic
simplifications
(use one memory
location for x in
x= c*x )
Statement translation and
optimization
Source code is translated into
intermediate form such as CDFG.
CDFG is transformed/optimized.
CDFG is translated into instructions with
optimization decisions  machine dependent
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Arithmetic expressions
a*b + 5*(c-d)
b
a
c
*
expression
5
-Some machines perform
memory to memory arithmetic
x
w
-In many machines, e.g.,
ARM, we have to load
variables into registers
*
y
+
z
DFG
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
d
Temporary
variables
Arithmetic expressions,
cont’d.
b
a
1
c
*
2
5
3
4
*
+
d
-
ADR r4,a
MOV r1,[r4]
ADR r4,b
MOV r2,[r4]
MUL r3,r1,r2
ADR r4,c
MOV r1,[r4]
ADR r4,d
MOV r5,[r4]
SUB r6,r4,r5
2
MUL r7,r6,#5
3
ADD r8,r7,r3
DFG
© 2000 Morgan
Kaufman
1
code
Overheads for Computers as
Components
4
An obvious
optimization:
Use a register
whose value is
not needed
Control code generation
if (a+b > 0)
x = 5;
else
x = 7;
a+b>0
x=7
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
x=5
Control code generation,
cont’d.  CDFG model (1-2-3/other walk)
1
3
a+b>0
T
x=5
2
ADR r5,a
LDR r1,[r5]
ADR r5,b
LDR r2,b
ADD r3,r1,r2
BLE label3
1
LDR r3,#5
ADR r5,x
STR r3,[r5]
B stmtent
x=7
label3
© 2000 Morgan
Kaufman
LDR r3,#7
ADR r5,x
STR r3,[r5]
stmtent ...
2
Overheads for Computers as
Components
3
Procedure linkage:
CPU’s
subroutine call is not usually sufficient to support
procedures in programming languages
Need code to:
call and return;
pass parameters and results.
Parameters and returns are passed on
stack.
Procedures with few parameters may use
registers.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
A Typical Procedure Linkage:
f(arg_0, …, arg_n-1)
 Push arg_n-1 (onto stack)
 …
 Push arg_1
 Push arg_0
 Store return address in a register
 Branch to procedure  B (f, Return address)
 De-allocate arguments POP n words off the
stack
 Use a register for return variable of f(.)
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Stack-based management- handled by
compiler
 A block of memory called “stack frame”
is required for every call of a function.
 The stack grows and shrinks during
execution difficult to predict worst case at compile time
SP
Total
Stack
space
© 2000 Morgan
Kaufman
SP
Used
Space
Overheads for Computers as
Components
SP
Procedure stacks
proc1(int a) {
int b, c;
b=5;
proc2(b);
c=2+b;
…
}
growth
proc1
FP
frame pointer: defines
end of last frame
SP
stack pointer: defines
end of current frame
© 2000 Morgan
Kaufman
proc2
5
accessed relative to SP
Overheads for Computers as
Components
ARM procedure linkage
APCS (ARM Procedure Call Standard):
 BL foo
r0-r3 pass parameters into procedure. Extra
parameters are put on stack frame.
r0 holds return value.
r4-r7 hold register values.
r11 is frame pointer, r13 is stack pointer.
r10 holds limiting address on stack size to
check for stack overflows.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
8051, PIC:
Static Allocation- no runtime memory allocation
How each byte of RAM is used is
established at compile time
Global and static data are allocated in
fixed locations.
Local data is stored in a block set for
each function, i.e., local data x is stored in
the same place for each invocation  nonreentrant code
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Data layout in memory:
1D, 2D arrays, structures
Compiler: must translate references to
data structures to references in raw
memory  address computations
Different types of data structures use
different data layouts.
Some offsets into data structure can be
computed at compile time, others must be
computed at run time.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
One-dimensional arrays
C array name points to 0th element:
aptr
a[0]
a[1]
= *(aptr + 1)
a[2]
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Two-dimensional arrays:
different possibilities
2-dim array is accessed thru 1-dim array
 Row-major layout:
a[0,0]
a[0,1]
...
N
...
a[1,0]
a[1,1]
© 2000 Morgan
Kaufman
M
Calculation done at
run time when i, j
change
= a[i*M+j]
Overheads for Computers as
Components
Structures
Fields within structures are static offsets:
aptr
struct {
int field1;
char field2;
} mystruct;
field1
field2
struct mystruct a, *aptr = &a;
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
4 bytes
*(aptr+4)
Addition
done at
compile
time
(a.filed2)
Expression simplification:
Machine independent
Constant folding:
8+1 = 9
Algebraic:
a*b + a*c = a*(b+c)
Strength reduction:
Addition done at compile
time
a*2  a<<1
 for (i=0 ; i<NOPS+1; i++) …
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Dead code elimination
Dead code:
Compile flag
#define DEBUG 0
if (DEBUG) dbg(p1);
0
0
Can be eliminated by
analysis of control
flow, constant folding.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
1
dbg(p1);
Procedure inlining
Eliminates procedure linkage overhead:
int foo(a,b,c) { return a + b - c;}
z = foo(w,x,y);

Inlining result: z = w + x - y;
 C++ provides an inline construct.
 More code/memory size
 may slow down cached systems
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Loop transformations
Loops are compact but can contribute to a
large fraction of computation time
Goals:
reduce loop overhead;
improve memory system performance.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Loop unrolling
Reduces loop overhead, enables
some other optimizations.
for (i=0; i<4; i++)
a[i] = b[i] * c[i];
iteration and test contribute to overhead
a[0]=b[0]*c[0], …, a[3]=b[3]*c[3]
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Loop fusion
Fusion combines two loops into 1:
for (i=0; i<N; i++) a[i] = b[i] * 5;
for (j=0; j<N; j++) w[j] = c[j] * d[j];
 for (i=0; i<N; i++) {
a[i] = b[i] * 5; w[i] = c[i] * d[i];
}
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Register allocation
Goals:
choose register to hold each variable;
determine lifespan of variable in the
register.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Register lifetime graph
w = a + b; t=1
x = c + w; t=2
y = c + d; t=3
Naive allocation:
Assign each variable to
separate registers (7
registers)
a
b
c
d
w
x
y
1
2
3
time
Maximum 3 registers in use 
3 registers required
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Using your compiler
Understand various optimization levels (O1, -O2, etc.)
Study the assembly language output.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Program design and
analysis
Optimizing for execution time.
Optimizing for energy/power.
Optimizing for program size.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Motivation
Embedded systems must often meet
deadlines.
Faster may not be fast enough.
Need to be able to analyze execution
time.
Worst-case, not typical.
Need techniques for reliably improving
execution time.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Run times will vary
Program execution times depend on
several factors:
Input data values  different execution
paths
State of the instruction, data caches.
Pipelining effects.
Other: e.g., Floating point operations- most
sensitive to data values
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Measuring program speed
CPU simulator.
I/O may be hard.
May not be totally accurate.
Hardware timer  start/stop at beginning/end of code
Requires board, instrumented program.
Logic analyzer.
Limited logic analyzer memory buffer.
Relies on code being able to produce
identifiable bus events.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Program performance
metrics
Average-case:
For typical data values, whatever they are.
Worst-case:
for meeting deadlines
For any possible input set.
Best-case:
For any possible input set.
Too-fast programs may cause critical
races at system level  timing
requirements should be met as well
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Performance analysis
Elements of program performance
(Shaw):
execution time = program path + instruction
timing
Path is the sequence of instructions and
depends on data values.
Instruction timing depends on: data
dependency, pipeline/cache behavior.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Program paths:
longest path != longest
execution time
Consider for loop:
i=0; f=0;
for (i=0, f=0, i<N; i++)
f = f + c[i]*x[i];
Loop initiation block
executed once.
Loop test executed
N+1 times.
Loop body and
variable update
executed N times.
© 2000 Morgan
Kaufman
i<N
N
Y
f = f + c[i]*x[i];
Overheads for Computers as
Components
i = i+1;
Loop optimizations
Loops are good targets for optimization.
Basic loop optimizations:
code motion;
induction-variable elimination;
strength reduction (x*2 -> x<<1).
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Code motion:
Moving unnecessary
code out of a loop
Performed at every iteration,
i.e., N*(M-1) executions
i=0; Xi=0;
= N*M
i<N*M
i<X
Y
for (i=0; i<N*M; i++)
z[i] = a[i] + b[i];
z[i] = a[i] + b[i];
i = i+1;
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
N
Induction variable
elimination
Induction variable: value derived from
loop iteration, e.g., loop index.
Consider loop:
for (i=0; i<N; i++)
for (j=0; j<M; j++)
z[i,j] = b[i,j];
Rather than re-compute i*M+j for each
array in each iteration, share induction
variable between arrays, increment at end
of loop body.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Induction variable
elimination
for (i=0; i<N; i++)
for (j=0; j<M; j++){
zbinduct = i*M + j;
*(zptr+zbinduct)=*(bptr+zbinduct);
}
OR: Eliminate multiplication (strength reduction)
zbinduct=0;
for (i=0; i<N; i++)
for (j=0; j<M; j++){
*(zptr+zbinduct)=*(bptr+zbinduct);
zbinduct++;
© 2000 Morgan }
Overheads for Computers as
Kaufman
Components
Performance optimization
hints
Simple high-level statements may be
time consuming, e.g., dyn. mem. alloc
malloc( )
Use registers efficiently.
Use page mode memory accesses 
arrange data contiguously so that they go in one
page
Analyze cache behavior
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Energy/power optimization
Energy: ability to do work.
Most important in battery-powered systems.
Power: energy per unit time.
Important even in wall-plug systems---power
becomes heat  Controlling power increases
reliability and cost
Memory accesses are a major component of power
consumption  Reduce memory access
Turn-off parts of the system when not needed
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Measuring energy
consumption
Execute a small loop, measure current:
Ammeter
I
while (TRUE)
a();
CPU
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Calculate power:
Power with a() in the
while loop –
Power without a() in the
while loop
Sources of energy
consumption
Relative energy per operation:
memory transfer: 33  most expensive
external I/O: 10
SRAM write: 9
SRAM read: 4.4
multiply: 3.6
add: 1
register access (<1)  most efficient
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Cache behavior is
important
Cache is power hungry  built from SRAM
Energy consumption has a sweet spot as
cache size changes:
cache too small: program thrashes,
burning energy on external memory
accesses;
cache too large: cache itself burns too
much power.
  There is an optimum cache size
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Optimizing for energy
First-order optimization:
high performance = low energy.
Not many instructions trade speed for
energy.
Generally speaking: Making program run
faster also reduces energy consumption.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Optimizing for program
size
Goal:
reduce hardware cost of memory;
reduce power consumption of memory units.
Two opportunities:
data;
instructions.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Data size minimization
Reuse constants, variables, data buffers in
different parts of code.
Select buffer size carefully
Requires careful verification of correctness.
Generate data using instructions- on-thefly  Although code takes space, there may be some
net space saving
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Reducing code size
Avoid function inlining.
Encapsulate functions in subroutines 
Due to overhead there is a minimum size function body
for which a subroutine makes sense
Choose CPU with compact instructions.
Use specialized instructions where
possible  e.g. Multiply-Accumulate is smaller and
faster than separate arithmetic operations
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Program design and
analysis
Program validation and testing.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Goals
Make sure software works as intended.
We will concentrate on functional testing--performance testing is harder.
What tests are required to adequately test
the program?
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Basic testing procedure
Provide the program with inputs.
Execute the program.
Compare the outputs to expected results.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Types of software testing
Black-box: tests are generated without
knowledge of program internals.
Clear-box (white-box): tests are
generated from the program structure.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Program design and
analysis
Software modem.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Theory of operation
Frequency-shift keying:
separate frequencies for 0 and 1.
0
1
time
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
FSK encoding
Generate waveforms based on current bit:
0110101
© 2000 Morgan
Kaufman
bit-controlled
waveform
generator
Overheads for Computers as
Components
A/D converter
FSK decoding
zero filter
detector
0 bit
one filter
detector
1 bit
Passes frequencies in the range
representing 1 and reject those
representing 0.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Transmission scheme
Send data in 8-bit bytes. Arbitrary spacing
between bytes.
Byte starts with 0 start bit.
Receiver measures length of start bit to
synchronize itself to remaining 8 bits.
start (0)
bit 1
bit 2
Length denotes start bit
© 2000 Morgan
Kaufman
bit 3
...
bit 8
Samples taken at midpoints
Overheads for Computers as
Components
Requirements
Inputs
Analog sound input, reset button.
Outputs
Analog sound output, LED bit display.
Functions
Transmitter: Sends data from memory
in 8-bit bytes plus start bit.
Receiver: Automatically detects bytes
and reads bits. Displays current bit on
LED.
1200 baud.
Performance
Manufacturing cost
Power
Physical
size/weight
© 2000 Morgan
Kaufman
Dominated by microprocessor and
analog I/O
Powered by AC.
Small desktop object.
Overheads for Computers as
Components
Specification
Line-in*
1
1
Receiver
sample-in()
bit-out()
input()
Transmitter
1
1
bit-in()
sample-out()
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Line-out*
output()
System architecture
Interrupt handlers for samples:
input and output.
Transmitter.
Receiver.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Transmitter
Waveform generation by table lookup.
float sine_wave[N_SAMP] = { 0.0, 0.5,
0.866, 1, 0.866, 0.5, 0.0, -0.5, -0.866, -1.0, 0.866, -0.5, 0};
time
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Receiver
Filters (FIR for simplicity) use circular
buffers to hold data.
Timer measures bit length.
State machine recognizes start bits, data
bits.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Hardware platform
CPU.
A/D converter.
D/A converter.
Timer.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
Component design and
testing
Easy to test transmitter and receiver on
host.
Transmitter can be verified with speaker
outputs.
Receiver verification tasks:
start bit recognition;
data bit recognition.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components
System integration and
testing
Use loopback mode to test components
against each other.
Loopback in software or by connecting D/A
and A/D converters.
© 2000 Morgan
Kaufman
Overheads for Computers as
Components