EE249 Discussion Section

Download Report

Transcript EE249 Discussion Section

A Method for Architecture Exploration
for Heterogeneous Signal Processing
Systems
Sam Williams
EE249 Discussion Section
October 15, 2002
1
EE249
Discussion
Related Work – System Level
Modeling and Analysis
• Polis/CFSMs
– Elements are mapped to hardware and software components
– Performance evaluated via simulation
– Hardware/Software synthesis
• Chinook
– Design of embedded systems
– Mapping to IP blocks
– Synthesized communication
• RASSP System
– VHDL modeling of DSPs
– ADEPT environment for hardware/software co-design
• Abstraction of architecture models can provide a speed up in design
space exploration
• SPADE separates architecture from application models
– Functionality is not modeled in architecture
2
EE249
Discussion
Basics – Workloads and Resources
• Applications generate workloads
– Computation
– Communication
– Storage
• Architecture provides resources
– Computation  processors, coprocessors, ASIC’s, etc…
– Communication  buses, ethernet,
specialized interfaces, etc…
– Memory  RAMs, ROMs, etc…
• System is realization of graph connecting
computation/memory components via communication
components, and the mapping of applications onto it
3
EE249
Discussion
Basics – Traces
• Signals
a
– Logic transitions
– Hardware specific
b
c
• Instructions
dli a0,addr
ld a1,0(a0)
addi a1,1,a1
sd a1,0(a0)
– Specific to ISA
– RISC instructions
• Macro Instructions / Functions
(extremely coarse-grain)
load_next_frame(frame);
decode_frame(frame,temp);
copy_frame_to_buffer(temp);
update();
…
– iDCT
– Structure moves
4
EE249
Discussion
Architecture Modeling
• Functional models not required
• Data dependent behavior results in data
dependent traces
• Built from library of components
• Processing Resource:
– Trace Driven Execution Unit = trace interpreter
• Table of latencies for each instruction
• Could be extended for other metrics (power, cost, etc…)
– Some number of communication interfaces
• Translates generic internal protocol to specific one
• Other Resources included buses, and memories
5
EE249
Discussion
Application Modeling
• Map functions to Kahn
process networks
• unbounded FIFO’s –
acceptable approximation
• Read/Write operations
F1
F3
F2
– generate a trace entry
(bytes transferred over
channel)
– performs the port accesses
in the Kahn Process
Network
_______
______
_______
________
___________
_____
________
_______
______
_________
_____
_______
___________
_____
P1
____________
________
_______
________
_______
___________
________
P2
• Execution operation
– only generates trace entries
6
EE249
Discussion
Mapping, Simulation, and Analysis
• Mapping
– Processes are mapped to a TDEU (n to 1)
– Ports are mapped to interfaces of the TDEU (1 to 1)
• Simulation
– Application and Architectural models are co-simulated
– Traces are generated on the fly
– Performance is generated by co-simulating traces on
architecture
• Analysis
– Utilization, Stalls, Latencies, Bandwidth
– Could add power, area, cost, etc…
7
EE249
Discussion
The Y-Chart
•
•
•
•
•
Applications and architecture are clearly separable
Several applications will be run on this system
Representative applications are collected
Designer makes a best guess at architecture
System is evaluated by mapping each application to
the architecture, simulating, and analyzing resulting
numbers
• Designer then redesigns architecture and/or
applications and repeats the mapping/simulation flow
8
EE249
Discussion
Y-Chart (continued)
Spec
Blocks
Applications (C/C++)
Architecture
Model
Application
Models
Mapping remap
Cycle accurate
simulator
repartition
Databook
rearchitect
Function|Latency
Table
Simulations
Analysis
Guesses
9
EE249
Discussion
MPEG2 Example
• C code was partitioned and mapped to Kahn Process
Network
• Run standalone to gather frequencies of operations, and
bandwidth requirements
• Mapped to TriMedia MPEG2 system (10 processing
elements/33 interface)
• Simulations on a series of streams /
bus loads / frame periods, resulting in a metric
frames dropped
• Slow down for performance simulation was about 3600
from hardware
– 300 CPU days for a 2 hour movie
– Limits to only analyze short clips
10
EE249
Discussion
Conclusion
 Easy exploration of heterogeneous programmable
architectures
 On the fly trace driven co-simulation
 Functionality is not required, only behavior
 Can be extended to analyze any number of metrics
(power, cost, area, etc…) – they didn’t
– Frames_Dropped(x,y,z,…)=0
– Power(x,y,z,…)<25W
– Cost(x,y,z,…)<$30
× Application is partitioned by hand
× Mapping is performed by hand
× Performance characteristics of components must be
simulated, known, or estimated
11
EE249
Discussion