Basics of Product Development
Download
Report
Transcript Basics of Product Development
Predictable Design
of
Embedded Systems
using
Networked Architectures
Henk Corporaal
www.ics.ele.tue.nl/~heco
ASCI Winterschool on Embedded Systems
Rockanje, March 2006
Outline
Trends and design problems
Unpredictability
Platforms
Predictable design
Proposed design flow
Open issues
Note: this lecture is not about a solved problem
ASCI Winterschool 2006
Henk Corporaal
(2)
Outline
Trends and design problems
Embedded systems everywhere
Design practice
Design complexity
Memory wall
Unpredictability
Platforms
Predictable design
Design flow
Open issues
ASCI Winterschool 2006
Henk Corporaal
(3)
Embedded systems everywhere
Convergence of 3 Cs
computers, communications and
consumer electronics
The computer enters the 3rd fase
computing power - networking - intelligent
processing
The world is 1 network
wherever, whenever, all information and
communication available
We get a smart environment
ASCI Winterschool 2006
Henk Corporaal
(4)
Design practice:
Informal system specification
System Task
people
Task
Task
Paper spec
Hardware vhdl
people verilog
C
ASM
Software
people
Integration
ASCI Winterschool 2006
Henk Corporaal
(5)
Design practice
Behavioral
specification
System
Algorithm
Structure
description
R/T
Logic
circuit
Y-Chart (Gajski-Kuhn)
Design Flow is path in Y chart
Physical
realization
Till RT-level largely manual flow
ASCI Winterschool 2006
Henk Corporaal
(6)
Design complexity problem
complexity
Process technology + 58%
103
102
HW gap
HW design productivity +21 %
SW gap
101
SW productivity + 8 %
4
ASCI Winterschool 2006
8
12
16
year
Henk Corporaal
(7)
Hitting the memory wall
Performance
µProc:
55%/yea
r
1000
10
Processor-Memory
Performance Gap:
(grows 50% / year)
CPU
100
“Moore’s Law”
DRAM:
7%/year
DRAM
1
1980
1985
1990
1995
2000
2005
Time
[Patterson]
ASCI Winterschool 2006
Henk Corporaal
(8)
Outline
Trends and design problems
Unpredictability
Platforms
Predictable design
Proposed design flow
Open issues
ASCI Winterschool 2006
Henk Corporaal
(9)
Unpredictability at all levels
applications
architectures
DSM VLSI design
Uncertainty increases at all levels
ASCI Winterschool 2006
Henk Corporaal
(10)
Application: Two forms of unpredictability
mem
Txt
Video
In1
Video
In2
NR
NR
HSRC
HSRC
gen
VSRC
VSRC
mix
100Hz
mem
HSRC
Peak
Matrix
VSRC
mix
mem
resources
Applications can be data dependent
Applications may have different
scenarios
time
ASCI Winterschool 2006
Henk Corporaal
(11)
In addition: dynamic changing set of
applications
Multi-standard modem operation
Several applications have to be activated simultaneously
Too many combinations for an analysis at design time (non
deterministic events)
[Philips EVP]
SCH = SCH search
SCH
100
SCH
CPICH search
Compute load
125
75
50
25
SCH
Initial
acquisition
ASCI Winterschool 2006
SCH
Inter-system
handover
SCH
CPICH search
SCH
CPICH search
RAKE
chip-rate
processing
RAKE
chip-rate
processing
RAKE sym-rate proc.
RAKE sym-rate proc.
WLAN acquisition
UMTS
connected
UMTS connected/
WLAN acquisition
SCH
CPICH search
WLAN receiver
WLAN connected/
UMTS monitoring
time
Henk Corporaal
(12)
Architecture unpredictability
ext.
mem
mem
arb.
Local schedulers:
cpu $
OS
task switching
interrupts
IP
interconnect
busses, bridges
networks
memory controllers
IP
…
IP
external memory
e.g. RR, TDMA, FCFS,
LRU, EDLF, FIFO,
priority, …
IP
IP
…
IP
IP
IP
…
IP
IP
IP
interconnect
cache pollution
IP
interconnect
IP
interconnect
cache strategy
$ cpu
IP
…
IP
IP
What is the global behavior (end-to-end),
composed of interacting local solutions ?
ASCI Winterschool 2006
Henk Corporaal
(13)
DSM VLSI Unpredictability
Global wiring delay becomes dominant over gate delay
(timing closure)
Gate delay vs. wire delay
400
350
300
ps
250
wire delay (ps/mm)
200
gate delay (ps)
150
100
50
0
0.5
0.35
0.25
0.18
0.13
0.1
technology (micron)
ASCI Winterschool 2006
Henk Corporaal
(14)
DSM VLSI Unpredictability
Length of
Isosynchronous zone
as function of frequency
Other DSM problems:
Clock distribution, skew
VDD and VSS voltage drop
Signal integrity, cross-talk
Variance in process parameters increases
ASCI Winterschool 2006
Henk Corporaal
(15)
Unpredictability: Design Closure problems
Design closure =
a realization meets all
requirements, including
functionality, speed, power,
area, yield, etc.,
without design iterations
application
mapping &
scheduling
architecture
placement &
routing
Closure problem
at all levels
ASCI Winterschool 2006
FPGA realization
VLSI realization
Henk Corporaal
(16)
Computational Requirements →
Unpredictability: Design Closure problems
1200%
1000%
800%
600%
400%
Orders of
Magnitude
200%
0%
Time →
Mapping with performance guarantees looks impossible !!
ASCI Winterschool 2006
Henk Corporaal
(17)
Solution ingredients:
Higher abstraction levels
SW and HW IP reuse / PnP principle
Standards
Avoid large design iterations
Design correct by synthesis
Avoid worst case resource requirements
How do we achieve all of this?
ASCI Winterschool 2006
Henk Corporaal
(18)
Outline
Trends and design problems
Unpredictability
Platforms
Predictable design
Design flow
Open issues
ASCI Winterschool 2006
Henk Corporaal
(19)
What is a platform?
Definition:
A platform is a generic, but domain specific
information processing (sub-)system
• Generic means that it is flexible, containing programmable
component(s).
• Platforms are meant to quickly realize your next system
(in a certain domain).
• Single chip?
ASCI Winterschool 2006
Henk Corporaal
(20)
Platforms, why?
- Reuse
- Short Time-to-Market
- High Quality
•
•
•
•
•
Flexible and Programmable
Large software component
Standardization
Optimized for specific domain
and you do not have to solve this design closure problem !!
ASCI Winterschool 2006
Henk Corporaal
(21)
Platforms separate the design communities !
SDT
system design
technology
PDT
platform design
technology
Design technology
Applications
Platform
Enabling technologies
ASCI Winterschool 2006
Henk Corporaal
(22)
Platform examples: Digital camera
Sanyo [Okada99]
ASCI Winterschool 2006
Henk Corporaal
(23)
TI OMAP
Up to 192Mbyte off-chip memory
192Kbyte shared SRAM
8Kb data cache (2-way,
512 lines of 16 bytes)
Write buffer (17 elements)
16Kb (2-way)
16Kb (2-way)
8Kb mem (2x 4K)
64Kb dual port (8x 4K x 16b)
96Kb single port (12x 4k x 16b)
32Kb ROM
ASCI Winterschool 2006
Henk Corporaal
(24)
SpaceCake (Philips research)
Homogeneous: set of equal tiles
Per tile e.g.:
n * MIPS
m * TriMedia
Accelerators
k * L2 Cache bank
Shared memory
Cache coherency
Big interconnect switch
switch
L2 cache memory banks
Inter Tile:
Router
Message passing
Working on inter tile cache coherence
ASCI Winterschool 2006
Single tile
Henk Corporaal
(25)
IMAGINE Stream Processor (Stanford)
IMAGINE = SIMD of VLIWs
It is controlled by a host processor, which send it stream
instructions (Load, store, receive, send, VLIW op, load microcode)
ASCI Winterschool 2006
Henk Corporaal
(26)
Hybrid FPGAs: Xilinx Virtex 4-Pro
GHz IO: Up to 16 serial transceivers
PowerPCs
Memory blocks &
Multipliers
PowerPC
ReConfig.
logic
Reconfigurable logic
blocks
Courtesy of Xilinx (Virtex II Pro)
ASCI Winterschool 2006
Henk Corporaal
(27)
Fundamental platform design decisions
Homogeneous versus Heterogeneous ?
Bus versus Network ?
Shared memory versus Message passing ?
QoS support, Guarantees built-in ?
Generic versus Application specific ?
What types of parallelism to support ?
ILP, DLP, TLP
Focus on Performance, Power or Cost ?
Memory organisation ?
HW or SW reconfigurable ?
And further:
OS support, Middleware ?
Mapping support?
ASCI Winterschool 2006
Henk Corporaal
(28)
Homogeneous or Heterogeneous
Homogenous:
replication effect
memory dominated any way
solve realization issues
once and for all
less flexible
ASCI Winterschool 2006
Henk Corporaal
(29)
Homogeneous or Heterogeneous
Heterogeneous
more flexible
better fit to application domain
smaller increments
no tile reuse
ASCI Winterschool 2006
Henk Corporaal
(30)
Homogeneous or Heterogeneous
Middle of the road approach
Flexibile tiles
Fixed tile structure at top level
tile
router
ASCI Winterschool 2006
Henk Corporaal
(31)
Reconfiguration time
HW or SW reconfigurable?
reset
FPGA
Spatial mapping
loopbuffer
context
Temporal mapping
Subword parallelism
1 cycle
fine
ASCI Winterschool 2006
Data path granularity
VLIW
coarse
Henk Corporaal
(32)
Outline
Trends and design problems
Unpredictability
Platforms
Predictable design
Current practise
Predictability
Architecture consequences
Design consequences
Design flow
Open issues
ASCI Winterschool 2006
Henk Corporaal
(33)
How should we design ?
Trajectory, from Idea to Realization
Desicions based on models
Abstract from implementation details (not all known yet)
Relatively cheap to create, validate and simulate
Idea
Concepts
Requirements
Design Problem
• Generate Ideas
Design Time
• Construct Models
“Steers”
• Evaluate Properties
• Make Design Decisions
Realization
ASCI Winterschool 2006
Henk Corporaal
(34)
Current practice
Mapping, easy, but...........
Given
reference C code for application
e.g. MPEG-4 Motion Estimation
platform: SUPERDUPER-LX50
Idea
a=b*5+d;
for (...)
{..
}
Task
map application on architecture
But … wait a moment
me@work> CC –o2 mpeg4_me mpeg4_me.c
Thank you for running SUPERDUPER-LX50
compiler.
Your program uses 257321886 bytes
memory, 78 Watt, 428798765291 clock
cycles
ASCI Winterschool 2006
Henk Corporaal
(35)
Current design process
application
mapping
constraints
OK ?
yes
Post analysis: check constraints after mapping
no
Simulation based
Does it still work for other data ?
Does it still work when other applications are active ?
Too many iterations
Easy to program, hard to tune
Can this be improved ?
e.g. Constraints = input
ASCI Winterschool 2006
Henk Corporaal
(36)
Predictable design
What is it?
Being able to reason at a high level about a design (in terms of
functional and non-functional properties) and
Being able to realize this design without time consuming
iterations in the design flow (design closure)
How:
Predictable architecture
Making resources predictable
Proper modeling of less predictable elements
Predictable design flow
Compositionality
Composability
Design time analysis Run time analysis
ASCI Winterschool 2006
Henk Corporaal
(37)
Making architectures predictable
Getting rid of all unpredictable elements
Caches ?
No problem, but WCET estimation may be big and
unacceptable !
Software controlled
locked cache lines
non-cachable memory
controlled replacement
Shared memory
Communication
ASCI Winterschool 2006
Henk Corporaal
(38)
Making architectures predictable: NoC
Philips AETHEREAL
Router provides both
guaranteed throughput
(GT) and best effort
(BE) services to
communicate with IPs.
Router
Network
Combination of GT and
BE leads to efficient use
of bandwidth and simple
programming model.
R
IP
ASCI Winterschool 2006
Network
Interface
R
R
R
R
R
R
R
R
Network
Interface
IP
Network
Interface
IP
Henk Corporaal
(39)
Making the NoC predictable:
how to support GT traffic?
Time wheel concept
control injection traffic at network interface
8
7
2
6
3
5
ASCI Winterschool 2006
time
1
4
Henk Corporaal
(40)
Making the design flow predictable :
Compositionality
High level
design
a
b
y
x
z
P(x,y) if [P(a,b),...] !
Low level
design
a
b
y
x
z
P(x,y) if [P(a,b),...] ?
ASCI Winterschool 2006
Henk Corporaal
(41)
Making the design flow predictable
Design time
Determine of upper bounds on time and resources
pareto curves
Scenario discovery:
Freq
separate your application in parts for which upper bounds
not too far from worst case
Sc1
Sc2
Sc3
Load
ASCI Winterschool 2006
Henk Corporaal
(42)
What do we want ? Design time analysis
Single application
Reasoning about end-to-end timing constraints (for given
resources and quality) = predictability
Which local arbitration mechanisms are needed ?
How to translate this to the global level ?
Example:
Given
Comp. Resources
Bandwidth
Buffer size
Throughput
Pareto curve
A5
A1
P1
A2
P2
A4
A3
P3
P4
1/Throughput
(q1,c1)
ASCI Winterschool 2006
Cost (resources)
Henk Corporaal
(43)
Scenarios: MP3
ASCI Winterschool 2006
Henk Corporaal
(44)
What do we want ? Composability
Multiple applications
If app. 1 and app. 2 fit each individually, what can be said about
the combination ?
Concept of virtual platform
A1
A2
Proc1
A3
ASCI Winterschool 2006
Proc2
A4
Henk Corporaal
(45)
Predictability: Composability
Can we add Pareto points?
application 1
application 2
Q
Q
(q1,c1)
(q2,c2)
Cost (resources)
Cost (resources)
+
(q1+q2,c1+c2) ?
ASCI Winterschool 2006
Henk Corporaal
(46)
Problem: Predictable Resource utilization?
50
A
50
50
50
B
50
50
Mapping & Scheduling
P1
ASCI Winterschool 2006
P2
P3
Henk Corporaal
(47)
Problem – Predictable Resource utilization?
50
A
50
50
50
B
50
50
Add ordering
dependences (edges)
P1
A
P2
B
P3
t0 t1
t2
Only 50%
processor
utilization !
t3
Scheduling conflict!
ASCI Winterschool 2006
Henk Corporaal
(48)
Where is the problem?
Different throughput obtained for different order of
actors
Possibilities of overall graph increases exponentially
with number of actors and individual graphs
Very difficult to do a complete analysis to obtain an
optimal order
Hard to model and analyze different arbitration
strategies realistically
ASCI Winterschool 2006
Henk Corporaal
(49)
Problem – Too many possibilities!
3
A
3
3
3
B
1
5
3
5
C
1
ASCI Winterschool 2006
Henk Corporaal
(50)
So, what is Composability?
The degree to which we can analyze the applications
in isolation:
Throughput, Latency, Resource utilization, Deadlock,
Switching / reconfiguration overhead, etc.
Design time analysis for complete system is too
expensive and often infeasible
Each job should be executed as if it had access to its
own dedicated resources – Virtualization
Consider applications separately and then reason
about the behavior of overall system
ASCI Winterschool 2006
Henk Corporaal
(51)
Providing a Bound for Resources
Arbitration strategy plays an important role in
determining resource requirement
A naive strategy leads to over-estimation of resources
Worst-case estimate is not always possible
Need predictable arbitration mechanism
More ‘realistic’ worst case bounds
Handle dynamism in the system
An overall quality versus resources Pareto curve
needed
ASCI Winterschool 2006
Henk Corporaal
(52)
Making the design flow predictable:
Run-time aspects
Scalable applications
QoS management
Application n
Application n / Scenario m
Local manager
Local manager
QoS protocol
Global manager
Platform
ASCI Winterschool 2006
Henk Corporaal
(53)
Quality-1 →
Match quality with resources
Computational Requirements →
ASCI Winterschool 2006
Henk Corporaal
(54)
Outline
Trends and design problems
Unpredictability
Platforms
Predictable design
Design flow
Open issues
ASCI Winterschool 2006
Henk Corporaal
(55)
Design flow
Idea
C
Requirements spec
Models
Spec
Reactive Process Network
POOSL/SystemC
Kahn Process Network (YAPI)
BDF
SDF
correct by
synthesis
Platform
ASCI Winterschool 2006
Henk Corporaal
(56)
RPN (Reactive Process Networks):
events and streaming
Event_in
• Processing of events
•Finite State Machine
• Controlling host-CPU (e.g. ARM)
• RTOS; hard real-time
• ‘classical’ SW complexity
mode
Stream_in
ASCI Winterschool 2006
• Soft Real-time
• Compute intensive
• Special hardware
Event_out
status
Stream_out
Henk Corporaal
(57)
POOSL Modeling Language
Mathematically defined semantics
Allows formal analysis of model properties
Can formally describe:
concurrency
synchronous communication
timing (delay statements)
functionality
P1
P2
delay 1;
ASCI Winterschool 2006
Henk Corporaal
(58)
POOSL: Phases of Model Execution
State space
State space
State space
Synchronous
time passage
Asynchronous
actions execution
model
time
ASCI Winterschool 2006
Henk Corporaal
(59)
From Model to Realization
a
S1
delay d1
S2
b
S3
S5
c
Possible execution (timed) traces:
delay d2
S4
S6
(S1, t1), (S2, t1), (S3, t1+d1), (S5, t1+d1)
(S1, t1), (S2, t1), (S4, t1+d2), (S6, t1+d2)
a()();
(S1, t1), (S2, t1+wcet(a)), (S3, t1+d1),
(S5, t1+d1+wcet(b))
(S1, t1), (S2, t1+wcet(a)),
(S4, t1+wcet(a)+wcet(c)), (S6, t1+d2)
ASCI Winterschool 2006
sel
delay d1; b()();
or
c()(); delay d2;
les;
Henk Corporaal
(60)
-Hypothesis: property preservation
If the time-deviation between two timed execution
traces is less than , then, if one trace satisfies a realtime property, that property, weakened upto , is
preserved in the second one as well
a
d1
b
Model
time
t1
t2
d1 - ε1
t’1
ASCI Winterschool 2006
ε1, ε2 < ε
t’2
a
b
t’1 + ε1
t’2 + ε2
Physical
time
Henk Corporaal
(61)
Extending SDF
SADF: Scenario Aware Data Flow
Can deal with dynamism
Still possible to reason about
deadlock,
resource utilization,
latency and throughput
Currently implemented in POOSL
ASCI Winterschool 2006
Henk Corporaal
(62)
SADF example: MPEG-2 Decoder
Pipelined MPEG-2 decoder for I and P frames
d
VLD and IDCT fire per macro-block
VLD
MC and RC fire per frame
a
1
FD (frame detector) models control part of VLD
that determines frame type
b
c c
Image size = 176x144
1
I-frame
99 macro-blocks
No motion vectors
Px-frame
x macro-blocks
Motion vectors from VLD to MC
Previous frame from RC to MC
P0-frame (still video)
Copy previous frame
FD model based on occurrence
probability of frame types
Execution time distributions of
kernels determined with profiling tool
ASCI Winterschool 2006
d
1
IDCT
d
1
1
1
MC
1
1
FD
1
1
1
1
e
RC
1
3
Rate
I
P0
Px
a
0
0
1
b
0
0
x
c
99
1
x
d
1
0
1
ex = {30, 40,
9950 ,60, 70,
0 80, 99} x
Henk Corporaal
(63)
Results for MPEG-2 Decoder
Time unit = 1 kCycle
Process
Throughput
VLD
0.063
rel. error ≤ 0.036%
IDCT
0.063
rel. error ≤ 0.036%
MC
0.00106
rel. error ≤ 0.190%
RC
0.00106
rel. error ≤ 0.191%
Average Latency between
Successive Firings
Accuracy results based on
confidence levels of 0.95
Process
Max. Latency between
Successive Firings
Variance in Latency between
Successive Firings
VLD
710
15.99
rel. error ≤ 0.031%
75.38
rel. error ≤ 0.18%
IDCT
698
15.99
rel. error ≤ 0.031%
56.45
rel. error ≤ 4.99%
MC
3305
940.3
rel. error ≤ 0.017%
2.4·105
rel. error ≤ 3.46%
RC
2216
940.3
rel. error ≤ 0.017%
1.5·105
rel. error ≤ 4.99%
Channel Memory
between Processes
Maximum
Occupancy
VLD and IDCT
9
1.910
rel. error ≤ 0.064%
0.528
rel. error ≤ 1.99%
IDCT and RC
154
60.19
rel. error ≤ 0.178%
671.8
rel. error ≤ 4.55%
VLD and MC
133
34.73
rel. error ≤ 0.517%
698.4
rel. error ≤ 4.39%
MC and RC
1
0.577
rel. error ≤ 0.561%
0.244
rel. error ≤ 3.27%
ASCI Winterschool 2006
Time-Average Occupancy
Time-Variance in Occupancy
Henk Corporaal
(64)
Design flow
Run-time
Combine pareto points
exploit pareto algebra
QoS management / scalable application
ASCI Winterschool 2006
Henk Corporaal
(65)
Mapping multiple jobs
T0
T1
T2
Multiple jobs can be active simultaneously.
When can a second job start ?
Are the requested resources available ?
If not, can the quality level be lowered ?
If not, can other jobs go for a lower
quality ?
If yes, independent from other jobs ?
How to give guarantees?
resources
100%
time
reconfiguration
ASCI Winterschool 2006
Henk Corporaal
(66)
Combining Pareto points
Cost
Application 1
80
Cost
100 Cycle Budget
Cycle Budget
+
Cost
ASCI Winterschool 2006
Application 2
•A new thread frame coming
•20 cycle budgets available
Application 3
Cycle Budget
Henk Corporaal
(67)
Combining Pareto points
Cost
Application 1
80
Cost
Application 2
100 Cycle Budget
Cycle Budget
Cost
Application 3
feasible,
but optimal?
20
ASCI Winterschool 2006
Cycle Budget
Henk Corporaal
(68)
Combining Pareto points
Cost
Application 1
Application 2
Cost
cost increase
1
80
80 100 Cycle Budget
Cycle Budget
Cost
Application 3
cost decrease
and
2 > 1
20
ASCI Winterschool 2006
40
a better
solution
Cycle Budget
Henk Corporaal
(69)
Outline
Trends and design problems
Unpredictability
Platforms
Predictable design
Design flow
Open issues
ASCI Winterschool 2006
Henk Corporaal
(70)
Open issues
Gap between specification and architecture modeling
High level modeling
use of modeling pattern library
Incorporate multiple pareto solutions into DSE
Pareto Algebra
Get synthesis correct for
control applications including compute intensive tasks
mapping to multi-processor
Managing QoS
Scenario detection, merging, prediction and exploitation
Runtime resource manager optimizing overall quality
Measuring overall quality
ASCI Winterschool 2006
Henk Corporaal
(71)
Open issues (cont'd)
Architecture modeling
how to deal with local memory (scratch pad / cache)
Modeling scheduling and arbitration
make things composable !
Definition NAL (run-time services)
Automatic partitioning
e.g., SPRINT tool of IMEC is a good start (C to SystemC)
VLSI tiling
…. and many more …..
e.g. see: Ogras e.a.: Key research problems in NoC Design
A holistic perspective
CODES – ISSS 2005
ASCI Winterschool 2006
Henk Corporaal
(72)
ASCI Winterschool 2006
Henk Corporaal
(73)