Transcript SlidesA

Low Contention Mapping of RT Tasks
onto a TilePro 64 Core Processor
1 Background Introduction = why
2 Goal
3 What
4 How
5 Experimental Result
6 Advantage & Limitaion
7 Significance & Improvement
Lei Cui
Related Terms & Concepts









Predictability
TilePro 64-Core Processor
Contention
Static Timing analysis
NoC
IPC
Full-deplex (communication)
Jitter
Hyper-Period
1 Background Introduction (why)

The predictability property of task execution is very important in
the RT system, especially the RT tasks, in addition, its upper bound
of execution times can be determined via static timing analysis. This
method may result in the unsafe underestimations under a situation
that when the underlying communication paths are not determined,
that is, when data from multiple sources share parts of a routing
path in the NoC, which can lead to a thing to happen---contention.

Therefore, the contention analysis is a must to guarantee to
provide a safe and reliable bounds. At the same time, the paper
takes a measure of utilizing a multi-core architecture to achieve
mapping tasks to cores in such a way the contention is minimized.
In addition, the less is the number of cores, the more possible the
overhead incurs under the situation of IPC.

In addition, the contention will lead to the latency, and then lead to
unsafe underestimation, and then lead to unpredictability.
1 Background Introduction (con)
Example:
two messages 38 and 42 sent at the same time
Effect:
The contention on the link 45 is resulted, and then result in
delay, and then latency, and then missed deadline, and then
unbounded time, and then unpredictability, and then non-RT
Drawback
1) The exhaustive approaches do not scale beyond small NoC mesh sizes as
they can take days to solve mapping layouts.
2) Previous work viewed communication as temporally stateless, which
limited the amount of communication that could feasibly be solved.
3) It also resulted in solutions that were overly conservative in that any
potential for common message routes were considered contention.
Improvement
1) by separating temporally disjoint messages when analyzing link
contention scenarios and thus increasing communication predictability.
2 Goal
Increase the predictability of RT tasks on
NoC architectures
 Models & Solutions to low or minimize
contention during communications.

3 What (Contributions)

Exhaustive Solver Model exhaustively
maps RT tasks onto cores to minimize
contention and improve predictability

SBTF to map communication traces into time
frames to ensure separation of analysis for
temporally disjoint communication

Heuristic Model, HSolver for rapid
discovery of low contention solutions
4 How – SBTF (Software-Based Temporal Framing)
Temporal Framing 9
4 How – Exhaustive Solver Model
4 How-Exhaustive Solver Model (continue)
For example:
4 How – Heuristic Model (Hsolver)
4 How – Heuristic Model (Hsolver-con)
Example:
Maximum Cross Chat First (TMH)
Degree(8) = 4, Degree(6) = 4 ==> 8,6 map empty cores (Group 1)
Degree(3) = 3, Degree(4) = 3 ==> 3,4 map empty cores (Group 2)
Degree(7) = 2, Degree(1) = 2 ==> 7,1 map empty cores (Group 3)
Degree(5) = 1, Degree(2) = 1 ==> 5,2 map empty cores (Group 4)
Degree(0) = 0 ==> 0 map empty cores (Group 5)
Task Scheduling Sequence is
8, 6, (6,8). 3, 4, (4, 3), 7, 1, (1, 7), 5, 2, (2, 5), 0
Here final choose sequence:
8, 6, 3, 4, 7, 1, 5, 2, 0
Maximum Cross Chat First (CMH)
Task
Core
5 Experimental Result (Ex 1)
The 1st experiment compares the minimum solutions for each of the
solvers as the complexity of the systems increase.
This experiment evaluates the minimum aggregate cost
across 100 randomly generated task sets in naive, heuristic
and exhaustive model mappings as the NoC size increases
along with a linear increase in the number of messages.
5 Experimental Result (Ex 2)
The 2nd experiment is to evaluate the HSolver approach to determine
the rate at which heuristics were used to generate the low-cost solution.
Percent Use of Core Selection Strategies
Percent Use of Task Selection Strategies
The left result shows the core selection strategies and the percent of use of each
during heuristic solving, and a significant variation in the effectiveness of core
strategies. Overall, minimizing the distances between frequently communicating
cores is the most beneficial heuristic.
The right picture shows that correlates well with the results where two selection
strategies account for 98% of the low-cost solutions. The most effective solution is
generally obtained by selecting tasks by Maximum Cross-Chat relative to the
currently mapped tasks.
5 Experimental Result (Ex 3)
The 3rd experiment assesses the impact of link contention on communication jitter.
X-axis represents the 10 randomly generated task
sets, each of which contains 200 messages within
their hyper-period;
Y-axis represents the standard deviation in clock
cycles for different tasks sets for the three
mapping approaches.
This figure shows that any single contended link can have a significant impact on the standard
deviation of transfer latencies.
Table shows the timing results for each configuration
evaluated in this experiment, all results determined by
the heuristic approach converged within a second.
Using the exhaustive solver, convergence can take up
to 70 of minutes for solutions with contention.
5 Experimental Result (Ex 4)
The 4th experiments illustrate the impact of unavoidable contention on realtime predictability.
These pictures depict the cost for sends and receives for one-to-one and two-to-one
pairing of senders/receivers
This experiment shows the worst-case experienced over multiple runs and
emphasises the significant impact that contention can have on bounding WCET.
6 Focus-on & Improvement
NoC architecture with static routing without alternate path routing
Address homogeneous architecture & resource mapping to reduce overhead
Hard RT system and consider communication first rather
Predictability for RT system instead of power & utilize currently available
architectures instead of resorting to simulation
Reduction of contention to increase predictability
Implement on top of an architecture that does not provide contention
avoidance at the hardware level
Software model allows for variable frame sizing to avoid impeding performance
in system with little contention
Improvement: 1) the exhaustive solver to determine optimal mapping for
solvable NoCs; 2) Hsolver generates fast and low contention solutions for
heavily contended NoCs; 3) Hsolver can reduce aggregate contention by up to
70% while reducing jitter by up to 40%;
7 Significance

1) the first work to consider IPC for WC time frames to simplify analysis and to
measure the impact an actual hardware for NoC-based real-time multi-core systems.

2) the first work to address predictability of NoC communication via framing
messages into temporal windows for real-time tasks.
Question
Experiment 3
 Experiment 4
