Transcript Document

Andreas Savva, UCY
Final Project Report
ΚΥΠΡΙΑΚΗ ΔΗΜΟΚΡΑΤΙΑ ΕΥΡΩΠΑΪΚΗ ΕΝΩΣΗ
•
•
•
•
•
•
•
Introduction in Many-core architectures.
Main technical objectives of the project.
Project Breakdown.
Work Packages.
Using the developed framework – Case Studies.
Simulation and Results.
Project Outcomes / Deliverables.
•
•
•
•
•
Emerging dominant trend in general purpose CPUS
Expected to be interconnected using on-chip networks
Tens to hundreds of cores
Simple cores, large parallelism
Several design parameters
• I/O system
• Processor Architecture
• Interconnection Network Architecture
• This project aims to:
• Develop a simulation and evaluation framework so that researchers
do parameter exploration related to the aforementioned parameters
1. Developed a simulation and evaluation framework for manycore architectures using JAVA programming language.
2. Developed benchmarks in order to evaluate many-core
architectures.
3. Developed on-chip network simulator which supports different
architectures / routing algorithms and different traffic
patterns.
4. Developed cross-compiler in C/C++ programming language
which translates programs into instructions which can be
executed from the architectures which are under evaluation.
5. Developed new architectures in order to evaluate the
framework.
• Work Packages:
• Progress and Result Dissemination (WP1, WP2).
• Develop simulator in order to interconnect cores (WP3).
• Develop models for the execution units and the cores (WP4).
• Develop Cross-Compiler (WP5).
• Create benchmarks to measure performance (WP6).
• Develop new architectures to evaluate the framework (WP7).
Implementation Strategy
WP1 + WP2: PROGRESS + RESULTS DISSEMINATION
…OVERLAP…
WP3
WP4
WP5
DEVELOP
MANY–CORE
SIMULATOR
DEVELOP
EXECUTION
UNITS
CROSS COMPILER
WP6
BENCHMARKS
WP7
EVALUATE
FRAMEWORK
• Kick-Off Meeting December 2008
• Targeted Application Models Developed
• Application Design Trade-Offs
• Roles
•
•
•
•
Six-Month Progress Reports
18- Month (Interim) Progress Report
Financial Issues
Final Progress Report
• Final Financial issues
• Project Website
• http://www.ece.ucy.ac.cy/labs/easoc/Research/SEFMA/home.html
• Publications
• Publications in selected Journals and Conferences.
• Determine specifications for many-core network simulator.
• Evaluate existent simulation frameworks
• POPNET simulator – C++ program language.
• GPNOC simulator – JAVA program language.
• Adapt simulation framework in order to simulate our many-core
systems.
• Develop traffic models based on many-core applications for
future evaluation
•
•
•
•
Random Traffic Pattern.
Tornado Traffic Pattern.
Transpose Traffic Pattern.
Neighbor Traffic Pattern.
• Develop communication protocol between units and network
• Design and develop unit models
• Cores.
• Memory.
• Input/output data models.
• Framework to develop models based on the specifications.
• Create instruction set architecture.
• Study existing compilers for RISC processors.
• Adapt existing compiler to translate programs into machine
instructions.
• Adapt compiler into the framework.
• Define and evaluate all possible functions of the system based
on :
• Performance
• Power consumption
• Reliability
• Develop algorithms to measure performance, power
consumption, reliability.
• Develop benchmarks for many-core processors in Assembly
language.
• WP Goals:
• Develop and evaluate novel many-core architectures.
• Develop and evaluate algorithms for work distribution in
many-core processors.
• Cross-evaluation of the developed framework based on the
new many-core architectures.
USING/EVALUATING THE FRAMEWORK
Case Studies
• Power Consumption: Major limitation in NoCs.
• Links and NoC routers: the most power-hungry components.
• Intel’s Teraflop NoC prototype suggests that link power
consumption could be as high as 17% and the rest power
consumption is dedicated at routers.
• Reduce both static and dynamic power consumption.
• Proposed works focus on simple static threshold
mechanisms.
Need of new intelligent dynamic power
management policy for NoCs.
Threshold based algorithm for turning links off/on:
• Run Simulation and check link utilization.
• Choose threshold.
• Run simulation.
• If new link utilization smaller than threshold  turn link off
for a period of time.
• After x cycles turn link back on.
NEXT: A new Intelligent Dynamic on/off Link
Management for NoCs based on ANNs.
Hidden layer
{
{
{
Input layer
Output neuron
Artificial Neural Networks
• Information processing paradigm
inspired by the way biological
neurons process information.
• Composed of a large number of
highly interconnected processing
elements (neurons) working in
unison to solve specific problems.
• Used as prediction and
forecasting mechanisms in several
application areas
• Able to determine hidden and
strongly non-linear dependencies.
Intelligent ANN algorithm:
• Pre-training.
• Choose links with minimum link
utilization
• Size of network more
manageable
• Prediction scheme based on
ANN
• Divide network into smaller nets
• Pass chosen links as inputs in
ANNs
• Output  links to turn off
ANN can be used for prediction since
they can discover hidden dependencies.
Power Saves for 8x8 mesh and
torus networks
ANN 1
ANN 2
ANN
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
ANN 3
ANN 4
ANN predictor with NoCs and an 8×8
network partition into four 4×4 networks with
their ANNs.
• Experiments with several NoC
regions.
• Compare hardware overheads
and responding power savings.
• 4×4 NoC region offers
satisfactory power savings and
less ANN overheads when
compared to a 5×5 NoC region.
• 3×3 NoC region does not
provide enough information to
the ANN in order to make
accurate predictions.
• We designed the based ANN
system to monitor 4x4 NoC regions.
Power Saves and hardware overheads for
3x3, 4x4,5x5 NoC regions
Prediction scheme based on ANN
Receive link utilization
for a 4x4 NoC
partition
• ANN mechanism receives all
the average link utilizations
from all the links of the 4×4
NoC partition.
• ANN uses the utilization values
to find optimal threshold
• Determine if a link is going to
be turned off or on for the
next n-cycle interval.
No
Receive from
ALL links
completed?
ANN
mechanism
Yes/timeout
Neural
Network
Intelligently computed
threshold
Chose links based
on threshold
Next time
interval
Output Control packets
to turn on/off links
Monitor link utilization
ANN hardware optimization
• A 4x4 ANN monitors 16 routers => at least 8 input neurons.
• Eight neurons at the input layer of the ANN => hidden layer
should have five neurons.
• Based rule of thumb that a satisfactory number of the hidden
layer neurons equals to half the number of input neurons plus
one neuron.
Try to minimize the size of the hidden layer…
• Choose appropriate size of
the hidden layer of the
ANN.
• Three different ANNs were
developed with five, four and
three neurons at the hidden
layer.
• Using four neurons (instead of
five), in the hidden layer
exhibits the best power
savings for all the traffic
patterns.
Power Savings for different neuron sizes in the
hidden layer
• How the bit representation of
the training weights affects the
threshold computation?
• 24, 16, 8, 6 and 4 bit
representations were used.
• 24, 16, 8 and 6 bits show similar
power savings, but these savings
are significantly reduced when 4
bits are used, due to reduced
training accuracy.
• => 6 bits are chosen, which
made the multiplieraccumulation hardware very
Power savings for different training weight
small
bit representations
• Power savings of the ANNbased mechanism are better
than the savings in the other
cases.
• ANN-based mechanism can
identify a significant amount
of future behavior in the
observed traffic patterns.
• Can intelligently select the
threshold necessary for the
next timing interval.
Power Saves for 8x8 mesh and
torus networks
• Measure throughput in each
mechanism.
• Having no on/off mechanism
yields a higher throughput,
the ANN-based technique
shows better throughput
results compared to statically
determined threshold
techniques.
Throughput for 8x8 mesh and
torus networks
• Measure energy in each
mechanism.
• Energy consumed using ANN
mechanism is less than the other
cases.
• The ANN exhibits a reduction in
the overall energy, because of
a balanced performance-topower savings ratio, when
compared to not having on/off
links or when compared to static
threshold computation.
Normalized Energy for 8x8 torus
networks
• Measure packet latency in
each mechanism.
• The ANN-based mechanism
incurs more delay, but we
believe that the delay
penalty is acceptable when
compared to the associated
power savings.
Average Packet Latency
Receive port
utilization for a 4x4
NoC partition
New Intelligent ANN algorithm:
• Pre-training.
• Choose router ports with minimum
port utilization
• Size of network more manageable
No
Receive from
ALL ports
completed?
ANN
mechanism
Yes/timeout
Neural
Network
• Prediction scheme based on
ANN
• Divide network into smaller nets
• Pass chosen ports as inputs in ANNs
• Output  ports to turn off
Intelligently computed
threshold
Chose ports based
on threshold
Next time
interval
Output Control packets
to turn ports on/off
Monitor port utilization
• When the router ports become unavailable, temporarily or
permanently, X-Y routing cannot guarantee deadlock free
system.
• Since router ports are turned off in our work, a new routing
algorithm must be developed in order to make sure that there
are no deadlocks.
• Fully adaptive routing algorithms perform better in the cases of
faults but they are very difficult to implement due to higher
overhead in silicon area and energy consumption.
• Based on this, a partially adaptive routing algorithm was
chosen in order to achieve a certain degree of fault tolerance
in our system.
• Fault Tolerant Negative First
algorithm is based on the
turn models.
• It makes certain turns
forbidden so that the
deadlock can be avoided.
• A packet is routed at first in
the negative direction in each
dimension and then, it is
routed at the positive
direction. The forwarding
message at first moves to
west or south until the offset
is zero and after that it
moves to the north or east.
Negative First Routing Algorithm
in 8x8 Mesh network
• The power savings of the
ANN-based mechanism
are better compared to
statically-determined
case, and the case
without any on/off ports
for all the traffic models.
Power Saves for 8x8 mesh and
torus networks
• Having no on/off
mechanism yields a
higher throughput;
however, the ANN-based
technique yields better
throughput when
compared to the
statically-determined
threshold
Normalized throughput for 8x8
mesh and torus networks
• Framework can be used from researchers in order to evaluate
many-core architectures.
• It helps to compare how the number of cores affects the total
power consumption of the network.
• Intel showed that the number of cores may be affected from the
power consumption because of the increase number of routers,
interconnects and data travelling through the network.
• Researchers can do parameter exploration related to manycore architectures.
• This new Network on Chip framework helps researchers to solve
different NoC tasks through simulations.
• Smooth flow of work
• Some simulator problems have been overcome
• Help from Dr. Soteriou and Drs. Michael and Chadjicostis
• Results Dissemination on target with Project Goals.
• Publications in conferences/journals
• Participation in ISVLSI Conference July 2011, Chennai, India.
• Publication in Journal of Electrical and Computer Engineering, Hindawi
Publishing Corporation, 2012.
• Submission at the ISVLSI 2012: paper for turning router ports on/off.
(Under Review)
ARTICLES:
• A. Savva, T. Theocharides, V. Soteriou, “Intelligent On/Off Link
Management for On-Chip Networks”, In Proc. IEEE Annual Symposium
on VLSI, pp. 343 – 344, July 2011.
• Under Review: A. Savva, T. Theocharides, V. Soteriou, “Intelligent
On/Off Router Ports Management for Networks on Chip”, ISVLSI
Conference 2012
JOURNALS:
• Andreas G. Savva, T. Theocharides, V. Soteriou, "Intelligent On/Off
Dynamic Link Management for On-Chip Networks," Journal of
Electrical and Computer Engineering, vol. 2012, Article ID 107821,
2012
POSTER:
• Poster at HiPEAC Ph.D. Student Poster Presentation - Paphos, Cyprus,
January 2009.
WORKSHOP:
• Results of this work were presented in a workshop at KIOS Research
Centre – 30 Nov. 2011
• D1: Six Month, Interim, Final Report, Financial Reports
• D2: Project Website, Publications
• D3: Network communication simulator in JAVA, Four traffic
models for purposes of simulation and evaluation of the
network (Available source code)
• D4: RISC processor models, memory models, core models, Input
Output models (VHDL/C++ Code)
• D5: Cross-compiler
• D6: Benchmarks, Algorithms for power consumption and
performance measurements.
• D7: Many-core architectures, Evaluation of the developed
framework.
• Dr. Maria K. Michael – for the verification and automation
algorithms feedback.
• Dr. Christoforos Hadjicostis – for the reliability aspects and the
discrete event algorithms employed in building the simulator.
• Dr. Vassos Soteriou - for the feedback on the Interconnect.
• Dr. Theocharis Theocharides - for the coordination of this project
and all the help.
ΚΥΠΡΙΑΚΗ ΔΗΜΟΚΡΑΤΙΑ ΕΥΡΩΠΑΪΚΗ ΕΝΩΣΗ
Project Host Organization
University of Cyprus
Andreas Savva, Theocharis Theocharides , Maria K. Michael,
Christoforos Hadjicostis
Collaborating Partners
Cyprus University of Technology
Vassos Soteriou