Advanced Processor Technologies

Download Report

Transcript Advanced Processor Technologies

Advanced Processor
Technologies
group overview
1
APT group
• Mission:
– Moore’s Law will soon deliver billion
transistor chips
– how do we make best use of a
billion transistors?
• parallel processing
• systems-on-chip
• novel architectures
• …?
2
Strategy/Vision
• Integrating focus on many-core
systems
– hardware, architecture, run-time
systems, compilation, programming
languages
– general-purpose and special-purpose
– homogeneous and heterogeneous
3
Strategy/Vision
• Underpinning technology themes:
– energy-efficiency
– reliability & fault-tolerance
– FPGAs & reconfigurability
– silicon design
– 3D packaging and modelling
– many-core interconnect
– neural systems engineering
4
Major Funding
• ERC Advanced Grant: Biologically-
Inspired Massively Parallel Computing
• EPSRC Programme Grant: PAMELA
• EU ICT Flagship: Human Brain Project
• Plus…
– BABEL, BrainScaleS, PRiME, DOME,
GAELS, INPUT, AXLE, Teraflux, AnyScale
Apps …
5
Many-core
Architecture and
Software
Mikel Lujan
Antoniu Pop
6
Buying a single-core
processor is difficult!
Multi-cores bring fundamental
changes for Computer Science
[applications, programming languages, compilers runtime
systems (OS), computer architecture]
7
Active projects
• Many cores systems
– Teraflux – parallel computational model
• Novel programming language
• Novel many-core memory organization
– Focus on hardware/software codesign Managed
Runtime Environments and Low-Power Many-core
Architectures
• DOME Delaying and Overcoming Microprocessor Errors
• PAMELA – Computer Vision & Data Centers
• AnyScale Apps – Approximate Computing
• Big Data
– AXLE
• Accelerating Analytics of Big Data
– RETHINK.big EU Roadmap for Big Data
8
OpenStream Project
• Make data flow programming easy
– OpenMP-style annotations on C code
– Visualization tool to
debug performance
issues: OSTV
• Efficient execution on Many-Cores
– Code generation for existing (x86, ARM) and
experimental architectures (Teraflux)
– Optimizations (e.g., polyhedral compilation)
– Runtime algorithms optimized for weak
memory consistency models
9
Managed Runtime
Enviroments
• Java, .Net are examples of managed
runtime environments (JVM, CLR)
• Key elements: JIT compilation and control
of memory allocation
• Research opportunities:
– Scaling MREs for many-core architectures
(GPUs)
– Hardware acceleration of MREs
– Use MREs for low-power computing
– Use MREs for dealing with faults and transistor
wearout -> DOME
10
Simulate/Prototype
many-core architectures
• Designing a chip is expensive and time consuming
• Computer architects build software models to
simulate new architectures
• Simulation can be slow (months to run one
application)
• How we can accelerate this process? Research
opportunities
– New modelling techniques
– FPGA prototyping
11
AXLE & Big Data
• Collaboration with Dr. Javier Navaridas & Dr.
Gavin Brown (MLO group)
• Amount of data generated in scientific
experiments or social web keeps growing!
• Graph-based data -> complex computation
• How can we make sense of this data deluge?
– New Learning techniques capable of working at scale
– Redesign architectures (clusters/data centres) and software for
low power analytics
– Accelerate software (JIT adaptation) for data processing
– Hardware acceleration for low-power learning algorithms
12
Communication
Architectures
Javier Navaridas
13
Interconnection
Networks
• On-chip networks
– Tile-based systems
– Heterogeneous systems
• High performance computing networks
– Massively Parallel Processing systems
– Compute (Super)Clusters
• Datacentre networks
– Off-the-shelf equipment
– High performance alternatives
• Performance Metrics
– Throughput, Latency
– Power, Area
– Fault tolerance
– Applications running time
14
Topics of interest
• Topologies
– Routing
– Wiring
– Fault resilience
– Deadlock avoidance
• Router microarchitecture
– Congestion control
– Quality of Service
– Fault tolerance
• Scheduling and resource management
– Task placement
• System and workload modelling
– Analytical modelling
– Simulation
15
INPUT Project
•
In collaboration with Durham University
•
Investigate how practical and theoretical aspects can be put
together to improve efficiency and performance
•
Main research questions
– Can we design interconnection networks that are incrementally
expandable?
– Can distance properties be more accurately ascertained?
– Can minimal routing algorithms be developed?
– Can we embed theoretical properties into the router architecture?
– Can we reflect components behaviour into theoretical analyses?
– How well do theoretically advantageous networks perform under
realistic conditions?
– Can we describe specific traffic patterns arising from applications
using graph theory?
– Can we transform traffic pattern graphs so that they are
‘embeddable’ into a network?
16
High Performance
Computing
Graham Riley
17
Graham Riley
• Interests:
– Parallel Performance Analysis and Improvement
• Techniques and methods
• Scientific applications
– Numerical algorithms and implementations
– Flexible software coupling technologies
• Flexible construction and deployment of complex
multi-model software
– The ‘Exascale’ challenge
• Scalable software and many-core hardware
• Good links to Weather and Climate modellers
– UK Met Office, European and US Centres
18
Current projects
• NERC ‘GungHo’ (NERC)
– Developing a new, highly scalable, dynamical core
for the Met Office’s atmosphere model
• IS-ENES (EU FP7)
– Scalable software infrastructures for Earth System
Modelling
• ERMITAGE (EU FP7)
– Coupling technology for Integrated Assessment
modelling
• Climate impact and mitigation
• PAMELA (EPSRC programme grant)
– Mobile vision scene understanding application
– From algorithms to specialized hardware via
compiler and run-time systems
19
Neural Systems
Engineering
Steve Furber,
Jim Garside,
Dave Lester
20
SpiNNaker project
• A million mobile
phone processors
in one computer
• Able to model
about 1% of the
human brain…
• …or 10 mice!
21
SpiNNaker chip
Multi-chip
packaging by
UNISEM
Europe
22
SpiNNaker circuit boards
23
SpiNNaker applications
• A wide range of
global collaborators
• Annual workshops:
– Capo Caccia
– Telluride
24
PhD projects
• Recent:
– SpiNNaker monitoring
– PyNN -> SpiNNaker
– Real-time neural learning algorithms
– Modelling the rat barrel cortex
– Technology scaling on SpiNNaker
• Future:
– System software
• run-time fault-tolerance, scaling, …
– SpiNNaker2 architecture exploration
– Neural network models
• learning algorithms, rewiring
– Robotics using SpiNNaker
– Non-neural algorithms
• graphics, physics modelling, …
25
3-D Integrated
Circuits & Systems
Vasilis Pavlidis
[email protected]
www.cs.man.ac.uk/~pavlidiv
26
3-D Integration Benefits
2-D global wire of 20 mm
3-D global wire of 12 mm
• Integrate disparate
•
The same total area for the
two circuits
• Delay improvement for 3-D up
to 54%*
• Architectural and physical
design implications leading to
several research questions
technologies/components
27
* “ASU Predictive Technology Model.” [Online]. Available:
http://www.eas.asu.edu/~ptm/
27
3-D Integration Design Technologies
TSV
• Through-silicon-via
(TSV) based systems
– Not mature for high
volume manufacturing
(HVM)
• Silicon interposers
– HVM from Xilinx FPGAs
– Glass interposers are
explored
• 3-D technologies and tools
Xilinx FPGA
Virtex 7
for prototyping are
available
28
28
3-D Integration as a
Circuit Design Paradigm
• (Re-)Design and assess
spiNNaker-based 3-D
architectures
– Power, area, and
performance tradeoffs
– Interposer and TSV
technologies
• Research methodology
– Use available resources
– Differentiate only where
required
• Reorganize spiNNaker
system at the chip level
– Replace wire bonds with
TSVs
• Reorganise spiNNaker
system at the core level
– On-chip long wires
replaced with short29
TSVs
29
3-D Integration as a System
Integration Approach
• Heterogeneous 3-D
integration
– Preached a lot but hardly
explored!
• Develop techniques and
methods for “Mix-andMatch” systems
– How do you model…?
– How do evaluate…?
– How do you integrate…?
– How do you manufacture…?
• The physical proximity of
diverse systems may not
come for free!
 Interdisciplinary research is a
prerequisite for such systems
 Rather application driven
30
30
Asynchronous
Logic Design Tools
[Doug Edwards,]
Jim Garside,
Steve Furber
31
Previous Projects
• Balsa
– world-leading public asynchronous
synthesis tool
– used for complete microprocessors
• SEDATE
– delay Insensitive datapath synthesis
• GALSA
– framework for heterogeneous GALS
• ...
32
GAELS
• Globally Asynchronous Elastic
Logic Synthesis
– modern SoCs comprise numerous,
semi-autonomous subsystems
– shrinking transistors have hard-topredict variations
• Address using Elastic Logic
– new, delay tolerant paradigm
– new project!
33
Reconfigurable
Processing
Dirk Koch
Jim Garside
34
Current Computing
• Energy use and design productivity
are today's major concerns!
• Software
– offers very good programmability
– But: highly inefficient
• Hardware
– limited programmability
– greater efficiency
– But: expensive to develop
• FPGAs: take the best from both
35
FPGAs
State of the Art
• Modern FPGAs provide (e.g. XC6VSX475T)
• 1000 32-bit multipliers
• 500 MHz clock speed
• 4.8 MB on-chip memory @ 5 TB/s
(aggregated)
• Less than 30 Watts power
• Allow very customized hardware
(do more in less cycles)
 High-performance and low-power
• But: difficult to program (Verilog/VHDL)
• Compilers for C and Java, etc. are at the
horizon
36
Change Hardware
at Runtime
Example: Database Acceleration
compose FPGA processing pipelines
by stitching together SQL modules
>
in
+
> > >
join
join
sort
+
mean
sort
static design: PCIe, memory, filesystem,
management, reconfiguration
out
Design goals:
• 512 bit datapath
• 300+x MHz (Virtex-6)
• Dozens of concurrently
working accelerators
• 100x faster than X86
for some queries
37
Research
• Methodologies and design tools
• Applications (video, database, embedded)
System Specification
(Communication Architecture & Floorplan)
a)
b)
V7
V1
V2
V7
V6
V3
V5
V5
V4
V4
V3
V6
generate
static
system
generate
module
repository
bitlink module.bit -pos X,Y static.bit -outfile initial.bit
memory tile
logic tile
V2
t
y
V1
x
• Goal: allow “civilians” to program FPGAs
(and to use dynamic reconfiguration)
38
Mobile Systems
Architecture
Nick Filer
with help from
Barry Cheetham
39
Nick Filer
• Interests:
– Wireless networks of all types. Mainly:
• Ad-hoc,
• Voice over IP,
• Sensors/Things (data collection), protocols...
• Pocket networks (e.g. mobile phones, PDAs), ...
• Information dissemination. Big data over
networks.
– Supported by:
• Simulation, analysis, software generation tools.
40
Current Interests
• Support for adaptable network
stacks
• Mobile pocket networks
• Low power wireless sensor
networks
• Neighbour detection and handover in mobile wireless networks
41
DYVERSE
Hybrid Dynamical
Systems
Eva Navarro López
42
43
44