Transcript williams_p

A Linux-based Software Platform
for the Reconfigurable Scaleable
Computing Project
John A. Williams*, Neil W. Bergmann*
Robert F. Hodson+
* School of ITEE
The University of Queensland
Brisbane, Australia
Williams
+NASA
1
Langley Research Centre
Hampton, Virginia
MAPLD 2005/1001
Outline

RSC Overview


Existing Technology


Vision, Multiprocessing, MPI, NoC
Status and outlook

Williams
MicroBlaze, uClinux
New Developments


Concept, participants
Planned Investigations, Progress
2
MAPLD 2005/1001
Outline

RSC Overview


Existing Technology


Vision, Multiprocessing, MPI, NoC
Status and outlook

Williams
MicroBlaze, uClinux
New Developments


Concept, participants
Planned Investigations, Progress
3
MAPLD 2005/1001
Reconfigurable Scaleable
Computing

Features






Williams
Next-generation on-board computing platform
FPGA-based reconfigurable computer
Soft CPU cores + embedded Linux operating
system
Hybrid SW/HW application environment
Hierarchical, scaleable computing network
Selected for funding in 2004 H&RT call for
proposals
4
MAPLD 2005/1001
Reconfigurable Scaleable
Computing

Participants







Williams
NASA LaRC (project lead, hardware design)
UQ (operating system, message passing libraries)
ASRC (system modeling, performance analysis)
Jefferson Labs (consulting)
StarBridge Systems (graphical design tools)
NASA Office of Logic Design
NSA
5
MAPLD 2005/1001
Reconfigurable Scaleable
Computing
Williams
6
MAPLD 2005/1001
Outline

RSC Overview


Existing Technology


Vision, Multiprocessing, MPI, NoC
Status and outlook

Williams
MicroBlaze, uClinux
New Developments


Concept, participants
Planned Investigations, Progress
7
MAPLD 2005/1001
MicroBlaze


32 bit RISC, Harvard soft processor
Targeted to Xilinx logic primitives


Parameteriseable



Caches
ALU, FPU
Memory/bus interfaces



Williams
~1000-1500 slices (10% of XC4V-LX25)
Local memory bus (LMB)
On-chip Peripheral Bus (OPB)
Fast Simplex Links (FSL)
8
MAPLD 2005/1001
MicroBlaze

Logic utilisation in RPM prototype FPGA
device (16K dcache & 16K dcache)
Selected Device : 4vlx25ff668-10
Number of
Number of
Number of
Number of
Number
Number of
Williams
Slices:
Slice Flip Flops:
4 input LUTs:
FIFO16/RAMB16s:
used as RAMB16s:
DSP48s:
1504
1172
2238
24
24
3
9
out
out
out
out
of
of
of
of
10752
21504
21504
72
13%
5%
10%
33%
out of
48
6%
MAPLD 2005/1001
MicroBlaze, Linux and RSC

Why?


Path for existing applications onto RSC
Standard platform improves design efficiency





UQ research focus in rSoC

Williams
Application development/debug
Multiprocessing/clustering
Software infrastructure
Interoperability (networking, file systems, …)
integration of custom hardware (for speed) with
conventional processor/OS modules (for flexibility)
10
MAPLD 2005/1001
MicroBlaze, Linux and RSC

Why not?

Performance


FPGAs roughly 10x less efficient than fixed silicon
CPUs less efficient than custom hardware




A serialised abstraction of intrinsically parallel hardware
Less efficient than deeply embedded software
Abstraction incurs performance penalty
Stability/reliability


RSC is a data processing/computation platform
Not part of spacecraft survivability
MicroBlaze and Linux are only part of the solution
Williams
11
MAPLD 2005/1001
Outline

RSC Overview


Existing Technology


Vision, Multiprocessing, MPI, NoC
Status and outlook

Williams
MicroBlaze, uClinux
New Developments


Concept, participants
Planned Investigations, Progress
12
MAPLD 2005/1001
Vision

Heterogeneous multiprocessing






RSC is an exotic computing machine

Williams
Multiple software tasks per processor
Multiple processors per chip/RPM
Hardware Co-processors
Multiple RPMs per stack
Multiple stacks per system
How do we program it?
13
MAPLD 2005/1001
Vision

Linux-based multiprocessing


To SW apps, RSC is a Linux cluster
Critical computation offloaded to hardware



Find the sweet spot


Runtime performance vs design effort
RSC is an exotic computing machine

Williams
EITHER Co-processors to CPU nodes,
OR Peers in the computational network
We must make it seem straightforward
14
MAPLD 2005/1001
Vision



Williams
Make it look like Linux
Build on enormous library of Linux
knowledge, tools, apps, documentation,
training and skills
Ability to prototype realistic user apps
on Linux desktop is tremendously
valuable
15
MAPLD 2005/1001
MicroBlaze Multiprocessing

Lots of processors gives performance
and reliability – parallelism is key


MicroBlaze achieves 4-8x better MIPS/LUT
than any other soft CPU architecture (in
Xilinx FPGAs)
We can put about 8 CPUs in an FPGA


Williams
What are the hardware architectural issues?
How to use it efficiently?
16
MAPLD 2005/1001
MicroBlaze Multiprocessing

Implicit multiprocessing


Explicit multiprocessing


protoSMP, looks like many processors
Multi-level multiprocessing

Williams
SMP, looks like one fast processor
MPI, looks like a cluster
17
MAPLD 2005/1001
MicroBlaze Multiprocessing

Symmetric Multiprocessing (SMP)

N CPUs as a single virtual machine

Implicit parallelism


Hardware support



Williams
Hidden by OS and hardware
Cache coherency
Memory architectures
Distributed interrupt dispatch
18
MAPLD 2005/1001
SMP vs ProtoSMP
Kernel Memory
Per-CPU data structures
MBlaze0
MBlaze1
INTC
Application Memory
MBlaze2
I/O (serial,
ethernet, …)
MBlaze3
SMP – 1 virtual machine
Williams
19
MAPLD 2005/1001
MicroBlaze Multiprocessing

ProtoSMP




Williams
N CPUs on shared bus
Private address zones within shared
physical memory
Common shared memory region with IPC
protocols
shared memory multicomputing
20
MAPLD 2005/1001
SMP vs ProtoSMP
Kernel Memory
Image 0
MBlaze0
Virt.
I/O
MBlaze1
Virt.
I/O
Application Memory
Kernel Memory
INTC
INTC
Image 1
Application Memory
Kernel Memory
INTC
MBlaze2
Virt.
I/O
Image 2
INTC
MBlaze3
I/O (serial,
ethernet, …)
Virt.
I/O
Application Memory
Kernel Memory
Image 3
Application Memory
ProtoSMP – N virtual machines
Williams
21
MAPLD 2005/1001
SMP vs ProtoSMP

SMP

Pros




ProtoSMP

Implicit parallelism
and inter-CPU comms
Efficient memory and
cache re-use
Cons

Williams

Specialised hardware
support (caches,
distributed interrupts)
Requires kernel
support
22
Pros




Simplicity
Use existing HW
components
No changes in kernel
Cons



Explicit parallelism
and inter-CPU comms
Memory waste
Virtual IO model (N
terminals)
MAPLD 2005/1001
RSC Network



Williams
Parallel processing architectures often
limited by CPU/memory bandwidth and
interprocess comms bandwidth.
RSC has several potential bottlenecks:
RPM memory, PCI backplane, interstack
networks.
Need to leave scope for high-speed
comms, eg. with Rocket I/O on FPGAs
23
MAPLD 2005/1001
RSC Network



Williams
Useful if applications can be initially
developed without regard to
partitioning and communications
Implies a uniform interprocess
communications mechanism
We choose MPI
24
MAPLD 2005/1001
MPI on Microblaze-uClinux


MPI - Message Passing Interface
API for explicit message passing
between processes


Williams
Multiple processes on one machine, or
Distributed across many machines
25
MAPLD 2005/1001
MPI on MicroBlaze-uClinux


MPICH implementation, Argonne
National Labs
MPICH2 – complete reimplementation
of MPI conforming to MPI2 standard


Williams
Layered implementation abstracting MPI
application interface from underlying
physical transport
Process Management Interface
26
MAPLD 2005/1001
MPI on MicroBlaze uClinux
http://www.sharcnet.ca/fw2003/slides/mpich2-details.ppt
Application
MPI
MPICH
ROMIO
ADI3
ADIO
CH3 Device
MPE
BG/L Myrinet ... PVFS GPFS XFS ...
CH3
Sock SHM SSM IB …
Williams
27
MAPLD 2005/1001
MPI on MicroBlaze-uClinux

MPICH2 on MicroBlaze

sock implementation over TCP/IP sockets




Williams
Starting point for RSC, with COTS demo MicroBlaze
multiprocessing experiments
shm shared memory wrapper, great for
SMP/protoSMP
Create new wrapper layer around RSC
interconnect/NoC architecture once finalised
Can hardware co-processors look like MPI ?
28
MAPLD 2005/1001
Outline

RSC Overview


Existing Technology


Vision, Multiprocessing, MPI, NoC
Status and outlook

Williams
MicroBlaze, uClinux
New Developments


Concept, participants
Planned Investigations, Progress
29
MAPLD 2005/1001
COTS Demo Platform

Two ethernet ports per board, up to 4
MicroBlaze per board


Variety of cluster configuration experiments




Williams
Four boards per demo cluster
4x uniprocessor
4x4-way protoSMP
4x2x2-way protoSMP
…
30
MAPLD 2005/1001
Status and Outlook

Detailed SMP vs protoSMP feasibility
study


MPICH2 port investigations commenced


Williams
Commenced Q2 2005
Baseline implementation uniprocessor over
TCP/IP
Work commenced Q2 2005
31
MAPLD 2005/1001
Conclusion


MicroBlaze and uClinux are part of the solution
Those parts which are Linux, look like
desktop/cluster Linux


Deliberate decisions in trade of design vs runtime
efficiency
Looking ahead

Linux abstractions over RSC hardware



Development and debug environments
Seamless integration with custom hardware

Williams
Intra-board, inter-board, inter-stack, …
Viva, VHDL, …
32
MAPLD 2005/1001