Transcript williams_p
A Linux-based Software Platform
for the Reconfigurable Scaleable
Computing Project
John A. Williams*, Neil W. Bergmann*
Robert F. Hodson+
* School of ITEE
The University of Queensland
Brisbane, Australia
Williams
+NASA
1
Langley Research Centre
Hampton, Virginia
MAPLD 2005/1001
Outline
RSC Overview
Existing Technology
Vision, Multiprocessing, MPI, NoC
Status and outlook
Williams
MicroBlaze, uClinux
New Developments
Concept, participants
Planned Investigations, Progress
2
MAPLD 2005/1001
Outline
RSC Overview
Existing Technology
Vision, Multiprocessing, MPI, NoC
Status and outlook
Williams
MicroBlaze, uClinux
New Developments
Concept, participants
Planned Investigations, Progress
3
MAPLD 2005/1001
Reconfigurable Scaleable
Computing
Features
Williams
Next-generation on-board computing platform
FPGA-based reconfigurable computer
Soft CPU cores + embedded Linux operating
system
Hybrid SW/HW application environment
Hierarchical, scaleable computing network
Selected for funding in 2004 H&RT call for
proposals
4
MAPLD 2005/1001
Reconfigurable Scaleable
Computing
Participants
Williams
NASA LaRC (project lead, hardware design)
UQ (operating system, message passing libraries)
ASRC (system modeling, performance analysis)
Jefferson Labs (consulting)
StarBridge Systems (graphical design tools)
NASA Office of Logic Design
NSA
5
MAPLD 2005/1001
Reconfigurable Scaleable
Computing
Williams
6
MAPLD 2005/1001
Outline
RSC Overview
Existing Technology
Vision, Multiprocessing, MPI, NoC
Status and outlook
Williams
MicroBlaze, uClinux
New Developments
Concept, participants
Planned Investigations, Progress
7
MAPLD 2005/1001
MicroBlaze
32 bit RISC, Harvard soft processor
Targeted to Xilinx logic primitives
Parameteriseable
Caches
ALU, FPU
Memory/bus interfaces
Williams
~1000-1500 slices (10% of XC4V-LX25)
Local memory bus (LMB)
On-chip Peripheral Bus (OPB)
Fast Simplex Links (FSL)
8
MAPLD 2005/1001
MicroBlaze
Logic utilisation in RPM prototype FPGA
device (16K dcache & 16K dcache)
Selected Device : 4vlx25ff668-10
Number of
Number of
Number of
Number of
Number
Number of
Williams
Slices:
Slice Flip Flops:
4 input LUTs:
FIFO16/RAMB16s:
used as RAMB16s:
DSP48s:
1504
1172
2238
24
24
3
9
out
out
out
out
of
of
of
of
10752
21504
21504
72
13%
5%
10%
33%
out of
48
6%
MAPLD 2005/1001
MicroBlaze, Linux and RSC
Why?
Path for existing applications onto RSC
Standard platform improves design efficiency
UQ research focus in rSoC
Williams
Application development/debug
Multiprocessing/clustering
Software infrastructure
Interoperability (networking, file systems, …)
integration of custom hardware (for speed) with
conventional processor/OS modules (for flexibility)
10
MAPLD 2005/1001
MicroBlaze, Linux and RSC
Why not?
Performance
FPGAs roughly 10x less efficient than fixed silicon
CPUs less efficient than custom hardware
A serialised abstraction of intrinsically parallel hardware
Less efficient than deeply embedded software
Abstraction incurs performance penalty
Stability/reliability
RSC is a data processing/computation platform
Not part of spacecraft survivability
MicroBlaze and Linux are only part of the solution
Williams
11
MAPLD 2005/1001
Outline
RSC Overview
Existing Technology
Vision, Multiprocessing, MPI, NoC
Status and outlook
Williams
MicroBlaze, uClinux
New Developments
Concept, participants
Planned Investigations, Progress
12
MAPLD 2005/1001
Vision
Heterogeneous multiprocessing
RSC is an exotic computing machine
Williams
Multiple software tasks per processor
Multiple processors per chip/RPM
Hardware Co-processors
Multiple RPMs per stack
Multiple stacks per system
How do we program it?
13
MAPLD 2005/1001
Vision
Linux-based multiprocessing
To SW apps, RSC is a Linux cluster
Critical computation offloaded to hardware
Find the sweet spot
Runtime performance vs design effort
RSC is an exotic computing machine
Williams
EITHER Co-processors to CPU nodes,
OR Peers in the computational network
We must make it seem straightforward
14
MAPLD 2005/1001
Vision
Williams
Make it look like Linux
Build on enormous library of Linux
knowledge, tools, apps, documentation,
training and skills
Ability to prototype realistic user apps
on Linux desktop is tremendously
valuable
15
MAPLD 2005/1001
MicroBlaze Multiprocessing
Lots of processors gives performance
and reliability – parallelism is key
MicroBlaze achieves 4-8x better MIPS/LUT
than any other soft CPU architecture (in
Xilinx FPGAs)
We can put about 8 CPUs in an FPGA
Williams
What are the hardware architectural issues?
How to use it efficiently?
16
MAPLD 2005/1001
MicroBlaze Multiprocessing
Implicit multiprocessing
Explicit multiprocessing
protoSMP, looks like many processors
Multi-level multiprocessing
Williams
SMP, looks like one fast processor
MPI, looks like a cluster
17
MAPLD 2005/1001
MicroBlaze Multiprocessing
Symmetric Multiprocessing (SMP)
N CPUs as a single virtual machine
Implicit parallelism
Hardware support
Williams
Hidden by OS and hardware
Cache coherency
Memory architectures
Distributed interrupt dispatch
18
MAPLD 2005/1001
SMP vs ProtoSMP
Kernel Memory
Per-CPU data structures
MBlaze0
MBlaze1
INTC
Application Memory
MBlaze2
I/O (serial,
ethernet, …)
MBlaze3
SMP – 1 virtual machine
Williams
19
MAPLD 2005/1001
MicroBlaze Multiprocessing
ProtoSMP
Williams
N CPUs on shared bus
Private address zones within shared
physical memory
Common shared memory region with IPC
protocols
shared memory multicomputing
20
MAPLD 2005/1001
SMP vs ProtoSMP
Kernel Memory
Image 0
MBlaze0
Virt.
I/O
MBlaze1
Virt.
I/O
Application Memory
Kernel Memory
INTC
INTC
Image 1
Application Memory
Kernel Memory
INTC
MBlaze2
Virt.
I/O
Image 2
INTC
MBlaze3
I/O (serial,
ethernet, …)
Virt.
I/O
Application Memory
Kernel Memory
Image 3
Application Memory
ProtoSMP – N virtual machines
Williams
21
MAPLD 2005/1001
SMP vs ProtoSMP
SMP
Pros
ProtoSMP
Implicit parallelism
and inter-CPU comms
Efficient memory and
cache re-use
Cons
Williams
Specialised hardware
support (caches,
distributed interrupts)
Requires kernel
support
22
Pros
Simplicity
Use existing HW
components
No changes in kernel
Cons
Explicit parallelism
and inter-CPU comms
Memory waste
Virtual IO model (N
terminals)
MAPLD 2005/1001
RSC Network
Williams
Parallel processing architectures often
limited by CPU/memory bandwidth and
interprocess comms bandwidth.
RSC has several potential bottlenecks:
RPM memory, PCI backplane, interstack
networks.
Need to leave scope for high-speed
comms, eg. with Rocket I/O on FPGAs
23
MAPLD 2005/1001
RSC Network
Williams
Useful if applications can be initially
developed without regard to
partitioning and communications
Implies a uniform interprocess
communications mechanism
We choose MPI
24
MAPLD 2005/1001
MPI on Microblaze-uClinux
MPI - Message Passing Interface
API for explicit message passing
between processes
Williams
Multiple processes on one machine, or
Distributed across many machines
25
MAPLD 2005/1001
MPI on MicroBlaze-uClinux
MPICH implementation, Argonne
National Labs
MPICH2 – complete reimplementation
of MPI conforming to MPI2 standard
Williams
Layered implementation abstracting MPI
application interface from underlying
physical transport
Process Management Interface
26
MAPLD 2005/1001
MPI on MicroBlaze uClinux
http://www.sharcnet.ca/fw2003/slides/mpich2-details.ppt
Application
MPI
MPICH
ROMIO
ADI3
ADIO
CH3 Device
MPE
BG/L Myrinet ... PVFS GPFS XFS ...
CH3
Sock SHM SSM IB …
Williams
27
MAPLD 2005/1001
MPI on MicroBlaze-uClinux
MPICH2 on MicroBlaze
sock implementation over TCP/IP sockets
Williams
Starting point for RSC, with COTS demo MicroBlaze
multiprocessing experiments
shm shared memory wrapper, great for
SMP/protoSMP
Create new wrapper layer around RSC
interconnect/NoC architecture once finalised
Can hardware co-processors look like MPI ?
28
MAPLD 2005/1001
Outline
RSC Overview
Existing Technology
Vision, Multiprocessing, MPI, NoC
Status and outlook
Williams
MicroBlaze, uClinux
New Developments
Concept, participants
Planned Investigations, Progress
29
MAPLD 2005/1001
COTS Demo Platform
Two ethernet ports per board, up to 4
MicroBlaze per board
Variety of cluster configuration experiments
Williams
Four boards per demo cluster
4x uniprocessor
4x4-way protoSMP
4x2x2-way protoSMP
…
30
MAPLD 2005/1001
Status and Outlook
Detailed SMP vs protoSMP feasibility
study
MPICH2 port investigations commenced
Williams
Commenced Q2 2005
Baseline implementation uniprocessor over
TCP/IP
Work commenced Q2 2005
31
MAPLD 2005/1001
Conclusion
MicroBlaze and uClinux are part of the solution
Those parts which are Linux, look like
desktop/cluster Linux
Deliberate decisions in trade of design vs runtime
efficiency
Looking ahead
Linux abstractions over RSC hardware
Development and debug environments
Seamless integration with custom hardware
Williams
Intra-board, inter-board, inter-stack, …
Viva, VHDL, …
32
MAPLD 2005/1001