360N: Computer Architecture Spring 2005
Download
Report
Transcript 360N: Computer Architecture Spring 2005
RAMP-White
Derek Chiou and Hari Angepat
The University of Texas at Austin
© Derek Chiou
Supported in part by DOE, NSF, IBM, Intel, and Xilinx1
Test of size
RAMP-White Requirements
Coherent shared memory experimental platform
Scalable to the same level as other RAMP machines
1K eventual target
Down to 2
Full system (OS, I/O, etc.)
Intentions
7/8/2015
Configurable coherence protocol, engine
ISA/Architecture independent (like all RAMP efforts)
Use different cores
Integrate components from other RAMP participants
A test-bed for sharing IP
Derek Chiou, RAMP-White Tutorial, FCRC 2007
2
Test of size
Texas Modifications to RAMP-White
New code in Bluespec rather than Verilog/VHDL
Start with XUP board
Many advantages including interfaces, configurability
My group’s hardware development is exclusively Bluespec
Free/low cost for academics (www.bluespec.com)
We had XUP before BEE2
Embedded PowerPC is starting core
It’s a free, fast core with real (incoherent) 16KB caches
RAMP is core independent
My research needs fast cores
Can then use synthesizable 405s
Multi-OS shared space
7/8/2015
No space issues on XUP
2 Leons + MMU + memory controller barely fits (no space for our stuff)
Processors map to shared global space
May try SMP OS, but unlikely to scale well to 1K processors
Derek Chiou, RAMP-White Tutorial, FCRC 2007
3
Test of size
High-Level Architecture Philosophy
Flexibility
Avoid wasted work
Easy changes
Module-agnostic
Interfaces
Complete set of necessary interfaces
All communication via messages
Fixed fields, but fields are configurable
“shims” connect components to White infrastructure
7/8/2015
Processors, network, I/O, etc.
Use existing IP
Building one instance to confirm interface completeness
Derek Chiou, RAMP-White Tutorial, FCRC 2007
4
Test of size
32b Address in Shared Memory Machine??
4GB possible per BEE2 FPGA
Need more than 32b
Eventually, hope for 64b soft-core processors
For now two options: live with 4GB space
Or, provide one more layer of translation
7/8/2015
Physical address in certain region is global virtual address
Translated by hardware to node + physical address
Also useful for multiple OSs in single memory
OSs tend to assume they own physical address 0
Derek Chiou, RAMP-White Tutorial, FCRC 2007
5
RAMP-White Block Diagram
Processor
Test of size
Proc dependent
Coherent $
IO
& Platform
Devices
Intersection
Unit (IU)
Network
Interface
(NIU)
Network
Router
Memory
Controller
(MC)
7/8/2015
Derek Chiou, RAMP-White Tutorial, FCRC 2007
6
Test of size
Three Phase Approach to Hardware
Phase 1: Incoherent shared memory
No hardware global cache, just global shared
memory support
However, software can maintain coherence if
necessary
Ring network
Requires a coherent cache, IU awareness
Running what is essentially a snoopy protocol
True coherence engine not required
But, very restricted communication
Sufficient for testing, modeling many targets
General network-based coherence
7/8/2015
Network virtual memory
Run a simulator on top of the processor
$ P $
$ CP$ $
IU
MC I/O
Ring-based coherence (scalable bus)
Optional cache for local memory
$ P $
$ CP$ $
IU
MC
I/O
Requires general coherence engine, general
network
Derek Chiou, RAMP-White Tutorial, FCRC 2007
7
Test of size
Intersection Unit
Processor interface
Processor
Network interface
Coherent $
Intersection
Unit (IU)
Network
Interface
(NIU)
Bluespec nice to specify coherence
engine
Incoherent version is a special case
Programmable memory regions
7/8/2015
Master (issue memory requests)
Hooks for coherency engine
Memory
Controller
(MC)
Master (send)
Slave (receive)
Memory interface
IO
& Platform
Devices
Slave
Snoop
Global (local and remote)
Local
translation
Derek Chiou, RAMP-White Tutorial, FCRC 2007
8
Test of size
Intersection Unit Internals
Proc
Memory
Controller &
DRAM
Net
Global Address Translation
Intersection Unit Controller
hardware
Proc
7/8/2015
IO
IO
Controller
BRAMs
Net
Derek Chiou, RAMP-White Tutorial, FCRC 2007
9
Test of size
Network Interface Unit
Currently two virtual channels
Split into two components
Processor
Coherent $
IO
& Platform
Devices
Intersection
Unit (IU)
Memory
Controller
(MC)
Network
Interface
(NIU)
One input/one output
7/8/2015
Msg composition/Queuing
Net transmit/receive
Insert/extract for ring
Intended to permit other netspecific transmit/receive
Creates a simple
unidirectional ring
Can interface to more
advanced fabrics
Derek Chiou, RAMP-White Tutorial, FCRC 2007
10
Test of size
IU Internal Message
PRI CMD PERM SIZE TAG
GADDR
DATA
Defaults
7/8/2015
PRI: High priority, Low priority
CMD: Read, Write, Coherence, …
PERM: Modified, Exclusive, Shared, Invalid
SIZE: Byte, word, double word, cache-line
GADDR: global address (translated by IU)
DATA: dependent on size
Bluespec permits easy modification for your protocol
Derek Chiou, RAMP-White Tutorial, FCRC 2007
11
Test of size
Network Message
PRI DEST SRC SIZE NETTAG CMD
MESSAGE
7/8/2015
PRI: High and Low
DEST,SRC: destination, source of message
SIZE: Total message size
NETTAG: network tag (optional)
CMD: network command (optional)
MESSAGE: data
Derek Chiou, RAMP-White Tutorial, FCRC 2007
12
Test of size
Programmer View
Sequential consistency
PowerPC
Global addresses labeled as uncached
Ordered accesses from PowerPC 405
Coherent global cache still uncached
Soft cores can be weaker
User interface
7/8/2015
Terminal per core/OS if desired
Mmap to map shared memory
Derek Chiou, RAMP-White Tutorial, FCRC 2007
13
Test of size
Operating System
Issues with SMP OS on embedded PowerPC
Incoherent cache
Load-reservation/store-conditional instructions not MP capable
Also missing TLB Invalidation & OpenPIC (interprocessor interrupts,
bring-up)
How scalable anyways? (1K processors)
Therefore, separate OS per core
Region of memory is global
7/8/2015
Mmap
Locks implemented using regular loads/stores + sequential
consistency
Derek Chiou, RAMP-White Tutorial, FCRC 2007
14
Test of size
Status: Phase 1 RAMP-White
Hari Angepat did the work
Components
Written in Bluespec
NIU code complete and tested
2 processor ring
IU code complete and tested
Hardware intended to target different ISAs
7/8/2015
Processor Slave (no coherence right now)
PLB Master/slave interface (I/O)
NIU interface
PLB master and slave shims written
Some preliminary OS work
Multi-image mmap interface running
Derek Chiou, RAMP-White Tutorial, FCRC 2007
15
Test of size
Current RAMP-White Phase 1
IO
& Platform
Devices
Intersection
Unit (IU)
Linux
Linux
PPC 405
PPC 405
Network
Interface
(NIU)
Network
Interface
(NIU)
Intersection
Unit (IU)
PLB shim
Memory
Controller
(MC)
7/8/2015
Derek Chiou, RAMP-White Tutorial, FCRC 2007
16
Test of size
Phase 1 Demo on XUP Configuration
7/8/2015
See both processors boot and run (top, cpu_info)
Run a simple “take-lock, increment counter, release
lock”
Derek Chiou, RAMP-White Tutorial, FCRC 2007
17
Test of size
Our Long Term Plans
Phase 1, XUP just started to work
Phase 2
With multi-OS, limited device support
Limited alpha release end of the 3Q07
Coherent cache, IU forwarding modifications
Better OS support (ProtoFlex?)
Limited alpha release 1Q08
Phase 3
Arbitrary network, cache coherency engine
7/8/2015
Getting network from Washington, Berkeley
RDL? Leon?
Release depends on ease of integration
Derek Chiou, RAMP-White Tutorial, FCRC 2007
18
Test of size
Conclusions
RAMP-White architecture
ISA/implementation agnostic
Running on XUP
We will be our own customer
7/8/2015
Care taken to not be specific
RAMP White Phase 1 works
Phased approach minimizes wasted work
Designed to be easy to modify for your purpose
Many architectures only require modified coherence engine,
maybe cache
Building cycle-accurate x86 CMP simulator on top
Derek Chiou, RAMP-White Tutorial, FCRC 2007
19
Extra slides
© Derek Chiou
20
Test of size
P
Node Architecture
$
P $
IU
MC I/O
$
P $
IU
MC I/O
$
P $
C$
IU
MC I/O
$
P $
C$
IU
MC
7/8/2015
Derek Chiou, RAMP-White Tutorial, FCRC 2007
I/O
21
Test of size
Generalized Architecture
Proc dependent
Intersection Unit
Network Interface Unit
Proc
$
Mem MC
PLB
IU
NIU Proc independent
OPB
bridge
7/8/2015
Derek Chiou, RAMP-White Tutorial, FCRC 2007
22
Test of size
Sharing IP: Some Preliminary Experience
We looked at RAMP-Red XUP
Used some code (PLB master)
Red-BEE is not ready to distribute
Looking for switch code
Berkeley’s code on CVS repository
But, we can’t use memory controller because we don’t have BEE2 board yet
Bluespec
We are spinning almost all of our own code right now
Would like to steal software
Naming
7/8/2015
OS (kernel proxy)
SMP OS port
MPI reference design in BEE2 repository
Is that RAMP-Blue?
A central CVS repository for RAMP code?
Derek Chiou, RAMP-White Tutorial, FCRC 2007
23
Test of size
Sharing Over the Long Term
Processor is shared
Proc
MC is shared
$
IU
NIU
Borrow half from Berkeley?
Network can be shared
7/8/2015
Trying to make ours general
NIU can be shared
CMU/Stanford
IU functionality can be shared
Peripherals
Transactional/traditional
Borrow Stanford’s?
Coherency engine can be shared
CCE
Xilinx or Berkeley
Coherent cache can be shared
Mem MC
Leon
PowerPC
MicroBlaze
Everything else
Borrow Berkeley’s?
Derek Chiou, RAMP-White Tutorial, FCRC 2007
24