The Raw Microprocessor

Download Report

Transcript The Raw Microprocessor

THE RAW MICROPROCESSOR: A
COMPUTATIONAL FABRIC FOR
SOFTWARE CIRCUITS AND GENERALPURPOSE PROGRAMS
Taylor, M.B.; Kim, J.; Miller, J.; Wentzlaff, D.; Ghodrat, F.; Greenwald, B.; Hoffman,
H.; Johnson, P.; Jae-Wook Lee; Lee, W.; Ma, A.; Saraf, A.; Seneski, M.; Shnidman,
N.; Strumpen, V.; Frank, M.; Amarasinghe, S.; Agarwal, A.
IEEE Micro , Volume: 22 Issue: 2 , March-April 2002 pp. 25-35
Wire delay is emerging as the natural
limiter to microprocessor scalability. A
new architectural approach could solve
this problem, as well as deliver
unprecedented performance, energy
efficiency, and cost effectiveness.
The Raw Microprocessor
Problem: How to leverage growing quantities of
chip resources even as wire delays become
substantial?
•
•
•
Scalable ISA
Provide a parallel, software interface to the
gate, wire, and pin resources of the chip
Allow programmers more control of physical
resources to achieve maximum performance
and energy efficiency
The Raw Microprocessor
Technology Trends
•
•
•
Until recently, the abstraction of a wire as an
instantaneous connection between transistors
has shaped assumptions and architectural
designs
However, today, it takes on the order of two
clock cycles for a signal to travel from edgeto-edge of a 2-GHz processor die
Processor manufacturers have strived to
maintain high clock rates in spite of the
increased impact of wire delay; but materials
and process changes have not been sufficient
to solve the problem
The Raw Microprocessor
The Response of Existing Architectures
The Raw Microprocessor
The Raw Microprocessor
•
•
Attempts to minimize the ISA gap by
exposing underlying physical resources as
architectural entities
Uses an array of identical, programmable tiles
The Raw Microprocessor
The Raw Microprocessor
Each tile contains:
•
•
•
•
•
•
The Raw Microprocessor
One static communication router
Two dynamic communication routers
An eight-stage, in-order, singleissue, MIPS-style processor
A four-stage, pipelined, floatingpoint unit
A 32-Kbyte data cache
96 Kbytes of software-managed
instruction cache
The Raw Microprocessor
•
•
•
The tiles interconnect using four 32-bit fullduplex on-chip networks, consisting of over
12,500 wires.
Each tile only connects to its four neighbors.
The length of the longest wire in the system
is no greater than the length or width of a
tile. This property ensures high clock speeds,
and the continued scalability of the
architecture.
The Raw Microprocessor
Pin Multiplexing
•
•
On the edges of the network, the network
buses are multiplexed onto pins
Prototype uses 1,657 pins and provides 14
full-duplex, 32-bit, 7.5 Gbps I/O ports at 225
MHz
The Raw Microprocessor
Architectural Entities
The Raw Microprocessor
Architectural Entities
Raw processors will have:
•
•
•
More functional units, as well as more
flexible and efficient pin utilization
Higher pin count due to this efficiency
More predictablity and have higher clock
frequencies due to explicit exposure of
wire delay
The Raw Microprocessor
Application Mapping
•
•
Applications can leverage the Raw static
network’s ASIC-like place and route facility -applications that do so are called software
circuits
The Raw operating system allows both space
and time multiplexing of processes -- it
allocates a rectangular-shaped number of tiles
to each process
The Raw Microprocessor
Application Mapping
The Raw Microprocessor
Design Decisions
Compute Processor:
•
•
•
Focus: tight integration of coupled network
interfaces and processor pipeline
Networks are register mapped and integrated
directly into the bypass paths of the pipeline
Intertile networking extends bypass concept
into 2-D
The Raw Microprocessor
Design Decisions
The Raw Microprocessor
Design Decisions
Static Router:
•
•
•
Routing instructions determine routing path
The static routers collectively reconfigure the
entire communication pattern of the network
on a cycle-by-cycle basis
One cycle-per-hop latency between tiles
The Raw Microprocessor
Design Decisions
Static Router:
•
5-stage pipeline that exploits parallelism in routing
The Raw Microprocessor
Design Decisions
Dynamic Networks:
•
•
Supports need for dynamic events and message
passing
Better suited for long data streams due to
large overhead
The Raw Microprocessor
•
•
•
Implementation
IBM’s SA-27E, 0.15 micron, six-level copper,
ASIC process
25W power consumption
Wire delay in tiles was large enough that
placement could not be ignored
The Raw Microprocessor
Implementation
•
•
•
Applications with very small ILP generally do
not benefit from running on Raw
For applications with moderate to significant
ILP, performance increases are observed
Authors attain speedups ranging from 6x to 11x
versus a single tile on Specfp applications for a
16-tile Raw processor and9x to 19x for 32 tiles
The Raw Microprocessor
Conclusion
•
•
•
Replicated tile design saved time in design, RTL
Verilog coding, resynthesis, verification,
placement, and back-end flow
Virtual Raw systems can be created from
glueless connection of up to 64 chips
Authors believe that reaching the point at
which a Raw tile is a relatively small portion of
total computation could change the way we
compute
The Raw Microprocessor
Discussion
The Raw Microprocessor
•
•
•
•
•
•
Discussion Questions
Does this paper discuss enough real program
and benchmark results?
Is 25W power consumption “energy efficient”
for the performance they have indicated?
Are there negative consequences of exposing so
much complexity to the software/programmer?
How can the functionality of this processor be
likened to a 2-D pipeline?
Does cost need to be addressed?
How advantageous is the design time reduction
achieved through redundancy?
The Raw Microprocessor