DOE ASCI TeraFLOPS

Download Report

Transcript DOE ASCI TeraFLOPS

DOE ASCI TeraFLOPS
Rejitha Anand
CMPS 5433
Accelerated Strategic
Computing Initiative
Large, complex, multifaceted, highly integrated research and
development effort created by US department of energy(DOE).
 Goal- deploy 1 TFLOP by end of 1996,10 TFLOP by 1999,
and a 100 TFLOP by 2002 and all of these systems to be
designed at similar costs
 First phase – built by Intel, also known as ASCI Option Red/
Intel TFLOPS supercomputer

Physical Features
It occupies 1,600 sq. ft of floor space (this is excluding the network
resources, tertiary storage and other supporting hardware)
 The system uses 9,216 Intel Pentium Pro processors and has over 4500
nodes
 596 Gbytes of RAM - connected through a 38x32x2 mesh
 Two independent 1 Tbyte disk systems
 Have disks that can be switched so the machine can be used for both
classified and unclassified computing

ASCI TOPS Hardware
Massive parallel processor, Distributed memory, MIMD,
message passing supercomputer
 All aspects of the system are scalable including aggregate
communication bandwidth, the number of compute nodes, the
amount of main memory, disk storage and I/O bandwidth
 Organized into four partitions
compute, service, system, I/O

Partitions
Service partition- provides integrated , scalable host that supports
interactive users, application development and system administration
I/O partition- supports scalable file system and network services
System partition- supports system Reliability, Availability and
Serviceability services
Compute partition-contains nodes for floating point performance and is
where parallel applications execute
SYSTEM BLOCK DIAGRAM
Logical System Block Diagram for the ASCI Option Red Supercomputer. This system uses a split-plane mesh topology and has 4 partitions: System, Service,
I/O and Compute. Two different kinds of node boards are used : the Eagle node and the Kestrel node. The operators console (the SPS station) is connected to
an independent ethernet network that ties together patch support boards on each card cage.
ASCI TFLOP FLOOR PLAN
ASCI SYSTEM PLAN
Schematic diagram of the ASCI Option Red supercomputer as it will be installed at Sandia National Laboratories in Albuquerque NM. The cabinets near each
end labeled with an X are the disconnect cabinets used to isolate one end or the other. Each end of the computer has its own I/O subsystem (the group of 5
cabinets at the bottom and the left), and their own SPS station (next to the I/O cabinets). The lines show the SCSI cables connecting the I/O nodes to the I/O
cabinets. The curved line at the top of the page show the windowed-wall to the room where the machine operators will sit. The black square in the middle of the
room is a support post.
PENTIUM PRO PROCESSOR
Both CISC and RISC chip
 Peak flop rate of 200 MFLOP at 200Mhz
 Peak multiply rate of 100 MFLOP at 200 Mhz
 Includes separate on chip data and instruction L1 caches
(each 8Kbytes) and an L2 cache (256 Kbytes)

EAGLE BOARD
The node boards used in the I/O and system partitions are the
Eagle Boards
 Each node includes two 200 MHz Pentium Pro processors.
These two processors support two on-board PCI interfaces that
each provide 133 MB/sec I/O bandwidth.
 Each Eagle board provides ample processing capacity and
throughput to support a wide variety of high-performance I/O
devices

EAGLE BOARD
The ASCI Option Red Supercomputer I/O Node (Eagle Board). The NIC connects to the MRC on the backplane through the ICF Link.
KESTREL BOARD
Kestrel boards are used in the compute and service partitions. Each
Kestrel board holds two compute nodes.
 The nodes are connected through their network interface chips(NIC)
with one of the NIC’s connecting to an mesh router chip (MRC) on the
backplane.
 Each node on the Kestrel board includes its own boot support
(FLASHROM and simple I/O devices) through a PCI bridge on its
local bus.

KESTREL BOARD
The ASCI Option Red supercomputer Kestrel Board. This board includes two compute nodes daisy-chained together through their NICs. One of the NICs
connects to the MRC on the backplane through the ICF Link.
INTERCONNECTION FACILITY
The interconnection facility utilizes dual plan mesh to provide better
aggregate bandwidth and to support routing around mesh failures. It
uses two custom components NIC and MRC
 Mesh Router Chip – sits on the system back plane and routes
messages across the machine
 Network Interface Chip - resides on each node and provides an
interface between the node’s memory bus and the MRC
INTERCONNECTION FACILITY
ASCI Option Red Supercomputer 2 Plane Interconnection Facility (ICF). The red squares on each node board are the Network Interface Chips (NIC) while the
black squares on the dual backplanes are the Mesh Router Chips (MRC). Bandwidth figures are given for NIC-MRC and MRC-MRC communication. Bidirectional bandwidths are given on the left side of the figure while uni-directional bandwidths are given on the right side. In both cases, sustainable (as opposed
to peak) numbers are given in parentheses.
OPERATING SYSTEM
Uses two different operating systems for different parts of the machine
 For service, I/O, and system partitions OS used is Intel’s distributed
version of UNIX developed for the paragon XP/S supercomputer
 In the compute partition OS used is Cougar – a light weight kernel(LWK)
LWK is based on the Puma operating system developed at Sandia National
Laboratory and University of New Mexico

APPLICATIONS
Provides computational and simulation capabilities which help
scientists understand aging weapons, predict when components will
have to be replaced, and evaluate implications of changes in materials
and fabrication processes
 Achieve higher resolution, higher fidelity, three dimensional physics
and full system modeling capabilities to reduce reliance on empirical
judgments

Intel Option Red/ Intel
TFLOPS supercomputer
Performance Goals
 Deliver a sustained TeraFLOP on MP LINPACK
before end of 1996
 Run a yet to be defined ASCI application using all
memory and all nodes by June 1997
PERFORMANCE
 Broke the MP-LINPACK benchmark at the rate of 1.06
TFLOPS just using 7,624 Pentium Pro's.
 In June 1997 when the full machine was installed it broke its
own record and ran the MP-LINPACK benchmark of 1.34
TFLOPS.
 The system has peak performance of 1.8 TFLOPS
Noteworthy for several
reasons
worlds first TOPS supercomputer
 I/O, memory, compute nodes and communication are scalable to an
extreme degree
 standard parallel interfaces make it simple to port parallel
applications to this system
 the system uses two operating systems to make the computer familiar
to the user (UNIX) and non-intrusive for the scalable application
(Cougar)
 makes use of Commercial Commodity Off The Shelf (CCOTS)
technology to maintain affordability

ON-GOING COMPUTING
ELEMENT





ASCI Blue Pacific
Supercomputer
ASCI Blue Mountain
Supercomputer
ASCI white
Supercomputer (IBM)
ASCI Q
ASCI Purple(IBM)
REFERENCES
http://ipdps.eece.unm.edu/1996/PAPERS/S03/TMATTSO/TMATTSO.PDF
http://www.sandia.gov/ASCI/Red/papers/Mattson/OVERVIEW.html
ftp://download.intel.com/technology/itj/q11998/pdf/overview.pdf