Transcript Document

IRAM Vision Statement
Microprocessor & DRAM
on a single chip:
L
o f
g a
i b
c
Proc
$
$
– 10X capacity vs. DRAM I/O I/O L2$
Bus
– on-chip memory latency
Bus
5-10X,
bandwidth 50-100X
D R A M
– improve energy efficiency
2X-4X (no off-chip bus)
I/O
– serial I/O 5-10X v. buses I/O
Proc
– smaller board area/volume
Bus
– adjustable memory
size/width
• Most transistors per
D R A M
Microprocessor in 1 year?
D
Rf
Aa
Mb
Slide 1
V-IRAM1: 0.18 µm, Fast Logic, 200 MHz
1.6 GFLOPS(64b)/6.4 GOPS(16b)/32MB
4 x 64
or
8 x 32
or
16 x 16
+
x
2-way
Superscalar
Processor
I/O
Vector
Instruction
Queue
I/O
÷
Load/Store
Vector Registers
16K I cache 16K D cache
4 x 64
4 x 64
Serial
I/O
Memory Crossbar Switch
M
I/O
M
4…x 64
I/O
M
M
M
M
M
M
M
…
M
4…x 64
M
x 64
… 4…
…
M
M
M
M
M
M
M
M
M
4…x 64
M
M
M
M
M
…
M
4…
x 64
…
M
M
M
M
…
Slide 2
IRAM Statistics
• 2 Watts, 3 GOPS, Multimedia ready (including
memory) AND can compile for it
• 150 Million transistors
– Intel @ 50M?
•
•
•
•
Industrial strength compilers
Tape out March 2001?
6 grad students
Thanks to
–
–
–
–
–
DARPA: fund effort
IBM: donate masks, fab
Avanti: donate CAD tools
MIPS: donate MIPS core
Cray: Compilers
Slide 3
Big Fat Web Servers
• Maintenance is the challenge, as well as
availablility and growth
• Says who:
– Lampson in keynote address at SOSP
– Hennessy in kenote address at ISCA
– Gray in Turing Award address
• 2X HW costs and 1/2 maintenance cost is a
big win
Slide 4
Intelligent Storage Project Goals
• ISTORE: a hardware/software
architecture for building scaleable,
self-maintaining storage
– An introspective system: it monitors itself
and acts on its observations
• Self-maintenance: does not rely on
administrators to configure, monitor, or
tune system
Slide 5
ISTORE-I Hardware
• ISTORE uses “intelligent” hardware
Intelligent
Chassis:
scaleable,
redundant,
fast
network +
UPS
CPU, memory, NI
Device
Intelligent Disk “Brick”: a disk,
plus a fast embedded CPU,
memory, and redundant network
interfaces
Slide 6
ISTORE-II Hardware Vision
• System-on-a-chip enables computer, memory,
redundant network interfaces without
significantly increasing size of disk
• Target for + 5-7 years:
• 1999 IBM MicroDrive:
– 1.7” x 1.4” x 0.2”
(43 mm x 36 mm x 5 mm)
– 340 MB, 5400 RPM,
5 MB/s, 15 ms seek
• 2006 MicroDrive?
– 9 GB, 50 MB/s
(1.6X/yr capacity,
1.4X/yr BW)
Slide 7
• ISTORE node
2006 ISTORE
– Add 20% pad to MicroDrive size for packaging,
connectors
– Then double thickness to add IRAM
– 2.0” x 1.7” x 0.5” (51 mm x 43 mm x 13 mm)
• Crossbar switches growing by Moore’s Law
– 2x/1.5 yrs  4X transistors/3yrs
– Crossbars grow by N2  2X switch/3yrs
– 16 x 16 in 1999  64 x 64 in 2005
• ISTORE rack (19” x 33” x 84”)
1 tray (3” high)  16 x 32
 512 ISTORE nodes / try
• 20 trays+switches+UPS
 10,240 ISTORE nodes / rack (!)
Slide 8
Disk Limit: I/O Buses
Cannot use 100% of bus
 Queuing Theory (<
70%)
 Command overhead
(Effective
size
=
size
x
Internal
Memory C
I/O bus 1.2)
C
External
(PCI)
I/O bus
• Bus rate vs. Disk rate

Multiple copies of data,
SW layers
CPU Memory
bus

– SCSI: Ultra2 (40 MHz),
Wide (16 bit): 80 MByte/s
– FC-AL: 1 Gbit/s = 125
MByte/s (single disk in 2002)
C
(SCSI)
C
Controllers(15 disks)
Slide 9
Conclusion and Status 1/2
• IRAM attractive for both drivers of Next
Generation: Mobile Consumer Electronic Devices
and Scaleable Infrastructure
– Small size, low power, high bandwidth
• ISTORE: hardware/software architecture for
single-use, introspective storage
• Based on
– intelligent, self-monitoring hardware
– a virtual database of system status and statistics
– a software toolkit that uses a domain-specific
declarative language to specify integrity constraints
• 1st HW Prototype being constructed;
1st SW Prototype just starting
Slide 10
Backup Slides
Slide 11
Related Work
• ISTORE adds to several recent research
efforts
• Active Disks, NASD (UCSB, CMU)
• Network service appliances (NetApp, Snap!,
Qube, ...)
• High availability systems (Compaq/Tandem, ...)
• Adaptive systems (HP AutoRAID, M/S
AutoAdmin, M/S Millennium)
• Plug-and-play system construction (Jini, PC
Plug&Play, ...)
Slide 12
Other (Potential) Benefits of ISTORE
• Scalability: add processing power, memory,
network bandwidth as add disks
• Smaller footprint vs. traditional server/disk
• Less power
– embedded processors vs. servers
– spin down idle disks?
• For decision-support or web-service
applications, potentially better performance
than traditional servers
Slide 13
State of the Art: Seagate Cheetah 36
– 36.4 GB, 3.5 inch disk
– 12 platters, 24 surfaces
– 10,000 RPM
– 18.3 to 28 MB/s internal
media transfer rate
(14 to 21 MB/s user data)
– 9772 cylinders (tracks),
(71,132,960 sectors total)
– Avg. seek: read 5.2 ms, write
6.0 ms (Max. seek: 12/13,1
track: 0.6/0.9 ms)
– $2100 or 17MB/$ (6¢/MB)
(list price)
– 0.15 ms controller time
source: www.seagate.com
Slide 14
TD Saw 2 Error Messages per Day
• SCSI Error Messages:
– Time Outs: Response: a BUS RESET command
– Parity: Cause of an aborted request
• Data Disk Error Messages:
– Hardware Error: The command unsuccessfully
terminated due to a non-recoverable HW failure.
– Medium Error: The operation was unsuccessful due
to a flaw in the medium (try reassigning sectors)
– Recovered Error: The last command completed with
the help of some error recovery at the target
– Not Ready: The drive cannot be accessed
Slide 15
Tertiary Disk SCSI Time Outs
+ Hardware Failures (m11)
SCSI Bus 0 Disks
SCSI Bus 0
Disk Hardware Failures
SCSI Time Outs
10
9
8
7
6
5
4
3
2
1
0
8/15/9 8/17/9 8/19/9 8/21/9 8/23/9 8/25/9 8/27/9 8/29/9 8/31/9
8 0:00 8 0:00 8 0:00 8 0:00 8 0:00 8 0:00 8 0:00 8 0:00 8 0:00
Slide 16
Can we predict a disk failure?
• Yes, look for Hardware Error messages
– These messages lasted for 8 days between:
»8-17-98 and 8-25-98
– On disk 9 there were:
»1763 Hardware Error Messages, and
»297 SCSI Timed Out Messages
• On 8-28-98: Disk 9 on SCSI Bus 0 of
m11 was “fired”, i.e. appeared it was
about to fail, so it was swapped
Slide 17
Tertiary Disk:SCSI Bus Parity Errors
SCSI Bus 2 Disks
SCSI Bus 2
SCSI Parity Errors
15
10
5
0
9/2/98 9/12/98 9/22/98 10/2/98 10/12/9 10/22/9
0:00
0:00
0:00
0:00 8 0:00 8 0:00
Slide 18
Can We Predict Other Kinds of Failures?
• Yes, the flurry of parity errors on m2
occurred between:
– 1-1-98 and 2-3-98, as well as
– 9-3-98 and 10-12-98
• On 11-24-98
– m2 had a bad enclosure
 cables or connections defective
– The enclosure was then replaced
Slide 19
User Decision Support Demand
vs. Processor speed
Database
demand:
2X / 9-12 months
100
“Greg’s Law”
Database-Proc.
Performance Gap:
“Moore’s Law”
CPU speed
2X / 18
months
10
1
1996
1997
1998
1999
2000
Slide 20