Transcript 2 - SLAC

Department of Particle & Particle Astrophysics
Sea-Of-Flash-Interface
SOFI introduction and status
The PetaCache Review
Michael Huffer, [email protected]
Stanford Linear Accelerator Center
November 02, 2006
Department of Particle & Particle Astrophysics
Outline
•
•
•
•
•
•
Background
– History of PPA involvement
– Synergy with current activities
Requirements
– Usage model
– System requirements
– Individual client requirements
Implementation
– Abstract model and features
– Building Blocks
Deliverables
– Packaging
Schedule
– Status
– Milestones
Summary
– Reuse
– Conclusions
2
Department of Particle & Particle Astrophysics
Background
•
Research Engineering Group supports a wide range of activities with limited resources
– LSST, SNAP, ILC, SiD, EXO, LHC, LCLS, etc
•
In order to utilize these resources most effectively requires understanding:
– core competencies
– the requirements of future electronics systems
•
Two imperatives for REG:
– Support upcoming experiments
– Build for the future by advancing core competencies
•
What are:
– More detailed examples of a couple of upcoming experiments?
– The necessary core competencies?
3
Department of Particle & Particle Astrophysics
LSST
“The Large Synoptic Survey Telescope (LSST) is a
proposed ground-based 8.4-meter, 10 square-degreefield telescope that will provide digital imaging of faint
astronomical objects across the entire sky, night after
night. In a relentless campaign of 15 second
exposures, LSST will cover the available sky every
three nights, opening a movie-like window on objects
that change or move on rapid timescales: exploding
supernovae, potentially hazardous near-Earth
asteroids, and distant Kuiper Belt Objects. The superb
images from the LSST will also be used to trace
billions of remote galaxies and measure the
distortions in their shapes produced by lumps of Dark
Matter, providing multiple tests of the mysterious Dark
Energy.”
SLAC/KIPAC is lead institution for the camera
– Camera contains > 3 gigapixels
• > 6 gigabytes of data/image
• Readout time is 1-2 seconds
– KIPAC delivers camera DAQ system
4
Department of Particle & Particle Astrophysics
SNAP
•
SLAC is lead institution for all non-FPA related electronics
– One contact every 24 hours
– Requires data to be stored on board instrument
– Storage capacity is roughly 1 Terabyte (includes
redundancy)
– Examining NAND flash as solution to storage problem
“The Supernova/Acceleration Probe (SNAP)
satellite observatory is capable of measuring
thousands of distant supernovae and
mapping hundreds to thousands of square
degrees of the sky for gravitational lensing
each year. The results will include a detailed
expansion history of the universe over the
last 10 billion years, determination of its
spatial curvature to provide a fundamental
test of inflation - the theoretical mechanism
that drove the initial formation of structure in
the universe, precise measures of the
amounts of the key constituents of the
universe, ΩM and ΩL, and the behavior of the
dark energy and its evolution over time.”
5
Department of Particle & Particle Astrophysics
Core competencies
•
•
•
•
•
System on Chip (SOC)
– Integrated processors and functional blocks on an FPGA
Small footprint, high performance, persistent, memory systems
– NAND Flash
Open Source R/T kernels
– RTEMS (Real-time Executive for Multiprocessor Systems)
High performance serial data transport and switching
– MGTs (Multi-Gigabit Transceivers)
Modern networking protocols:
– 10 Gigabit Ethernet
– InfiniBand
– PCI-Express
6
Department of Particle & Particle Astrophysics
PetaCache consistent with mission?
Project
Use core technology?
SOC
Memory
R/T kernels
H/S transport
LSST
yes
no
yes
yes
SNAP
no
yes
yes
no
Petacache
yes
yes
yes
yes
Main Entry: syn·er·gy
Pronunciation: 'si-n&r-jE
noun
Inflected Form(s): plural -gies
Etymology: New Latin synergia, from Greek synergos working together
1:
broadly : combined action or operation
2 : a mutually advantageous conjunction or compatibility of distinct business
participants or elements (as resources or efforts)
Function:
SYNERGISM
;
7
Department of Particle & Particle Astrophysics
Usage model
•
System requirements:
– Scalable, both in:
• Storage capacity
• Number of concurrent clients
– Large address space
– Random access
– Support population evolution
•
Data Storage
distribution, transport & management
Features:
– Changes are quasi-adiabatic
• “Write once, read many”
– Able to treat as Read-Only system
•
Requirements not addressed in this phase:
– Access Control
– Redundancy
– Cost
client
Host
“Lots of storage, shared concurrently by many clients, distributed over a large number of hosts”
8
Department of Particle & Particle Astrophysics
Client Requirements
•
Uniform access time to fetch a “fixed” amount of data from storage
– Implies: deterministic and relatively “small” latency in round-trip time
• Where: “fixed” is O(8 Kbytes) & “small” O(200 micro-seconds)
•
– Need approximately 40 Mbytes/sec between client & storage
Access time scales independent of:
– Address
– Number of concurrent clients
Petacache project focus is
on this issue alone
Two contributions to latency:
– Storage access time
– Distribution, transport, and management overhead
SOFI architecture attempts to
address both issues
9
Department of Particle & Particle Astrophysics
Abstract model
•
Key features:
– Available concurrency and bandwidth
scales with storage capacity
– Many individual “Memory servers”
Memory server
Flash Memory Controller
(FMC)
• Access granularity is 8 Kbytes
• 16 GBytes of Memory/server
• 40 Mbytes/sec/server
– Load Leveling
• Data randomly distributed over memory
servers
• Multicast for concurrent addressing
• Both client & server side caching
– Two address spaces
• Physical page access
• Logical block access
• Hides data distribution from client
– Network Attached storage
Client
Content
Addressable
Switching
10
Department of Particle & Particle Astrophysics
Building Blocks
Four Slice Module
(FSM)
256 GByte Flash
Slice Access
Module
(SAM)
Cluster Inter-Connect Module
(CIM)
1 Gbyte/sec
PGP (Pretty Good
Protocol)
Host Inter-connect
10 GEthernet
Application
specific
Host
Client Interface (SOFI)
(1 of n)
1 Gigabit Ethernet (.1
GByte/sec)
(1 of n)
Network Attached Storage
8 x 10 G-Ethernet
(8 GByte/sec)
11
Department of Particle & Particle Astrophysics
Four Slice Module (FSM)
to PHY
CRC-In
CRC-Out
PGP & command encode/decode
initiator
PHY
in-bound transfer
in-bound
out-bound
out-bound transfer
& encode
arbiter
arbiter
& decode
FMC1
FPGA
clock
configuration
FMC2
FMC3
1 DIMM (8
devices)
32 GBytes
FMC4
1 x 4 slices
to DIMM
12
Department of Particle & Particle Astrophysics
Flash Memory Controller (FMC)
•
•
•
•
Implemented as Core IP
Controls 16 GBytes of memory (4 devices) in units of:
– Pages (8 Kbytes)
– Blocks (512 Kbytes)
Queues operations
– Read Page (in units of 128 byte chunks)
– Write Page
– Erase Block
– Read statistics counters
– Read device attributes
Transfers data at 40 Mbyte/sec
13
Department of Particle & Particle Astrophysics
Universal Protocol Adapter (UPA)
The SAM is ½ of a UPA pair
Right side
PPC-405
(450 MHZ)
Fabric clock
Right side
MGT clock
FPGA (SOC)
200 DSPs
Lots of gates
Xilinx XC4VFX60
Right side
Memory
(512 Mbytes)
Micron RLDRAM II
Right side
Configuration
memory
128 Mbytes)
Samsung K9F5608
Right side
Multi-Gigabit Transceivers
(MGT)
8 lanes
Left side
MFD
Reset options
JTAG
Left side
100-baseT
Reset
14
Department of Particle & Particle Astrophysics
UPA Features
•
“Fat” Memory Subsystem
– Sustains 8 Gbytes/sec
– “Plug-In” DMA interface (PIC)
• Designed as a set of IP cores
• Designed to work in conjunction with MGT and protocol cores
•
•
•
•
•
Bootstrap loader (with up to 16 boot options and images)
Interface to configuration memory
Open Source R/T kernel (RTEMS)
100 base-T Ethernet interface
Full network stack
“Think of the UPA as a Single Board Computer (SBC) which interfaces to one or more busses through its MGTs”
15
Department of Particle & Particle Astrophysics
UPA Customization for SAM
•
•
•
•
Implements two cores:
– PGP
– 10-GE
All 8 lanes of MGT used:
– 4 lanes for PGP core
– 4 lanes for 10-GE
Network driver to interface 10G-E to network stack
Executes application code to satisfy:
– Server side of SOFI client interface
• Physical to Logical translation
• Server side caching
– FSM management software
• Proxy FMC command set
• Maintains bad blocks
• Maintains available blocks
16
Department of Particle & Particle Astrophysics
(Cluster Inter-connect Module (CIM)
to SAMs (high-speed)
100
baseT
to SAMs (low-speed)
10
GE
(XUI)
100
baseT
(16)
(4)
(16)
(4)
(16)
(16)
(4)
(4)
(8)
High Speed Switch
Data
(24 x 10-GE)
Fulcrum FM2224
10
GE
(XUI)
Switch
management
(UPA)
(21)
(21)
(22)
Low-speed
Switch
Management
(24 x FE + 4 x GE)
Zarlink ZL33020
1000
baseT
1000
baseT
to host inter-connect (data-network)
to host inter-connect (management-network)
17
Department of Particle & Particle Astrophysics
Client/Server Interface
•
•
•
•
•
•
•
•
Client Interface resides on host
Servers reside on SAMs
Any one client on any one host has uniform access to all flash storage
Client accesses flash through network Interconnect
Abstract Interconnect model
– Delivered implementation is IP (UDP and multicast services)
Interface delivers three types of services:
– Random Read access to objects within the store
– Population of objects within the store (Write and Erase access)
– Access to performance metrics
Client Interface is Object-Oriented (C++)
– Class library (distributed as a set of binaries and header files)
Two address spaces (physical & logical)
– Client access information only in logical space
– Client is not sensitive to actual physical location of information
– Population distribution is pseudo-random ( static load leveling)
18
Department of Particle & Particle Astrophysics
Addressing
Physical addressing (1 page = 8 Kbytes)
Interconnect
20
Manager
232
Slice
22
Controller
22
Page
221
20 x 232 x 22 x 22 x 221 = 128K peta-pages (1M peta-bytes)
Logical Addressing (1 block = 8 Kbytes)
Interconnect
20
Partition
264
Bundle
264
Block
264
19
Department of Particle & Particle Astrophysics
Using the interface
•
Partition is a management tool
– Segment logically storage into disjoint sets
– One-to-One correspondence between a partition and a server
– One SAM may host more then one server
•
Bundle is an organization tool
– Bundle belongs to one (and only one) partition
– Bundle is an access pattern hint. Allows:
• fetch look-ahead
• optimization of overlapping fetches from different clients
•
•
Both partition and bundle are assigned unique identifiers (over all time)
Identifiers may have character names (alias)
– Assigned at population time
•
Client query is composed of: partition/cluster/offset/length
– offset is expressed in units of blocks
– length is express in units of bytes
Client may query by either identifier or alias
•
20
Department of Particle & Particle Astrophysics
Deliverables
•
•
•
•
Two FSMs (8 slices)
– 1/2 TByte
Two SAMs
– Enough to support FSM operations
Client/Server interface (SOFI)
– Targeted to Linux
How will the hardware be packaged?
– Where packaging is defined as:
• How the building blocks are partitioned
• The specification of the electro-mechanical interfaces
21
Department of Particle & Particle Astrophysics
The “Chassis”
•
•
•
•
2 FSMs/Card
–
1/2 TByte
16 Cards/Bank
–
8 TByte
2 Banks/Chassis
–
64 SAMS
–
1 CIM
–
16 TByte
3 chassis/rack
–
48 TByte
1U
Air-Outlet
1U
Fan-Tray
Passive
Backplane
Supervisor Card (8U)
8U
X2
(XENPACK MSA)
Accepts
DC power
1U
Air-Inlet
Line Card (4U)
22
Department of Particle & Particle Astrophysics
48 TByte facility
1 chassis
Catalyst 6500 (3 x 4 10GE, 2 x 48 1GE)
SOFI Host ( 1 x 96)
xRootD
servers
23
Department of Particle & Particle Astrophysics
Schedule/Status
•
Methodology:
– Hardware
Host
Client API
• Implement 3 “platforms”
logical/physcial translation
cache management
– One for each type of module
IP protocol implementation
• Decouple packaging from architectural &
implementation issues…
– Evaluate layout issues concerning high-speed
signals
– Evaluate potential packaging solutions
– Allow concurrent development of VHDL & CPU code
The “wire”
IP protocol implementation
logical/physcial translation
cache management
– Software
• Emulate FSM component of server software
– Complete/debug in absence of hardware
– Allows clients an “early look” at interface
FSM interface
SAM
24
Department of Particle & Particle Astrophysics
Evaluation platforms
•
•
•
UPA
– Memory subsystem
– Bootstrap loader
– Configuration memory
– RTEMs
– Network stack/network driver interface issues
CIM
– Low and high speed management
– Evaluate different physical interfaces (including X2)
FSM Line card (depending on packaging this could be production prototype)
– FMC debug
– PGP debug
25
Department of Particle & Particle Astrophysics
Schedule
products
activities
SOFI
schematic
Chassis/mechanical
layout
Backplane
5
debug
Line Card PCB
3
spin/load
UPA/10GE driver
design
UPA/PGP
specification
UPA/10GE MAC
implement
RTEMS/UPA
CIM platform
2
4
Supervisor PCB
UPA platform
1
PIC
October
November
December
January
February
March
26
Department of Particle & Particle Astrophysics
Milestones
Milestone
date
RTEMS running on UPA evaluation platform
2rd week December/2006
SOFI (emulation) ready
3rd week January/2007
Supervisor PCB ready for debug
3rd week January/2007
Chassis & PCBs complete
3th week of Febuary/2007
Start Test & Integrate
2nd week of March/2007
27
Department of Particle & Particle Astrophysics
Status
Products
specification
design
implementation
SOFI

in-progress
in-progress
DIMM



FCS



FSM


in-progress
SAM


in-progress
CIM


in-progress
UPA


in-progress
PGP core



10-GE core



The “chassis”



28
Department of Particle & Particle Astrophysics
Products & Reuse
Product
Targeted for use?
Petacache
LSST Camera DAQ
SNAP
LCLS DAQ
Atlas Trigger Upgrade
UPA
yes
yes
no
yes
yes
10-GE core
yes
yes
no
yes
yes
PGP core
yes
yes
no
yes
yes
FCS
yes
no
yes
no
no
CIM
yes
yes
no
yes
yes
FSM
yes
no
no
no
no
SAM
yes
no
no
no
no
DIMM
yes
no
no
no
no
SOFI
yes
no
no
no
no
The “chassis”
yes
maybe
no
maybe
maybe
29
Department of Particle & Particle Astrophysics
Conclusions
•
•
•
Robust and well developed architecture
– Concurrency and bandwidth scale as storage is added
– Logical Address space hides client from actual data distribution
– Network Attached Storage
– Scalable (in size and users)
Packaging solution may need an iteration…
Schedule
– Somewhat unstable, however…
• sequence and activities are to a large degree correct
• risk is in development of 10 GE
•
– Well-along implementation road
Well developed synergy between Petacache and the current activities of ESE
– Great mechanism to develop core competencies
– Many of the project deliverables are directly usable in other experiments
30