Filers - Usenix
Download
Report
Transcript Filers - Usenix
Surfing Technology Curves
Steve Kleiman
CTO
Network Appliance Inc.
1
Book Plug
The Innovator’s Dilemma - When New
Technologies Cause Great Firms to Fail
Clayton M. Christensen
2
About NetApp
Two product lines:
Network Attached File Servers (a.k.a. filers)
Web proxy caches: NetCache
Founded in 1992
>$1B revenue run rate
>70% CAGR since founding
>120% last year
3
Filers: Fast, Simple, Reliable and Multi-protocol
System
Sun E 3500/4500
HP-9000 N4000
NetApp 840
CPUs
2
4
1
Overall Resp. Ops per
Result Resp. @ Max SpecRate
8,165
15,270
15,235
3.04
1.91
1.54
23.8
3.7
3.6
20.4
10.4
46
Ops/FS
RAID
340
318
15,235
no
yes
yes
4
Filers: Fast, Simple, Reliable and Multi-protocol
Disk management
Filer finds disks and organizes into RAID groups
and spares automatically
Simple addition of storage
Automatic RAID reconstruction
Data management
Snapshots
SnapRestore
SnapMirror
Simple upgrade
Small command set
5
Filers: Fast, Simple, Reliable and Multi-protocol
Built-in RAID
Easy hardware maintenance
Hot plug disk, power, fans
Low MTTR
Cluster Failover
Autosupport
>99.995% measured field availability
6
Filers: Fast, Simple, Reliable and Multi-protocol
NFS
CIFS
CIFS and NFS attributes
HTTP
FTP
DAFS
Internet Cache
FTP
Streaming media
7
Wave 1:
Networks, Appliances and Software
8
Network and Storage Bandwidth
Year
1992
1994
1996
1998
2001
Storage
10 MB
20 MB
40 MB
100 MB
200-400 MB
Network
0.1 MB
1 MB
10 MB
100 MB
1000 MB
Penalty
100-to-1
20-to-1
4-to-1
1-to-1
.2-to-1
9
The Appliance Revolution
1980s
(General Purpose)
1990s
(Appliance Based)
UNIX
UNIX
Application
Windows/NT
Print
File Service
Filer
Routing
Router/Switch
Printer
...
...
10
Appliance philosophy
Appliance philosophy breeds focus
External simplicity internal simplicity
RISC argument
Don’t have to be all things to all people
Limited compatibility constraints
Interfaces are bits on wire
Think different!
Can innovate with both software and hardware
11
Filer Architecture
Commercial off-the shelf chips
Any appropriate architecture
CPU
i486 Pentium Alpha ‘064
Alpha‘164 PIII
Board level integration
1 or more CPUs (4)
1 or more PCI busses (4)
High bandwidth switches
Multiple memory banks
Integrated I/O
NVRAM
Mem
PCI
NVRAM
12
Roads Not Taken
No “unobtainium”
Minimalist infrastructure
No special purpose busses
No big MPs
Motherboards only: no cache coherent
backplanes
No functionally distributed computers
No special purpose networks (e.g. HIPPI)
No block access protocols
13
DataOnTap Architecture
Daemons, Shells, Commands
Java Virtual Machine
Lib
ATM
GbE
FDDI
100BT
VI
NIC*
NFS
FCAL
TCP/IP
CIFS
WAFL
RAID
Disk
HTTP
SCSI
VIPL
DAFS
SK
* VI supported on FC, (Future: GbE, Infiniband)
14
DataOnTap
Simple Kernel
Message passing
Non-preemptive
Sample optimizations
Checksum caching
Suspend/Resume
Cache hit pass through
15
WAFL: Write Anywhere File Layout
Log-like write throughput
No segment cleaning (LFS)
Write data allocated to optimize RAID performance
Delayed write allocation
Active data is never overwritten (shadow paging)
On-disk data is always consistent
File system state is changed atomically
Every 10 sec, by default
Client modification requests are logged to NVRAM
NVRAM log is replayed only on reboot
16
Wave 2:
Memory-to-Memory Interconnects
(a.k.a NUMA, NORMA)
17
Problem:
Remove single points of failure
Without doubling hardware
Minimizing performance overhead
Without decreasing reliability
18
Clustered Failover Architecture
Network
Filer 1
ServerNet
Filer 2
NVRAM
NVRAM
Fibre Channel
Fibre Channel
19
Memory-to-Memory Interconnects
Efficient transfer model
Allows minimal overhead on receiver
Scaleable Bandwidth
High speed ASIC based switching
Gigabit technology
Open architecture
PCI, not coherent bus interface
Incorporate multiple technologies
Relatively inexpensive
20
Mirroring NVRAM
NVRAM is split into
local and partner
regions
PCI Bus
ServerNet
DMA
Data is assembled in
NVRAM
Data is DMAed from
NVRAM to equivalent
offset in remote node
To partner
NVRAM
NVRAM data
from partner
CPU
NVRAM
Client reply is sent
when log entry DMA
completes
21
Leveraged Components
Memory-to-Memory interconnects
Low overhead, high-bandwidth, cheap
WAFL
Always consistent file system
Built-in NVRAM logging/replay
Fibre Channel disks
Two independent ports
Single function appliance software
Simple, low-overhead failover
22
Wave 3:
The Internet
23
The Consequences of
Higher-speed Internet Access
200K-400K home cable head-end
Requires 1.5-3Gbps access capability
30% subscription rate, 20% online
Minimum 128Kbps BW
Enterprise
Remote sites still connected by slow links
Require high-quality access to content
Overloaded web servers
ISP
Require distribution and caching of large
media files
24
Yet Another Appliance
Cisco
NetApp
25
NetCache
HTTP/FTP proxy cache appliance
Highly deployable
Forward and reverse proxy
Transparency
Filtering
iCAP
Enables value added services
Virus scanning, transcoding, ad insertion, …
Stream splitting
Stream caching
Content distribution
26
Cacheable Content
Cacheable
Content
Time
Static
Content
Dynamic
Content
Streaming
Media
27
Wave 4:
The Death of Tapes
28
Using Tapes for Disaster Recovery
Year
Drive
#
Tapes
# Tape drives to
Capacity
Capacity Drives
Required restore in 8 hours
1999
36G
168
6TB
172
21
2000
72G
216
16TB
160
28
2001
144G
500a
72TB
360
63
a: with SAN
29
SnapMirror
Remote asynchronous mirroring
Continuous incremental update
Only allocated blocks are transmitted
Automatic resynchronization after
disconnect
Destination is always a consistent
“snapshot” of source
Filer
WAN
Filer
30
Creating a Snapshot
Before
Snapshot
After
Snapshot
Active FS
Active FS
Snapshot
A
B
C
After
Block Update
D
Disk Blocks
A
B
Active FS
Snapshot
C
D
A
B
C
D
C’
New
Block
31
WAFL: Block Map File
Multiple bits per 4KB block
Column for allocated block
in the active file system
Columns for allocated
blocks in snapshots
Taking a Snapshot
Copy root inode
S1 S2 S3 FS
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
Block 8
32
Consistent Image Propagation
Fast Network or Slow Modification Rate
Source
1
2
3
Destination
4
1
5
4
6
5
6
Slow Network or High Modification Rate
Source
Destination
1
2
3
4
1
5
6
4
33
Wave 5:
Local File Sharing and
Virtual Interface Architecture
34
ISPs: Scalable Services
Scalability
Scale compute power and storage independently
Resiliency
Cost
Commodity hardware and Open Systems standards
Load
Balancing
Switch
Application
Servers
Gigabit
Switch
File
Servers
F760
F760
Internet or
Intranet
F760
Data Center
35
Database
Better Manageability
Offline backup with snapshots
Replication
Recovery from snapshots
Easy storage management
Equal or better performance
Less retuning
F760
36
Local File Sharing
Geographically constrained
1 or 2 machine rooms
Mostly homogeneous clients
Can be large or small
1 - 100 machines
Single administrative control
High performance applications
Web service, Cache
Email, News
Database, GIS
37
Local File Sharing Architecture Characteristics
Applications tend to avoid OS
e.g. No virtual memory
Applications tend to have OS adaptation layer
Different access protocol requirements
e.g. high-performance locking, recovery,
streaming
38
What is VI?
Virtual Interface (VI) Architecture
VI architecture organization
Promoted by Intel, Compaq and Microsoft
VI Developer’s Forum
Standard capabilities
Send/receive message, remote DMA read/write
Multiple channels with send/completion queues
Data transfer bypasses kernel
Memory pre-registration
39
VI Architecture
Application
VIPL
Library
User
Kernel
KVIPL client
Data
Kernel
Control
KVIPL
Module
Hardware
VI compliant
NIC driver
VI compliant
NIC
40
VI-compliant implementations
Fibre channel (FC-VI draft standard)
e.g. Troika, Emulex
Giganet
Servernet II
Infiniband
Enables 1U MP heads
Future: VI over TCP/IP
41
How VI Improves Data Transfer
No fragmentation, reassembly and
realignment data copies
No user/kernel boundary crossing
No user/kernel data copies
Data transfer direct to application buffers
42
Direct Access File System
Application
Buffers
User
File Access API
DAFS
VIPL* API
VIPL
VI NIC
Driver
Kernel
Data
Control
Hardware
NIC
Memory
* VI Provider Layer specification
maintained by the VI Developers Forum
43
DAFS Benefits
File access protocol with implicit data sharing
Direct application access
File data transfers directly to application buffers
Bypasses Operating System
File semantics
Optimized for high throughput and low latency
Consistent high speed locking
Graceful recovery/failover of clients and servers
Fencing
Enhanced data recovery
Leverages VI for transport independence
44
DAFS vs. SAN
Wires
Direct
(direct transfer to memory)
Block
Local
Attached
Network
(TCP/IP)
SCSI over IP
SAN
Protocols
File
DAFS
NAS
45
Summary
Wave 1: Filers
Technology: Fast networks, commodity
servers
Environment: Appliance-ization
Wave 2: Failover
Technology: Memory-to-memory
interconnects, Dual ported FC disks
Environment: 24x7 requirements
Wave 3: NetCache
Technology: Internet, HTTP
Environment: High BW requirements, POP
deployability
46
Summary
Wave 4: SnapMirror
Technology: Disk areal density, Fibre
Channel, fast networks
Environment: Cost of downtime for recovery
Wave 5: DAFS
Technology: VI architecture
Environment: Local file sharing
47