Boxwood: Distributed Data Structures as
Download
Report
Transcript Boxwood: Distributed Data Structures as
Boxwood: Distributed Data
Structures as Storage
Infrastructure
Lidong Zhou
Microsoft Research Silicon Valley
Team Members:
Chandu Thekkath, Marc Najork, Nick Murphy
Trends in Storage Systems: Distribution,
Virtualization, and Abstractions
The case for distributed storage architecture
Enables incremental expandability
Benefits performance and reliability
Virtualization facilitates management
Scalability, automatic reconfiguration, load
balancing, and fault tolerance
Abstractions for managing complexity
10/27/2003
Boxwood
2
Going Beyond Virtual Disks
Virtual disk provides a low-level abstraction
Advantages of higher-level abstractions
Complexity pushed to each client and duplicated
Limited “intelligence” due to lack of structural info.
Reduce client complexity and eliminate duplication
Exploit structural information for better load
balancing, pre-fetching, and caching
There is no universal abstraction
10/27/2003
Boxwood
3
Boxwood System Architecture
Supports multiple data abstractions
Space abstraction: Segments named by UIDs
Structure abstractions (B-Link Tree, Hash Table, Skip List)
Structure
Abstractions
B-Link Tree
Hash Table
UID
Space
10/27/2003
Boxwood
4
UID Space Abstraction
Segments in a flat UID space
UID names segments
Deallocated UIDs are never reused
Offloads address management from clients
Clients can provide “hints” (e.g., for co-location)
A virtualization layer
Distributed, fault tolerant, and incrementally
expandable
Performs simple capacity and load balancing
10/27/2003
Boxwood
5
UID Space: Design and Implementation
Client
Space
Clerk
B-Link
Tree
Hash
Table
Consensus
(Paxos)
UID
Space
Disks
10/27/2003
Boxwood
6
Higher-Level Structure Abstractions
UID Space emphasizes universality and simplicity
Structure abstractions more attractive for
sophisticated clients
Co-location, pre-fetching, and caching strategies supported
inside Boxwood
Better abstractions for databases and file systems
First Boxwood structure abstraction: B-Link trees
Supports tree creation/destruction, insert, delete, lookup,
and enumeration
A good abstraction with wide applicability
Enough complexity to expose fundamental issues
10/27/2003
Boxwood
7
B-Link Tree Layer
Built on the UID space
Highly concurrent BLink tree operations
with distributed locking
Logging/recovery to
ensure atomicity of BLink tree operations
10/27/2003
B-Link
Tree
B-Link
Tree
Distributed
Locking
UID Space
Boxwood
8
Current Status
UID space and B-Link tree modules
Distributed UID space with capacity balancing
B-Link tree algorithm with distributed locking
Logging/recovery for tolerating transient failures
Paxos consensus
On-going Work
Run-time verification of the B-Link tree algorithm
(Shaz Qadeer and Serdar Tasiran)
A distributed file system on Boxwood abstractions
10/27/2003
Boxwood
9
File System on Boxwood:
A High-Level Design
B-Link tree abstraction is ideal for directories
and meta-data
Files are implemented using both B-Link tree
abstraction and UID space abstraction
B-Link trees for i-node: mapping from file offsets
to (UID, offset)
UID space for actual user data store
10/27/2003
Boxwood
10
File System on Boxwood:
The Architecture
Client
Client
NFS/CIFS
File
Server
File
Server
B-Link
Tree
B-Link
Tree
Distributed
Locking
UID Space
10/27/2003
Boxwood
11
Future Work
Finish module implementation
load balancing
automatic reconfiguration
chained de-clustered disk
More abstractions and clients to explore
utility, generality, and flexibility of Boxwood
10/27/2003
Boxwood
12
Related Work
Distributed Storage/Operating Systems
Scalable Distributed Data Structures
Virtual/Logical disks
File systems
Database systems
Linear Hash Table (LH) and its variants (Litwin, 1980--present)
Scalable distributed hash table (Gribble, et al., 2000)
Highly concurrent B-tree (Lehman&Yao, 1981; Sagiv, 1986)
10/27/2003
Boxwood
13
Early Experience and Observations
Virtualized distributed storage infrastructure that
exports good abstractions is promising
Multiple layers of abstractions are beneficial
Distributed file system on Boxwood is straightforward
Manages complexity
Caters to the needs of a wide variety of clients
Use of a matching high-level abstraction is ideal
A low-level abstraction offers more flexibility, but requires
more bookkeeping
A good exercise to uncover fundamental
principles in scalable/reliable distributed systems
10/27/2003
Boxwood
14