Boxwood: Distributed Data Structures as

Download Report

Transcript Boxwood: Distributed Data Structures as

Boxwood: Distributed Data
Structures as Storage
Infrastructure
Lidong Zhou
Microsoft Research Silicon Valley
Team Members:
Chandu Thekkath, Marc Najork, Nick Murphy
Trends in Storage Systems: Distribution,
Virtualization, and Abstractions

The case for distributed storage architecture



Enables incremental expandability
Benefits performance and reliability
Virtualization facilitates management
Scalability, automatic reconfiguration, load
balancing, and fault tolerance

Abstractions for managing complexity
10/27/2003
Boxwood
2
Going Beyond Virtual Disks

Virtual disk provides a low-level abstraction



Advantages of higher-level abstractions



Complexity pushed to each client and duplicated
Limited “intelligence” due to lack of structural info.
Reduce client complexity and eliminate duplication
Exploit structural information for better load
balancing, pre-fetching, and caching
There is no universal abstraction
10/27/2003
Boxwood
3
Boxwood System Architecture
Supports multiple data abstractions


Space abstraction: Segments named by UIDs
Structure abstractions (B-Link Tree, Hash Table, Skip List)
Structure
Abstractions
B-Link Tree
Hash Table
UID
Space
10/27/2003
Boxwood
4
UID Space Abstraction

Segments in a flat UID space





UID names segments
Deallocated UIDs are never reused
Offloads address management from clients
Clients can provide “hints” (e.g., for co-location)
A virtualization layer


Distributed, fault tolerant, and incrementally
expandable
Performs simple capacity and load balancing
10/27/2003
Boxwood
5
UID Space: Design and Implementation
Client
Space
Clerk
B-Link
Tree
Hash
Table
Consensus
(Paxos)
UID
Space
Disks
10/27/2003
Boxwood
6
Higher-Level Structure Abstractions


UID Space emphasizes universality and simplicity
Structure abstractions more attractive for
sophisticated clients



Co-location, pre-fetching, and caching strategies supported
inside Boxwood
Better abstractions for databases and file systems
First Boxwood structure abstraction: B-Link trees



Supports tree creation/destruction, insert, delete, lookup,
and enumeration
A good abstraction with wide applicability
Enough complexity to expose fundamental issues
10/27/2003
Boxwood
7
B-Link Tree Layer



Built on the UID space
Highly concurrent BLink tree operations
with distributed locking
Logging/recovery to
ensure atomicity of BLink tree operations
10/27/2003
B-Link
Tree
B-Link
Tree
Distributed
Locking
UID Space
Boxwood
8
Current Status

UID space and B-Link tree modules





Distributed UID space with capacity balancing
B-Link tree algorithm with distributed locking
Logging/recovery for tolerating transient failures
Paxos consensus
On-going Work

Run-time verification of the B-Link tree algorithm
(Shaz Qadeer and Serdar Tasiran)

A distributed file system on Boxwood abstractions
10/27/2003
Boxwood
9
File System on Boxwood:
A High-Level Design


B-Link tree abstraction is ideal for directories
and meta-data
Files are implemented using both B-Link tree
abstraction and UID space abstraction


B-Link trees for i-node: mapping from file offsets
to (UID, offset)
UID space for actual user data store
10/27/2003
Boxwood
10
File System on Boxwood:
The Architecture
Client
Client
NFS/CIFS
File
Server
File
Server
B-Link
Tree
B-Link
Tree
Distributed
Locking
UID Space
10/27/2003
Boxwood
11
Future Work

Finish module implementation




load balancing
automatic reconfiguration
chained de-clustered disk
More abstractions and clients to explore
utility, generality, and flexibility of Boxwood
10/27/2003
Boxwood
12
Related Work

Distributed Storage/Operating Systems




Scalable Distributed Data Structures



Virtual/Logical disks
File systems
Database systems
Linear Hash Table (LH) and its variants (Litwin, 1980--present)
Scalable distributed hash table (Gribble, et al., 2000)
Highly concurrent B-tree (Lehman&Yao, 1981; Sagiv, 1986)
10/27/2003
Boxwood
13
Early Experience and Observations


Virtualized distributed storage infrastructure that
exports good abstractions is promising
Multiple layers of abstractions are beneficial



Distributed file system on Boxwood is straightforward



Manages complexity
Caters to the needs of a wide variety of clients
Use of a matching high-level abstraction is ideal
A low-level abstraction offers more flexibility, but requires
more bookkeeping
A good exercise to uncover fundamental
principles in scalable/reliable distributed systems
10/27/2003
Boxwood
14