ppt - CIS @ Temple University

Download Report

Transcript ppt - CIS @ Temple University

Mariposa: a wide-area
distributed database
system
Kumar Ramdurgkar.
CIS 661
Mariposa Distributed Database
Management System
Principal Investigator: Prof. Michael Stonebraker
SECTION 1
 Introduction
to Mariposa
LAN Vs WAN databases
LAN database management is common
most often used in industries where the
data is local to the installation.
 LAN has a single RDBMS source.
 LAN is maintained by a well defined set of
rules, data types, and services.

The difference ?
WAN Databases
Many databases interconnected over a
WAN
 In WAN there are many sites participating
in the DBMS
 Different site administrators.
 Different data types, extensions
and service handling times.
 How do we interconnect ?
 What are the issues ?

Issues and problems
Network connections and traffic.
 Different ‘load’ handling capabilities and
service times.
 Different data type and extensions.
 A single program acting as a query
optimizer will NOT work

continued…
Issues and problems
Cost based optimization does not respond
well to site specific type extensions and
access constraints, charging algorithms
and time-of-day constraints.
 No proper scaling for LAN algorithms to
suite WAN DBMS

The Solution…
An excellent idea ! MARIPOSA

UBID !! Have you been there ??

The Mariposa is a distributed
DBMS working on the economic
paradigm of Bidding.
Mariposa was proposed by:
Michael Stonebraker, Paul M. Aoki,
Witold Litwin, Avi Pfeffer, Adam Sah,
Jeff Sidell, Carl Staelin, Andrew Yu
Proposed: Nov 1994 Accepted: Sept 1995
Mariposa… vision
Standard approach for distributed data.
 A set of standard guidelines for WAN
databases.
 Application of query storage and
optimization using a different perspective.
 Scalability and data explosion handling.
 A query optimizer for the WWW ??


Need to formalize
WAN Guidelines for Mariposa
Scalability to a large number of
cooperating sites.
 Data mobility.
 No global synchronization of data.
 Total local autonomy and complete control.
 Easily configurable policies for changing
the behavior of Mariposa.

Mariposa System architecture
Microeconomic mechanisms.
 All Mariposa clients and servers have a
account with a network bank.
 A user allocates a budget in the currency
of this bank to each query.
 The goal of the query processing system
is to solve the query within the allotted
time by contracting various Mariposa
clients.

Mariposa Broker mechanism
Obtain bid pieces for a query from sites.
 Uses a distributed advertising system as
over the usual META – DATA mechanisms
used in LAN.
 The server who has advertised the best
time for the given query wins.

Scalability
Site can join Mariposa by buying ‘objects’
and advertising services
 Site can leave Mariposa by selling objects
and by ceasing to bid.
 Hence a highly scalable system.


Infact the success of Mariposa depends
on a large number of sites participating in
the system.
Storage decisions
Objects have no notion of home.
 All secondary indices are moved with the
objects.
 Avoidance of global sync is simplified
because of the economic paradigm.
 Mariposa fosters data mobility and free
trade of objects
 Object here means ‘data’

Total local control
Since each Mariposa site is free to bid on
any business of interest, it has total local
autonomy.
 Each site is expected to maximize its
individual profit per unit of operating time
and to bid on those queries that it feels will
accomplish this goal.

Sounds good… any drawbacks ??
Some queries may not be solvable either
because nobody will bid on them or the
minimum bids exceeds what the client is
willing to pay.
 A site can refuse to give up objects
 A site may not find buyers for objects that
it wants to sell.

SECTION 2
Mariposa
architecture
Mariposa Architectural details
Hardware Flow chart
 Processes (bidding, bid protocols,
acceptance, finding bidders, sub–query
bidding, network bidding, splitting and
combining)
 Code languages (RUSH)
 Mariposa experiments and results
 Conclusions

Architecture overview





Client query in SQL3
Middleware consists
of several query
separator and query
broker.
Broker and Bidder
coded in RUSH.
Local execution at
the site that wins the
bid.
Details…
Architecture details
Processes : Bidding
Each query Q has a budget B(t) that can
be used to solve the query
 The budget is a value the user gives to
solve this query.
 Broker receives query plan for Q and tries
to bid and solve each fragment using
either the expensive bid protocol or a
cheaper purchase order protocol.

Processes : Bidding
Brokers split each query into sub queries
and bid for each sub query
 There is a set sequence of sub query
execution.
 Finding the right winners is implemented in
a greedy algorithm at the broker.

Processes : Bid Protocols

The expensive bid protocol has 2 phases:
 Broker
sends requests and Bidder sends back
triplet value (Ci, Di, Ei) indicating cost Ci for
Delay of Di and expiration of bid is Ei (for Qi)
 The broker notifies winners (and losers).

The purchase order protocol is faster and
involves the Broker sending the query to
the site it is most likely to be processed.
There is a risk that the query might not be
processed in the given time.
Finding Bidders
Brokers examine ‘Ad Tables’ to find out the
servers that are willing to perform the task
at hand.
 Using records in an Ad Table the server
posts its bids.
 Ad tables typically have the bidding
information for the sample query
structures run on that server.

Sample Ad Table design

Not all fields might be used
Bidding strategies
Bulk purchase contracts allowing lower
than normal bids (wholesale)
 Coupons
 Sale
 Broker intelligence (remember last
successful bid history and try that site
query combination again)

Processes: Network Bidding
Account for network bandwidth.
 Data size comes into the consideration.
 Minimum available bandwidth is calculated
from node to node.
 This bandwidth must be reserved to
achieve desired performance.
 Mariposa uses Telnet protocols RTIP and
RCAP for network bidding.

Coding (RUSH language)
Mariposa provides a low level, very
efficient embedded scripting language and
rule system called Rush
 Using Rush, it is straightforward to change
policy decisions; one simply modifies the
rules by which these modules are
implemented.
 The Mariposa architecture is primarily
coded in Rush.

SECTION 3
Mariposa
results
experiments and
Operational system
Mariposa operational on Digital Equipment
Corp. Alpha AXP workstations. UC
Berkeley,
The basic server engine is that of
POSTGRES.
 Implementation of the Rush language itself
has required careful design and
performance engineering.
 Requirement of multithreaded network
communication package.

Experiment setup
Workstations connected by 10MB/s
ethernet
 WAN experiments conducted at night.
 The benchmark database consists of three
tables, R1, R2 and R3.
 The workload query is an equijoin of all
three tables:

SELECT * FROM R1, R2, R3
WHERE R1.u1 = R2.u1
AND R2.u1 = R3.u1

In the wide area case, the query originates at
Berkeley and performs the join over the WAN
connecting UC Berkeley,UC Santa Barbara and
UC San Diego.
Timing Results
Conclusions
Mariposa, a prototype data management
system that unifies the best features of
distributed operating system and
distributed database management system
research.
 Distributed query optimization has been
identified as an area that will receive a
strong emphasis and we will also examine
how to build a system that has a rule
system at its core.

Conclusions

Future work remains in the areas of
system robustness, distributed failure
recovery, and performance assessment.
References

Mariposa home
http://s2k-ftp.cs.berkeley.edu:8000/mariposa/index.html
Thank you.