Administrivia - Andrew.cmu.edu
Download
Report
Transcript Administrivia - Andrew.cmu.edu
Information Resources
Management
April 17, 2001
Agenda
Administrivia
Database Architectures
Administrivia
Homework #8
Database Architectures
Centralized
Client-Server
Parallel - single site
Distributed - multiple sites
Database Architectures
Centralized
Client-Server
Distributed
(Parallel)
Function
Data
Centralized
PC, Mini, or Mainframe
Single Database
Single Database Manager
One or More Users
Data and Function in One Place
Client-Server
PCs to Mainframes to Minis
PC to PC
Mainframe to Mainframe
Use Desktop Processing Power
Better User Interface
Greater Functionality
Retain Centralized Control of Data
Client-Server: Basic Model
Client
Client
Request
Result
Server
Client
Client
Client
Servers
Supercomputer
Mainframe
Mini
PC Server
All retain all data
Client-Server Architecture
Data
Function
Thin
Fat
Client
Server
(Back-End)
Client
Client
(Front-End)
Functionality
Presentation
I/O Processing
Validation
Business Rules
Application Logic
Data Management
Validation
Error Handling
“Thin” Client
Presentation Services Only
Accept Input
Format Output
Display
Server does all processing
“Fat” Client
Presentation
Validation
Application Logic - Programs
Data Management
Send SQL to Server
Server is just DBMS
“In Between” Client
Client
Presentation
Some Application Logic
Server
Some Applicaton Logic
Data Management and Services
Benefits of Client-Server
Use Local Processing Power
Better User Interface
Some Functionality if System Down
Use Sunk Costs of PCs
Support Reengineering
Support Intranets
Flexibility, Scalability, Customizeability
Challenges of Client-Server
Cost of (Upgraded) PCs
Network Reliance
Distributing Application Updates
Management of Complex System
Problem Identification & Resolution
Application Partitioning
Other Client-Server
Architectures
Traditional is Two-Tiered (client-server)
Three-Tiered
Client-Application Server-DB Server
(PC - Mini - Mainframe)
(PC - PC Server - Mainframe)
Beyond Three
PC - PC Server - Web Server - Mini
- Mainframe
Client-Server vs. Distributed
Client-Server: Application Distribution
Distributed: Data Distribution
Often, “client-server” is used to refer to
either application distribution or data
distribution or both.
Middleware
What if
Multiple databases (sources) need to
be accessed from a single client?
Different kinds of clients?
Mix of clients and servers?
Want to take advantage of existing
base of applications (legacy
systems)?
Middleware
Fat Clients just send SQL transactions
Other types of transactions may be
needed based on the server (system)
Middleware
Software that shields applications from the
complexity of the operating environment.
Client
Client
Client
Middleware
System
System
(Legacy)
(Legacy)
Types of Middleware
Transaction Process (TP) Monitor
Database Middleware
Remote Procedure Call (RPC)
Message-Oriented Middleware (MOM)
Object-Request Brokers
(CORBA - ORB)
TP Monitor
Synchronous - sender must wait
Queuing
Message Delivery
Insured Delivery
Either Direction
Database Middleware
Variety of Clients/Platforms
Variety of Servers/DBMSs/Platforms
Specific to DB transactions (SQL)
Message-Oriented
Middleware (MOM)
Asynchronous - clients do not wait
Queues & Queue
Management/Recovery
Message Delivery
Insured Delivery
Either Direction
(like email or EDI only transactions)
Advantages of Middleware
Leverage sunk costs (legacy systems)
Reduce development cost
Reduce development time
Increase responsiveness
Improve overall systems management
Consolidate diffuse information
Challenges of Middleware
Cost
Session management - Transaction
state
Security
Network reliance
Diversity of systems - lack of standards
Constant technology change
Availability of talent
Middleware Management
Parallel and Distributed
Client-Server is an attempt to improve
performance
Reduce time to execute a transaction
Parallel
Reduce time to get the data
Distributed
Parallel Systems
Single site for data
Very Large databases
Operations performed simultaneously
Parallel Database
Architecures
Shared Memory
Shared Disk
Shared Nothing
Hierarchical
Shared Memory
P
P
P
M
Shared Memory
Advantages
Extremely efficient communications
Disadvantages
Max of 32/64 processors
Bus becomes bottleneck
Shared Disk
M
P
M
P
M
P
Shared Disk
Advantages
No bus bottleneck
Fault tolerance provided
Disadvantages
Disk access becomes bottleneck
Shared Nothing
M
P
P
M
P
M
Shared Nothing
Advantages
No disk bottleneck
Highly scaleable
Disadvantages
High communication overhead/cost
Between processors
To another processor’s data
Hierarchical
P
M
P
M
P
P
P
M
Hierarchical
Advantages
Best of all worlds
Disadvantages
Worst of all worlds
Some high communcation
overhead/cost
Between subsystems
Complexity
Distributed Databases
Client-Server - distribute functionality
What about distributing data?
Distributed Databases
Overview
Distributed Storage
Distributed Queries
Distributed Transactions
Multidatabase (Middleware)
Distributed Databases
Multiple locations
Single logical database
Several physical databases
Network connections
Advantages
Sharing across locations
Local control
Availability
Challenges
Development costs
People & Equipment
Testing
Problem identification & resolution
Technical expertise
Network dependence
Increased processing overhead
Distributed Data Storage
Replication
Fragmentation
Both
Replication
Data is repeated
Spectrum of options available
Temporary replication of specific rows
Replicate infrequently changed data
Replicate by site
Central site - all / each local site their data only
Full replication
Everything everywhere
Concerns with Replication
Availability needed
Amount of parallelism in reads
Overhead of updates
Keeping replicas updated
Conflicting updates
Fragmentation
Partitioning
Divide data into subsets based on need
Have to be able to pull back together to
get original tables
Fragmentation
Horizontal
by rows
specified conditions
Vertical
by column
each requires primary key (or created
key)
Mixed
by row and column
Fragmentation & Replication
Repeat as necessary:
Replicate fragments
Fragment replicas
Don’t lose track of what you have and
where it is!
Network Transparency
Distributing data should not require that
the user know where or how it’s been
distributed.
The database should be seen as a
single entity no matter how fragmented
and replicated it becomes.
Network Transparency
Some DBMSs are starting to provide
this level of functionality so
transparency exists even at the program
level, but in many cases this
“transparency” must be programmed
into the applications.
It must always be designed into the
database.
Distributed Queries
How do you query data that is
everywhere?
Effeciency vs. Overhead
Splitting the query apart
Keeping track of the data/locations
Making sure everything gets executed
Putting the results back together
Generating network traffic
Handling partial results
Distributed Queries
Full replication can avoid the overhead
Huge increase in update overhead
Parallel execution no longer possible
Additional costs of replication
Example
5 sites - NY, Pgh, Chicago, Dallas, Los
Angeles
Data fragmented by site - no replication
Query (in Pgh):
SELECT Name, Max (Salary) from
Employee
Option 1 - High Bandwidth
1. Have all sites send their full employee
tables to Pgh.
2. Build a temporary employee table.
3. Run the query against this table.
Option 2 Not so High Bandwidth
1. Examine the query and determine it
can be run separately at each location
and the results combined.
2. Submit just the query to each location.
3. Wait for the results from each city.
4. As results return, build a temporary
table (5 rows only).
5. Find the max using the temporary
table.
Distributed Transactions
Transaction Types
Coordinators
Commit Protocols
Concurrency Controls
Deadlocks
Transaction Types
Local - transaction only needs local data
Global - transaction uses non-local data
My global becomes someone else’s
local
Either type of transaction must still have
ACID properties - global is the concern
System Structure
Things to do:
1. Process local transactions
(transaction manager)
2. Process and track global transactions
(transaction coordinator)
Global Processing
1. Recognize as global
2. Break up transaction
3. Distribute pieces
4. Assemble results
5. Coordinate termination
6. Handle problems
Coordinator of Coordinators
Coordinate among sites
Detect problems
Attempt to fix
Share status with others
Coordinator Failure
Backup Coordinator
receives all messages - maintains
state
monitors coordinator
automatically takes over if coordinator
down
avoids delays - increases overhead
Election
highest pre-assigned number
Commit Protocols
Two-Phase
Three-Phase
All sites must commit or all sites have to
rollback
Replicated data only
Two-Phase Commit
Phase 1
Send PREPARE to all sites
Sites respond READY or ABORT
Phase 2
If all sites READY,
COMMIT locally - Send COMMITs
If not READY or time expires
ROLLBACK locally - Send
ROLLBACK
Two-Phase Commit
Coordinator
Site
Site
Site requests commit
Site
Two-Phase Commit Phase 1
Coordinator
Site
Site
Site
Send PREPARE - all sites
Two-Phase Commit Phase 1
Coordinator
Site
Site
Sites respond READY
Site
Two-Phase Commit Phase 2
Coordinator
Site
Site
COMMIT locally
Site
Two-Phase Commit Phase 2
Coordinator
Site
Site
Send COMMIT - all sites
Site
Two-Phase Commit Phase 1
Coordinator
Site
Site
Site responds ABORT or
does not respond
Site
Two-Phase Commit Phase 2
Coordinator
Site
Site
ROLLBACK locally
Site
Two-Phase Commit Phase 2
Coordinator
Site
Site
Site
Send ROLLBACK - all sites
Site Failure - Recovery
COMMIT and ROLLBACK as normal
If READY only
Check with coordinator or other sites
Either COMMIT or ROLLBACK
If no one found, ROLLBACK
Coordinator Failure
Ask the sites
If one has COMMIT, then REDO
If one has ROLLBACK, then UNDO
If one doesn’t have READY, UNDO
If all READY only
Coordinator must decide
Sites must wait and locks are held
“Blocking” occurs
Three-Phase Commit
Phase 1
Sent PREPARE
Sites respond READY or ABORT
Phase 2
If all sites READY, send PRECOMMIT
Else, ROLLBACK
Sites must ACKNOWLEDGE
Phase 3
If at least K sites ACKNOWLEDGE, send
COMMIT
Coordinator Failure
Three-Phase Commit prevents blocking
If coordinator fails
New coordinator is selected
Sites queried to determine status
New coordinator resumes
Network Partitioning
Network split creates two separate
networks
Each “half” selects a coordinator
Coordinators make independent
decisions
Result could be different decisions
Resolution of network problem may
create need to resolve database
problems
Concurrency Control
Single Lock Manager
Multiple Lock Managers
Single Lock Manager
One site for all locking
All other sites must go to it
Can read from anywhere
Updates must be to all copies
Advantages: Simple, Easy deadlock
detection
Disadvantages: Bottleneck, Vulnerability
Simple Multiple Lock Mgrs
Each site locks a unique partition of the
data
non-replicated data
Advantages: Fairly simple, reduced
bottlenecks
Disadvantages: Complicated deadlock
detection
Majority Protocol
Each site locks its own data
replication possible
Request owner for lock on data that isn’t local
When multiple owners, n/2 + 1 (majority)
must provide the lock
Advantages: No bottlenecks
Disadvantages: More messages sent,
Complicated deadlock detection, More
deadlocks (each gets 1/2)
Biased Protocol
Reduced form of Majority Protocol
For a READ, only need any single lock
For a WRITE, need all locks
Advantages: No bottle necks, Reduced traffic
Disadvantages: Update traffic, Deadlocks
Primary Copy
Site designated to hold “primary” copy
Multiple sites
Replicated Data
All locks through that site
Advantages: Fairly simple, reduced
bottlenecks
Disadvantages: Vulnerability, Complicated
deadlock detection
Other Than Locking
Timestamps
Centralized generation
Local generation
Timestamp tests determine ability to
read or write
Deadlocks & Distributed Data
Centralized
One Site
Distributed
Centralized - same advantages and
disadvantages as other centralized
control (database or locking)
Distributed Deadlock
Detection
Each site tracks all transactions accessing its
own data
Dummy transaction for transactions that
originated here but are executing elsewhere
If deadlock found that includes dummy
transaction
Must send deadlock information to other
sites
They check for deadlock
May have to pass on to another site
Homework #9
Continuuing with the Carnegie Library
Client/Server
Distrributed Database