Advanced Distributed Software Architectures and Technology group

Download Report

Transcript Advanced Distributed Software Architectures and Technology group

ADSaT
Transactions and Databases
Paul Greenfield
CSIRO
Advanced Distributed Software Architectures and Technology group
1
ADSaT
This Week
• More on transactions
– Left overs
– http://research.microsoft.com/~gray/
wics_99_TP
• Isolation and locking
– How do we achieve isolation?
• Recovery
– How do we recover after failure?
Advanced Distributed Software Architectures and Technology group
2
ADSaT
Why bother with TP?
• Use two-tier apps with
database transactions?
– Business logic in client
and stored procedures
– Fast!
– Scalable?
– Maintainable?
– Cheaper?
– Flexible??
Stored
procedures
Database
Server
Advanced Distributed Software Architectures and Technology group
3
ADSaT
Two-tier Applications
• The most recent ‘legacy’
• Stored procedures
– Different and proprietary languages
– Integrated debugging?
– Re-use in different applications?
• DB connection per client
– Even when not active
Advanced Distributed Software Architectures and Technology group
4
ADSaT
Three-tier Applications
• Business logic written in common or
standard languages (VB, C++, Java)
• Clean separation of business logic
– Easier re-use and maintainability?
• Use server resources only for
active transactions
– Process and connection pooling
Advanced Distributed Software Architectures and Technology group
5
ADSaT
TP Implementation
• What are the TP programs?
–
–
–
–
Small ‘one-shot’ executable programs?
Application programs fed from queue?
Libraries called from a process?
Libraries called from threads?
• Answer have an effect on
performance, integrity and
management
Advanced Distributed Software Architectures and Technology group
6
ADSaT
One-shot Programs
• Old-style solution (CICS, TIP, …)
• Schedule application to run when
transaction request arrives
– Start app, process request, terminate
– Single function per application
• OS/TP monitor support for
– Fast application startup
– Application recycling (reduce overheads)
Advanced Distributed Software Architectures and Technology group
7
ADSaT
Queued Applications
• TP application ‘always’ running
– Instances balanced against load
– Queue of waiting requests
– Application supports multiple functions
• Group functions into applications
• Clients not bound to server applications
– Tune response times
• Faster response time for some transactions
• Multiple copies of critical applications
Advanced Distributed Software Architectures and Technology group
8
ADSaT
TP Processes
Client bound to server process
– Typical CORBA approach
– Queue of requests for each server
– Need to run/manage multiple servers
• Tune response times?
– Can allocate transactions to programs
– Fast, critical transactions delayed?
• Need for load balancing
– Unequal server load possible
Advanced Distributed Software Architectures and Technology group
9
ADSaT
TP Process - Orbix
Server
processes
Server
objects
Waiting requests
Advanced Distributed Software Architectures and Technology group
10
ADSaT
Orbix Example
Milliseconds
Configuration 1: 20 Servers
3500
3000
2500
2000
1500
1000
500
0
buy
create
getholding
query code
queryid
sell
update
100
200
300
400
Number of Clients
Advanced Distributed Software Architectures and Technology group
11
ADSaT
TP Threads
• Thread pool inside a server process
–
–
–
–
No binding from client to thread
Objects live in process address space
Threads have access to all objects
Queue of requests shared by all threads
• No need for load balancing
– No idle/busy processes
– No way to push priority of some
transactions – may not matter?
Advanced Distributed Software Architectures and Technology group
12
ADSaT
TP Threads - MTS
Server
threads
Waiting
requests
Proxy Active
server server
objects objects
Advanced Distributed Software Architectures and Technology group
13
ADSaT
MTS Example
Transaction times - C++ & Keytable
Response time (ms)
5000
100 clients
4500
200 clients
4000
400 clients
3500
600 clients
3000
2500
2000
1500
1000
500
0
Buy
Create
Account
Get Holding QueryCode
Stmnt
QueryID
Sell
Update
Advanced Distributed Software Architectures and Technology group
14
ADSaT
Failure
• Need to isolate faults
–
–
–
–
Failing application takes down what??
Entire application process?
Process holding thread pool?
Entire transaction system?
• Need to run applications as separate
processes or have careful fault traps
Advanced Distributed Software Architectures and Technology group
15
ADSaT
What Goes Where?
• Routing and directories
– Where to send a request message?
– Where to create a remote object?
• Routing tables
– Table of what requests go where
• Directories/name servers
– Database and server that knows who is
providing what service
Advanced Distributed Software Architectures and Technology group
16
ADSaT
Directory/Name Servers
• Map name onto server locations
• Could be part of TP system
– CORBA Name Servers
• Could be part of system-wide
directory
– Active Directory for COM+
• ‘Hard-wiring’ also works
– Administration costs can be high
Advanced Distributed Software Architectures and Technology group
17
ADSaT
Name Servers
• Client asks name server where to
find a service when creating object
• Servers advertise their services to
the name server
• Load balancing by name server
distributing requests over multiple
server processes and systems
Advanced Distributed Software Architectures and Technology group
18
ADSaT
Name Servers
Goods?
Name Server
A
Use object X
on server B
Object
Goods server
Client
App
B
Object
Goods server
Advanced Distributed Software Architectures and Technology group
19
ADSaT
Request Integrity
• What happens to requests on failure
– Transactions ensure database integrity
– Incoming requests can be saved to disk
– Fetch request operation included as
part of transaction
• Undone and request requeued on failure
• Need to avoid failure loops!
• Easy recovery from transient errors
Advanced Distributed Software Architectures and Technology group
20
ADSaT
Response Integrity
• Are responses part of transaction?
– Rolled out if transaction fails
– Recovered and sent after system
recovery if committed
• Is this reasonable? Sent to who??
• Just discard?
• Need feedback to know delivery succeeded
• Just what does the operator see/do?
– Wait? Retry? Check success?
Advanced Distributed Software Architectures and Technology group
21
ADSaT
RPC Extras
• DCE, CORBA, COM, … are language
and platform independent
– Interfaces specified in IDL
– Marshalling translates between
languages and platforms
• Character sets, byte order, …
– Translate to and from ‘canonical’ form
– Or use ‘receiver makes it right’
• Send in client format
• Receiver translates only if necessary
Advanced Distributed Software Architectures and Technology group
22
ADSaT
IDL Example
• COM IDL fragment
– More detail in a later lecture!!
[object, uuid(6B29FC40-CA47-1067-B31D-00DD010662DA)]
interface IHop : IUnknown {
import “unknwn.idl”; // bring in definition of IUnknown
HRESULT
Walk([in] long How_far);
HRESULT
Hop([in] long How_far);
HRESULT
Bound([in] BSTR Over_what);
}
Advanced Distributed Software Architectures and Technology group
23
ADSaT
Nested Transactions
• Calling a transaction from anywhere
– Directly from a client
– From within a transaction
• Start a sub-transaction, linked into the
parent transaction
– All transactions committed together
• Sub-transaction commit does not really
commit and make changes durable. Changes
made visible to other sub-transactions.
Advanced Distributed Software Architectures and Technology group
24
ADSaT
Nested Transactions
• Not widely supported
• Alternative programming models
– Top-level transactional service code
calling on business logic
– MTS and EJB ‘requires transaction’
• Run in existing transaction if there is one
• Start new transaction otherwise
• More in MTS/COM+ and EJB lectures
Advanced Distributed Software Architectures and Technology group
25
ADSaT
Nested Transactions
Function transfer(src, dest, amt)
tx_start
withdraw(src, amt)
deposit(src, amt)
tx_commit
Function transfer(src, dest, amt)
tx_start
withdraw(src, amt)
deposit(src, amt)
tx_commit
Function withdraw(src, amt)
tx_start
……..
Tx_commit
Function withdraw(src, amt)
……..
Function deposit(dest, amt)
tx_start
……..
Tx_commit
Function deposit(dest, amt)
……..
Transactional Services
Nested Transactions
Advanced Distributed Software Architectures and Technology group
26
ADSaT
Isolation and Locking
• How do resource managers achieve
the illusion of ‘isolation’
– Application programmers can (largely)
pretend no other programs are running
concurrently
– Done using ‘locks’ and ‘lock managers’
– Application programmers still need to
be aware of possible problems
Advanced Distributed Software Architectures and Technology group
27
ADSaT
Serialisable
• Concurrent execution of concurrent
transactions has the same effect
as running them serially.
– One after another with no overlap
• Highest level SQL Isolation Level
• Implemented by locking resources
before they are used
Advanced Distributed Software Architectures and Technology group
28
ADSaT
Locks
• Lock data before using it
–
–
–
–
Set read lock before reading
Set write lock before writing
Wait if lock cannot be granted
Locks only granted if no conflicts
• Read locks conflict with write locks
• Write locks conflict with both read and
write locks
Advanced Distributed Software Architectures and Technology group
29
ADSaT
Locks
• Locks affect performance
– All computers wait at the same speed
– Can result in single-threading
• Concurrent transactions waiting for
access to the same resource
• Strongly influenced by application design
• Locks introduce new problems
– deadlocks
Advanced Distributed Software Architectures and Technology group
30
ADSaT
Two-phase Rule
• Correct locking avoids problems
– Locks have to be held until commit to
achieve isolation
• Locks are held for longer
• Performance is reduced
– Two phases
• Locking resources
• Unlocking (only at commit)
• Avoids cascading aborts
Advanced Distributed Software Architectures and Technology group
31
ADSaT
Lock Managers
• Code that manages locking
– Maintains a lock table
• Keeps track of all locks in the database
• Waiting requests and granted locks
– Lock operations are atomic
• Protected by low-level locks (mutex, spin)
Locks granted
Locks requested
x T1(read), T2(read)
T3(write)
y T2(write)
T4(read), T1(read)
z T1(read)
Advanced Distributed Software Architectures and Technology group
32
ADSaT
Lock Managers
• Distributed systems can have
interesting locking problems
– No lock analysis across databases?
• Distributed databases have
distributed lock managers
– Shared lock state
– Communication between LMs
Advanced Distributed Software Architectures and Technology group
33
ADSaT
Lock Types
• More than just read and write!
–
–
–
–
–
Shared (read) locks
Exclusive (write) locks
Update (read then write)
Intent locks (lock also held at finer level)
Key locks (lock ranges within keys)
Advanced Distributed Software Architectures and Technology group
34
ADSaT
Lock Granularity
• What is locked?
–
–
–
–
Whole database
Whole table?
Page of data?
Individual record?
• All of the above at times
– X lock on record
– IX locks on page and table
– S locks on database
Advanced Distributed Software Architectures and Technology group
35
ADSaT
Tables to Records
Table
Page
Page
Page
Record
Record
Record
Advanced Distributed Software Architectures and Technology group
36
ADSaT
Lock Granularity
• Level of locking a DB decision
– Fine grain locks give less contention
and better performance
– Fine grain locks using lots of locks and
are more expensive to manage
• Choose record lock when..
– Just locking a few records
• Otherwise get coarser locks
Advanced Distributed Software Architectures and Technology group
37
ADSaT
Lock Escalation
• DB can start with record locks and
move to page/table locks
– Finds that many locks are being held
for the page/table
– Escalate lock up a level
– Free lock resources
• Guess at proper locking level and
adjust as needed (up only?)
Advanced Distributed Software Architectures and Technology group
38
ADSaT
SQL Isolation Levels
• Uncommitted read (dirty read)
– Read all changes, no locks, no waits
– Fastest and sometimes useful
• Statistical scans of data
• Committed read (SQL default)
– Only read committed data
– Release read locks after use
– Repeating an SQL statement can give
different results each time
Advanced Distributed Software Architectures and Technology group
39
ADSaT
SQL Isolation Levels
• Repeatable read
– Same query always returns same data
• Can get phantoms – new records
– Keep shared locks until Commit
• Serializable (TP Isolation)
– Same query returns same data
• No phantoms!
• Lock data that does not exist
– Need to keep key locks as well
Advanced Distributed Software Architectures and Technology group
40
ADSaT
Locking Hints
• DB decides what locks to use
– Shared or exclusive lock?
– Locks can be converted normally
– Programmer can override with ‘hints’
• Programmer knows what will happen next
• Avoid deadlocks?
Select * from accounts (updlock) where acc_no = 123
Update accounts set balance= … where acc_no=123
Advanced Distributed Software Architectures and Technology group
41
ADSaT
Deadlocks
• Normally applications just wait for
locks to be granted
• Sometimes dependencies between
locks means they would wait forever
Granted
A
T1
T2
T1 Lock A
B
T2
T1
Lock B
Lock B T2
Lock A
Waiting
Advanced Distributed Software Architectures and Technology group
42
ADSaT
Deadlocks
• Db performs locking graph analysis
• Deadlock if loop found!
• Solution?
– Pick a process/transaction and return
a db error
– Application recovers or dies…
– Transaction abort and retry?
Advanced Distributed Software Architectures and Technology group
43
ADSaT
Deadlocks
• Deadlock avoidance is an application
coding problem – and a hard one
– Use ‘canonical locking orders’
• Define a standard locking order
• Invoice header before invoice details
– Nice idea in theory
– Can still get ‘conversion deadlocks’
Advanced Distributed Software Architectures and Technology group
44
ADSaT
Conversion Deadlocks
• Database uses shared locks rather
than exclusive locks for reading
– Can convert to exclusive later
– Deadlocks when DB cannot do convert
Granted
K1
T1(s)
T1(x)
Select next from keytable where type=1
T2(s)
T2(x)
Update keytable set next=next+1 where type=1
Waiting
Advanced Distributed Software Architectures and Technology group
45
ADSaT
Conversion Deadlocks
• A use for locking hints
– Tell DB to get exclusive lock earlier
Granted
K1
T1(x)
T2(x)
Select next from keytable (updlock) where type=1
Update keytable set next=next+1 where type=1
Waiting
Advanced Distributed Software Architectures and Technology group
46
ADSaT
Performance
• Blocking on waits undesirable
– Remove hot spots
• ‘next entry’ counters, summary
information, end of file counter
• Avoid altogether
• Cache high contention records
– Reduce ‘path length’
– Obtain locks as late as possible
Advanced Distributed Software Architectures and Technology group
47
ADSaT
Performance
C++ transaction rates
500
450
400
350
TPS
300
250
200
150
Local keytable
Local Identity
100
Remote identity 10M
50
Remote identity 100M
Remote keytable 100M
0
0
200
400
600
800
1000
1200
Client threads
Advanced Distributed Software Architectures and Technology group
48
ADSaT
Performance
C++ response times
remote db - identity & keytable
10000
Read ident
9000
Update ident
Average ident
Read key
8000
Update key
Average key
Response time (ms)
7000
6000
5000
4000
3000
2000
1000
0
0
200
400
600
800
1000
1200
Clients
Advanced Distributed Software Architectures and Technology group
49
ADSaT
Recovery
• Durability and redundancy
– Keep critical information on disk
– In-memory copies for performance
– Ensure disk writes complete before
continuing at critical times
– Keep multiple copies of disk data
• Protecting against …
– Memory loss when system fails
– Disk file loss with disk failure
Advanced Distributed Software Architectures and Technology group
50
ADSaT
Database Model
• Really two databases
– Database tables on disk +
in-memory changed pages/records
• For performance
– Logged changes on disk/tape +
database dump
• For durability
• The log really the durable database
– Can recreate the disk/memory form
Advanced Distributed Software Architectures and Technology group
51
ADSaT
Logging
• Write to log…
– Before images
• Changes, deletions
– After images
• Changes, insertions
– Data pages, index blocks, storage
allocation
• Need to wait for log flushes
– Can be major performance bottleneck
– Batch flushes by adding a short delay
Advanced Distributed Software Architectures and Technology group
52
ADSaT
Logging
• Write-log-ahead
– Never flush an uncommitted change to
the database.
• Changes can be flushed after they
have been committed
– Leave in memory until cache manager
needs the space…
Advanced Distributed Software Architectures and Technology group
53
ADSaT
Commit
• Changes are written to a log page
– Page write initiated when page full
• At commit time
– Flush all logged changes to disk
– Flush logged commit record to disk
• Changes are now in stable storage
– Database is recoverable
Advanced Distributed Software Architectures and Technology group
54
ADSaT
Recovery
• Recover from abort
– Apply before images if necessary to
pages in cache
• Recovery from system failure
– Apply after images to disk pages
• Recovery from media failure
– Restore from backup
– Apply after images to disk pages
Advanced Distributed Software Architectures and Technology group
55
ADSaT
Checkpoints
• How can we recover more quickly?
– How far back do we go in the log?
• When do we know that there are no more
log records that need to be applied?
– Problem comes from caching and lazy
database page writes
• Checkpoints force database pages
back out to disk now and then
– Stop recovery when checkpoint found
– Fuzzy checkpoints to improve CP cost
Advanced Distributed Software Architectures and Technology group
56
ADSaT
Checkpoints
Last
checkpoint
All updates in
stable database
Log
Classic checkpoint
All updates in
stable database
Log
2nd last
checkpoint
Last
checkpoint
Fuzzy checkpoint
Advanced Distributed Software Architectures and Technology group
57
ADSaT
Media failure?
• Duplicate the media (disks)
– RAID disks
– Mirror/shadow disks
– Avoid sharing anything
• Multiple disks with multiple controllers
• Remote sites for backup?
– Put logs on mirror/RAID at least
• Archive logs to tape or …
Advanced Distributed Software Architectures and Technology group
58
ADSaT
Performance
• Disk performance is the key
– Disks are slow to rotate (latency)
– Disk heads are slow to move (seek)
• One heavily used file per disk is best
– Allocate DB files and logs across disks
to balance out usage
– Number of disks can be more
important than storage capacity
Advanced Distributed Software Architectures and Technology group
59
ADSaT
Next week
• Security!
–
–
–
–
–
Access control
Authentication
Data privacy
Public key crypto
SSL/TLS
Advanced Distributed Software Architectures and Technology group
60