CSIS0402 System Architecture Distributed Computing

Download Report

Transcript CSIS0402 System Architecture Distributed Computing

CSIS0402 System Architecture
Distributed Computing
- Middleware Technology (1)
K.P. Chow
University of Hong Kong
Early Days
Mainframe terminal networks
 Connecting “dumb” green-screen
terminals to mainframe
 Terminal identified by line id
Distributed network
 Each machine can have many
connections (TCP/IP port no.)
 Each machine is identified by its
IP address
Programs
Terminal
handler
Terminals
Programs
Network
interface
Distributed Systems

First distributed system: ARPANET
– 1969: 4-node network
– 1972: about 50 computers

Communication between companies in the same industry:
– SWIFT (Society for Worldwide Interbank Financial Telecommunication)
for international money transfers in the financial industry
– IATA (International Air Transport Association) in the airline industry

Vendor specific network architecture (1970s)
–
–
–
–
–
IBM’s System Network Architecture (SNA)
Sperry’s Distributed Computing Architecture (DCA)
Burroughs’ Network Architecture (BNA)
DEC’s Distributed Network Architecture (DNA)
Basic services: file transfer, remote printing, terminal transfer, remote file
access
– Distributed applications: e-mail
Open Systems Interconnection (OSI)



Force IT vendors to implement the same standards, e.g. RS232 and X.25
Use International Organization for Standardization (ISO) as the standards
authority
Based on OSI (Open Systems Interconnection) series of standards
– OSI Basic Reference Model: seven-layer model
Protocol between each layer
Sender
Flow of
data
Application
Application
Presentation
Presentation
Session
Session
Transport
Network
Transport
Network
Data Link
Data Link
Physical
Physical
Receiver
Flow of
data
TCP/IP

Open Systems Standards
– Too slow
– Standards are too complex, e.g. OSI virtual terminal
standard

Unix
–
–
–
–

Open systems
Cheap and portable
Many versions: Berkeley version and AT&T version
Network support: TCP/IP
TCP/IP
Session
TCP
UDP
IP
Ethernet
– Internet Protocol (IP): network standard
– Transmission Control Protocol (TCP): session standard for
program-to-program communication over IP
– Provides application programming interfaces (APIs) for
sending messages/data over the network: Sockets
– Services: telnet, snmp, ftp
Transport
Network
ATM Data Link
Why Middleware?
When building distributed systems, need to build the middleware:
 Easy of use: compare to Sockets
 Location transparency:
– Application should not need to know the network and application address
– Possible to move an application from a machine to another machine with
different network address without recompilation




Message delivery integrity: messages should not be lost or duplicated
Message content integrity: messages should not be corrupted
Language transparency: a program using middleware should be able to
communicate with another program written in a different language
Support client/server model: server provides a service for the client
Traditional Middleware Models





Remote Procedure Call (RPC)
Remote data access
Transaction processing
Distributed transaction processing
Message queue
Remote Procedure Call (RPC)


Access a remote service through remote procedure calls: the syntax in
the client (the caller) and the server (the called) remains the same as if
they were on the same machine
Known RPC mechanism:
– Open Network Computing (ONC) from SUN
– Distributed Computing Environment (DCE) from Open Software
Foundation (OSF)
Price =
getPrice(Product);
Interface
Definition
Language (IDL)
Skeleton
Stub
Client
Server
Price =
getPrice(Product);
Compilation
Environment
Remote Procedure Calls (RPC)

Stub and skeletons
– Generated by IDL compiler
– Small chunks of C code that are compiled and linked into the client and
server programs



Stub: converts parameters into a string of bits and sends the message
over the network
Skeleton: takes the messages and converts it back to parameters, and
calls the server program
Marshalling: handles different data format, e.g. byte ordering
Call getPrice(string Product)
return float // price
Parameters
marshalled into a
string of bits
Client
010011010110011110001010
Server
getPrice(string Product)
return float // price
RPC and Multi-threading

Using RPC
– A client is blocked when it calls a remote procedure
– Client may be left waiting due to message lost, server too slow, or server
halted
– Server is also blocked when process a request

Multi-thread
– Allow client to read from the keyboard or the mouse while waiting for
remote procedure call returns
– Allow server to handle many clients simultaneously, e.g. 1000 threads to
handle 1000 clients
– May have synchronization problem during resources sharing: need to use
locks or semaphores, e.g. withdraw money from the same account from 2
different ATMs
– Difficult to test and debug multi-thread program
Remote Database Access



Provides the ability to read or write to a database that is physically on a
different machine from the client program
Vendor specific, e.g. Oracle’s Oracle Transparent gateway, IBM’s
DRDA
ODBC (Open Database Connectivity)
–
–
–
–
Remote database access provided by Microsoft
Standard programming interface (only) for Windows, not middleware
Vendors will have to provides the client and server middleware
ODBC client software that is written to communicate remotely to an
Oracle database will not communicate to an IBM database
Remote Database Access – Microsoft’s Approach



Move away from ODBC,
use OLE DB instead
OLE DB uses COMobjects, rather complex
ADO (Active Data
Objects): another standard
by Microsoft
– A simpler COMRemote DB access
compliant interface
– Uses OLE DB, and OLE
DB uses an ODBC data
provider
Host DB interface
Database
Application or Tool
ADO
OLE DB
ODBC
Remote DB Access – SQL Parsing

Parse step - turns the SQL commands
into a query plan:
Client
–
–
–
–

Define tables that will be accessed
List of indexes to be used
Filtered by which expression
Define the output
Execute query – parameters will be
provided
– Output can be any length, e.g. a
million rows, large overhead on
network



Optimization: can be achieved using
stored procedure
Good for ad-hoc queries
Not good for transaction processing
Server
SQL Text
Parse
Query
Query Output
Description
Execute command
plus parameters
Execute
Query
Output Data
Query
Plan
Transaction Processing

Transaction
– A transaction is a sequence of operations that are carried out together and
form a single unit
– Must be either end up completed (committed) or is completely undone
– E.g. withdraw money from a bank account:




write a record of the debit
update the account total
update the bank teller record
Early transaction processing system:
– Handled by transaction monitor, e.g. CICS on IBM MVS, TIP on Unisys
2200, COMS on Unisys A Series
– On a single machine (mainframe)


Distributed transaction processing: databases were on different
machines
How important is transaction?
– Business involves transaction, not customer, e.g.


Give the bank a check to settle the credit card bill (atomic action)
The bank has a complex business process to ensure the each payment is settled
(transaction)
Transaction Characteristics – ACID Properties


A for atomic: the transaction is never half done, if there is an error, it is
completely undone
C for consistent: the transaction changes the DB from one consistent
state to another consistent state, i.e.
– The DB data integrity constraints hold true
– The DB need not be consistent within the transaction
– Includes explicit data integrity (e.g. product codes must be between 8 to
10 digits long) and internal integrity constraints (e.g. all index entries must
point at valid records)


I for isolation: the data updates within a transaction are not visible to
other transaction until the transaction is completed
D for durable: when a transaction is done, it really is done and the
updates will not disappear at sometime in the future
Transactional Architecture




Local transactions: all operations are carried out on the same database
engine
Distributed transactions: involve multiple database servers, which are
usually deployed on different hosts, requires a transaction manager to
co-ordinate the processing and managed by a 2-phase commit process
Global transactions: a distributed transaction that encompasses not
only different servers, but different application, e.g. EJB application
that sends messages to a legacy system using JMS
Transaction demarcation:
– Transaction demarcation is defined as the specification of transaction
boundaries: indicating when a transaction begins and ends
– The execution of a program “crosses a transactional boundary” whenever
there is a change in transaction context
– A transaction context is the association of a transaction with an application
component
Classic Example

Transfer money from a savings account to a current account, one of the
two followings should occur:
– Money is withdrawn from savings account and deposited into the current
account
– Money is not taken out from the savings account and is not deposited into
the current account

Classic implementation:
begin_transaction()
withdraw_from_savings()
deposit_to_current()
commit_transaction()


With one DB, the DBMS can serve as the transaction coordinator
With multiple DBs, we need distributed transaction coordinator to
coordinate the processing using 2-phase commit
Transaction using Objects

Class SavingsAccount with the
following withdraw method that
can be reused:

Class CurrentAccount with the
following deposit method that
can be reused:
withdraw(amount)
{
begin_transaction();
…
withdraw_money;
…
end_transaction();
}
deposit(amount)
{
begin_transaction();
…
deposit_money;
…
end_transaction();
}
Transaction involves multiple objects

Implement the transfer operation:
transfer(savingsAccount, currentAccount, amount)
{
savingsAccount.withdraw(amount);
currentAccount.deposit(amount);
}

Problem:
– savingsAccount.withdraw(amount) and currentAccount.deposit(amount) are
in different transactions
Will the following work?


Remove the transaction boundaries from
withdraw and deposit
Add the transaction boundary to the transfer:
transfer(savingsAccount, currentAccount, amount)
{
begin_transaction();
savingsAccount.withdraw(amount);
currentAccount.deposit(amount);
end_transaction();
}

Problem: withdraw() and deposit() are no longer
transactionally protected
withdraw(amount)
{
…
withdraw_money;
…
}
deposit(amount)
{
…
deposit_money;
…
}
Automatic Transaction Boundary Management

No transaction boundary management code in
the business method:
transfer(savingsAccount, currentAccount, amount)
{
savingsAccount.withdraw(amount);
currentAccount.deposit(amount);
}


Transactions are defined by the run-time
environment using transaction attributes, e.g.
Required
Supported by modern transaction servers, e.g.
MS MTS, J2EE
withdraw(amount)
{
…
withdraw_money;
…
}
deposit(amount)
{
…
deposit_money;
…
}
Transaction Attributes
TransferSession
transfer(savingsAccount,currentAccount,amount)
RequiresNew
{
savingsAccount.withdraw(amount);
currentAccount.deposit(amount);
}
SavingsAccount
withdraw(amount) Required
{
…
withdraw_money;
…
}
CurrentAccount
deposit(amount) Required
{
…
deposit_money;
…
}
Transactional Context
Required
4.Invoke withdraw() SavingsAccount
1.Invoke transfer()
withdraw( )
RequiresNew
TranferSession
transfer( )
2.Register
TransferSession
7.Invoke deposit()
5.Register
SavingsAccount
CurrentAccount
deposit( )
8.Register
CurrentAccount
Transaction Manager
6.Add SavingAccount
3.Create a new
Transactional
Context and add
TransferSession
TransferSession
9.Add
CurrentAccount
SavingsAccount
CurrentAccount
Transaction Context X
Required
Transactional Context (cont)
Required
4.Invoke withdraw() SavingsAccount
1.Invoke transfer()
withdraw( )
RequiresNew
TranferSession
transfer( )
2.
5.
7.Invoke deposit()
8
CurrentAccount
deposit( )
Transaction Manager
3.
6.
TransferSession
SavingsAccount
Transaction Context X CurrentAccount
9.
10. Check TxCtx X
to make sure
that updates to
each will work
Required
11. Commit all updates or
roll back all updates
Transaction Isolation Level

Isolation: a transaction should be ignorance of the existence of any
other transactions
– If a second transaction attempts to read the data being modified by the
first transaction, that second transaction will not see any changes the first
transaction has made

E.g. if my wife read a joint saving account balance at the same time I
was transferring money from saving to current, the following events
may occur:
–
–
–
–
–

I see my saving account balance $1000
I initiate a transfer of $500 from saving to current
The system debits my saving account
My wife requests the balance for the saving account and sees $1000
The system credits my current account
Full transaction isolation: a transaction act on the DB as if it were the
only thread operating on the DB
– Unable to support high concurrency requirement in real systems
Isolation Levels


To achieve better concurrency
ANSI SQL identifies 4 distinct transaction isolation levels
– To balance performance needs with data integrity needs

Isolation conditions:
– Dirty read: a transaction A views the uncommitted changes of another
transaction B, if transaction B rolls back its change, transaction A is said
to have “dirty” data
– Nonrepeatable read: a transaction A reads different data from the same
query when it is issued multiple times and other transactions have changed
the rows between reads by transaction A. A transaction that mandates
repeatable reads will not see the committed changes made by other
transactions.
– Phantom read: a phantom reads deals with changes in other transactions
that would result in new rows matching the transaction’s WHERE clause.
E.g. if transaction A reads all accounts with balance < 100 and A performs
2 reads, a phantom read allows for new rows to appear in second read
based on changes made by other transactions
ANSI SQL Transaction Isolation Levels





Read uncommitted transactions: allows dirty reads, nonrepeatable
reads, and phantom reads
Read committed transactions: only data committed to the DB may be
read, can perform nonrepeatable and phantom reads
Repeatable read transactions: committed, repeatable reads as well as
phantom reads are allowed, nonrepeatable reads are not allowed
Serializable transactions: only committed, repeatable reads are
allowed, phantom reads are not allowed
The support of different isolation levels depends on the database
engine
Distributed Transaction Processing
Steps
Client
1. The client tells the middleware that
a transaction begins
Begin Transaction
Server A
2. The client calls server A
3. Server A updates the database
4. The client calls the server B
Update DB on A
5. Server B updates its database
Server B
6. The client tells the middleware that
the transaction ends


If the updates to 2nd DB failed (5),
updates to 1st DB are rolled back
To maintain ACID properties, all
locks acquired by DB cannot be
released until end of transaction (6)
Update DB on B
Commit
2 phase commit
X/Open




A consortium to establish the standards for distributed transaction
processing (now called Open Group)
Standard protocol between the middleware and the database is called
XA protocol
XA compliant: a DB that can co-operate with X/Open DTP
middleware in a two-phase commit protocol
X/Open standard for client/server contains 3 protocols:
– Protocol based on SNA LU6.2 (IBM): peer-to-peer protocol with no
marshalling
– Protocol based on DCE’s remote procedure calls (from Encina): parameter
marshalling and threads are blocked during a call
– XATMI protocol based on ATMI (from Tuxedo): uses message format
called View Buffers, supports both RPC-like calls and unblocked calls
Message Queuing



Program-to-queue communication
Message queue: like a very fast mail box, i.e. put a message in a box
without the recipient being active
Actions:
– Put: puts a message into the queue
– Get: takes a message out of the queue

Message queue software:
– Transfer of messages from queue to queue
– Ensures message arrives eventually and only one copy of the message is
placed in the destination queue
Get
Put
Put
Queue
Queue
Get
Message Queue Software







Queue have names
The queues are independent of program, i.e. many programs can perform
Puts and Gets on the same queue, and a program can access many queues
If the network goes down, messages will wait in the queue until the
network comes up
The queues can be put onto a disk so that the queue is not lost even the
system goes down
The queue can be a resource manager and co-operate with a transaction
manager, i.e. if the message is put in a queue during a transaction and the
transaction later aborted, then the DB will be rolled back and the
message is removed from the queue
Some message queue systems can cross networks of different types, e.g.
message goes through SNA leg, TCP/IP leg and then a Novell IPX leg
Efficient: used in applications require sub-second response time
Message Queue Software



Products: IBM MQSeries, Microsoft MSMQ, BEA Systems Tuxedo/Q
Disadvantage: no parameter marshalling, up to the sender and the
receiver to interpret the data
Peer-to-peer instead of client/server
MQ vs. DTP – DTP solution
Machine X
Example: transfer money from
account A to account B using
DTP
 Actions “Debit Account A” and
“Credit Account B” are done in
one distributed transaction
 Any failure aborts the whole
transaction
Disadvantages:
 Performance degrade due to
two-phase commit
 If either of the system is down
or network is down, the
transaction cannot take place
Machine Y

Start Transaction
Debit Account A
Initiate Partner
Transaction
Start Transaction
Credit Account B
2 phase commit
MQ vs. DTP – MQ solution
Machine X
Message queue solution to the
same problem
 Message is not allowed to reach
machine Y until the first
transaction has committed
(make sure 1st transaction is not
aborted)
Flaw (not atomic):
 If the destination transaction
fails, money is taken out of A
and disappears
Machine Y

Start Transaction
Debit Account A
Send Message
Start Transaction
Commit
Read Transaction
Credit Account B
Commit
MQ vs. DTP – MQ with Reversal
Machine X
Reversal transaction in case
transaction cannot be completed
on Machine Y
Flaw (not isolated):
 Fails if account A is deleted
before the reversal takes effect,
i.e. action “Credit Account A”
cannot be processed
Machine Y

Start Transaction
Debit Account A
Send Message
Start Transaction
Commit
Read Message
FAIL !!
Practical solution:
 Wait for a complaint and do
manual adjustment
Start Transaction
Read Message
Credit Account A
Commit
Send Message
What will happen to these technologies?
Technology
RPC
Replacement
Component middleware
Distributed transaction middleware Component middleware
MSMQ
Provides a component interface to
message queue and supports
sending objects
MQSeries
Will exist
Remote data access
Will exist to solve simple problems
that require quick solution