Distributed System Concepts and Architectures

Download Report

Transcript Distributed System Concepts and Architectures

Distributed System Concepts and
Architectures
By
Master Prince
1
Outline
•
•
•
•
•
•
•
•
Advantages and disadvantages of distributed OS
Goals
Transparency
Services
Architecture Models
Communication Network Protocols
Major Design Issues
Distributed Computing Environment (DCE)
2
Distributed OS
• An integration of system services, presenting a transparent
view of a multiple computer system with distributed
resources and control
• A collection of independent computers that appear to the
users of the system as a single computer
• Examples
– Personal workstations + a pool of processors + single file system
– Robots on the assembly line + Robots in the parts department
– A large bank with hundreds of branch offices all over the world
3
Advantages of Distributed Systems
Over Centralized Systems
• Economics – microprocessors offer a better
price/performance than mainframes
• Speed – a distributed system may have more total
computing power than a mainframe
• Inherent distribution – some applications involve spatially
separated machines
• Reliability – if one machine crashes, the system as whole
can still survive
• Incremental growth – computing power can be added in
small increments
4
Advantages of Distributed Systems
Over Isolated Computers
• Data sharing – allow many users access to a common data
base
• Device sharing – allow many users to share expensive
peripherals like color printers
• Communication – make human-to-human communication
easier, for example, by E-mail
• Flexibility – spread the workload over the available
machines in the most cost effective way
5
Disadvantages of Distributed
Systems
• Software – complex software
• Networking – the network can saturate or cause other
problems
• Security – easy access also applies to secret data
6
Goals (I)
• Provide a high-performance and robust computing
environment with least awareness of the management and
control of distributed system resources
• Efficiency - difficult due to communication delays
– Propagation delay – nothing can be done
– Protocol overhead
• Effective communication primitives, good protocols
– Load distribution – bottleneck or congestions in Network/SW
• Balance and overlap computation and communication
• Distributed processing and load sharing
7
Goals (II)
• Flexibility
– User view: friendly system and freedom in using the system
• Friendliness: user interface, consistency, reliability  use OO
• Freedom:
– No unreasonable restrictions in using systems
– Easy to build additional tools or services
– System view
• Ability to evolve and migrate
• Modularity, scalability, portability and interoperability
– Difficult to achieve…
• Heterogeneous HW/SW components
8
Goals (III)
• Consistency - Lack of global information, replication and
partitioning of data, component failures, complexity of
interaction among components
– User needs: uniformity in using the system and predictable system
behavior
– System needs: proper concurrency control mechanisms and failure
handling and recovery procedure
• Robustness - problem with failures in communication links,
processing nodes and client/server processes
– System must reinitialize itself to a state where integrity preserved
and only small loss in performance
– Handle exceptions and errors, changes to topology, long message
delays, inability to locate server
– Security: reliability, protection, and access control
9
Transparency
• Transparency
– Hide all irrelevant system-dependent details from users
– Create an illusion of the model users are supposed to see
– Trade-off between simplicity and effectiveness
• Objective
– Provide a logical view of a physical system and at the same time
reduce the effect and awareness of the physical system to a
minimum
10
Type of Transparency (I)
• Access: access local and remote system objects in same
way
– Phone (local) VS. letter (remote)
• Location (name): No awareness of object location - use
logical names
– Area code for other cities
• Migration: object can be moved to different locations without
changing names
– Local numbers are changed if one moves to other cities
– Need universal name (symbolic or numerical)
• Concurrency: sharing of objects without interference
11
Type of Transparency (II)
• Relocation: a resource may be moved to another location when in use
• Replication: consistency of multiple instances of files and data
• Parallelism: permit parallel activities without users knowing how, where,
and when these activities are carried out by the system
• Failure: fault tolerance, graceful performance degradation, minimum
damages to the user
• Performance: consistent and predictable performance level even if
changes in structure or load distribution
• Size: modularity and scalability Incremental growth in HW without user
awareness
• Persistence: (software) resource may be in memory or on disk
• Revision: SW revisions not visible (vertical growth)
12
Categorization of Transparency
Based on System Goals
• Efficiency
• Consistency
– Concurrency
– Parallelism
– Performance
–
–
–
–
• Flexibility
–
–
–
–
–
–
Access
Replication
Performance
Persistence
• Robustness
Access
Location
Relocation
Migration
Size
Revision
–
–
–
–
13
Failure
Replication
Size
Revision
Distributed System Issues
and Transparencies
Major Issues
Transparencies
Communication
Synchronization
Distributed algorithms
Interaction and control transparency
Process scheduling
Deadlock handling
Load balancing
Performance transparency
Resource scheduling
File sharing
Concurrency control
Resource transparency
Failure handling
Configuration
Redundancy
Failure transparency
14
Services (I)
• Primitive services - most fundamental, in kernel
– Must implemented in the kernel of each node in the system
– Communication – message passing (send/receive primitives)
• Synchronous or asynchronous
– Inter-node, inter-process Synchronization – synchronous
communication
• Synchronous semantics of communication or synchronization
server
– Processor multiplexing -- Process server (for transparency reason)
• Creation, deletion, tracking for memory and processing time
15
Service (II)
• Services by System Servers – fundamental, not need in kernel
– Provide fundamental services for managing processes, files, and process
communication
– Can be implemented anywhere in the system, and still perform functions
basic to the operation of a distributed system
– Mapping logical names to physical addresses
• Name server: locate processes, users, machines
• Directory server: locate files, communication ports
– Translate addresses and locations into communication paths: network server
– Broadcast messages: broadcast or multicast servers
– Clocks for synchronization - impossible to agree on global clock information
• Time server: physical clocks and logical clocks (for event ordering)
– File servers, print servers, migration server, authentication server
16
Service (III)
• Value-added Services - not essential in implementation of
system but useful, higher-level or special purpose services
(such as user applications)
– Increase computational performance, enhance fault tolerance,
cooperative activities
– Example is Web server
– Groups of interacting processes
• Group server: membership (add/remove), admission policies,
privileges
– Distributed conferencing server and concurrent editing server
17
System Architecture Models
• System Architectures
– Workstation-server model
• Client workstations
– Local processing capability and interface to the network
• Server workstations
– Dedicated for special services
– Processor pool model - collect all processing power in one place,
users use terminals only
• Terminal: remote booting, remote file mounting, virtual terminal
handling, packet assembling and disassembling (PAD)
• File and processor allocation done by system
– Integrated hybrid model
18
Workstation-Server Model
File Server
Printer Server
19
Processor-Pool Model
20
Communication Network
Architecture Models
• HW interconnection + inter-node inter-process communication protocols
• Hardware interconnection
– Point-to-point links – direct connections between pairs of nodes
– Multipoint links – allow connection of nodes into clusters
• Common bus – time shared
– IEEE 802 LAN Standard – Ethernet, Token Bus/Ring, FDDI…
• Switch – space/time multiplexing at higher HW cost/complexity
– Private switches for multiprocessor systems – cross-bar…
– Public switches – ISDN, SMDS, ATM
• LAN, MAN, WAN
• Ratio of propagation delay to transmission delay
– LAN: small. Close components, more suitable for distributed processing
– MAN/WAN: large. More communication oriented
21
WAN, MAN, LAN
Point-to-Point
Point-to-Point
22
Communication Network
Protocols
• Communication Protocol: set of rules that regulate the
exchange of messages to provide a reliable and orderly flow
of information among communicating processes
• Connection-oriented communication service – Phone
– Need explicit set up of a connection channel before communication
– Messages are delivered reliably and in sequence
– Virtual circuit (logical) or circuit switching (physical)
• Connectionless communication service – postal service
– No initial connection establishment is necessary
– Messages are delivered on a best-effort basis in timing and route
and may arrive in arbitrary order
– Datagram (logical) or packet switching (physical)
23
OSI Protocol Suite
• Seven-layer protocol suite
• OSI focuses on interconnecting computers
• A process communicates with a remote process by passing
data through the seven layers, then the physical network,
and finally through the remote layers in reverse order
– Segmenting/reassembling
– Transparency between layers – encapsulation
• Add header for protocol data unit (PDU) from upper layer
• The remote corresponding layer strip off the header
• A gateway or intermediate node only stores and forwards
messages at the three lower network dependent layers
24
OSI Protocol Suite (Cont.)
Application
Peer-to-Peer Protocols
Application
Presentation
Presentation
Session
Session
Transport
Transport
Intermediate Node
Network
Network Network
Network
Data Link
Data Link Data Link
Data Link
Physical
Physical Physical
Physical
Communication Link
Communication Link
25
OSI Protocol Suite (Cont.) -Physical Layer
• Specify the electrical and mechanical characteristics of the physical
communication link – standardize
– Coding method, modulation technique, wire/connector specification
– Sharing of common bus needs interface standards for the medium access
control in the data link layer
• Reliable mapping of signals to bits – need bit synchronization
• Bit synchronization
– Detection of the beginning of a bit and a sequence of bits
– Bit synchronous: large blocks of bits transmitted at a regular rate
• Offer higher data transfer speed and better link utilization
– Character asynchronous: small fixed-size bit sequences transmitted
asynchronously
• Low-speed character-oriented terminals
26
OSI Protocol Suite (Cont.) -Data Link Control (DLC) Layer
• Ensure reliable data transfer of groups of bits (frames)
• Configuration setup
– Establishment and termination of a connection
– Full- or half-duplex, synchronous or asynchronous connection?
• Error controls
– Transmission errors and loss or replication of data frames
– Detected by checksum or time-out mechanisms
– Recovered by retransmissions or forward error corrections
• Sequencing
– Maintain an orderly delivery of frames by sequence numbers
– Sequence number can assist error control and flow control of data frames
• Flow control of data frames
– Permit the transmission of a frame only if it falls into an allowed windows of
buffers for the send and the receiver
• Multipoint configuration: DLC sublayer – MAC sublayer – Physical layer
– Resolve the access contention of the multiple access channel
27
OSI Protocol Suite (Cont.) –
Network Layer
• Address issues of sending packets across the network
through several link segments
• Routing function
– Which link should be selected for forwarding a packet, based on its
destination address
– Static or dynamic routing; centralized or distributed
– Routing decision can be made at the time when a connection is
requested and is being established (connection-oriented); or packetby-packet basis (connectionless, multiple path routing)
• Error, sequencing, and flow control function
– Reassemble packets and discard duplicate ones
– Congestion control for favorable routing nodes
28
OSI Protocol Suite (Cont.) –
Transport Layer
• The most important layer from the OS view
– The only interface between the communication sub-network layers
and network-independent layers
• Provide a reliable end-to-end communication between peers
processes
– All network-dependent faults or problems are to be shielded from the
communicating processes
– Message packets (breaking/reassembling)
– Multiple sessions can be multiplexed on one transport connection
– One session may occupy multiple transport connection
– Five classes (TP0 to TP4) of transport services to support sessions
• Depend on application and network quality
• TP4: multiplexing, error detection, and retransmission
29
OSI Protocol Suite (Cont.) – Session,
Presentation, Application Layers
• Session layer: add additional dialog and synchronization
services to transport layer
– Dialog: establishment of sessions
– Synchronization: allow processes to insert checkpoints for efficient
recovery from system crashes
• Presentation layer: data encryption, compression, and code
conversion for messages that use different coding schemes
• Application layer: standard is completely left to the designer
of the application
30
TCP/IP Protocol Suite
• Address inter-process and inter-node communication
– How is communication between a pair of processes maintained?
• Transport Layer  TCP (TP4 in OSI)
• Connection-oriented (TCP) or Connectionless (UDP)
– How are messages routed through the network nodes?
• Network Layer  IP (a little more than the OSI network Layer)
• Virtual circuit or datagram
• TCPI/IP focuses on interconnecting networks
• (TCP, UDP) * (Virtual Circuit, Datagram IP)
– Shift burden of maintaining reliable communication from network to OS
• Port and Socket (more in Chapter 4)
– Port: inter-process communication endpoints
– Socket: interface to port
31
TCP/IP Protocol Suite (Cont.)
Application
processes
Peer to Peer Protocols
Application
processes
message
Transport
layer
packet
Internet
layer
datagram
Data link and
physical Layer
Transport
layer
Gateway
Internet
layer
Internet
layer
Data link and
physical Layer
Data link and
physical Layer
Frame in bits
32
Major Design Issues
• A distributed system consists of concurrent processes
accessing distributed resources (which may be shared or
replicated) through message passing in a network
environment that may be unreliable and contain un-trusted
components
–
–
–
–
–
How to model and identify objects
How to coordinate the interaction among objects
How to achieve objects communication
How to manage shared or replicated objects
How to protect objects and system security
• How to support transparency
33
Major Design Issues – Object
Models and Naming Schemes
• Objects: processes, data files, memory, devices,
processors, networks
• Assume all objects can be represented uniformly
– An object is represented abstractly by the allowable operations
– The physical details of the object are transparent to other objects
• To identify a server:
– By name - map name to logical address
– Physical or logical address - done by network service, port for logical
– By service - needed by CAS
34
Major Design Issues –
Distributed Coordination
• Coordinate interacting concurrent processes to achieve synchronization
• Requirements
– Barrier synchronization – a set of processes (or events) must reach a
common synchronization point before they can continue
– Condition coordination – a set of processes (or events) must wait for an
asynchronously condition set by other processes to maintain some ordering
of execution
– Mutual exclusion - concurrent processes must have mutual exclusion when
accessing a critical shared resource
• Need knowledge of state information about other processes
– Through messages  inaccurate or incomplete (unreliable network)
– Centralized coordinator (leader election) or distributed resolution
• Deadlock handling – detect and recover
• Assimilate partial global state information and use it for decision making
– Exchange local knowledge among cooperating sites
35
Major Design Issues (Cont.)
• IPC - Use high-level methods for transparency in communication
– Message passing – low level and physical
– Client/Server Model - system interactions through message exchanges:
request/reply
– RPC - request/reply like procedure call, built on top of client/server model
• RPC assumes point to point, but need groups (multicast, broadcast)
• Distributed Resources - data processing capacity
– Multiprocessor scheduling - static load distribution vs. dynamic load sharing
• Process migration, real-time scheduling
– Distributed file system and distributed shared memory
• Sharing and replication of data
36
Major Design Issues (Cont.)
• Fault tolerance and security
– Failure - unintentional intrusion - redundancy alleviates it
– Security violation - intentional intrusion - need secure communication
processes, integrity of messages
– Need to authenticate clients/severs, messages
37
Distributed Computing
Environment (DCE)
• Proposed by Open Software Foundation (OSF)
– Develop and standardize an open Unix environment that is free from
the influence of AT&T and Sun
• DEC: an integrated package of software and tools for
developing distributed applications on an existing OS
• Hierarchically layered architecture
38
DCE Architecture
39