Introduction

Download Report

Transcript Introduction

Distributed Systems COEN 317
Introduction
Chapter 1,2,3
COEN 317
JoAnne Holliday
Email: [email protected] (best way to reach me)
Office: Engineering 247, (408) 551-1941
Office Hours: TW 3:00-4:30 and by appointment
Class web page: http://www.cse.scu.edu/~jholliday/
Textbook: Distributed Systems,
Principles and Paradigms
By Tanenbaum and van Steen
We will cover chapter 4-8 and parts of 9.
Read chapter 1. Review chapters 2 if needed for
networks and 3 as needed for threads and processes
Chapter 1: Introduction
Chapter 2: Communication, Networking
Chapter 3: Processes
Definition of a Distributed System (1)
A distributed system is:
A collection of independent
computers that appears to its
users as a single coherent
system.
Definition of a Distributed System (2)
1.1
A distributed system organized as middleware.
Note that the middleware layer extends over multiple machines.
Threads (chapter 3)
Message propagation times are long. Send a
message and let one thread wait for response
while another continues with task.
Distributed systems
“Distributed System” covers a wide range of
architectures from slightly more distributed
than a centralized system to a truly distributed
network of peers.
One Extreme: Centralized
Centralized: mainframe and dumb
terminals
All of the computation is done on the
mainframe. Each line or keystroke is sent
from the terminal to the mainframe.
Moving Towards Distribution
In a client-server system, the clients
are workstations or computers in
their own right and perform
computations and formatting of the
data.
However, the data and the application which
manipulates it ultimately resides on the server.
More Decentralization
In Distributed-with-Coordinator, the nodes or sites
depend on a coordinator node with extra knowledge
or processing abilities
Coordinator might be used
only in case of failures or
other problems
True Decentralization
A true Distributed system has no distinguished node
which acts as a coordinator and all nodes or sites are
equals.
The nodes may choose
to elect one of their
own to act as a
temporary coordinator
or leader
Distributed Systems: Pro and Con
Some things that were difficult in a centralized system
become easier
– Doing tasks faster by doing them in parallel
– Avoiding a single point of failure (all eggs in one basket)
– Geographical distribution
Some things become more difficult
– Transaction commit
– Snapshots, time and causality
– Agreement (consensus)
Advantages of the True Distributed System
• No central server or coordinator means it is
scalable
• SDDS, Scalable Distributed Data Structures,
attempt to move distributed systems from a
small number of nodes to thousands of nodes
• We need scalable algorithms to operate on
these networks/structures
– For example peer-to-peer networks
Transparency in a Distributed System
Transparency
Description
Access
Hide differences in data representation and how a
resource is accessed
Location
Hide where a resource is located
Migration
Hide that a resource may move to another location
Relocation
Hide that a resource may be moved to another
location while in use
Replication
Hide that copies of a resource exist and a user might
use different ones at different times
Concurrency
Hide that a resource may be shared by several
competitive users
Failure
Hide the failure and recovery of a resource
Persistence
Hide whether a (software) resource is in memory or
on disk
Important: location, migration (relocation), replication,
concurrency, failure.
Scalability
• Something is scalable if it “increases linearly with
size” where size is usually number of nodes or
distance.
• “X is scalable with the number of nodes”
• Every site (node) is directly connected to every other
site through a communication channel. Number of
channels is NOT scalable. For N sites there are N!
channels.
• Sites connected in a ring. # of channels IS scalable.
(N channels for N sites)
Scalability Problems
Concept
Example
Centralized services
A single server for all users
Centralized data
A single on-line telephone book
Centralized algorithms
Doing routing based on complete information
Examples of scalability limitations.
Scaling Techniques (1)
1.4
The difference between letting:
a) a server or
b) a client check forms as they are being filled
Scaling Techniques (2)
1.5
An example of dividing the DNS name space into zones.
Characteristics of Scalable
Distributed Algorithms
• No machine (node, site) has complete
information about the system state.
• Sites make decisions based only on local
information.
• Failure of one site does not ruin the algorithm.
• There is no implicit assumption that a global
clock exists.
Homogeneous and tightly coupled vs
heterogeneous and loosely coupled
We will study heterogeneous and
loosely coupled systems.
Multiprocessors (1)
1.7
A bus-based multiprocessor.
Multiprocessors (2)
1.8
a) A crossbar switch
b) An omega switching network
Homogeneous Multicomputer Systems
1-9
a) (a) Grid
b) (b) Hypercube: 2N nodes at degree N
Software Concepts
System
Description
Main Goal
DOS
Tightly-coupled operating system for multiprocessors and homogeneous
multicomputers
Hide and manage
hardware
resources
NOS
Loosely-coupled operating system for
heterogeneous multicomputers (LAN and
WAN)
Offer local
services to remote
clients
Middleware
Additional layer atop of NOS implementing
general-purpose services
Provide
distribution
transparency
• DOS (Distributed Operating Systems)
• NOS (Network Operating Systems)
• Middleware
Uniprocessor Operating Systems
1.11
Separating applications from operating system code through
a microkernel.
Distributed Operating Systems
1.14
May share memory or other resources.
Network Operating System
1-19
General structure of a network operating system.
Middleware based Distributed System
1-22
General structure of a distributed system as middleware.
Middleware and Openness
1.23
In an open middleware-based distributed system, the protocols
used by each middleware layer should be the same, as well as
the interfaces they offer to applications.
Comparison between Systems
Item
Distributed OS
Network
OS
Middlewarebased OS
Multiproc.
Multicomp.
Very High
High
Low
High
Yes
Yes
No
No
Number of copies of OS
1
N
N
N
Basis for communication
Shared
memory
Messages
Files
Model specific
Resource management
Global,
central
Global,
distributed
Per node
Per node
Scalability
No
Moderately
Yes
Varies
Openness
Closed
Closed
Open
Open
Degree of transparency
Same OS on all nodes
A comparison between multiprocessor operating systems,
multicomputer operating systems, network operating
systems, and middleware based distributed systems.
Modern Architectures
1-31
An example of horizontal distribution of a Web service.
Two meanings of synchronous and
asynchronous communications
• Synchronous communications is where a process blocks
after sending a message to wait for the answer or before
receiving.
• Sync and async have come to describe the
communications channels with which they are used.
• Synchronous: message transit time is short and
bounded. If site does not respond in x sec, site can be
declared dead. Simplifies algorithms!
• Asynchronous: message transit time is unbounded. If a
message is not received in a given time interval, it could
just be slow.
What makes Distributed Systems
Difficult?
• Asynchrony – even “synchronous” systems
have time lag.
• Limited local knowledge – algorithms can
consider only information acquired locally.
• Failures – parts of the distributed system can
fail independently leaving some nodes
operational and some not.
Example: Byzantine Agreement
Introduced as voting problem (Lamport, Shostak, Pease ’82)
A and B can defeat enemy iff both attack
A sends message to B: Attack at Noon!
General A
General B
The Enemy
Byzantine Agreement
Impossible with unreliable networks
Possible if some guarantees of reliability
– Guaranteed delivery within bounded time
– Limitations on corruption of messages
– Probabilistic guarantees (send multiple messages)