Distributed systems
Download
Report
Transcript Distributed systems
Advanced
Operating Systems
Lecture 9: Distributed Systems
(introduction)
University of Tehran
Dept. of EE and Computer Engineering
By:
Dr. Nasser Yazdani
Univ. of Tehran
Distributed Operating Systems
1
Covered topic
Distributed Systems, Why? And how.
References
Chapter 1 of the text book
Univ. of Tehran
Distributed Operating Systems
2
Outline
Why Distributed systems
Challenges.
Communication
Distributed Operating systems
Architectural models
Univ. of Tehran
Distributed Operating Systems
3
Distributed System?
(examples)
The Internet
A Sensor Network
Gnutella peer to peer system
Food Web of
Little Rock Lake, WI
Problems?
Bigger Problems like weather forecast,
Economic modeling, Scientific problems, etc.
Faster machines? It is getting harder to extract the
performance modern applications require out of a single
processor machine
Some application are inherently distributed,
sensor networks, etc.
A lot of Data to store in one place
More efficient use of resources, sharing
resources
Solution: Distributed computing
Univ. of Tehran
Distributed Operating Systems
5
Distributed systems
Definitions
A collection of autonomous computers linked by a network,
with software designed to produce an integrated computing
facility
A distributed system is a collection of independent
computers that appear to users as a single computer
A system in which hardware or software components located
at networked computers communicate and coordinate their
actions only by passing messages
Examples
World Wide Web
Automatic Teller Machines
Cell Phones
Univ. of Tehran
Distributed Operating Systems
6
A working definition
A distributed system is a collection of entities,
each of which is autonomous, programmable,
asynchronous and failure-prone, and which
communicate through an unreliable
communication medium.
Our interest in distributed systems involves
algorithmics, design and implementation, maintenance,
study
Advantages
Item
Economics
Speed
Description
Microprocessors offer a better
price/performance than mainframes
A distributed system may have more
total computing power than a mainframe
Inherent
Distribution
Reliability
Some applications involve spatially
separated machines.
If one machine crashes, the system as a
whole can still survive
Incremental
Growth
Computing power can be added in small
increments
Univ. of Tehran
Distributed Operating Systems
8
Disadvantages
Item
Description
Software
Little software, OSs, etc., exist at
present for distributed systems
The network can saturate or cause
other problems.
Easy access also applies to secret
data, privacy!
Networking
Security
Univ. of Tehran
Distributed Operating Systems
9
A range of challenges
Failures (of nodes or Network)
Asynchrony
Scalability
Security
Consequences
Concurrency
No Global Clock
Concurrency is the norm instead of the exception
Synchronization is critical
There is a limit as to how accurate a global clock can be.
Contrary to parallel systems.
Independent Failures
The more stuff you add the more likely something will
break
Single system view says independent failures should not
affect users
Univ. of Tehran
Distributed Operating Systems
11
Communication Issues
Building a system out of interconnected
computers requires that some major
issues be addressed
Independent failure
Unreliable communication
Insecure Communication
Costly Communication
Univ. of Tehran
Distributed Operating Systems
12
Distributed Operating
Systems
A distributed operating system supports the
encapsulation and protection of resources inside
servers; and it supports mechanisms required to
access these resources, including naming,
communication and scheduling
The software for multiple CPU systems can be
divided into three rough classes
Network operating systems (file servers)
Distributed Operating Systems
Shared Memory Multiprocessors
Univ. of Tehran
Distributed Operating Systems
13
Parallel Computing
A large collection of processing elements
that can communicate and cooperate to
solve large problems quickly
A form of information processing which
uses concurrent events during execution
In other words, both the language and the
hardware support concurrency
Univ. of Tehran
Distributed Operating Systems
14
Parallel Architectures
Unlike traditional von Neumann
machines, there is no single standard
architecture used on parallel machines
In fact dozens of different parallel
architectures have been built and are being
used
Several people have tried to classify the
different types of parallel machines
The taxonomy proposed by Flynn is the most
commonly used
Univ. of Tehran
Distributed Operating Systems
15
Ex. Building a mail server
mail arrives from outside world
store it until...
user reads/deletes/saves it
Solution:
One server w/ disk to store mail-boxes
Problems:
Performance: Stable performance under high load
consistent w.r.t. client-side copies
concurrent mail arrival, deletion
crash recovery (crash while updating mail-box)
availability
Univ. of Tehran
Distributed Operating Systems
16
Other problems?
Not necessarily plenty of bandwidth
Not necessarily low latency
Significant variance in latency and bandwidth
Frequent and unpredictable partial failure of channel
Lost messages, &c
What else has changed?
We don't have hardware support for synchronization/atomicity
among hosts
We don't have a global timer or clock
Frequent and unpredictable failure of some CPUs, I/O devices,
&c
Snoopy caches are not practical, because broadcasting is too
Univ. of Tehran
Distributed Operating Systems
17
expensive.
Challenges
There are a number of challenges found in
building distributed systems
Heterogeneity
Openness
Security
Scalability
Failure Handling
Concurrency
Transparency
Univ. of Tehran
Distributed Operating Systems
18
Heterogeneity
Applies to
Networks
Computer Hardware
Operating Systems
Programming Languages
Implementations
Middleware applies to a software layer that
helps to handle heterogeneity
Univ. of Tehran
Distributed Operating Systems
19
Openness
The characteristic that a system can be
extended in various ways
Hardware extensions
Software extensions
Historically, computer systems were largely
closed
UNIX broke the mold for OS
IBM PC broke the mold for hardware
Univ. of Tehran
Distributed Operating Systems
20
Security
Security is a huge issue in computing in
general, but even more so in distributed
computing
Communication
Distributed Resources
Infrastructure Attacks
Univ. of Tehran
Distributed Operating Systems
21
Scalability
Distributed systems operate at many
different scales
Two workstations and a file server
Department computers…
Often the more important question is not can
you scale, but can you scale well
Consider the Internet
Univ. of Tehran
Distributed Operating Systems
22
Failure Handling
What happens when a fault occurs?
Detect
Mask
Tolerate
Fault tolerant design is based on two
approaches
Hardware redundancy
Software recovery
Univ. of Tehran
Distributed Operating Systems
23
Hardware Redundancy
Two computers are employed for a single
application, one acting as a standby
Very costly, but often very effective, solution
Redundancy can be planned at a finer
grain
Individual servers can be replicated
Redundant hardware can be used for noncritical activities when no faults are present
Univ. of Tehran
Distributed Operating Systems
24
Software Redundancy
Software must be designed so that the
state of permanent data can be recovered
or “rolled back” when a fault is detected
Transaction processing
Univ. of Tehran
Distributed Operating Systems
25
Concurrency
Concurrency in a distributed system does
not necessarily mean concurrency within a
single program
Many users invoke similar commands
Many different server processes may be running
Synchronization, of course, is a problem
Univ. of Tehran
Distributed Operating Systems
26
Transparency
Transparency Description
Access
Hide differences in data representation and how a
resource is accessed
Location
Hide where a resource is located
Migration
Hide that a resource may move to another location
Relocation
Hide that a resource may be moved to another
location while in use
Replication
Hide that a resource may have several copies.
Concurrency
Hide that a resource may be shared by several
competitive users
Failure
Hide the failure and recovery of a resource
Persistence
Hide whether a (software) resource is in memory
or on disk
Univ. of Tehran
Distributed Operating Systems
27
Scalability Problems
Concept
Example
Centralized services
A single server for all users
Centralized data
A single on-line telephone book
Centralized algorithms
Doing routing based on complete information
Examples of scalability limitations.
Univ. of Tehran
Distributed Operating Systems
28
Scaling Techniques (1)
1.4
The difference between letting:
• a server or
• a client check forms as they are being filled
Univ. of Tehran
Distributed Operating Systems
29
Scaling Techniques (2)
1.5
An example of dividing the DNS name space into zones.
Univ. of Tehran
Distributed Operating Systems
30
Hardware Models
1.6
Different basic organizations and memories in distributed
computer systems
Univ. of Tehran
Distributed Operating Systems
31
Multiprocessors (1)
A bus-based multiprocessor.
1.7
Univ. of Tehran
Distributed Operating Systems
32
Multiprocessors (2)
a)
b)
A crossbar switch
An omega switching network
1.8
Univ. of Tehran
Distributed Operating Systems
33
Homogeneous
Multicomputer Systems
a)
b)
Grid
Hypercube
1-9
Univ. of Tehran
Distributed Operating Systems
34
Software Models
System
Description
Main Goal
DOS
Tightly-coupled operating system for multiprocessors and homogeneous multicomputers
Hide and manage
hardware resources
NOS
Loosely-coupled operating system for
heterogeneous multicomputers (LAN and
WAN)
Offer local services
to remote clients
Middleware
Additional layer atop of NOS implementing
general-purpose services
Provide distribution
transparency
An overview between
DOS (Distributed Operating Systems)
NOS (Network Operating Systems)
Middleware
Univ. of Tehran
Distributed Operating Systems
35
Uniprocessor Operating
Systems
Separating applications from operating system
code through a microkernel.
1.11
Univ. of Tehran
Distributed Operating Systems
36
Multicomputer Operating
Systems (1)
1.14
Univ. of Tehran
Distributed Operating Systems
37
Multicomputer Operating
Systems (2)
Alternatives for blocking and buffering in message
passing.
1.15
Univ. of Tehran
Distributed Operating Systems
38
Distributed Shared Memory
Systems (1)
a)
b)
c)
Pages of address
space distributed
among four
machines
Situation after CPU
1 references page
10
Situation if page
10 is read only
and replication is
used
Univ. of Tehran
Distributed Operating Systems
39
Distributed Shared Memory
Systems (2)
False sharing of a page between two independent
processes.
1.18
Univ. of Tehran
Distributed Operating Systems
40
Network Operating
System (1)
General structure of a network operating system.
1-19
Univ. of Tehran
Distributed Operating Systems
41
Positioning Middleware
General structure of a distributed system as middleware.
1-22
Univ. of Tehran
Distributed Operating Systems
42
Software Layers
Applications, services
Middleware
Operating system
Platform
Computer and network hardware
Univ. of Tehran
Distributed Operating Systems
43
Middleware
What does it do?
Provides an API for the application
Hides the underlying heterogeneity
Examples
Sun RPC, ISIS
CORBA
RMI
DCOM
Univ. of Tehran
Distributed Operating Systems
44
Middleware and Openness
1.23
In an open middleware-based distributed system, the
protocols used by each middleware layer should be the
same, as well as the interfaces they offer to applications.
Univ. of Tehran
Distributed Operating Systems
45
Comparison between
Systems
Distributed OS
Multiproc.
Multicomp.
Network Middleware
-based OS
OS
Degree of
transparency
Very High
High
Low
High
Same OS on all nodes
Yes
Yes
No
No
Number of copies of
OS
1
N
N
N
Basis for
communication
Shared
memory
Messages
Files
Model
specific
Resource
management
Global,
central
Global,
distributed
Per node
Per node
Scalability
No
Moderately
Yes
Varies
Openness
Closed
Closed
Open
Open
Item
Univ. of Tehran
Distributed Operating Systems
46
Next Lecture
DS Architecture
References
Chapter 2 of the book
The Anatomy of the Grid
Chord: A Scalable Peer to peer Lookup Service
for Internet Applications.
Univ. of Tehran
Distributed Operating Systems
47