slajdy - Poznańskie Centrum Superkomputerowo

Download Report

Transcript slajdy - Poznańskie Centrum Superkomputerowo

e-mail [email protected]
http://www.man.poznan.pl/
1
POZNAŃ SUPERCOMPUTING AND NETWORKING CENTER
Homogeniczne i heterogeniczne środowiska
• Środowisko homogeniczne:
• jednorodne
• elementy składowe charakteryzują się tymi samymi
wartościami, cechami
• skalowalne
• Środowisko heterogeniczne:
• Różne systemy
operacyjne
• różnorodność elementów składowych
• Różne architektury
• zróżnicowany zbiór parametrów, cech
• skalowalne
• trudne w zarządzaniu
• Różni producenci
2
POZNAŃ SUPERCOMPUTING AND NETWORKING CENTER
Zasoby
• procesor (cpu, rodzaj)
• częstotliwość (zróżnicowane płyty CPU),
• typ, np. skalarny, wektorowy , graficzny
• RAM (typ, wielkość)
• we/wy
• interfejsy sieciowe,
• dyski,
• ‘graphics engines’
• pamięć masowa
• pojedyncze systemy (węzły w sieci)
• specjalizowane systemy (obliczeniowe, graficzne, archiwizacji, etc.)
3
POZNAŃ SUPERCOMPUTING AND NETWORKING CENTER
Zapotrzebowanie na zasoby 1/2
BIG Compute Problems
Compute
•Computing
•Visualization
•Data Handling
BIG Visualization Problems
•Computing
•Visualization
•Data Handling
Data
Visualize
BIG Data Problems
•Computing
•Visualization
•Data Handling
4
POZNAŃ SUPERCOMPUTING AND NETWORKING CENTER
Weather simulation
CPU
Zapotrzebowanie na zasoby 2/2
Traditional big supercomputer
Repository / archive
Signal processing
Web serving
I/O
Storage
Media streaming
Scale in Any and All Dimensions
5
6
Typy klastrów
- zapewniające niezawodność (ang. high-availability cluster), których zadanie
polega na zapewnieniu ciągłej pracy systemu i przerzucenie obciążenia na
zapasowe węzły w przypadku awarii (np. serwery WWW, e-commerce)
- obliczeniowe (ang. capability cluster), których zadaniem jest przetwarzanie
równoległe aplikacji dla celów naukowych, inżynierskich czy
projektowych. Wymagane jest zapewnienie wydajnych mechanizmów
komunikacji między węzłami, co umożliwi wykorzystanie wysokiego
stopnia równoległości (fain grain granularity). Klastry obliczeniowe
przeważnie są dedykowane dla określonej aplikacji, a programy są
wykonywane sekwencyjnie i nie współzawodniczą między sobą w dostępie
do zasobów.
- skalowalne (ang. scalability cluster), których zadaniem jest poprawienie
efektywności wykonywania programów poprzez odpowiednie przydzielanie
węzłów do aplikacji. Wymagane jest oprogramowanie zarządzające
zapewniające uruchamianie zadań, load balancing, analizę obciążenia i
zarządzanie zadaniami. Ewentualne zadania rozproszone mogą
wykorzystywać równoległość na poziomie procedur i modułów.
7
Single system image
Single Point of Entry: A user can connect to the cluster as a single system (like telnet
beowulf.myinstitute.edu), instead of connecting to individual nodes as in the case of
distributed systems (like telnet node1.beowulf.myinstitute.edu).
Single File Hierarchy (SFH): On entering into the system, the user sees a file system as a
single hierarchy of files and directories under the same root directory. Examples: xFS
and Solaris MC Proxy.
Single Point of Management and Control: The entire cluster can be monitored or
controlled from a single window using a single GUI tool, much like an NT workstation
managed by the Task Manager tool or PARMON monitoring the cluster resources
Single Virtual Networking: This means that any node can access any network connection
throughout the cluster domain even if the network is not physically connected to all
nodes in the cluster.
Single Memory Space: This illusion of shared memory over memories associated with
nodes of the cluster.
Single Job Management System: A user can submit a job from any node using a
transparent job submission mechanism. Jobs can be scheduled to run in either batch,
interactive, or parallel modes (discussed later). Example systems include LSF and
CODINE.
Single User Interface: The user should be able to use the cluster through a single GUI. The
interface must have the same look and feel of an interface that is available for
8
workstations (e.g., Solaris OpenWin or Windows NT GUI).
Single system image
Availability Support Functions
Single I/O Space (SIOS): This allows any node to perform I/O operation on local
or remotely located peripheral or disk devices. In this SIOS design, disks
associated with cluster nodes, RAIDs, and peripheral devices form a single
address space.
Single Process Space: Processes have a unique cluster-wide process id. A process
on any node can create child processes on the same or different node (through
a UNIX fork) or communicate with any other process (through signals and
pipes) on a remote node. This cluster should support globalized process
management and allow the management and control of processes as if they are
running on local machines.
Checkpointing and Process Migration: Checkpointing mechanisms allow a
process state and intermediate computing results to be saved periodically.
When a node fails, processes on the failed node can be restarted on another
working
9
Stopień złożoności
C-brick
CPU Module
R-brick
Router Interconnect
I-brick
Base I/O Module
P-brick
PCI Expansion
X-brick
XIO Expansion
D-brick
Disk Storage
G-brick
Graphics Expansion
10
POZNAŃ SUPERCOMPUTING AND NETWORKING CENTER
Klastry homogeniczne
• GigaRing, SuperCluster (T3E)
• PowerChallengeArray
• POE
• DCE
• Zarządzanie dużymi ilościami danych
• Systemy archiwizacji
11
Poznań Supercomputing and Networking Center
Massively Parallel Processing (MPP)
• Massively parallel approaches achieve high processing rates by
assembling large numbers of relatively slow processors
• Traditional approaches focus on improving the speed of individual
processors and assembly only a few of these powerfull processors for a
complete machine
• Improving network speed and communication overheads
• Examples :
– Thinking Machines (CM-2, CM-5)
– Intel Paragon
– Kendall Square (KS-1)
– SGI Origin 2000
– Cray T3D, T3E
Poznań Supercomputing and Networking Center
MPP’s network topologies
Some commonly used network topologies
Topology
Connectivity
Ring
2
2-Dimensional
Mesh
4
3-Dimensional
Mesh
6
Hypercube
N
2 Nodes
N
N=3
2
Nodes
Poznań Supercomputing and Networking Center
Cray T3E, T3D
• The Cray MPP system contains four types of components: processing
element nodes, the interconnect network, I/O gateways and a clock
• Network topology: 3D Mesh
+Y
-Z
Interconnect
Network
+X
Node B
-X
+Z
-Y
Processing Element
Node
I/O Gateway
Node A
Cray T3D
System Components
Poznań Supercomputing and Networking Center
Cray T3E
Processing Element Nodes (PE)
• Each PE contains a microprocessor, local memory and support circuitry
• 64-bit DEC Alpha RISC processor
• Very high scalability (8 ... 2048 CPUs)
Memory
Node B
CPU
Switch
Node A
Links
Poznań Supercomputing and Networking Center
Cray T3E
Interconnect Network
• The interconnect network provides communication paths between PEs
• There is formed a three dimensional matrix of paths that connect the
nodes in X, Y and Z dimensions
• A communication link transfers data and control information between
two network routers, connects two nodes in one dimension.
A communication link is actually two unidirectional channels. Each
channel in the link contains data, control and acknowledge signals.
• Dimension order routing (predefined methods of information traveling)
• Fault tolerance
Poznań Supercomputing and Networking Center
Cray T3E
Distributed operating system (Unicos/mk)
In the CRAY T3E systems, the local memory of each PE must contain a
copy of the microkernel and one or more servers. Under Unicos/mk each
PE is configured as one of the following types of PEs:
• Support PEs
The local memory of support PEs contains a copy of the microkernel and
servers. The exact number and type of servers vary depending on
configuration tuning.
• User PEs
The local memory of user PEs contains a copy of the microkernel and a
minimum number of servers. Because it contains a limited amount of
operating system code, most of a user PE’s local memory is available to
the user. User PEs include command and application PEs
• Redundant PE
A redundant PE is not configured into the system until an active PE fails.
Poznań Supercomputing and Networking Center
Cray T3E
Distributed operating system (Unicos/microkernel)
• Unicos/mk does not require a common memory architecture. Unlike Unicos,
the functions of Unicos/mk are devided between a microkernel and
numerous servers. For this reason, Unicos/mk is referred to as a serverized
operating system.
• Serverized operating systems offer a distinct advantage for the Cray T3E
system because of its distributed memory architecture. Within these systems,
the local memory of each PE is not required to hold the entire set of OS code
• The operating system can be distributed across the PEs in the whole system
• Under Unicos/mk, traditional UNICOS processes are implemented as actors.
Actors represents a resource allocation entity. The microkernel views all
user processes, servers and daemons as actors
• A multiple PE application has one actor per PE. User and daemon actors
reside in user address space; server actors reside in supervisory (kernel
address) space.
19
T3EMS – konfiguracja PE
20
T3E – szeregowanie zadań
21
Moduły demona psched
Gang scheduler
Provides application CPU and memory residency control by enabling
you to schedule all members of an application together. This
guarantees that the application members are synchronized across all
PEs spanning the application.
Load balancer
Measures how well processes and applications are acted upon and
serviced in each scheduling domain. Based on this information, the
load balancer may decide to move commands and applications among
eligible PEs in each domain.
MUSE
Implements a scheduling strategy similar to the fair-share scheduler in
UNICOS. MUSE allows the system to be shared among groups in an
organized way by assigning resources to the most deserving process.
Resource manager
Collects and analyses information about resource usage within the
machine for internal and external use. The object manager then makes
this information available in a uniform way to service providers such
as NQE.
22
Gang scheduling
Wszystkie procesy aplikacji są przydzielane do
zasobów w tym samym czasie
Parametry
•Heartbeat – długość kwantu
czasu przydzielanego aplikacji
•Partial - pozwala na częściowe
szeregowanie w razie wolnych
zasobów
•Variation - wariacja kwantu
czasu
23
Load balancing
w domenie
interaktywnej
Przenoszenie procesów
pomiędzy procesorami w
zależności od
wykorzystywanych zasobów.
Uwzględnia się koszt
przeniesienia zadań.
24
Load balancing w domenie aplikacyjnej
•Minimize swapping
• Minimize migration cost
• Perform expensive migrations
only when necessary
• Minimize the number of parties
• Maximize the contiguously
allocated PEs per party
Parametry
Heartbeat - częstotliwość
MigrationDelay – minimalny czas pomiędzy migracjami tej samej aplikacji
MigrationGravity – w którą stronę przesuwać aplikacje (w dół, w górę, w obie)
25
NoPreemptiveMigration – migracje tylko jeżeli są kolejne aplikacje do uruchomienia.
MUSE scheduler
Przydział ustalonego
procentu czasu CPU
niezależnie od ilości
procesów
użytkownika
26
psview - MUSE
lotus 9% psview -m APP
Status of MUSE Domain: APP
PE Range
: 0 - 0x4
Mode
: Active
Share by
: UID
Heartbeat
: 600 seconds
Decay
: 3600 seconds
OsHeartbeat : 60 seconds
Entitlement
MUSE
LongTerm Interval
Name
Absolute Relative Factor
Usage
Usage
Type
------------ -------- -------- -------- -------- -------- -------root
1
1.0000
- Root
Users
100
0.5000
0.9630
- Group
komasa
100
0.5000
0.2596
0.9630
0.6824 Active
Staff
100
0.5000
0.0370
- Group
pawelw
100
0.5000
1.0000
0.0370
0.3176 Active
27
psview -gang
lotus 12% psview -g APP
Status of Gang Scheduler Domain: APP
PE Range
: 0 - 0x4
Mode
: Full Gang Scheduling
Gangs
: 3
Parties
: 2
Time Slice : 50 - 800; Current: 800; Minimum: 5
Status
: schedule change pending
Rank
Command Name
User
PE-Range
Id
Status
==== ================ ======== =========== ====== =======
0
a.out
pawelw 0x003-0x004 19415 a.out
pawelw
000-0x002 19087 1
nel186_4.exe
komasa
000-0x003 81257 swapped (1 of 4)
28
Poznań Supercomputing and Networking Center
GigaRing Channel
• The GigaRing channel architecture is a modification of Scalable
Coherent Interface (SCI) specification and is designed to be the
common channel that carries information between Input/Output Nodes
(ION)
• This channel consists of a pair of 500 MB/s. channels configured as
counter-rotating rings
• The two rings form a single logical channel with a maximum
bandwidth of 1.0 GB/s. Protocol overhead lowers the channel rate to
920 MB/s.
• A client connects to the GigaRing channel through the ION via a 64-bit
full-duplex interface
• Detection of lost packets and cyclic redundancy checksums
Poznań Supercomputing and Networking Center
GigaRing Channel
The counter rotating rings provide two forms of system resiliency:
• Ring folding
• Ring masking
Client-specific
Chip
64 - bit
Client Port
Positive In Link
Positive Out Link
GigaRing Node
Chip
Negative Out Link
Negative In Link
GigaRing Node
GigaRing Node Interface
Poznań Supercomputing and Networking Center
GigaRing Channel
Ring Folding
• The GigaRing channel can be software configured to map out one or
more IONs from the system. Ring folding converts the counterrotating rings to form a single ring
• The maximum channel bandwith for a folded ring is approximately
500 MB/s
ION
ION
ION
ION
ION
ION
GiGaRing
Channel
Poznań Supercomputing and Networking Center
GigaRing Channel
Ring Masking
• Ring masking removes one of the counter-rotating rings from the
system, which results in one fully connected, uniderectional ring
• The maximum channel bandwidth = 500 MB/s
ION
GigaRing
Channel
ION
ION
ION
Poznań Supercomputing and Networking Center
GigaRing Channel
Input/Output Nodes (ION)
• All devices that connect directly to the GigaRing channel are
considered to be IONs
• There are three types of IONs :
Single-purpose Node (SPN)
Multipurpose node (MPN)
Mainframe node
• Available mainframe nodes :
Cray T90
Cray T3E
Cray J90se
Poznań Supercomputing and Networking Center
GigaRing Channel
Cray T3E
Cray J90se
Cray T3E
GigaRing Channel
HPN-2 (HIPPI)
Cray T90
Disk Array
HIPPI Network
Cray J90
Poznań Supercomputing and Networking Center
SuperCluster Environment
Cray T3E
Cray T90
Heterogenous
Workstation
Servers
PVM
DCE
NQE
DFS
HIPPI Switch
HIPPI Disk Array
NFS
HIPPI
ATM
FDDI
Ethernet
Parallel Vector Supercomputers J90
Poznań Supercomputing and Networking Center
SuperCluster Software Components
• Job distribution and load balancing
Cray NQX (NQE for Unicos)
• Open systems remote file access:
NFS
• Standard, secured distributed file system:
DCE DFS Server
• Client/server based distributed computing:
DCE Client Services
• Cray Message Passing Toolkit (MPT):
PVM, MPI
• High performance, resilient file sharing:
opt.
Shared File System (SFS)
• Client/server hierarchical storage management: opt.
Data Migration Facility (DMF)
Poznań Supercomputing and Networking Center
SuperCluster Software Components
Network Queuing Environment (NQE)
•
•
•
NQE consists of four components :
Network Queuing System (NQS), Network Load Balancer (NLB)
File Transfer Agent (FTA), Network Qeuing Environment clients
NQE is a batch queuing system that automatically load balances jobs across
heterogenous systems on a network. It runs each job submitted to the network
as efficiently as possible on the ressources available.
This provides faster turnaround for users and automatic load balancing to
ensure that all systems on the network are used effectively.
NQS
NQS
FTA
NLB server
FTA
NQE Clients
Collector
NQE master server
Collector
NQE execution servers
Poznań Supercomputing and Networking Center
POWER CHALLENGEarray
• Consists of up to eight Power Challenge or Power Onyx
(POWERnode) supercomputing systems connected by a high
performance HIPPI interconnect
• Two level communication hierarchy, whereas CPUs within a
POWERnode communicate via a fast shared bus interconnect and
CPUs across POWERnode communicate via HIPPI interconnect
M
P
P
P
M
M
HiPPI
switch
P
P
P
P
M
P
P
P
P
P
Poznań Supercomputing and Networking Center
POWER CHALLENGEarray
Parallel programming models supported:
• Shared memory with n processes inside a POWERnode
• Message passing with n processes inside a POWERnode
• Hybrid model with n processes inside a POWERnode, using a
combination of shared memory and message passing
• Message passing with n processes over p POWERnodes
• Hybrid model with n processes over p POWERnodes, using a
combination of shared memory within a POWERnode system and
message passing between POWERnodes
Poznań Supercomputing and Networking Center
Message Passing MPI Model
MPI Task
MPI Task
Multiparallel Memory
Sharing
Shared Memory
MPI Task
MPI Task
MPI Task
Communication
via sockets
Shared Memory
MPI Task
Poznań Supercomputing and Networking Center
POWER CHALLENGEarray
Software:
• Native POWERnode tools
IRIX 6.x, XFS, NFS, MIPSpro compilers, scientific and math libraries,
development environment
• Array services
Allows to manage and administer the array as a single system
• Distributed program development tools
HPF, MPI and PVM libraries, tools for distributed program
visualization and debugging (Upshot, XPVM)
• Distributed batch processing tools
LSF, CODINE
• Distributed system management tools
IRIXPro, Performance Co-Pilot (PCP)
Poznań Supercomputing and Networking Center
An array session is a set of processes, possibly running across several POWERnodes, that
are related to another by a single, unique identifier called the Array Session Handle
(ASH). A local ASH is assigned by the kernel and is guaranteed to be unique within a
single POWERnode, whereas a global ASH is assigned by the array services daemon
and is unique across the entire POWER CHALLENGEarray.
POWERnode1
ARRAY 1
POWERnode2
Process 2
Process 1
array
services
daemon
array
services
daemon
Array
Session
Process 3
array
services
daemon
POWERnode3
POWERnode4
array
services
daemon
Parallel Operating Environment
• Parallel Operating Environment - środowisko do
pracy równoległej
• Upraszcza uruchamianie programów równoległych
• Jeden punkt zarządzania - konsola wspólna dla
wszystkich procesów
• Proste konfigurowanie przy pomocy zmiennych
środowiskowych (lub parametrów)
• MPL, MPI, własne programy równoległe lub
nawet seryjne
Poznańskie Centrum
Superkomputerowo-Sieciowe
Parallel
Operating
Environment
The POE consists of parallel compiler scripts, POE environment variables, parallel
debugger(s) and profiler(s), MPL, and parallel visualization tools. These tools allow
one to develop, execute, profile, debug, and fine-tune parallel code.
The Partition Manager controls a partition, or group of nodes on which you wish to
run your program. The Partition Manager requests the nodes for your parallel job,
acquires the nodes necessary for that job (if the Resource Manager is not used), copies
the executables from the initiating node to each node in the partition, loads executables
on every node in the partition, and sets up standard I/O.
The Resource Manager keeps track of the nodes currently processing a parallel task,
and, when nodes are requested from the Partion Manager, it allocates nodes for use. The
Resource Manager attempts to enforce a ``one parallel task per node” rule.
The Processor Pools are sets of nodes dedicated to a particular type of process (such as
interactive, batch, I/O intensive) which have been grouped together by the system
administrator(s).
44
What is POE?
POE encompasses a collection of software tools designed to
provide an environment for developing, executing, debugging and
profiling parallel C, C++ and Fortran programs.
• Facilities to manage your parallel execution environment (environment
variables and command line flags)
• Message Passing Interface (MPI) library for interprocess communications
• Subset of MPI-2
• Low-level Application Programming Interface (LAPI)
• Parallel compiler scripts
• Parallel file copy utilities
• Authentication utilities
• Parallel debuggers
• Parallel profiling tools
• Dynamic probe class library (DPCL) parallel tools development API
45
What is POE?
Much of what POE does is designed to be transparent to the
parallel user. Some of these tasks include:
• Linking the necessary parallel libraries during compilation (via parallel
compiler scripts)
• Finding and acquiring machines (nodes) for your parallel job
• Loading your executable onto all nodes acquired for your parallel job
• Handling all stdin, stderr and stdout between the nodes of a your
parallel job
• Signal handling for all tasks in your job
• Providing intertask communications facilities
• Managing the use of processor and network adapter resources
• Retrieving system and job status information when requested
• Error detection and reporting
• Providing support for run-time profiling and analysis tools
46
Basic POE Environment Variables
MP_PROCS
The number of task processes for your parallel job. May be used alone or in
conjunction with MP_NODES and/or MP_TASKS_PER_NODE to specify how many
tasks are loaded onto a physical SP node. The maximum value for MP_PROCS is
dependent upon the version of PE software installed (currently ranges from 128 to 2048)
If not set, the default is 1.
MP_NODES
Specifies the number of physical nodes on which to run the parallel tasks. May be
used alone or in conjunction with MP_TASKS_PER_NODE and/or MP_PROCS.
MP_TASKS_PER_NODE
Specifies the number of tasks to be run on each of the physical nodes. May be used in
conjunction with MP_NODES and/or MP_PROCS.
MP_RESD
Specifies whether or not LoadLeveler should be used to allocate nodes. Valid values
are either "yes" (non-specific node allocation) or "no" (specific node allocation). If not
set, the default value is context sensitive to other POE variables. Batch systems typically
override/ignore user settings for this environment variable.
47
Basic POE Environment Variables
MP_RMPOOL
Specifies the SP system pool number that should be used for non-specific node allocation. This
is only valid if you are using the LoadLeveler for non-specific node allocation (from a single pool)
without a host list file. Batch systems typically override/ignore user settings for this environment
variable.
MP_HOSTFILE
This environment variable is used only if you wish to explicitly select which nodes will be
allocated for your POE job (specific node allocation). If you prefer to let LoadLeveler
automatically allocate nodes then this variable should be set to NULL or "". If used, this variable
specifies the name of a file which contains the actual machine (domain) names of nodes you wish
to use. It can also be used to specify which pools should be used. The default filename is "host.list"
in the current directory.
MP_EUILIB
Specifies which of two protocols should be used for task communications. Valid values are
either "ip" for Internet Protocol or "us" for User Space protocol. The default is "ip", while "us" is
faster.
MP_EUIDEVICE
A node may be physically connected to different networks. This environment variable is used to
specify which network adapter should be used for communications. Valid values are: "en0"
(ethernet), "fi0" (FDDI), "tr0" (token-ring), or "css0" (high-performance switch). Note that valid
values will also depend upon the actual physical network configuration of the node.
48
System Status Array
The leftmost area represents a list
of POE jobs which the Resource
Manager knows about. Clicking on
one of these jobs selects it.
The rightmost area provides a list
of node names; nodes are listed in
order from left to the right and from
top to the bottom.
The central area provides a grid of
squares, each square representing a
machine/node.
Pink squares represent low utilized
nodes.
Yellow squares represent high
utilized nodes.
Gray squares are nonexistent nodes
or nodes that are not available for
monitoring.
Squares with green boxes indicate
which nodes are associated with a
selected POE job number.
49
DCE
1.DCE provides tools and services that support distributed
applications.
(DCE RPC, DCE Threads, DCE Directory Service, Security Service and
Distributed Time Service,
2.DCE's set of services is integrated and comprehensive.
3.DCE provides interoperability and portability across
heterogeneous platforms.
4.DCE supports data sharing.
5.DCE participates in a global computing environment.
(X.500 and Domain Name Service (DNS))
50
Potential Users of DCE
1.An office with isolated computing resources can network the computers together and use
DCE for data and resource sharing.
2.An organization consisting of multiple computing sites that are already interconnected
by a network can use DCE to tie together and access resources across the different
sites.
3.Any computing organization comprising, or expecting to comprise in the future, more
cooperating hosts than can be easily administered manually
4.Organizations that write distributed applications can use DCE as a platform for their
software. Applications that are written on DCE can be readily ported to other software
and hardware platforms that also support DCE.
5.Organizations wishing to use applications that run on DCE platforms.
6.Organizations that wish to participate in networked computing on a global basis.
7.System vendors whose customers are in any of the preceding categories.
8.Organizations that would like to make a service available over the network on one
system (for example, a system running a non-UNIX operating system), and have it
accessible from other kinds of systems (for example, workstations running UNIX).51
DCE Models of Distributed Computing
• The Client/Server Model
• The Remote Procedure Call Model
• The Data Sharing Model
• The Distributed Object Model
52
Architektura DCE
53
DNS
GDA agent
X.500
GDS Server
DCE client
DCE API
DCE client
network
DCE client
Security
server
DTS
CDS
server
server
DFS File
server
UFS
DCE API
DCE API
LFS
DCE API
54
Time provider
Architectural Overview of DCE
• DCE Threads supports the creation, management, and synchronization of
multiple threads of control within a single process. This component is
conceptually a part of the operating system layer, the layer below DCE. DCE
threads are used by other DCE components and are also available for
applications to use.
• The DCE Remote Procedure Call facility consists of both a development
tool and a runtime service. The development tool consists of a language (and
its compiler) that supports the development of distributed applications
following the client/server model. It automatically generates code that
transforms procedure calls into network messages. The runtime service
implements the network protocols by which the client and server sides of an
application communicate. DCE RPC also includes software for generating
unique identifiers, which are useful in identifying service interfaces and
other resources.
55
Architectural Overview of DCE
• The DCE Directory Service is a central repository for information about
resources in the distributed system. Typical resources are users, machines, and
RPC-based services. The information consists of the name of the resource and its
associated attributes. Typical attributes could include a user's home directory, or
the location of an RPC-based server.
• The DCE Directory Service comprises several parts: the Cell Directory Service
(CDS), the Global Directory Service (GDS), the Global Directory Agent
(GDA), and a directory service programming interface. CDS manages a database
of information about the resources in a group of machines called a DCE cell.
(Cells are described in the next section.) GDS implements an international
standard directory service and provides a global namespace that connects the
local DCE cells into one worldwide hierarchy. GDA acts as a go-between for
cell and global directory services. Both CDS and GDS are accessed using a
single directory service application programming interface, the X/Open
Directory Service (XDS) Advanced Programming Interface (API).
56
Architectural Overview of DCE
• The DCE Distributed Time Service (DTS) provides synchronized
time on the computers participating in a Distributed Computing
Environment. DTS synchronizes a DCE host's time with Coordinated
Universal Time (UTC), an international time standard.
• The DCE Security Service provides secure communications and
controlled access to resources in the distributed system. There are four
aspects to DCE security: authentication, secure communications,
authorization, and auditing. These aspects are implemented by several
services and facilities that together constitute the DCE Security
Service, including the registry service, the authentication service, the
privilege service, the access control list (ACL) facility, the login
facility, and the audit service.
57
Architectural Overview of DCE
• The DCE Distributed File Service allows users to access and share files
stored on a file server anywhere on the network, without having to know
the physical location of the file. Files are part of a single, global
namespace, so no matter where in the network a user is, the file can be
found by using the same name. DFS achieves high performance,
particularly through caching of file system data, so that many users can
access files that are located on a given file server without prohibitive
amounts of network traffic and resulting delays.
• DCE DFS includes a physical file system, the DCE Local File System
(LFS), which supports special features that are useful in a distributed
environment. They include the ability to replicate data; log file system
data, enabling quick recovery after a crash; simplify administration by
dividing the file system into easily managed units called filesets; and
associate ACLs with files and directories.
58
Security Service
Zarządza bezpieczeństwem zasobów wchodzących w skład komórki
• Authentication service
wiarygodna, wzajemna identyfikacje komunikujących się procesów
• Privilege service
autoryzacja dostępu do zasobów i bezpieczne przekazywanie informacji identyfikujących
użytkownika w środowisku rozproszonym
• Registry service
zarządzanie i utrzymywanie replikowanej bazy danych o użytkownikach, ich grupach i
serwerach usług (principals) dostępnych w komórce
• ACL facility
kontrola dostępu do zasobów w postaci list kontroli dostępu ACL
• Login facility
autoryzacja użytkowników w środowisku DCE
• Audit service
monitorowanie i zapis operacji związanych z wszystkimi serwisami bezpieczeństwa
59
Security Service
60
Distributed File Service (DFS)
Rozproszony, replikowany system plików zbudowany w oparciu o serwis nazw
DCE i serwis bezpieczeństwa
• ukrywa detale dotyczące fizycznej lokalizacji plików
• udostępnia pliki i katalogi DFS w obrębie wszystkich węzłów przy użyciu tej samej
nazwy
• zapewnia replikację, kontrolę dostępu do obiektów (ACL) oraz szybką naprawę po
ewentualnej awarii (crash-recovery)
• Cache Manager
Zarządza mechanizmem buforowania po stronie klienta
• File Exporter
obsługuje żądania klientów DFS do eksportowanego systemu plików.
• Token Manager
zarządza współdzielonym dostępem do systemu plików (concurrent access).
• Replication Server
Jest odpowiedzialny za utrzymywanie spójności replik między serwerami DFS komórki.
• Update Server
Pozwala na łatwą dystrybucję nowych wersji oprogramowania wewnątrz komórki.
• Backup Server
Zapewnia automatyczne tworzenie kopii bezpieczeństwa zbiorów plików
61
Znajdowanie usługi w DCE
62
DCE RPC
63
System kolejkowy LSF na bazie DCE
Zalety wykorzystania DCE do połączenia systemów obliczeniowych przy
pomocy LSF:
• centralne zarządzanie użytkownikami (Registry Service)
• globalna przestrzeń nazw
• wspólny system plików (DFS)
• podwyższona efektywności (buforowanie)
• niezawodność (replikacja danych)
• kontrola integralności i zapewnienie poufności danych
• szyfrowanie transmisji
• listy kotroli dostępu
• wzmocnione mechanizmy autoryzacji i uwierzytelniania użytkowników
• model Key Distribution Center (KDC)
• identyfikacja przy wykorzystaniu papierów uwierzytelniających
64