Transcript Name Space

Chapter 5 Naming (II)
Speaker:Jyun-Yao Huang ([email protected])
Application and Practice of Distributed Systems
1
Solution to Location Service
• Simple solutions: both solutions below are applicable
only to local area networks
• Broadcasting and multicasting
• Forwarding Pointers
• Home-Based Approaches
• Distributed Hash Tables
• Hierarchical Approaches
Application and Practice of Distributed Systems
2
Hierarchical Approaches
• A network is divided into a collection of domains
• Tree-like domain structure (next slide)
• The root (directory) node knows all the entities
• Each domain has a directory node that keeps track of
all entities in that domain
• An entity currently located in domain D is represented
by a location record in the directory node dir(D)
• Keep the entity’s current address
Application and Practice of Distributed Systems
3
Hierarchical Approaches (cont.)
• Hierarchical organization of a location service into
domains, each having an associated directory node
Application and Practice of Distributed Systems
4
Hierarchical Approaches (cont.)
• If a leaf domain
• The location record contains a entity’s current address
• If a higher domain directory node
• The location record, for each entity, containing a link to the
next lower-level child domain’s directory node
• Thus, the root node has a location record for each
entity
Application and Practice of Distributed Systems
5
Hierarchical Approaches (cont.)
• If an entity is duplicated (see next slide)
• Say in D1 and D2, then the dir node of the smallest domain
containing both D1 and D2 will have two pointers
• One for each subdomain
Application and Practice of Distributed Systems
6
Replicated Entities
Application and Practice of Distributed Systems
7
Hierarchical Approaches (Cont.)
• Lookup operation is forwarded from a leaf domain to
a higher level domain
• The lookup operation exploits locality
• The entity is searched in a gradually increasing ring
• The ring expands until root since root has a loc. record for
each entity
Application and Practice of Distributed Systems
8
Lookup Operation
Application and Practice of Distributed Systems
9
Hierarchical Approaches (Cont.)
• Update operations also exploits locality
• E.g., an entity E created an replica in leaf domain D
• Lead to install the chain of pointers in a top-down fashion
• Start at the M
• However, the chain of pointers can also be constructed from
the bottom up
• i.e., create a location record before passing the insert request to its
parent node
• Advantage. an address becomes available for lookups as soon as
possible
Application and Practice of Distributed Systems
10
Update Operations
Application and Practice of Distributed Systems
11
Outline
• Names, Identifiers, and Addresses
• Flat Naming
• Structured Naming
• Attribute-Based Naming
Application and Practice of Distributed Systems
12
Introduction to Structured Naming
• Flat names are good for machine
• But not convenient for humans to use
• Sol. Structured Naming
• Composed from simple, human-readable names
• Outline
• – Name Space
• – Name Resolution
• – The implementation of a Name Space
• – Example: The Domain Name System
Application and Practice of Distributed Systems
13
Name Spaces
• Names are often organized into namespaces
• A name space can be represented as a labeled, directed
graph with two types of nodes
• leaf nodes: a named entity with no outgoing edges
• directory nodes: has a number of outgoing edges, each
labeled with a name
• Store a table called directory table: (edge label, node identifier)
• Each namespace has at least one root node
• No incoming edge
Application and Practice of Distributed Systems
14
Name Space Organization
• Could be a tree
• Each node except the root has exactly one incoming edge
• Each node has exactly one associated (absolute) path name
• Could be a DAG (Directed Acyclic Graph)
• A node may have more than one incoming edge
• But not permit to have a cycle
• Could have a general graph structure (cycles
permitted).
• Each entity could have an infinite number of names
Application and Practice of Distributed Systems
15
Name Spaces and Graphs
Application and Practice of Distributed Systems
16
Path Name
• Each path in the name space
• N:<label-1, label-2, …, label-n>, where N is the first node
• Absolute path name: N is a root node
• relative path name: N is not root
• Similar to file systems, e.g., /home/steen/mbox
• Global name: denotes the same entity no matter where the
name is used
• Local name: name interpretation depends on where the
name is being used
• Example: an environment variable in UNIX is a local name
• E.g., HOME refer to the home directory of a user
• Each user has its own copy of this variable, which is initialized to
the global user’s home directory
Application and Practice of Distributed Systems
17
Structured Naming
• Name Space
• Name Resolution
• The implementation of a Name Space
• Example: The Domain Name System
Application and Practice of Distributed Systems
18
Name Resolution
• Given just the path name, the process of looking up
information stored in the node
• And assuming, of course, that you know where to start …
• Closure mechanism: knowing how and where to start
name resolution
• i.e., the mechanism to select the initial node from which to
start name resolution
• E.g. UNIX file system: root node is stored in a specific place
in the disk
• E.g. 0031204447784: dial a phone number
Application and Practice of Distributed Systems
19
Link
• An alias is another name for the same entity
• Two approaches to implement an alias
• Hard link approach: allow multiple absolute paths
names to refer to the same node
• See previous figure, both /keys and /home/steen/keys
are called hard links to node n5
• Soft link or symbolic link approach: a node
containing the absolute path name of another node
• See the next figure
Application and Practice of Distributed Systems
20
Symbolic Link
• The concept of a symbolic link explained in a naming
graph. The node n5 has two names
Application and Practice of Distributed Systems
21
Mount
• Name resolution can be used to merge different name
spaces in a transparent way
• A mounted file system in NFS (Network File System)
• Let a directory node store the identifier of a directory node
from a different name space (foreign name space)
• Mount point: the directory node storing the node identifier of
another name space
• E.g., /home/vu in Fig. 4-4
• Mounting point: the directory node in the foreign name space
• E.g., /Home/steen in Fig. 4-4
Application and Practice of Distributed Systems
22
Mount (Cont.)
• To mount, we need at least the following information
• The name of an access protocol
• The name of the (foreign) server
• The name of the mounting point in the foreign name space
• Represent the three names above as a URL used in
NFS
• E.g., a URL used in NFS (Network File System)
nfs://filts.cs.vu.nl//home/steen
• A file (now a directory) called /home/steen on an NFS file
server flits.cs.vu.nl, which can be access by NFS protocol
Application and Practice of Distributed Systems
23
Mounting
• Mounting remote name spaces through a specific process
protocol (in this case Sun’s Network File System protocol NFS).
Application and Practice of Distributed Systems
24
Structured Naming
• Name Space
• Name Resolution
• The implementation of a Name Space
• Example: The Domain Name System
Application and Practice of Distributed Systems
25
The Implementation of a Name Space
• A Name Service allows users and processes to add,
remove and lookup names.
• Name services are implemented by Name Servers
• On LAN’s – a single server usually suffices
• On WAN’s – a distributed solution is often more
practical
Application and Practice of Distributed Systems
26
The Implementation of a Name Space
(Cont.)
• Namespaces (and services) are usually organised into
one of three layers
• Global Layer: highest level nodes
• Stable; dir tables change very infrequently
• Administrational Layer: directory nodes managed by
a single organization
• Relatively stable; although changes can occur more frequently.
• Managerial Layer: nodes change frequently
• Nodes maintained by users as well as administrators
• Nodes are the ‘leaf entities’ and can often change.
Application and Practice of Distributed Systems
27
DNS Name Space Partition
• An example partitioning of the DNS name space, including Internet-
accessible files, into the three name space layers. A “zone” in DNS is a
non-overlapping part of the namespace that is implemented by a
separate name server.
Application and Practice of Distributed Systems
28
The Implementation of a Name Space
(Cont.)
• Availability and performance in each layer have
different requirements
• Global layer and administrative layers
• Address the availability features
• If fail, a large part of the name space will be unreachable
• Solution: replication and client caching
• In contrast, managerial layer address the performance,
i.e., quick response time
• Solution: local, high-performance name server
Application and Practice of Distributed Systems
29
Name Server Comparison
Application and Practice of Distributed Systems
30
Name Resolution Process
• Two common approaches: Iterative and Recursive
• Iterative name resolution
• E.g., root:<nl,vu,cs,ftp,pub,globe,index.txt>
• It URL notation: ftp://ftp.cs.vu.nl/pub/globe/index.txt
• A name resolver in client hands over the complete name to
the root name server
• Root name server resolves the path name as far as it can and returns the
resulting name server to the client
• The client passes the remaining path to the returned name
server
• Recursive name resolution
• A name server passes the result to the next name server
Application and Practice of Distributed Systems
31
Iterative Name Resolution
Application and Practice of Distributed Systems
32
Recursive Name Resolution
Application and Practice of Distributed Systems
33
Advantages and Disadvantages
• Recursive name resolution
• Bad: puts a higher performance demand on each name server
• Good:
• (1) Caching is more effective compared to iterative name resolution.
One name server may cache the address of many of its lower-level
name server
• (2) Communication cost may be reduced (following the next slide)
• Iterative name resolution
• Bad: caching is restricted to the client side
• (1) If client A and B resolve the same name, caching in A is useless
• (2) Sol: a local, intermediate name server shared by all clients
under its administration such that its cache is shared
Application and Practice of Distributed Systems
34
Caching in Recursive Name Resolution
Application and Practice of Distributed Systems
35
Communication Cost in Iterative vs.
Recursive Resolution
Application and Practice of Distributed Systems
36
Structured Naming
• Name Space
• Name Resolution
• The implementation of a Name Space
• Example: The Domain Name System
Application and Practice of Distributed Systems
37
Example: The Domain Name System
(DNS)
• One of the largest distributed naming services
• Used to lookup up host addresses and mail servers
• DNS name space is organized as a rooted tree
• Each label (the bit between the ‘.’) must be < 64 chars
• Each path (the whole thing) must be < 256 chars.
• The root is given the name ‘.’ (although, in practice, the dot is
rarely shown nor required).
• Example: flicts.cs.vu.nl. (root)
• A subtree within DNS is referred to as a “domain”.
• A path name is referred to as a “domain name”.
• These can be relative or absolute.
Application and Practice of Distributed Systems
38
Example: The Domain Name System
(DNS) (Cont.)
• An Internet directory service
• Domain names are translated into IP addresses
• Also controls email delivery
• DNS system consists of three components
• DNS data: called resource records
• Servers: called name servers
• Internet protocols for fetching data from the servers
Application and Practice of Distributed Systems
39
• The billions of resource records in the DNS are split
into millions of files called zones
• Zones are kept on authoritative servers distributed all
over the Internet
• Each zone is implemented by a name server
• Always replicated for availability
• Updates for a zone is handled by the primary
Application and Practice of Distributed Systems
40
Domain Name System (DNS) (Cont.)
• The contents of a node is a collection of resource
records
• Different types of resource records
• SOA (start of authority): info such as email address of the
system admin
• A (address): IP addr of one of the host
• MX (mail exchange): a mail server in this domain
• SRV (server name): the name of a server for a specific service.
• The service is identified by a name with the name of a protocol
• E.g. www.cs.un.nl: the web server in cs.vu.nl.
Application and Practice of Distributed Systems
41
Domain Name System (DNS) (Cont.)
• NS: the name of a name server for some a zone
• CNAME: the canonical, or primary name of a alias
• PTR: do a "reverse" DNS lookup
• That is, they have your IP address and want to know what your
host/domain is.
• But the name of the pointer record is not the IP • address itself, but is
the IP address’ four IP octets in reverse order followed by INADDR.
ARPA.
• Ex: 192.168.0.1 becomes 1.0.168.192.IN-ADDR.ARPA.
• HINFO: store additional information on a host
• Machine type and operating system
• TXT: any other kind of data that is useful to user
Application and Practice of Distributed Systems
42
DNS – Types of Resource Record
Application and Practice of Distributed Systems
43
DNS Implementation Example
Application and Practice of Distributed Systems
44
DNS Implementation Example (cont.)
Application and Practice of Distributed Systems
45
DNS Example Explanation
• Previous slide shows part of info in cs.vu.nl domain
• 4 name servers for this zone
• TXT record give addition information on this zone
• 1 mail servers handle incoming mail addressed to users in this
domain
• The number specifies their priority (1: highest)
• star.cs.vu.nl is a name server with 2 IP address (2 NICs)
• Zephyr.cs.vu.nl is a mail server, backed up by another mail
server tornado.cs.vu.nl
• Web server and ftp server are implemented at the same
machine
Application and Practice of Distributed Systems
46
Decentralized DNS Implementation
• Fully decentralized by DNS system: scalability
• Sol. map DNS to a DHT-based implementation
• Example: CoDoNS
• Compute the hash of a DNS name
• Use the hash as a key
• Lookup in a distributed hash table
• To further minimize the number of hops in routing a
request => replicating DNS record
Application and Practice of Distributed Systems
47
CoDoNS
Application and Practice of Distributed Systems
48
Outline
• Names, Identifiers, and Addresses
• Flat Naming
• Structured Naming
• Attribute-Based Naming
Application and Practice of Distributed Systems
49
Introduction
• Flat and structured names
• Provide a location-independent way to refer to entity
• Human friendliness for structured names
• But…in some cases
• We would look up entities by means of their attributes using
(attribute, value)
• Example: traditional directory services (a.k.a., yellow pages)
• Sol. attributed-based naming
Application and Practice of Distributed Systems
50
Attribute-Based Naming
• Directory Services
• Hierarchical Implementation: LDAP
• Decentralized Implementations
Application and Practice of Distributed Systems
51
Directory Services
• Attributed-based naming systems are also known as
directory services
• Structured naming systems are generally called
naming systems
• But, how to describe a resource?
• Sol. Resource Description Framework (RDF)
• Each resource is described by (subject, predicate, object)
• E.g, (Person, name, Alice)
Application and Practice of Distributed Systems
52
Attribute-Based Naming
• Directory Services
• Hierarchical Implementation: LDAP
• Decentralized Implementations
Application and Practice of Distributed Systems
53
Hierarchical Implementation
• But, how to search using the RDF?
• Example: lightweight directory access protocol
(LDAP)
Application and Practice of Distributed Systems
54
LDAP
• A naming service operates very much like the
telephone director
• Find ‘B’, then find ‘Barry’, then find ‘Paul’, then get the
number.
• With a directory service, client look for an entity
based on its properties instead of its full name
• This is more like the Yellow Pages
• Find ‘Perl Consultants’, obtain the list, search the list, find
‘Paul Barry’, then get the number
Application and Practice of Distributed Systems
55
More on LDAP
• LDAP: a directory service
• A client can look for an entity based on a description of
properties instead of a full name
• LDAP: a directory service
• Consist of a number of records,
• A record is called a directory entry
• Directory entries in LDAP are roughly equivalent to resource
record in DNS.
• Each entry is a collection of (attribute, value)
• A collection of directory entries is referred to as a Directory
Information Base (DIB).
Application and Practice of Distributed Systems
56
The LDAP Name Space Example
Application and Practice of Distributed Systems
57
The LDAP Name Space Example
• OrganizationUnit = department
• Country and locality: where the entry is stored
• A naming attribute is also called Relative
Distinguished Name (RDN)
• The first 5 attributes are all naming attributes
• Can be concatenated to form a globally unique name
• E.g. /C=NL/O= Vrije Universiteit /OU= Math. & Comp. Sc
• Operation “read” and “list” are supported
• Read a single record
• List all records that match the search
Application and Practice of Distributed Systems
58
More on LDAP (Cont.)
• RDN’s can be arranged in sequence into a Directory
Information Tree (DIT).
• The DIT is usually partitioned and distributed across
several servers, which is called Directory Service
Agents – DSA
• Clients are known as Directory User Agents – DUA.
Application and Practice of Distributed Systems
59
The LDAP DIT
Application and Practice of Distributed Systems
60
The LDAP DIT (cont.)
Application and Practice of Distributed Systems
61
Attribute-Based Naming
• Directory Services
• Hierarchical Implementation: LDAP
• Decentralized Implementations
Application and Practice of Distributed Systems
62
Attribute-Based Naming - Decentralised
Implementation
• Driven by advent of peer-to-peer
• Key issue-efficient map (attribute,value) pairs for
efficient search
• Avoid exhaustive search of network
• Two approaches:
• Distributed Hash Tables (INS/Twine)
• Semantic Overlay Networks
Application and Practice of Distributed Systems
63
Mapping to Distributed Hash Tables
• Users specify a list of (attribute, value) pairs
• Each description is translated into an attribute-value
tree (AVTree)
Application and Practice of Distributed Systems
64
Mapping to Distributed Hash Tables
Application and Practice of Distributed Systems
65
Mapping to Distributed Hash Tables
(Cont.)
• Then, transform the AVTrees into a collection of keys
• To look up in a DHT system
• Each path originating in the root is assigned a unique hash
value
• For example:
• – hash(type-book) = h1 (=5)
• – hash(type-book-author) = h2 (=7 )
• – hash(type-book-author-Tolkien) = h3 (=21 )
• – hash(type-book-title) = h4 (=6 )
• – hash(type-book-title-LOTR) = h5 (=33 )
• – hash(genre-fantasy) = h6 (=11 )
Application and Practice of Distributed Systems
66
Mapping to Distributed Hash Tables
(Cont.)
• Node in DHT responsible for hash value will keep a
reference to actual resource
• Query for (type-book) will get hashed to value 5 and sent to node
responsible for storing hash value 5
• In the example, six nodes storing information on Tolkien’s
• LOTR book
• Redundancy
• In practice
• A hash such as h1 is rather general and will be generated often
• Can be filtered out of the system
• Only the most specific hashes need to be evaluated
Application and Practice of Distributed Systems
67
Semantic Overlay Networks
• When there is no organized attribute-based naming
resolution scheme
• Nodes must discover where resources are located
• To make queries efficient, nodes can keep track of
nodes with similar resources
• Semantically proximal
• Example: commonality in the metadata information
• Have the same collection of attributes
• The nodes and these links from a semantic overlay
network
Application and Practice of Distributed Systems
68
Semantic Overlay Networks (cont.)
• However, measuring similarity based on attributes is
difficult
• Different nodes have different definitions of attributes
• Another solution: use file names
• Similarity measured as number of files in common
• As shown in the following slide
• Top layer is a semantic overlay network
• Maintains links between similar nodes
• Bottom layer is random overlay network
• Maintains links to uniform randomly-selected nodes
Application and Practice of Distributed Systems
69
Semantic Overlay Networks
Application and Practice of Distributed Systems
70