UNIX Internals – the New Frontiers

Download Report

Transcript UNIX Internals – the New Frontiers

UNIX Internals – the New Frontiers
Distributed File Systems
1
Difference between DOS and DFS
 Distributed
OS looks like a centralized OS,
but runs simultaneously on multiple
machines. It may provide a FS shared by
all its host machines.
 Distributed FS is a software layer that
manages communication between
conventional operating systems and file
systems
2
General Characteristics of DFS
 Network
transparency
 Location transparency & Location
independence
 User Mobility
 Fault tolerance
 Scalability
 File mobility
3
Design Considerations
 Name
Space
 Stateful or stateless
 Semantics of sharing
 UNIX
semantics
 Session semantics
 Remote
4
access method
Network File System(NFS)
 Based
on Client-server model
 Communicate via remote procedure call
5
User Perspective
 An
NFS server exports one or more file
systems
 Hard
mount: must get a reply
 Soft mount: returns an error
 Spongy mount: hard for mount, soft for I/O
 Commands:




6
mount –t nfs nfssrv:/usr
mount –t nfs nfssrv:/usr/u1
mount –t nfs nfssrv:/usr
mount –t nfs nfssrv:/usr/local
/usr
/u1
/users
/usr/local
7
Design goals
 Not
restricted to UNIX
 Not be dependent on any hardware
 Simple recovery mechanisms
 To access remote files transparently
 UNIX semantics
 NFS performance must be comparable
to that of a local disk
 Transport-independent
8
NFS components
 NFS
protocol
 RPC protocol
 XDR(Extended Data Representation)
 NFS server code
 NFS client code
 Mount protocol
 Daemon processes (nfsd, mountd,biod)
 NLM(Network Lock Manager)& NSM(Network Status Monitor)
9
Statelessness
 Each
request is independent
 It makes crash recovery simple
 Client
crash
 Server crash
 Problem:
 It
must commit all modifications to stable
storage before replying to a request.
10
10.4 The protocol suite
 Why
XDR?
 Differences among internal
representation of data elements:
 Order,
sizes of types.
 Opaque (byte stream)
 Typed
 Little-endian
 Big-endian
11
XDR
 Integers
 32
bits, (0 byte leftmost - most significant),
(signed integers - 2’s compliment)
 Variable-length
opaque data
 Length(4B),data
is NULL padded
 Strings
 Length(4B),
ASCII string, NULL padded
 Arrays
 size(4B),same
 Structures
 Natural
12
order
type of data
13
RPC
 Specify
the format of communications
between the client and the server.
 SUN RPC: synchronous requests only.
 Implemented on UDP/IP.
 Authentication to identify callers
 AUTH
_NULL, AUTH _UNIX, AUTH_SHORT,
AUTH _DES, and AUTH _KERB
 RPC
14
language compiler: rpcgen
15
10.5 NFS Implementation
 Control
 Vnode
 Rnode
16
Flow
File Handle
Assign a file handle for lookup, create or
mkdir.
 Subsequent I/O operations will use it.
 A file handle =Opaque 32B object =<file
system ID, inode number, generation
number>
 Generation number is used to check if the file
is not obsolete (its inode is allocated to
another file)

17
The mount operation

nfs_mount():
send RPC request with argument of
pathname
 Mountd daemon translate
 Checks
 Reply success with a file handle
 Initialize vfs, records name, address
 Allocate rnode & vnode
 Server must check access rights on each
request

18
Pathname Lookup
 Client:
 Initiate
lookup during open, create & stat
 From current or root directory, proceeds one
component at a time
 Send request if it is a NFS directory
 Server
 From
file handle ->FS ID->vfs->VGET-> vnode
->VOP_LOOKUP->vnode & pointer
 VOP_GETATTR->VOP_FID-> file handle
 Reply message= status+file handle+file attributes
 Client:
 Gets
the reply, allocates rnode+vnode, copy info and
proceeds to search for the next component
19
10.6 UNIX Semantics
NFS leads to a few incompatibilities with
UNIX because of stateless.
 Open file permission

 UNIX
checks for open
 NFS checks for each read and write
 In NFS, the server always allows the owner of the
file to read or write the file.

Write to the write-protected?
 Save
attributes containing the file permission
when open
20
Deletion of open files
 The
server has no ideas about the
open file.
 The clients renames the file to be
deleted.
 Delete it when closing it
 Delete on different machines?
21
Reads and Writes
 UNIX
locks the vnode at the start of I/O
 NFS clients can lock the vnode on the
same machine.
 NFS offers no protection against
overlapping I/O requests.
 Using NLM(Network Lock Manager)
protocol is only advisory.
22
10.7 NFS Performance
 Bottlenecks
 Writes
must be committed to stable storage
 Fetching of file attributes requires one RPC
call per file
 Processing retransmitted requests adds to
the load on the server
23
Client-side caching
 Caching
both blocks and file attributes
 To avoid invalid data
 Keep
an expiry time in the kernel
 60 seconds for rechecking the modified time
 Reduces
24
but not eliminates the problem
Deferral of writes
Asynchronous writes for full blocks
 Delayed writes for partial blocks
 Flush delayed writes when closing or 30
seconds by biod daemon
 Server uses NVRAM buffer, flushes the
buffer to disk
 Write-gathering:

 Wait,
process >1 writes to one file and reply for
each
 The server process gathered write requests
25
The retransmissions cache
 Idempotent
 Nonidempotent
 Problem:
 Retransmissions
 Check
Remove
request
Remove, sends reply success, but lost
Client restransmit remove
Server processes remove request
Remove error, sends remove failure
Client receives the error message
(xid) cache (server):
xid, procedure number, & client ID
 Check cache only when failure
26
New implementation
Caches all requests
 Check xid, procedure number, client ID, state
field & timestamp
 If request in progress, discard; if done,
discards if timestamp shows the request is in
the throwaway window(3-6s)
 Otherwise processes request if idempotent;
 For nonidempotent, checks the file if
modified, if not - send success; otherwise,
retry it.

27
10.9 NFS Security
 NFS
Access Control
 On
mount and request
 By an exports list
 Mount:
checks the list, denies the ineligible
 Request: authentication information,
AUTH_UNIX form(UID,GID)
 Loophole:
a imposter can use <UID,GID>
to access the files of others
28
UID Remapping
A
translation map for each client.
 Same
UID may map to different UID on
the server
 Nobody if does not match in the map
 Implemented at RPC level
 Implemented at NFS level
 Merging
29
the map and /etc/exports file
Root Remapping
Map the super user to nobody
 Limit the super user of the client to
access files on the server
 The UNIX framework is designed for an
isolated, multi-user environment. The
users trust each other.

30
10.10 NFS Version 3

Commit request
 Client
writes, the kernel sends asynchronous
write
 Server saves to local cache, replies immediately
 Client holds the data copy until the process
closes the file and sends commit request
 Server flushes data to disk

file length:
 From
32 bits(4GB) to 64 bits(234 GB)
 READDIRPLUS
 Returns
31
=(LOOKUP+GETATTR)
names, file handles, file attributes
Other DFS
 The
Andrew File System
(10.15 – 10.17)
 The DCE Distributed File System
(10.18 – 10.18.5)
32