UNIX Internals – the New Frontiers
Download
Report
Transcript UNIX Internals – the New Frontiers
UNIX Internals – the New Frontiers
Distributed File Systems
1
Difference between DOS and DFS
Distributed
OS looks like a centralized OS,
but runs simultaneously on multiple
machines. It may provide a FS shared by
all its host machines.
Distributed FS is a software layer that
manages communication between
conventional operating systems and file
systems
2
General Characteristics of DFS
Network
transparency
Location transparency & Location
independence
User Mobility
Fault tolerance
Scalability
File mobility
3
Design Considerations
Name
Space
Stateful or stateless
Semantics of sharing
UNIX
semantics
Session semantics
Remote
4
access method
Network File System(NFS)
Based
on Client-server model
Communicate via remote procedure call
5
User Perspective
An
NFS server exports one or more file
systems
Hard
mount: must get a reply
Soft mount: returns an error
Spongy mount: hard for mount, soft for I/O
Commands:
6
mount –t nfs nfssrv:/usr
mount –t nfs nfssrv:/usr/u1
mount –t nfs nfssrv:/usr
mount –t nfs nfssrv:/usr/local
/usr
/u1
/users
/usr/local
7
Design goals
Not
restricted to UNIX
Not be dependent on any hardware
Simple recovery mechanisms
To access remote files transparently
UNIX semantics
NFS performance must be comparable
to that of a local disk
Transport-independent
8
NFS components
NFS
protocol
RPC protocol
XDR(Extended Data Representation)
NFS server code
NFS client code
Mount protocol
Daemon processes (nfsd, mountd,biod)
NLM(Network Lock Manager)& NSM(Network Status Monitor)
9
Statelessness
Each
request is independent
It makes crash recovery simple
Client
crash
Server crash
Problem:
It
must commit all modifications to stable
storage before replying to a request.
10
10.4 The protocol suite
Why
XDR?
Differences among internal
representation of data elements:
Order,
sizes of types.
Opaque (byte stream)
Typed
Little-endian
Big-endian
11
XDR
Integers
32
bits, (0 byte leftmost - most significant),
(signed integers - 2’s compliment)
Variable-length
opaque data
Length(4B),data
is NULL padded
Strings
Length(4B),
ASCII string, NULL padded
Arrays
size(4B),same
Structures
Natural
12
order
type of data
13
RPC
Specify
the format of communications
between the client and the server.
SUN RPC: synchronous requests only.
Implemented on UDP/IP.
Authentication to identify callers
AUTH
_NULL, AUTH _UNIX, AUTH_SHORT,
AUTH _DES, and AUTH _KERB
RPC
14
language compiler: rpcgen
15
10.5 NFS Implementation
Control
Vnode
Rnode
16
Flow
File Handle
Assign a file handle for lookup, create or
mkdir.
Subsequent I/O operations will use it.
A file handle =Opaque 32B object =<file
system ID, inode number, generation
number>
Generation number is used to check if the file
is not obsolete (its inode is allocated to
another file)
17
The mount operation
nfs_mount():
send RPC request with argument of
pathname
Mountd daemon translate
Checks
Reply success with a file handle
Initialize vfs, records name, address
Allocate rnode & vnode
Server must check access rights on each
request
18
Pathname Lookup
Client:
Initiate
lookup during open, create & stat
From current or root directory, proceeds one
component at a time
Send request if it is a NFS directory
Server
From
file handle ->FS ID->vfs->VGET-> vnode
->VOP_LOOKUP->vnode & pointer
VOP_GETATTR->VOP_FID-> file handle
Reply message= status+file handle+file attributes
Client:
Gets
the reply, allocates rnode+vnode, copy info and
proceeds to search for the next component
19
10.6 UNIX Semantics
NFS leads to a few incompatibilities with
UNIX because of stateless.
Open file permission
UNIX
checks for open
NFS checks for each read and write
In NFS, the server always allows the owner of the
file to read or write the file.
Write to the write-protected?
Save
attributes containing the file permission
when open
20
Deletion of open files
The
server has no ideas about the
open file.
The clients renames the file to be
deleted.
Delete it when closing it
Delete on different machines?
21
Reads and Writes
UNIX
locks the vnode at the start of I/O
NFS clients can lock the vnode on the
same machine.
NFS offers no protection against
overlapping I/O requests.
Using NLM(Network Lock Manager)
protocol is only advisory.
22
10.7 NFS Performance
Bottlenecks
Writes
must be committed to stable storage
Fetching of file attributes requires one RPC
call per file
Processing retransmitted requests adds to
the load on the server
23
Client-side caching
Caching
both blocks and file attributes
To avoid invalid data
Keep
an expiry time in the kernel
60 seconds for rechecking the modified time
Reduces
24
but not eliminates the problem
Deferral of writes
Asynchronous writes for full blocks
Delayed writes for partial blocks
Flush delayed writes when closing or 30
seconds by biod daemon
Server uses NVRAM buffer, flushes the
buffer to disk
Write-gathering:
Wait,
process >1 writes to one file and reply for
each
The server process gathered write requests
25
The retransmissions cache
Idempotent
Nonidempotent
Problem:
Retransmissions
Check
Remove
request
Remove, sends reply success, but lost
Client restransmit remove
Server processes remove request
Remove error, sends remove failure
Client receives the error message
(xid) cache (server):
xid, procedure number, & client ID
Check cache only when failure
26
New implementation
Caches all requests
Check xid, procedure number, client ID, state
field & timestamp
If request in progress, discard; if done,
discards if timestamp shows the request is in
the throwaway window(3-6s)
Otherwise processes request if idempotent;
For nonidempotent, checks the file if
modified, if not - send success; otherwise,
retry it.
27
10.9 NFS Security
NFS
Access Control
On
mount and request
By an exports list
Mount:
checks the list, denies the ineligible
Request: authentication information,
AUTH_UNIX form(UID,GID)
Loophole:
a imposter can use <UID,GID>
to access the files of others
28
UID Remapping
A
translation map for each client.
Same
UID may map to different UID on
the server
Nobody if does not match in the map
Implemented at RPC level
Implemented at NFS level
Merging
29
the map and /etc/exports file
Root Remapping
Map the super user to nobody
Limit the super user of the client to
access files on the server
The UNIX framework is designed for an
isolated, multi-user environment. The
users trust each other.
30
10.10 NFS Version 3
Commit request
Client
writes, the kernel sends asynchronous
write
Server saves to local cache, replies immediately
Client holds the data copy until the process
closes the file and sends commit request
Server flushes data to disk
file length:
From
32 bits(4GB) to 64 bits(234 GB)
READDIRPLUS
Returns
31
=(LOOKUP+GETATTR)
names, file handles, file attributes
Other DFS
The
Andrew File System
(10.15 – 10.17)
The DCE Distributed File System
(10.18 – 10.18.5)
32