OSPP: File Systems - People Search Directory

Download Report

Transcript OSPP: File Systems - People Search Directory

File Systems
Main Points
• File layout
• Directory layout
The External View of the File Manager
Application
Program
Windows
Hardware
Operating Systems: A Modern
Perspective, Chapter 13
Memory Mgr
Process Mgr
File Mgr
UNIX
Device Mgr
WriteFile()
CreateFile()
CloseHandle() ReadFile()
SetFilePointer()
Memory Mgr
Process Mgr
Device Mgr
File Mgr
mount()
write()
close() open()
read()
lseek()
Why Programmers Need Files
HTML
Editor
<head>
…
</head>
<body>
…
</body>
Web
Browser
foo.html
File
Manager
• Persistent storage
• Shared device
Operating Systems: A Modern
Perspective, Chapter 13
<head>
…
</head>
<body>
…
</body>
File
Manager
• Structured information
• Can be read by any applic
• Accessibility
• Protocol
File Management
• File is a named, ordered collection of
information
• The file manager administers the collection
by:
– Storing the information on a device
– Mapping the block storage to a logical view
– Allocating/deallocating storage
– Providing file directories
Operating Systems: A Modern
Perspective, Chapter 13
Information Structure
Applications
Records
Byte Stream Files
Stream-Block Translation
Storage device
Operating Systems: A Modern
Perspective, Chapter 13
Byte Stream File Interface
fileID = open(fileName)
close(fileID)
read(fileID, buffer, length)
write(fileID, buffer, length)
seek(fileID, filePosition)
Operating Systems: A Modern
Perspective, Chapter 13
Low Level Files
fid = open(“fileName”,…);
…
read(fid, buf, buflen);
…
close(fid);
int
int
int
int
int
open(…) {…}
close(…) {…}
read(…) {…}
write(…) {…}
seek(…) {…}
Storage device response to commands
Operating Systems: A Modern
Perspective, Chapter 13
b0 b1 b2
...
bi
...
Stream-Block Translation
Structured Files
Records
Structured Record Files
Record-Block Translation
Operating Systems: A Modern
Perspective, Chapter 13
Record-Oriented Sequential Files
Logical Record
fileID = open(fileName)
close(fileID)
getRecord(fileID, record)
putRecord(fileID, record)
seek(fileID, position)
Operating Systems: A Modern
Perspective, Chapter 13
Record-Oriented Sequential Files
Logical Record
H byte header
Operating Systems: A Modern
Perspective, Chapter 13
k byte logical record
...
Record-Oriented Sequential Files
Logical Record
H byte header
k byte logical record
...
...
Physical Storage Blocks
Operating Systems: A Modern
Perspective, Chapter 13
Fragment
Indexed Sequential File
• Suppose we want to directly access records
• Add an index to the file
fileID = open(fileName)
close(fileID)
getRecord(fileID, index)
index = putRecord(fileID, record)
deleteRecord(fileID, index)
Operating Systems: A Modern
Perspective, Chapter 13
Indexed Sequential File (cont)
Application structure
Account #
Index
index = i
012345
123456
i
294376
k
index = k
...
529366
...
965987
Operating Systems: A Modern
Perspective, Chapter 13
j
index = j
File System Design
• File System is an organized collection of
regular files and directories (mkfs)
• Data structures
– Directories: file name -> file metadata
• Store directories as files
– File metadata: how to find file data blocks
– Free map: list of free disk blocks
File System Design Constraints
• For small files:
– Small blocks for storage efficiency
– Files used together should be stored together
• For large files:
– Contiguous allocation for sequential access
– Efficient lookup for random access
• May not know at file creation
– Whether file will become small or large
Design Challenges
• Index structure
– How do we locate the blocks of a file?
• Index granularity
– What block size do we use?
• Free space
– How do we find unused blocks on disk?
• Locality
– How do we preserve spatial locality?
• Reliability
– What if machine crashes in middle of a file system op?
File Systems
• Traditional FFS file system (Linux)
• Microsoft’s FAT, FAT2 and NTFS file systems
• Journaling file systems, ext3
• … others
Microsoft File Allocation Table (FAT)
• Linked list index structure
– Simple, easy to implement
– Still widely used (e.g., thumb drives)
• File table:
– Linear map of all blocks on disk
– Each file a linked list of blocks
FAT
FAT
• Pros:
– Easy to find free block
– Easy to append to a file
– Easy to delete a file
• Cons:
– Small file access is slow
– Random access is very slow
– Fragmentation
• File blocks for a given file may be scattered
• Files in the same directory may be scattered
• Problem becomes worse as disk fills
Berkeley UNIX FFS (Fast File System)
• inode table
– Analogous to FAT table
• inode
– Metadata
• File owner, access permissions, access times, …
– Set of 12 data pointers
– With 4KB blocks => max size of 48KB files
File System Structure
• Basic unit for allocating space on the disk is a
block
Disk
File
System
partition
Boot Block
partition
Super-block
i-node table
partition
Data blocks
I-nodes
• Each file or directory in the file system has a
unique entry in the i-node table.
• File type (regular, symbolic link, directory…)
• Owner
• Permissions
• Timestamps for last access; last modification, last
status change
• Size
• …
i-node entry
DB 0
0
DB 5
…
5
DB
…
11
IPB
IPB
12
13
14
15
IPB
DB
FFS inode
• Metadata
– File owner, access permissions, access times, …
• Set of 12 data pointers
– With 4KB blocks => max size of 48KB files
• Indirect block pointer
– pointer to disk block of data pointers
• Indirect block: 1K data blocks => 4MB (+48KB)
FFS inode
• Metadata
– File owner, access permissions, access times, …
• Set of 12 data pointers
– With 4KB blocks => max size of 48KB
• Indirect block pointer
– pointer to disk block of data pointers
– 4KB block size => 1K data blocks => 4MB
• Doubly indirect block pointer
– Doubly indirect block => 1K indirect blocks
– 4GB (+ 4MB + 48KB)
FFS inode
• Metadata
– File owner, access permissions, access times, …
• Set of 12 data pointers
– With 4KB blocks => max size of 48KB
• Indirect block pointer
– pointer to disk block of data pointers
– 4KB block size => 1K data blocks => 4MB
• Doubly indirect block pointer
– Doubly indirect block => 1K indirect blocks
– 4GB (+ 4MB + 48KB)
• Triply indirect block pointer
– Triply indirect block => 1K doubly indirect blocks
– 4TB (+ 4GB + 4MB + 48KB)
FFS Asymmetric Tree
• Small files: shallow tree
– Efficient storage for small files
• Large files: deep tree
– Efficient lookup for random access in large files
Disk Organization
Boot Sector
Blk0
Volume Directory
…
Blk1
Blkk-1
Track 0, Cylinder 0
Blk2k-1
Track 0, Cylinder 1
Blk
Track 1, Cylinder 0
Blk
Track N-1, Cylinder 0
Blk
Track N-1, Cylinder M-1
…
Blkk
Blkk+1
…
Blk
…
Blk
…
Blk
…
Blk
…
Blk
Operating Systems: A Modern
Perspective, Chapter 13
Blk
…
Low-level File System Architecture
Block 0
b0 b1 b2 b3
…
…
bn-1
...
Sequential Device
Operating Systems: A Modern
Perspective, Chapter 13
Randomly Accessed Device
Block Management
• The job of selecting & assigning storage
blocks to the file
• Three basic strategies:
– Contiguous allocation
– Linked lists
– Indexed allocation
Operating Systems: A Modern
Perspective, Chapter 13
Contiguous Allocation
• Maps the N blocks into N contiguous blocks
on the secondary storage device
• Difficult to support dynamic file sizes
File descriptor
Head position
237
…
First block
785
Number of blocks25
Operating Systems: A Modern
Perspective, Chapter 13
Linked Lists
• Each block contains a header with
– Number of bytes in the block
– Pointer to next block
• Blocks need not be contiguous
• Files can expand and contract
• Seeks can be slow
First block
…
Head: 417
...
Operating Systems: A Modern
Perspective, Chapter 13
Length
Length
Length
Byte 0
...
Byte 4095
Byte 0
...
Byte 4095
Byte 0
...
Byte 4095
Block 0
Block 1
Block N-1
Indexed Allocation
• Extract headers and put them in an index
• Simplify seeks
• May link indices together (for large files)
Index block
…
Head: 417
...
Byte 0
...
Byte 4095
Length
Byte 0
...
Byte 4095
Length
Length
Operating Systems: A Modern
Perspective, Chapter 13
Block 0
Byte 0
...
Byte 4095
Block N-1
Block 1
DOS FAT Files
File Descriptor
43
Disk
Block
254
…
107
Disk
Block
Disk
Block
File Descriptor
43
107
254
43
Disk
Block
…
Disk
Block
107
254
Operating Systems: A Modern
File
Access Table (FAT)
Perspective, Chapter 13
Disk
Block
UNIX Files
inode
mode
owner
…
Direct block 0
Direct block 1
…
Direct block 11
Single indirect
Double indirect
Triple indirect
Data
Data
Data
Index
Data
Index
Data
Index
Index
Index
Data
Index
Index
Data
Index
Index
Operating Systems: A Modern
Perspective, Chapter 13
Data
Data
Unallocated Blocks
• How should unallocated blocks be
managed?
• Need a data structure to keep track of them
– Linked list
• Very large
• Hard to manage spatial locality
– Block status map (“disk map”)
• Bit per block
• Easy to identify nearby free blocks
• Useful for disk recovery
Operating Systems: A Modern
Perspective, Chapter 13
FFS Locality
• Block group allocation
– Block group is a set of nearby cylinders
– Files in same directory located in same group
– Subdirectories located in different block groups
• inode table spread throughout disk
– inodes, bitmap near file blocks
• First fit allocation
– Small files fragmented, large files contiguous
FFS First Fit Block Allocation
FFS First Fit Block Allocation
FFS First Fit Block Allocation
FFS
• Pros
– Efficient storage for both small and large files
– Locality for both small and large files
– Locality for metadata and data
• Cons
– Inefficient for tiny files (a 1 byte file requires both an
inode and a data block)
– Inefficient encoding when file is mostly contiguous on
disk (no equivalent to superpages)
– Need to reserve 10-20% of free space to prevent
fragmentation
NTFS
• Master File Table
– Flexible 1KB storage for metadata and data
• Extents
– Block pointers cover runs of blocks
– Similar approach in linux (ext4)
– File create can provide hint as to size of file
• Journalling for reliability
– Discussed next time
NTFS Small File
NTFS Medium File
NTFS Indirect Block
NTFS Multiple Indirect Blocks
Named Data in a File System
Directories
• A set of logically associated files and sub
directories
• File manager provides set of controls:
– enumerate
– copy
– rename
– delete
– traverse
– etc.
Operating Systems: A Modern
Perspective, Chapter 13
Directory Implementation
• Device Directory
– A device can contain a collection of files
– Easier to manage if there is a root for every file
on the device -- the device root directory
• File Directory
– Typical implementations have directories
implemented as a file with a special format
– Entries in a file directory are handles for other
files (which can be files or subdirectories)
Operating Systems: A Modern
Perspective, Chapter 13
Directory Structures
• How should files be organized within
directory?
– Flat name space
• All files appear in a single directory
– Hierarchical name space
• Directory contains files and subdirectories
• Each file/directory appears as an entry in exactly
one other directory -- a tree
• Popular variant: All directories form a tree, but a
file can have multiple parents.
Operating Systems: A Modern
Perspective, Chapter 13
Directories
unix
etc
passwd
home
motd
pro
dev
twd
unix
...
slide1 slide2
Directory Representation
Component Name
Inode Number
directory entry
.
..
unix
etc
home
pro
dev
1
1
117
4
18
36
93
Directories
Directories
• Directories can be files
– Map file name to file number (MFT #, inode num)
• Table of file name -> file number
– Small directories: linear search
Large Directories: B-Trees
Hard Links
unix
etc
home
pro
dev
twd
image motd
unix
...
slide1 slide2
% ln /unix /etc/image
# link system call
Directory Representation
.
..
unix
etc
home
pro
dev
1
1
117
4
18
36
93
.
..
image
motd
4
1
117
33
Create hard links
$ echo –n “It is good to collect things, “ > abc
$ ln abc xyz
$ echo “ but it is better to go on walks” >> xyz
$ cat abc
It is good to collect things, but it is better to on
walks
Hard links
• If one is removed, the other name and the file
itself continue to exist.
Soft (symbolic) links
• Differing from a hard link, a soft link (or
symbolic link) is a special kind of file
containing the name of another file.
• Created with the ln –s
Soft Links
unix
etc
home
pro
dev
twd
image twd
/home/twd
% ln –s /unix /home/twd/mylink
% ln –s /home/twd /etc/twd
# symlink system call
unix
...
slide1 slide2
mylink
/unix
Working Directory
• Maintained in kernel for each process
– paths not starting from “/” start with the
working directory
– changed by use of the chdir system call
– displayed (via shell) using “pwd”
• how is this done?
UNIX mount Command
/
/
bin usr etc
bill
bin usr etc
foo
bill
nutt
foo
/
nutt
FS
abc cde
xyz
/
FS
abc cde
Operating Systems: A Modern
Perspective, Chapter 13
blah
xyz
blah
mount FS at foo
VFS-based File Manager
Exports OS-specific API
File System Independent
Part of File Manager
Virtual File System Switch
MS-DOS Part of
File Manager
Operating Systems: A Modern
Perspective, Chapter 13
ISO 9660 Part of
File Manager
…
ext2 Part of
File Manager
File
Management
Operating Systems: A Modern
Perspective, Chapter 13
13
An open() Operation
• Locate the on-device (external) file
descriptor
• Extract info needed to read/write file
• Authenticate that process can access the
file
• Create an internal file descriptor in primary
memory
• Create an entry in a “per process” open file
status table
• Allocate resources, e.g., buffers, to support
file usage
Operating Systems: A Modern
Perspective, Chapter 13
File Manager Data Structures
2 Keep the state
of the processfile session
3 Return a
reference to
the data
structure
Process-File
Session
Open File
Descriptor
External File Descriptor
Operating Systems: A Modern
Perspective, Chapter 13
1 Copy info from
external to the
open file
descriptor
Opening a UNIX File
fid = open(“fileA”, flags);
…
read(fid, buffer, len);
0
1
2
3
stdin
stdout
stderr
...
On-Device File Descriptor
File structure
inode
Open File Table
Operating Systems: A Modern
Perspective, Chapter 13
Internal File Descriptor
File Descriptors
•External name
•Current state
•Sharable
•Owner
•User
•Locks
•Protection settings
•Length
•Time of creation
•Time of last modification
•Time of last access
•Reference count
•Storage device details
Operating Systems: A Modern
Perspective, Chapter 13
Marshalling the Byte Stream
• Must read at least one buffer ahead on
input
• Must write at least one buffer behind on
output
• Seek  flushing the current buffer and
finding the correct one to load into
memory
• Inserting/deleting bytes in the interior of
the stream
Operating Systems: A Modern
Perspective, Chapter 13
Full Block Buffering
• Storage devices use block I/O
• Files place an explicit order on the bytes
• Therefore, it is possible to predict what is likely to
be read after bytei
• When file is opened, manager reads as many
blocks ahead as feasible
• After a block is logically written, it is queued for
writing behind, whenever the disk is available
• Buffer pool – usually variably sized, depending on
virtual memory needs
– Interaction with the device manager and memory
manager
Operating Systems: A Modern
Perspective, Chapter 13
File-Descriptor Table
File-descriptor
table
0
1
2
3
File descriptor
.
ref
count
.
.
User
address space
n–1
Kernel address space
access
mode
file
inode
location pointer
Allocation of File Descriptors
• Whenever a process requests a new file descriptor,
the lowest numbered file descriptor not already
associated with an open file is selected; thus
#include <fcntl.h>
#include <unistd.h>
close(0);
fd = open("file", O_RDONLY);
– will always associate file with file descriptor 0
(assuming that the open succeeds)
Redirecting Output … Twice
if (fork() == 0) {
/* set up file descriptors 1 and 2 in the
close(1);
close(2);
if (open("/home/twd/Output", O_WRONLY) ==
exit(1);
}
if (open("/home/twd/Output", O_WRONLY) ==
exit(1);
}
execl("/home/twd/bin/program", "program",
exit(1);
}
/* parent continues here */
child process */
-1) {
-1) {
0);
Redirected Output
File-descriptor
table
File descriptor 1
1
WRONLY
0
inode
pointer
1
WRONLY
0
inode
pointer
File descriptor 2
User
address space
Kernel address space
Redirected Output After Write
File-descriptor
table
File descriptor 1
1
WRONLY
100
inode
pointer
1
WRONLY
0
inode
pointer
File descriptor 2
User
address space
Kernel address space
Sharing Context Information
if (fork() == 0) {
/* set up file descriptors 1 and 2 in the child process */
close(1);
close(2);
if (open("/home/twd/Output", O_WRONLY) == -1) {
exit(1);
}
dup(1); /* set up file descriptor 2 as a duplicate of 1 */
execl("/home/twd/bin/program", "program", 0);
exit(1);
}
/* parent continues here */
Redirected Output After Dup
File-descriptor
table
File descriptor 1
2
File descriptor 2
User
address space
Kernel address space
WRONLY
100
inode
pointer
Fork and File Descriptors
int logfile = open("log", O_WRONLY);
if (fork() == 0) {
/* child process computes something, then does: */
write(logfile, LogEntry, strlen(LogEntry));
…
exit(0);
}
/* parent process computes something, then does: */
write(logfile, LogEntry, strlen(LogEntry));
…
File Descriptors After Fork
logfile
Parent’s
address space
2
logfile
Child’s
address space
Kernel address space
WRONLY
0
inode
pointer
Naming
• (almost) everything has a path name
– files
– directories
– devices (known as special files)
•
•
•
•
keyboards
displays
disks
etc.
Uniformity
int file = open("/home/twd/data", O_RDWR);
// opening a normal file
int device = open("/dev/tty", O_RDWR);
// opening a device (one’s terminal
// or window)
int bytes = read(file, buffer, sizeof(buffer));
write(device, buffer, bytes);