CENG334 Introduction to Operating Systems

Download Report

Transcript CENG334 Introduction to Operating Systems

CENG334
Introduction to Operating Systems
Filesystems
Topics:
•Erol Sahin
•Dept of Computer Eng.
•Middle East Technical University
•Ankara, TURKEY
URL: http://kovan.ceng.metu.edu.tr/ceng334
© 2006 Matt Welsh – Harvard University
1
Virtual File Systems




VFS: another layer of
abstraction
Upper interface: for
processes implementing
POSIX interface
Lower interface: for concrete
file systems
VFS translates the POSIX
calls to the calls of the
filesystems under it.
Developed by Sun to support
© 2006Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
2
VFS- 2



At boot time, the root filesystem is registered with VFS.
When other filesystems are mounted, they must also register with
VFS.
When a filesystem registers, it provides the list of addresses of the
functions that the VFS demands, such as reading a block.

After regitratin, when one opens a file:

open(“/usr/include/unistd.h”, O_RDONLY)


VFS creates a v-node and makes a call to the concrete filesystem
to return all the information needed.
The created v-node also contains pointers to the table of functions
for the concrete filesystem that the file resides.
© 2006 Matt Welsh – Harvard University
3
Virtual File Systems (2)
Figure 4-19. A simplified view of the data structures and code
used by the VFS and concrete file system to do a read.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
4
Keeping track of Free Blocks -1



Linked list of disk blocks, with
each block filled with free disk
block numbers.
With 1KB block size, and 32
bit disk block number, each
block can hold the numbers
for 255 free blocks. The 256th
slot points to the next block
holding free disk blocks.
Example:




500 GB disk has approx. 488
million blocks.
If the disk is empty: storing the
free blocks requires 488
million/255 = 1.9 million blocks.
If the disk is nearly full, then the
linked list requires small.
Storage is essential free.. Free
blocks are used.
© 2006 Matt Welsh – Harvard University
5
Cont’d


The free list method can be enhanced..
Instead of keeping track of each free block, we can keep track of
consecutive free blocks represented by:



The address of the first free block
The count of free blocks
In the free list method, only one block of pointers need to be stored
in the memory. When it runs out, another one can be brought into
the memory.
© 2006 Matt Welsh – Harvard University
6
Keeping track of Free Blocks -2

One bit for keeping the state of each block.



1 : free blocks
0 : allocated blocks
500 GB disk, 1KB blocks


488 million bits = 60000 1KB blocks
For an empty disk:
 Less space than linked list method, since we use
1-bit instead of 32-bits for each free block
© 2006 Matt Welsh – Harvard University
7
Disk Quotas
Figure 4-24. Quotas are kept track of on a per-user basis
in a quota table.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
8
File System Backups (1)
Backups are generally made to handle one of two
potential problems:
•
•
Recover from disaster.
Recover from stupidity.
•Thrash bins
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
9
Backup what?
•
In order to have storage efficiency, one can
choose not to backup:
•Executable files (since they can usually be restored
from manufacturer CD-ROMs
•Temporary files
• /tmp
•Special files (which correspond to I/O devices)
• /dev
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
10
Back-up issues

Full backup



Incremental backup





Reduced storage
Less secure: a single bad byte can screw the whole backup
Backing up and active filesystem is tricky


Backup only the files that were changed after the most recent backup
Typically, a weekly backup is followed by daily incremental backups
Smaller backup size, and faster
Compressed backups


backup all the filesystem providing a snapshot of the system at that point which can be fully
restored
Typically done weekly or monthly
Since during backup, files and directories are being modified
Makes the system less secure


Each backup tape/disk needs to be as safe as the serve itself..
It doesn’t matter if your backup tapes are lying around, even if you have the most secure
computer system..
© 2006 Matt Welsh – Harvard University
11
Dumping strategies – physical dump

Algorithm


Pros



Starts at block 0 of the disk and copies all the data onto a tape/disk
Simple to implement in a bug-free way
Fast
Cons


Backups the free blocks as well
 Backuping a free disk takes as much storage/time as a full disk
Dumping of “bad blocks” is a concern
 Typically disk controllers provide bad block replacement transparently without the
OS even knowing about it
 If a block goes bad after formatting, then the OS typically creates a “file”
consisting of al the bad blocks.
© 2006 Matt Welsh – Harvard University
12
Dumping strategies – logical dump

Algorithm


Full backup: Traverses the
filesystem as a tree and
creates the same filesystem
structure on the backup
disk/tape.
Partial backup: all the
directories on the path to the
particular file needs to be
saved. For instance, backing
up file 9 requires the saving of
directories, 1,5,6, and 7.
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
© 2006 Matt Welsh – Harvard University
13
Dumping strategies – incremental logical dump


Bitmaps, indexed by i-node
number are used.
Phase 1: Examine all files
and directories below the
starting directory (root in
this case), and


mark the modified files.
mark ALL THE DIRECTORIES
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
© 2006 Matt Welsh – Harvard University
14
Dumping strategies – incremental logical dump

Phase 2: Recursively walk
the tree again, and


UNMARK all the directories
that do not have any modified
files under them.
Note:


10 and 11 are unmarked
5 and 6 remain marked
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
© 2006 Matt Welsh – Harvard University
15
Dumping strategies – incremental logical dump

Phase 3: Scan the i-nodes
in numerical order and

Dump all the marked
directories
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
© 2006 Matt Welsh – Harvard University
16
Dumping strategies – incremental logical dump

Phase 4: dump the marked
files.
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
© 2006 Matt Welsh – Harvard University
17
Dumping strategies – incremental logical dump

Restoring:


Restore all the directories that
were backupped
Restore all the files
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
© 2006 Matt Welsh – Harvard University
18
Issues

Links:


Holes:





If a file is linked to more than one directory, only one copy should be saved.
In UNIX, some file, such as core files may contain holes.
These files, write a few bytes, and then seek to a distant file offset and write some
more bytes.
These empty blocks that are not written should not be dumped and stored.
Cores typically have may megabytes of empty blocks.
Special files

Such as named pipes (which can appear anywhere in the filesystem) should not be
dumped.
© 2006 Matt Welsh – Harvard University
19
Dumping strategies – logical dump

Pros


Allows to restore a single file. If
directories that lie on the path
to the file-to-be-restored were
deleted, then they would also
be restored.
Cons

Slow and complicated.
© 2006 Matt Welsh – Harvard University
20
File system consistency

Consistency check is typically done after a crash..



Redundant information in the filesystem is used:



UNIX: fsck
Windoze: scandisk
Check the blocks
Check the files
Two tables, each containing a counter initialized to 0


Blocks in use: How many times a block is present in a file
 Read all the i-nodes using a raw device (not through the filesystem calls)
 For each block that is referenced in the inode structure, increment the
corresponding block use counter by one.
Free blocks:
 Examine the free block list of free block bitmap structure
 Each appearance of a free block increments the counter by one
© 2006 Matt Welsh – Harvard University
21
File System Consistency
Figure 4-27. File system states. (a) Consistent. (b) Missing block.
(c) Duplicate block in free list. (d) Duplicate data block.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
22
Missing block
•
•
Harmless but wastes space
Action: Add the missing block to the free list.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
23
Duplicate block in free list
•
•
Can only occur in linked list representation. Bitmap
representation does not have this problem.
Action: Rebuild the free list.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
24
Duplicate data block
•
•
The worst thing that can happen! A data block appears in two different
files..
Action:
•Allocate a free block
•Copy the contents into the new block.
•Change the links such that each copy appears once in each file.
•For sure, the contents of one of the files is garbled.
•The filesystem is made to be consistent.
•The use is informed.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
25
Consistency check for directories

Uses a table of counters per file (rather than per block)

Starts from the root and traverses the tree




For each i-node, it increments the corresponding counter for that file
 Remember due to hard links, a file can appear more than once
It then checks the link counts stored in the i-nodes to these values.
If the link count > counter
 Even if the file is deleted by from all the directory entries, it will continue to exist.
 Solution: correct the link count
If the counter > link count
 Although the file is linked from, say, two directories, removal from one would
cause the i-node deleted leaving the other one invalid.
 Solution: correct the link count
© 2006 Matt Welsh – Harvard University
26
Caching


Reading a 32-bit word from memory takes 10 nsec.
Hard disks can transfer data at: 100MB/sec, that is 40nsec per 32bit words..

PLUS 5-10 msecs of seek time!

Caching aims to fill in the gap..

Often thousands of blocks are kept in cache.
© 2006 Matt Welsh – Harvard University
27
Caching (1)
Hash the device and the disk address and look up the result in
a hash table.
All the blocks with the same cache value are chained together
through a linked list.
In addition to this, a bidirectional link runs through all the
blocks implementing a LRU list.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
28
Caching (2)
•
•
•
•
Some blocks, such as i-node blocks, are rarely referenced two times
within a short interval.
Consider a modified LRU scheme, taking two factors into account:
•Is the block likely to be needed again soon?
•Is the block essential to the consistency of the file system?
For both questions, blocks can be divided into categories
•i-node blokcs
•Indirect blocks
•Directory blocks
•Data blocks
• Full
• Partially full
Blocks that are essential for filesystem consistency should be written to disk
immediately – write-through-caches
•UNIX: synch (every 30 seconds)
•Windows: In the past: none, recent ones: FlushFileBuffers
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
29
Block read ahead

The system tries to get blocks into the cache, before they are
accessed to increase the hit rate.


Sequential access: performance improvement
Random access: performance degradation
© 2006 Matt Welsh – Harvard University
30
Reducing Disk Arm Motion
Figure 4-29. (a) I-nodes placed at the start of the disk.
(b) Disk divided into cylinder groups, each with its own blocks
and i-nodes.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
31
The ISO 9660 File System
Figure 4-30. The ISO 9660 directory entry.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
32
Rock Ridge Extensions
Rock Ridge extension fields:
•
•
•
•
•
•
•
•
PX - POSIX attributes.
PN - Major and minor device numbers.
SL - Symbolic link.
NM - Alternative name.
CL - Child location.
PL - Parent location.
RE - Relocation.
TF - Time stamps.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
33
Joliet Extensions
Joliet extension fields:
•
•
•
•
Long file names.
Unicode character set.
Directory nesting deeper than eight levels.
Directory names with extensions
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
34
The MS-DOS File System (1)
Figure 4-31. The MS-DOS directory entry.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
35
The MS-DOS File System (2)
Figure 4-32. Maximum partition size for different block sizes. The
empty boxes represent forbidden combinations.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
36
The UNIX V7 File System (1)
Figure 4-33. A UNIX V7 directory entry.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
37
The UNIX V7 File System (2)
Figure 4-34. A UNIX i-node.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
38
The UNIX V7 File System (3)
Figure 4-35. The steps in looking up /usr/ast/mbox.
© 2006 Matt Welsh – Harvard
University
Tanenbaum,
Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
39