CENG334 Introduction to Operating Systems

Transcript CENG334 Introduction to Operating Systems

CENG334
Introduction to Operating Systems
Filesystems – Case studies
Topics:
FAT
UNIX V7
Erol Sahin
Dept of Computer Eng.
Middle East Technical University
Ankara, TURKEY
1
FAT (MS-DOS) Filesystems
MS-DOS, Windows 98, Windows ME
Still supported in Windows NT, XP, Vista
Its use has been shifted towards embedded devices such as



Digital cameras
MP3 playes
iPod (the default filesystem)
2
Directories
Filenames are limited to 8+3 characters.. Smaller ones are left justified and padded with
space.
Attributes: read-only, archive, hidden, system
Time represented with 2 bytes: correct upto +-2 second
Date: Counts in three fields: day (5 bit), month (4 bits), year (7 bits).

Contains: Y2108 problem
File size: 2 bytes. Theoretically 4GB limit, but the limit is 2GB due to other reasons.
10 bits reserved for future use.
3
File Allocation Table
MS-DOS keeps track of files through
FAT table hold in memory.
First block number (2 bytes) used as
the index to the FAT table which
has 64K entries.
Block size can be set as multiple of
512 byes.
Three versions of FAT depending on
the number of bits a disk address
contains:



FAT-12
FAT-16
FAT-32 (actually should be called as
FAT-28)
FAT is also used for keeping track of
free blocks.
4
FAT-12
Block size: 512 bytes
(2^12-10) X 512 bytes =~ 2MB

10 disk addresses are used as
special markers.
FAT table size: 4096 entries
with 2 bytes each.
Worked well for floppy disks.
The limit was extended using
larger block sizes: 1KB, 2KB,
4KB providing support for
partitions 16MB.
Limited for hard disks.
5
The MS-DOS File System (2)
Figure 4-32. Maximum partition size for different block sizes. The
empty boxes represent forbidden combinations.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
6
The UNIX V7 File System (1)
A UNIX V7 directory entry.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
7
The UNIX V7 File System (2)
A UNIX i-node.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
8
The UNIX V7 File System (3)
Figure 4-35. The steps in looking up /usr/ast/mbox.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
9
10
11
12
13
14
The ISO 9660 File System
The ISO 9660 directory entry.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
15
Rock Ridge Extensions
Rock Ridge extension fields:
•
•
•
•
•
•
•
•
PX - POSIX attributes.
PN - Major and minor device numbers.
SL - Symbolic link.
NM - Alternative name.
CL - Child location.
PL - Parent location.
RE - Relocation.
TF - Time stamps.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
16
Joliet Extensions
Joliet extension fields:
•
•
•
•
Long file names.
Unicode character set.
Directory nesting deeper than eight levels.
Directory names with extensions
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
17
File System Mounting
A file system must be mounted before it can be accessed
A unmounted file system is mounted at a mount point
18
Virtual File Systems
VFS: another layer of
abstraction
Upper interface: for processes
implementing POSIX
interface
Lower interface: for concrete
file systems
VFS translates the POSIX calls
to the calls of the filesystems
under it.
19
VFS- 2
At boot time, the root filesystem is registered with VFS.
When other filesystems are mounted, they must also register with VFS.
When a filesystem registers, it provides the list of addresses of the functions that
the VFS demands, such as reading a block.
After registration, when one opens a file:
open(“/usr/include/unistd.h”, O_RDONLY)
VFS creates a v-node and makes a call to the concrete filesystem to return all the
information needed.
The created v-node also contains pointers to the table of functions for the concrete
filesystem that the file resides.
20
Virtual File Systems (2)
A simplified view of the data structures and code used by the VFS and concrete
file system to do a read.
21
CENG334
Introduction to Operating Systems
Filesystem Corruption and
Backups
Erol Sahin
Dept of Computer Eng.
Middle East Technical University
Ankara, TURKEY
22
Filesystem corruption
What happens when you are making changes to a filesystem and the
system crashes?



Example: Modifying block 5 of a large directory, adding lots of new file entries
System crashes while the block is being written
The new files are “lost!”
System runs fsck program on reboot


Scans through the entire filesystem and locates corrupted inodes and directories
Can typically find the bad directory, but may not be able to repair it!
 The directory could have been left in any state during the write
fsck can take a very long time on large filesystems

And, no guarantees that it fixes the problems anyway
23
Example:
Example: removing a file requires
 Remove the file from its directory
 Release the i-node to the pool of free i-nodes
 Return all the disk blocks to the pool of free disk blocks
In the absence of crashes the order these steps taken do not matter.
In the presence of crashes, however, it does!
24
Example:
Example: removing a file requires
 Remove the file from its directory
 Release the i-node to the pool of free i-nodes
 Return all the disk blocks to the pool of free disk blocks
The inodes and file blocks will not be accessible from any file yet they
will not be available for reassignment.
25
Example:
Example: removing a file requires
 Remove the file from its directory
 Release the i-node to the pool of free i-nodes
 Return all the disk blocks to the pool of free disk blocks
The directory node will point to an invalid inode or (if the inode is
reassigned) point to a different file.
The blocks of the file will not be available for reassignment.
26
Example:
Example: removing a file requires
 Remove the file from its directory
 Release the i-node to the pool of free i-nodes
 Return all the disk blocks to the pool of free disk blocks
The file will point to empty blocks, or (after reassignment) it will share
the blocks of other files to which these were reassigned..
27
File system consistency
Consistency check is typically done after a crash..


UNIX: fsck
Windoze: scandisk
Redundant information in the filesystem is used:


Check the blocks
Check the files
Two tables, each containing a counter initialized to 0


Blocks in use: How many times a block is present in a file
 Read all the i-nodes using a raw device (not through the filesystem calls)
 For each block that is referenced in the inode structure, increment the
corresponding block use counter by one.
Free blocks:
 Examine the free block list of free block bitmap structure
 Each appearance of a free block increments the counter by one
28
File System Consistency
File system states. (a) Consistent. (b) Missing block. (c) Duplicate
block in free list. (d) Duplicate data block.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
29
Missing block
•
•
Harmless but wastes space
Action: Add the missing block to the free list.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
30
Duplicate block in free list
•
•
Can only occur in linked list representation. Bitmap
representation does not have this problem.
Action: Rebuild the free list.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
31
Duplicate data block
•
•
The worst thing that can happen! A data block appears in two different
files..
Action:
•Allocate a free block
•Copy the contents into the new block.
•Change the links such that each copy appears once in each file.
•For sure, the contents of one of the files is garbled.
•The filesystem is made to be consistent.
•The user is informed.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
32
Consistency check for directories
Uses a table of counters per file (rather than per block)
Starts from the root and traverses the tree




For each i-node, it increments the corresponding counter for that file
 Remember due to hard links, a file can appear more than once
It then checks the link counts stored in the i-nodes to these values.
If the link count > counter
 Even if the file is deleted by from all the directory entries, it will continue to exist.
 Solution: correct the link count
If the counter > link count
 Although the file is linked from, say, two directories, removal from one would
cause the i-node deleted leaving the other one invalid.
 Solution: correct the link count
33
Journaling Filesystems
Ensure that changes to the filesystem are made atomically



That is, a group of changes are made all together, or not at all
In the directory modification example, this means that after the system reboots:
 The directory either looks exactly as it did before the block was modified
 Or the directory looks exactly as it did after the block was modified
Cannot leave an FS entity (data block, inode, directory, etc.) in an intermediate state!
Idea: Maintain a log of all changes to the filesystem


Log contains entries that indicate what was done
e.g., “Directory 2841 had inodes 404, 407, and 408 added to it”
To make a filesystem change:



1. Write an intent-to-commit record to the log
2. Write the appropriate changes to the log
 Do not modify the filesystem data directly!!!
3. Write a commit record to the log
This is essentially the same as the notion of database transactions
34
Journaling FS Recovery
What happens when the system crashes?


Filesystem data has not actually been modified, just the log!
So, the FS itself reflects only what happened before the crash
Periodically synchronize the log with the filesystem data


Called a checkpoint
Ensures that the FS data reflects all of the changes in the log
No need to scan the entire filesystem after a crash...

Only need to look at the log entries since the last checkpoint!
For each log entry, see if the commit record is there

If not, consider the changes incomplete, and don't try to make them
35
Journaling FS Example
File 1
File 2
Checkpoint
Log
36
Journaling FS Example
File 1
File 2
Checkpoint
Log
37
Journaling FS Example
File 1
File 2
Checkpoint
Log
Filesystem reflects changes up to last checkpoint
Fsck scans changelog from last checkpoint forward
Doesn't find a commit record ... changes are simply ignored
38
File System Backups (1)
Backups are generally made to handle one of two
potential problems:
•
•
Recover from disaster.
Recover from stupidity.
•Thrash bins
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
39
Backup what?
•
In order to have storage efficiency, one can
choose not to backup:
•Executable files (since they can usually be restored
from manufacturer CD-ROMs
•Temporary files
• /tmp
•Special files (which correspond to I/O devices)
• /dev
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
40
Back-up issues
Full backup


backup all the filesystem providing a snapshot of the system at that point which can be fully
restored
Typically done weekly or monthly
Incremental backup



Backup only the files that were changed after the most recent backup
Typically, a weekly backup is followed by daily incremental backups
Smaller backup size, and faster
Compressed backups


Reduced storage
Less secure: a single bad byte can screw the whole backup
Backing up and active filesystem is tricky

Since during backup, files and directories are being modified
Makes the system less secure


Each backup tape/disk needs to be as safe as the serve itself..
It doesn’t matter if your backup tapes are lying around, even if you have the most secure
computer system..
41
Dumping strategies – physical dump
Algorithm

Starts at block 0 of the disk and copies all the data onto a tape/disk
Pros


Simple to implement in a bug-free way
Fast
Cons


Backups the free blocks as well
 Backuping a free disk takes as much storage/time as a full disk
Dumping of “bad blocks” is a concern
 Typically disk controllers provide bad block replacement transparently without the
OS even knowing about it
 If a block goes bad after formatting, then the OS typically creates a “file”
consisting of all the bad blocks.
42
Dumping strategies – logical dump
Pros

Allows to restore a single file. If
directories that lie on the path
to the file-to-be-restored were
deleted, then they would also
be restored.
Cons

Slow and complicated.
43
Dumping strategies – logical dump
Algorithm


Full backup: Traverses the
filesystem as a tree and
creates the same filesystem
structure on the backup
disk/tape.
Partial backup: all the
directories on the path to the
particular file needs to be
saved. For instance, backing
up file 9 requires the saving of
directories, 1,5,6, and 7.
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
44
Dumping strategies – incremental logical dump
Bitmaps, indexed by i-node
number are used.
Phase 1: Examine all files
and directories below the
starting directory (root in
this case), and


mark the modified files.
mark ALL THE DIRECTORIES
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
45
Dumping strategies – incremental logical dump
Phase 2: Recursively walk
the tree again, and

UNMARK all the directories
that do not have any modified
files under them.
Note:


10 and 11 are unmarked
5 and 6 remain marked
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
46
Dumping strategies – incremental logical dump
Phase 3: Scan the i-nodes
in numerical order and

Dump all the marked
directories
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
47
Dumping strategies – incremental logical dump
Phase 4: dump the marked
files.
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
48
Dumping strategies – incremental logical dump
Restoring:


Restore all the directories that
were backupped
Restore all the files
Squares are directories, circles are files.
Shaded items have been modified since
last dump. Each directory and file is
labeled by its i-node number.
49
Issues
Links:

If a file is linked to more than one directory, only one copy should be saved.
Holes:




In UNIX, some file, such as core files may contain holes.
These files, write a few bytes, and then seek to a distant file offset and write some
more bytes.
These empty blocks that are not written should not be dumped and stored.
Cores typically have may megabytes of empty blocks.
Special files

Such as named pipes (which can appear anywhere in the filesystem) should not be
dumped.
50
CENG334
Introduction to Operating Systems
Filesystem Caching
Topics:
Disks
Erol Sahin
Dept of Computer Eng.
Middle East Technical University
Ankara, TURKEY
51
File System Caching
Most filesystems cache significant amounts of disk in memory


e.g., Linux tries to use all “free” physical memory as a giant cache
Avoids huge overhead for going to disk for every I/O
Issues:


When do you commit a write to the disk?
 What happens if you write only to the memory cache and then the system crashes?
How do you keep the memory and disk views of a file consistent?
 What if the file metadata (inodes, etc.) is modified before the data blocks?
Read-ahead


Read a few extra blocks into memory when you do one read operation
 Amortize the cost of the seek
Useful if the blocks of a file are laid out in contiguous blocks
 Take advantage of sequential access patterns on the file
52
Caching
Reading a 32-bit word from memory takes 10 nsec.
Hard disks can transfer data at: 100MB/sec, that is 40nsec per 32-bit
words..

PLUS 5-10 msecs of seek time!
Caching aims to fill in the gap..
Often thousands of blocks are kept in cache.
53
Caching (1)
Hash the device and the disk address and look up the result in
a hash table.
All the blocks with the same cache value are chained together
through a linked list.
In addition to this, a bidirectional link runs through all the
blocks implementing a LRU list.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
54
Caching (2)
•
•
•
•
Some blocks, such as i-node blocks, are rarely referenced two times
within a short interval.
Consider a modified LRU scheme, taking two factors into account:
•Is the block likely to be needed again soon?
•Is the block essential to the consistency of the file system?
For both questions, blocks can be divided into categories
•i-node blocks
•Indirect blocks
•Directory blocks
•Data blocks
• Full
• Partially full
Blocks that are essential for filesystem consistency should be written to disk
immediately – write-through-caches
•UNIX: synch (every 30 seconds)
•Windows: In the past: none, recent ones: FlushFileBuffers
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
55
Block read ahead
The system tries to get blocks into the cache, before they are
accessed to increase the hit rate.


Sequential access: performance improvement
Random access: performance degradation
56
Reducing Disk Arm Motion
Figure 4-29. (a) I-nodes placed at the start of the disk.
(b) Disk divided into cylinder groups, each with its own blocks
and i-nodes.
Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
57