A Fast File System for Unix

Download Report

Transcript A Fast File System for Unix

A Fast File System for Unix
Marshall K. McKusick ,
William N. Joy,
Samuel J. Leffler ,
Robert S. Fabry
Computer Science Research Group,
University of California , Berkeley.
1984
Presented By:Aravind Subhash
Basic Unix File Abstraction and
System Calls
•
Unix abstraction
– Everything to appear as if it was sequential stream of bytes.
•
Set of functions that provides this abstraction :– Open : When a process calls open , try to locate the file the process identifies and
return a file descriptor.
– Close : release file descriptor from use and make file inaccessible until another
open call is made.
– Read:allow to read n bytes out of a file and into some space in a process’s
memory.Read call starts where the previous read left off, Seek Pointer
– Write : allow to write n bytes from a process’s memory in to a file.If write call
goes past end of file , file should be made larger to accommodate new data.
– Truncate : Need to be able to remove data from end of the file stream.
– Seek : Allow to move the seek pointer in a file , allowing to choose which part of
the byte stream we read and write to.
2
Old File System
• Developed at Bell Labs .
– Each disk drive is divided into one or more
partitions.
– Each disk partition can contain one file
system.
– File system never spans multiple partitions
3
File System Layout on Disk
•
Picture from Modern Operating Systems -Andrew S. Tanenbaum
4
Old File System
• File System is described by its
– Super Block
•
•
•
•
Number of data blocks in the file system
A Count of the maximum number of files
A Pointer to the free list
A Linked list of all free blocks in the file system
• Within the file system
– Files
• Descriptor associated with every file called - Inode
5
Old File System
• Inode
– Information describing file ownership
– Time stamps marking last modification / access times
– An Array of indices that point to the data blocks for the file.
6
Inode Data Structure
•
Given the I-Node it is possible
to find all the blocks of the
file.
•
In order to accomodate file
growth , last disk address of
inode is used to store address
of a block containing more
disk block addresses.
•
Advantage of Inode Scheme
over linked files using an inmemory table is that the
i-node need only be in
memory when the
corresponding file is open.
Picture from Modern Operating Systems 7
-Andrew S. Tanenbaum
Old File System
• An Inode may also contain
– References to indirect blocks containing further
data block indices.
• Singly Indirect Block
– contains 128 further block addresses
• Doubly Indirect Block
– contains 128 addresses of further singly indirect blocks
• Triply Indirect Block
– contains 128 addresses of further doubly indirect blocks
8
Inode Illustrated
•
Picture from Modern Operating Systems -Andrew S. Tanenbaum
9
Original Unix File System
• Original 512 byte UNIX file system was
incapable of providing the data throughput rates
– This was required by many applications
• VLSI design / Image processing
– small amount of processing on large quantities of data
• Programs that map files from the file system into large virtual
address spaces
• What was needed ?
– A file system providing higher bandwidth
10
Initial works at Berkeley
-on the Unix File System
• Aimed to improve reliability .How ?
– Factors to be considered
• Seek time
Overhead
• Rotational delay
• Transfer Rate
Throughput
– Need to increase Throughput , Minimise overhead . How?
11
picture source :http://www.csee.umbc.edu/~plusquel/611/slides/chap6_1.html
File System performance
• Improved by a factor of 2+
– Cause : Changing block size from 512 to
1024 Bytes
• Increase caused because of 2 factors :– Each disk transfer accessed twice as much data
– Most files could be described without need to
access indirect blocks as direct blocks contained
twice as much data.
» Fragmented free spaces at the end of each
file so new blocks can be inserted next to old
blocks.
» To aid in updates to files.
12
Great System but what is still
holding it back.
• Although throughput had doubled ,
– Still used only about 4 % of disk bandwidth.
• Although initially free list was ordered for optimal access
– Got quickly scrambled as files were created and removed.
• Over time free list became entirely random
– caused files to have their blocks allocated randomly over the
disk.
• This forced a seek before every block access.
• Transfer rates deteriorated because of randomisation of data block
placement.
13
New File System
– Each disk drive contains one or more file systems.
– Each file system is described by its super block.
– Each file system block has minimum size of 4k bytes
• to create files as large as 232 bytes with only 2 levels of
indirection
– The new file system divides a disk partition into one
or more areas called cylinder groups.
14
http://www.seas.ucla.edu/classes/mkampe/cs111.sq05/docs/bsd.html
New File System
•
•
Each cylinder group has
bookkeeping information
– redundant copy of super
block
– space for inodes
– bit map describing
available blocks in the
cylinder group
– summary info. of usage of
data blocks within
cylinder group.
Redundant information spirals
down into the pack so that
data on any single track ,
cylinder , or platter can be lost
without losing all copies of the
super block.
15
Fast File System Optimises Reads
• Data laid out such that larger blocks can be transferred in
a single disk transaction
– Greatly increasing file system throughput
• By increasing block size , disk accesses in the new file
system could transfer upto four times as much
information per disk transaction.
16
Optimising Storage Allocation
• As block size on the disk increases , waste rises quickly.
17
Optimising Storage Allocation
-New File System
• Divide single file system block into fragments.
» {2,4,8 addressable fragments }.
• Block map is associated with each cylinder group
– records the space available in a cylinder group at the fragment level.
•Each bit in the map records status of fragment.
X fragment in use
O fragment available
18
File System Parameterisation
• Next goal of the new file system
– To parameterize the processor capabilities and mass
storage characteristics
•  Blocks can be allocated in an optimum
configuration dependent way.
• Parameters used
– Speed of the processor
– Hardware support for mass storage transfers
– Characteristics of the mass storage devices.
19
Global Layout Policies
• Use file system wide summary information
• Responsible for deciding the placement of new directories and files.
• Calculate rotationally optimal block layouts.
– Decide when to force long seek to new cylinder group if there are
insufficient blocks left in the current cylinder group .
• Improve performance by clustering related information.
20
Global Layout Policies Call Local
Allocation Routines
• Local Allocation Routines
– Use locally optimal scheme to layout data blocks.
• Methods to improve file system performance
– Increase locality of reference
• This minimises seek latency
– Improve layout of data
• To make larger transfers possible
21
Layout Policies
• The global layout policy tries to place all the inodes of
files in a directory in the same cylinder group.
• Allocation of Inodes within a cylinder group is done using the next
free strategy.
• Spills when using datablocks are handled by redirecting
block allocation to a different cylinder group.
New File System
•All inodes within in a
particular cylinder group can
be read with 8-16 disk
transfers.
Old File System
•Requires one disk transfer
to fetch the inode for each
file in a directory
22
Four level allocation strategy used by
local allocator
Use next available block rotationally closest to the requested block
If there are no blocks available on the same cylinder , use a block
within the same cylinder group
If that cylinder group is entirely full , quadratically hash
the cylinder group number to choose another cylinder
group to look for a free block
If hash fails , apply exhaustive search to all
cylinder groups
Quadratic hash is used : Fast in finding unused slots in nearly full
hash tables .
23
Performance Results
• Inode layout policy is effective
– Large directory having many directories within.
– # disk accesses for inodes cut by a factor of 2.
– Large directory containing only files.
– # disk accesses for inodes cut by a factor of 8.
24
Throughput Analysis
The slower write rates (when compared to the read rates) in the 4096 types occur because the
kernel has to do twice as many disk allocations per second , making the processor unable to
keep up with the disk transfer rate.
25
Throughput Analysis Results
•
•
•
•
Percentage(%) of bandwidth is a measure of the effective utilisation of
the disk by the file system.
Both reads and writes are faster in the new system .
Biggest contributing factor in this speedup is because of the larger block
size used by the new file system.
(New File System) uses 47% of the disk bandwidth
26
File System Functional
Enhancements
• Long file names
– (New)File names can be of arbitrary length.
• File Locking
– New File System
• Provides file locking. [ 2 Schemes : Hard locks , Advisory locks ]
– Old File System
• No provision for locking files
• Main drawbacks
– Processes consumed CPU time by looping over attempts to create
locks
– Locks left lying around because of system crashes had to be
manually removed
– Process running as sys.admin. were forced to use a different
mechanism.
27
File System Functional
Enhancements
•
Symbolic links
– Is implemented as a file that contains a pathname.
– When system encounters a symbolic link while interpreting a component of a
pathname , the contents of the symbolic link is prepended to the rest of the
pathname,and this name is interpreted to yield the resulting pathname.
– Allows references across physical file systems and supports inter-machine linkage.
28
picture source : http://www.seas.ucla.edu/classes/mkampe/cs111.sq05/docs/bsd.html
File System Functional
Enhancements
• Rename
– (New)Create new version of temporary file and rename temporary file.
– (Old ) Required three calls to the system
• If programs were interrupted or the system crashed between these calls ,
target file could be left with only its temporary name.
• Quotas
– (Old)Any single user can allocate all available space in the file system
Shared User Systems this might not be acceptable
– (New)Quota mechanism sets limit on both number of inodes and the
number of disk blocks that a user may allocate.
29
Current Trends & Conclusion
• Original File System  Fast File System
[FreeBSD] [NetBSD] [OpenBSD] [NeXTStep] [Solaris] [Ext2]
[Linux Native]
30
Source :http://www.tldp.org/FAQ/Linux-FAQ/partitions.html
Extra Slides
31
Linux Kernel File System
•
•
•
Linux kernel contains a virtual
file system layer which is used
during system calls acting on files.
The VFS is an indirection layer
which handles the file oriented
system calls and calls the
necessary functions in the physical
filesystem code to do the I/O.
This indirection mechanism is
frequently used in the Unix like
operating systems to ease the
integration and the use of several
filesystem types.
Design and Implementation of the Second Extended
Filesystem ,Rémy Card, Theodore Ts'o, Stephen
Tweedie,
32
Linux Kernel File System
•
•
•
When a process issues a file
oriented system call , the kernel
calls a function contained in the
VFS.
This function(1) handles the
structure independent
manipulations and redirects the
call to a function(2) contained in
the physical filesystem code
This function(2) is responsible for
handling the structure dependent
operations.File system code uses
the buffer cache functions to
request I/O on devices.
Design and Implementation of the Second Extended
Filesystem ,Rémy Card, Theodore Ts'o, Stephen
Tweedie,
33