Transcript file

Operating Systems
CST 352
File Systems
4/2/2016
CST 352 - Operating Systems
1
Topics
Introduction
File System Considerations
Naming
Structure
Types
Access
Operations
4/2/2016
CST 352 - Operating Systems
2
Topics
Directories
Single Level
Hierarchical
Operations
Implementation
Layout
Allocation
Directories
Free Space Management
4/2/2016
CST 352 - Operating Systems
3
Introduction
All computer systems need to retrieve and
store information.
A machine must be capable of being
powered down without losing important
information.
Information must also remain viable outside
the reference of a creating/consuming
process.
4/2/2016
CST 352 - Operating Systems
4
Introduction
Persistence Requirements
Information must transcend process
boundaries.
Information must transcend power cycles
of a machine.
Multiple processes/threads must be able to
have simultaneous access to information.
Stored information must be capable of
growing to large quantities.
4/2/2016
CST 352 - Operating Systems
5
Introduction
To deal with the aforementioned
requirements, the most common
solution is to use a magnetic disk.
On the magnetic disk, information needs
to be organized in groups, known as
files.
4/2/2016
CST 352 - Operating Systems
6
Introduction
A File is an abstract way to represent
information stored on some persistent
media, such as a magnetic disk.
Using a file, information can be created
by a thread in a process and written to a
disk, then later read by a thread in a
separate process.
4/2/2016
CST 352 - Operating Systems
7
File System Considerations
File Naming
To deal with files on a magnetic disk, the
process using the file must have some way
to refer to the chunk of disk storage.
File names must conform to standards set by
the operating system.
A typical naming scheme deals with two
parts – a name and an extension.
The extension can be used by the OS to
associate files with application programs.
4/2/2016
CST 352 - Operating Systems
8
File System Considerations
File Naming – Possible components.
protocol (or scheme) — access method (e.g., http, ftp, file etc.)
host (or network-ID) — host name, IP address, domain name, or LAN
network name (e.g., wikipedia.org, 207.142.131.206,
\\MYCOMPUTER, SYS:, etc.)
device (or node) — port, socket, drive, root mountpoint, disc, volume
(e.g., C:, /, SYSLIB, etc.)
directory (or path) — directory tree (e.g., /usr/bin, \TEMP,
[USR.LIB.SRC], etc.)
file — base name of the file
type (format or extension) — indicates the content type of the file (e.g.,
.txt, .exe, .COM, etc.)
version — revision number of the file
4/2/2016
CST 352 - Operating Systems
9
File System Considerations
File Structure
Structure as simply a sequence of 8 bit values.
Most common and most flexible.
Using process must interpret the file.
Structure as a sequence of fixed length records.
Allows each record to be structured for block read.
Structure fields in the file using a BTree type
structure.
Allows fast access and indexing.
4/2/2016
CST 352 - Operating Systems
10
File System Considerations
File Types
Regular Files: These files are created by
users and contain information relative to
those user processes.
Directories: System files used to manage
groupings of files.
Character Special Files: Used by the
system for spooling of serial I/O devices.
Block Special Files: Used by the system to
model disk I/O.
4/2/2016
CST 352 - Operating Systems
11
File System Considerations
File Types
Regular Files are of two types:
ASCII– The file can be dumped directly
to the screen. All characters are printable.
Binary – The file contains a sequence of
binary information, not necessarily
ASCII.
In UNIX, binary files start with a “magic
number”. The OS uses this to determine if the
file is a true executable file.
4/2/2016
CST 352 - Operating Systems
12
File System Considerations
File Access
Sequential Access – The bytes in a file
must be accessed in a serial fashion.
There can be no random seeks through
a file.
Random Access – The bytes in a file
can be accesses in any order. Access is
done based on a “key-value” pair.
4/2/2016
CST 352 - Operating Systems
13
File System Considerations
File Organization
Sequential Access
Pile
Variable length records
Variable set of fields
Chronological order
4/2/2016
Each record in the file contains a burst of data.
Data is appended to the file as it shows up for write.
Read access must be done sequentially.
Search for a particular item must be an exhaustive
search.
CST 352 - Operating Systems
14
File System Considerations
File Organization
Sequential Access
Pile
Length
Record 1
Record 2
Record 3
Record 4
Record 5
Record 6
etc...
4/2/2016
CST 352 - Operating Systems
15
File System Considerations
File Organization
Sequential Access
Sequential File
Fixed-length Records.
Fixed set of fields in fixed order.
Sequential order based on a “key” field.
4/2/2016
CST 352 - Operating Systems
16
File System Considerations
File Organization
Sequential Access
Sequential File
4/2/2016
Key
Data
Cksum
Key
Data
Cksum
Key
Data
Cksum
Key
Data
Cksum
Key
Data
Cksum
Key
Data
Cksum
CST 352 - Operating Systems
Record 1
Record 2
Record 3
Record 4
Record 5
Record 6
etc...
17
File System Considerations
File Organization
Random Access
Indexed Sequential File
Characteristics are the same as those of a
sequential file.
A “key” based index of access pointers (file
pointers) is maintained to give random access
points into file records.
4/2/2016
CST 352 - Operating Systems
18
File System Considerations
File Organization
Random Access
Indexed
Sequential File
Index Tree
4/2/2016
Key
Data
Cksum
Key
Data
Cksum
Key
Data
Cksum
Key
Data
Cksum
Key
Data
Cksum
Key
Data
Cksum
CST 352 - Operating Systems
Record 1
Record 2
Record 3
Record 4
Record 5
Record 6
etc...
19
File System Considerations
File Organization
Random Access
Indexed File
A tree based index is created to give direct
access to file records.
Multiple tree indexes may be deployed to
search on different key types.
Records can be variable length.
4/2/2016
CST 352 - Operating Systems
20
File System Considerations
File Organization
Random Access
Indexed File
4/2/2016
Primary
Index
Tree
CST 352 - Operating Systems
Secondary
Index Tree
21
File System Considerations
File Organization
Random Access
Hashed Index
Disk records are of variable length.
A hash table is deployed to map a key to the
actual disk address.
4/2/2016
CST 352 - Operating Systems
22
File System Considerations
File Organization
Random Access
Hashed Index
4/2/2016
Hash
Table
File Nodes
CST 352 - Operating Systems
23
File System Considerations
File Attributes
In the header of a file, special attributes are stored
for management of the file:
Permission – What process can and cannot
access the file.
Password – File access control.
Creator – What process or user created the file.
Owner – Who is the current owner of the file.
RW Flag – Readable?
4/2/2016
CST 352 - Operating Systems
24
File System Considerations
File Attributes
Hidden Flag – Can the file be seen by
directory listings?
System Flag – Is this a system file?
Archive Flag – Has this file been
archived?
Binary Flag – Is this a binary file?
Etc.
4/2/2016
CST 352 - Operating Systems
25
File System Considerations
File Operations
Create – Write a file entry point in the file
system.
Delete – Free up any disk space associated
with the file to be deleted.
Open – Get the file attributes and address
copied from the disk into main memory.
Initialize the file pointer.
Close – Remove the file attribute cache from
main memory.
4/2/2016
CST 352 - Operating Systems
26
File System Considerations
File Operations (cont’d)
Read – Read a sequence of bytes from the
file from the current file pointer position.
Write – Write a sequence of bytes to the file
based on the current file pointer position.
Append – Add a sequence of bytes to the
end of a file.
Seek – Move the file pointer to a new
position in the file.
4/2/2016
CST 352 - Operating Systems
27
File System Considerations
File Operations (cont’d)
Get Attributes – Fetch only the
attributes of a file.
Set Attributes – Set the attributes of a
file.
Rename – Assign a new logical name
to the file.
4/2/2016
CST 352 - Operating Systems
28
File System Considerations
Pile
File System Generic
Layered
Architecture
Indexed
Seq.
Seq.
Indexed
Hashed
Logical I/O Interface
Basic I/O Subsystem
Basic File System - Low Level Access
Device Drivers - Magnetic Disk, DVD, CDROM, etc
Physical Device
4/2/2016
CST 352 - Operating Systems
29
File System Considerations
File System Generic Layered Architecture
Physical Device – The actual hardware.
Device Drivers – Perform operations on the hardware
(e.g. start, stop, read, write, etc.)
Basic File System – Block interface, buffering, read
commands, write commands.
Basic I/O Subsystem – File I/O initiation and
termination. Management of control structures.
Logical I/O Interface – Present file I/O to the file
system as records of data.
4/2/2016
CST 352 - Operating Systems
30
Directories
Directories provide an abstract method to
create a “file of files”.
Creating a directory allows the ability to
store multiple files in a file system.
4/2/2016
CST 352 - Operating Systems
31
Directories
Single Level
There is one “file of files” in the system.
The directory can only contain files, not
directories.
4/2/2016
CST 352 - Operating Systems
32
Directories
Hierarchical
Allow any directory to contain directories
as one of the entries.
A directory that can contain files and other
directories.
4/2/2016
CST 352 - Operating Systems
33
Directories
Operations
Find – find a file or directory in a directory.
Create File – create a new file entry in this
directory.
Delete File – delete a file from this directory.
List – list a directory or file or all directories
and file contained in this directory.
4/2/2016
CST 352 - Operating Systems
34
Implementation
Allocation Strategies
To manage files, the free space on the disk
must be tracked.
Every time a file is created and written to,
the file manager must write to a block
(group of sectors) on disk.
In addition to handling free and used disk
space, the file system must keep track of
what blocks go with which files.
4/2/2016
CST 352 - Operating Systems
35
Implementation
Allocation Strategies
Contiguous – Keep a file as a contiguous
sequence of disk blocks.
Advantages:
Simple implementation
 Must only keep track of the starting block and
the number of blocks used for the file.
Read performance is ideal
 A file is located in a continuous sequence of
blocks, requiring minimum seek.
4/2/2016
CST 352 - Operating Systems
36
Implementation
Allocation Strategies
Contiguous (cont’d)
disadvantages:
High fragmentation
 Initially, new files are added to the end of free space.
 As files are freed, space will open up.
 The file system will then use this free space, most
likely for a file smaller than the one freed up.
To create a file, the file size must be known in advance.
Files that grow larger that their original size must be
relocated to a new contiguous area of free blocks on the
disk.
4/2/2016
CST 352 - Operating Systems
37
Implementation
Allocation Strategies
Linked List – Keep a linked list of free
disk sectors.
The start of each free block has the address to
the next free block.
When a file is created, the next free block
will be added to the file descriptor.
The file system just needs to keep track of the
first block address.
4/2/2016
CST 352 - Operating Systems
38
Implementation
Allocation Strategies
Linked List (cont’d)
Advantages
There will be no disk fragmentation
Management is simple
Disadvantages
File reading is limited to sequential access
Random access is non existent
4/2/2016
CST 352 - Operating Systems
39
Implementation
Free Space Management
Bitmap
Keep a bitmap where each bit corresponds to
a block on disk.
1 – allocated
0 – free
4/2/2016
CST 352 - Operating Systems
40
Implementation
Popular File Systems
•
•
•
•
•
•
•
FAT (File Allocation Table – Created by Bill
Gates)
NTFS (New Technology File System – Microsoft)
XFS (X File System – Silicon Graphics)
HFS+ (Hierarchical File System – Apple)
EXT 2/3/4….(Linux, Android, others???)
ZFS (Free BSD) – popular??? I don’t know
UFS (Unix File System) – not so popular any more
4/2/2016
CST 352 - Operating Systems
41
Implementation
Popular File Systems
File Allocation Table (FAT) – Keep the
file pointers in a table in memory (an
array implementation of a linked list).
A Cluster is a Group of Sectors on the Hard Drive
that have information in them.
A 16K Cluster has 32 Sectors in it (512*32=16384).
Each Cluster has an entry in the FAT Table.
FAT 16 – Limited to 216 (16 bit) entries (clusters).
A File name maps to an entry in the FAT table.
4/2/2016
CST 352 - Operating Systems
42
Implementation
Popular File Systems
File Allocation Table (FAT) – Entry
Structure:
FAT Code Range
0000h
0002h-FFEFh
FFF0h-FFF6h
FFF7h
FFF8h-FFFFh
4/2/2016
Meaning
Available Cluster
Used, Next Cluster in File
Reserved Cluster
BAD Cluster
Used, Last Cluster in File
CST 352 - Operating Systems
43
Implementation
Popular File Systems
File Allocation Table (FAT) – Keep the
file pointers in a table in memory (an
array implementation of a linked list).
Advantages
File pointer access is fast because it is in
memory.
Entire block is available for memory.
File chain may be followed without accessing the
disk.
4/2/2016
CST 352 - Operating Systems
44
Implementation
Popular File Systems
File Allocation Table (FAT)
disadvantages
The entire FAT must be in memory.
 20 Gbyte disk with a 1 Kb block
requiring 20 million entries. Each entry
is 4 bytes (FAT 32). The table will
therefore be 60 Mbytes of memory.
4/2/2016
CST 352 - Operating Systems
45
Implementation
Popular File Systems
NTFS
Physical disk space is divided into
clusters (like FAT).
MFT -12% of the partition is set aside
for the Master File Table.
The first 16 MFT files are special
“housekeeping” files for NTFS (called
metafiles).
4/2/2016
CST 352 - Operating Systems
46
Implementation
Popular File Systems
NTFS
MFT - the common table of files.
The centralized directory of all
remaining disk files and itself.
MFT is divided into records of the fixed
size (usually 1 KBytes)
Each record corresponds to some file.
4/2/2016
CST 352 - Operating Systems
47
Implementation
Popular File Systems
NTFS - Partition Layout
4/2/2016
CST 352 - Operating Systems
48
Implementation
Popular File Systems
NTFS - MFT Entry Structure
4/2/2016
CST 352 - Operating Systems
49
Implementation
Popular File Systems
NTFS – Metafiles
$MFT
$MFTmirr
$LogFile
$Volume
$AttrDef
$.
$Bitmap
$Boot
$Quota
$Upcase
4/2/2016
Itself MFT
copy of the first 16 MFT records placed in the middle of the disk
journaling support file
housekeeping information - volume label, file system version, etc.
list of standard files attributes on the volume
root directory
volume free space bitmap
boot sector (bootable partition)
file where the users rights on disk space usage are recorded (began
to work only in NT5)
File - the table of accordance between capital and small
letters in files names on current volume. It is necessary because in
NTFS file names are stored in Unicode that makes 65 thousand
various characters and it is not easy to search for their large and
small equivalents.
CST 352 - Operating Systems
50
Implementation
Popular File Systems
NTFS
The $. Metafile points to the root directory file.
Directory files are divided into blocks
Each block contains: file name, base attributes, reference to
the element MFT which gives the complete information on
an element of the directory.
The inner structure of the directory is a binary tree.
4/2/2016
CST 352 - Operating Systems
51
Implementation
Popular File Systems
I-node – Each file has a data node created
for it that contains the address of every
block for the file (essentially a FAT for
each file, created in memory upon file
open).
Advantages – Typically takes up less memory
than the FAT.
Disadvantage – More overhead in file
creation.
4/2/2016
CST 352 - Operating Systems
52
Implementation
Popular File
Systems
XFS –B-Trees
of I-nodes.
4/2/2016
CST 352 - Operating Systems
53
Implementation
XFS
4/2/2016
CST 352 - Operating Systems
54
Implementation
Popular File
Systems
HFS+ - Volume
Layout
Volumes are
structured as BTrees.
4/2/2016
CST 352 - Operating Systems
55
Implementation
Popular File
Systems
HFS+ - B-Tree
based
Each node contains
file information.
4/2/2016
CST 352 - Operating Systems
56
Implementation
Popular File Systems (HFS+)
Each B-tree contains a single header node. The header
node is always the first node in the B-tree. It contains the
information needed to find other any other node in the tree.
Map nodes contain map records, which hold any
allocation data (a bitmap that describes the free nodes in the
B-tree) that overflows the map record in the header node.
Index nodes hold pointer records that determine the
structure of the B-tree.
Leaf nodes hold data records that contain the data
associated with a given key. The key for each data record
must be unique.
4/2/2016
CST 352 - Operating Systems
57