Lecture 12 EXT4

Download Report

Transcript Lecture 12 EXT4

1
17
Lecture 12 EXT4
Embedded Software Lab
Daejun Park, Eunsoo Park
Embedded Software Lab.
2
Overview FS
17
•••
EXT2,3
EXT4
NTFS, F2FS
•••
<specific implementation of FS>
According to the chapter 12.
VFS gives an abstraction view of FS to users
So we will cover specific FS connected to VFS
•
•
EXT2, EXT3
EXT4
Embedded Software Lab.
Ext2 Disk Data Structure
block and inode bitmap must be stored in a single block
These parts are duplicated in each block group
We will cover each components in block group.
• Super block, group desc, bitmap, inode table
Embedded Software Lab.
3
17
4
Super Block
17
We are here!
2 sectors (1024 bytes) that describe the file system
•
•
•
•
•
•
Volume label
Block size
# blocks per group
#reserved blocks before the 1st block group
The superblock block group number
Count of free inodes & blocks ( total all groups)
1st superblock is 1024bytes past the beginning of the file system
• The first two sectors are used to store boot code
Embedded Software Lab.
5
Super Block(2)
17
Type
Field
Description
__le32
s_inodes_count
# of inodes in filesystem
__le32
s_blocks_count
# of blocks in filesystem
__le32
s_free_blocks_count
Free blocks counter
__le32
s_free_inodes_count
Free inodes counter
__le32
s_log_block_size
Block size (0:1024 bytes, 1: 2048 bytes, …)
__le32
s_blocks_per_group
# of blocks per group
__le32
s_inodes_per_group
# of inodes per group
__le16
s_state
Status flag (mounted, unmounted, error)
__le16
s_block_group_nr
Block group number of this superblock
char [64]
s_last_mounted
Pathname of last mount point
…..
……
…….
<ext2_super_block>
Additional fields are for ext3 compatibility ( journaling ) and (e2fsck)
Embedded Software Lab.
6
Group Descriptor, Bitmap
We are here!
Type
Field
Description
__le32
bg_block_bitmap
Block number of block bitmap
__le32
bg_inode_bitmap
Block number of inode bitmap
__le32
bg_inode_table
Block number of first inode table bloc
k
__le16
bg_free_blocks_count Number of free blocks in the group
__le16
bg_free_inodes_count Number of free inodes in the group
__le16
bg_used_dirs_count
Number of directories in the group
__le16
bg_pad
Alignment to word
__le32 [3]
bg_reserved
Nulls to pad out 24 bytes
……..
………
……
17
we can decide a number of blocks
in partition by size of one block?
Ex) ext2: one block= 4KB  A
bitmap can store as much as 32k
32k * 4KB = 128MB  one group
maximum capacity
Conclusive we can determine how
many blocks can be allocated in p
artition.
<ext2_group_desc>
Embedded Software Lab.
7
Inode Table
17
We are here!
Inode Table
•
Multiple consecutive blocks, each of which contains a predefined number of inodes.
Inode
All inodes have the same size : 128bytes
•
•
•
•
Each inode corresponds to one file, and it stores file’s primary metadata, such as file’s s
ize, ownership, and temporal information.
Inode is typically 128 bytes in size and is allocated to each file and directory
Directory has file/directory name and pointer to inode in the table
Inode points to the file content blocks
Embedded Software Lab.
8
Inode
17
Type
Field
Description
__le16
i_mode
File type and access rights
__le16
i_uid
Owner identifier
__le32
i_size
File length in bytes
__le16
i_links_count
Hard links counter
__le32
i_blocks
Number of data blocks of the file
__le32 [EXT2_N_BLOCKS]
i_block
Pointers to data blocks
__le32
i_file_acl
File access control list
__le32
i_dir_acl
Directory access control list
union
osd1 osd2
OS info
<ext2_inode>
Embedded Software Lab.
9
Inode(2)
17
• Access Control Lists(ACL)
- file protection mechanism in Unix filesystem
- ACL can be associated with each file
- A user may specify for each of his files the names of specific us
ers and the privileges to be given to these users
- Linux 2.6 fully supports ACLs by making use of inode extended
attributes(extended attributes have been introduced mainly to supp
ort ACLs
Embedded Software Lab.
10
Inode(3)
17
File_type
Description
Explanation
0
Unknown
1
Regular file
•
Needs data blocks only when it starts to have data(fi
rst created, empty data blocks)
2
Directory
•
Data block store filenames together with the corresp
onding inode numbers
Such data block contain structures of type ext2_dir_e
ntry_2
EXT2_NAME_LEN : 255
•
•
3
Character device
No data block Just inode
4
Block device
No data block Just inode
5
Named pipe
No data block Just inode
6
Socket
No data block Just inode
7
Symbolic link
If the pathname less than equal 60  inode
If the pathname more than 60  one data block
Embedded Software Lab.
11
Inode(4)
17
Type
Field
Description
__le32
inode
Inode number
__le16
rec_lenv
Directory entry length
(pointer to next item offset)
__u8
name_len Filename length (real)
__u8
file_type
Char [EXT2_NAME_LEN] name
Offset
File type
Filename (A multiple of 4 )
<ext2_dir_entry_2>
*4 for efficiency
12+16
Deleted
Embedded Software Lab.
12
Inode(5)
17
Inode & Directory
• Map a file name with the related inode
• Directory is itself a file (supporting file hierarchy)
inode table
0 1 2 3 4 5 6…
status : dir
size : **
…
data blocks:
1______
___
_____
status : dir
size : **
…
data blocks:
7_______
__
_____
status : file
size : 26
…
data blocks:
10 _ _ _ _ _ _
___
_____
disk blocks
1
2
2
2
3
4
6
7
…
..
.
usr
home
dev
etc
3
4
5
6
/* comment for
hello.c */
int main()
{
…
}
7
2
4
8
9
10
5
…
8
9
..
.
reports.doc
hello.c
sudbir
alphabet.txt
10
11 …
abcdefghi…
/home/alphabet.txt
Embedded Software Lab.
13
Inode(6)
17
20 21 22
23 24 25
26 27 28 29 30
mydir
inode table
38
39
40
41
42
43
0 1 2 3 4 5 6…
Root Dir
Status : dir
Size : **
Data blocks:
20 _ _ _ _ _ _
________
Super Block
Boot Block
Root dir
File1.c
Status : file
Size : ***
Data blocks :
21 22 23 _ _ _
__ _ _ _ _ _ _
File1.c
myfile
myfile
myfile
mydir
Status : dir
Size : **
Data blocks :
24 _ _ _ _ _ _
________
Status : file
Size : ****
25 26 27 29 30
31 32
33 34 35 36 37
28 _ _
indirect
Embedded Software Lab.
2
2
3
4
5
7
.
..
File1.c
mydir
myfile
mydir2
4
2
10
11
24
19
.
..
a.hwp
b.c
Test.c
Note.doc
directory entry
Memory Data Structures
•
•
14
17
For performance, most information stored in the disk data structure of an Ex
t2 partition are copied into RAM when the file system is mounted
Kernel uses the page cache to keep disk data structures up-to-date
In dynamic mode, the data is kept in a cache as long as the associated object is in use;
when the file is closed or the data block is deleted, may be removed from the cache.
Embedded Software Lab.
15
Memory Data Structures(2)
17
Superblock Object
Disk
Memory
VFS: s_fs_info
Buffer head
Embedded Software Lab.
Memory Data Structures(3)
16
17
ext2_fill_super()
• Allocate all buffer for Objects and read or point to them
s_debts fields for maintaining balance btw regular file and Directory
s_debts increase because of increasing the number of directory.
Otherwise it will decrease
After Completion
Embedded Software Lab.
Creating EXT Filesystem
17
Mke2fs(Making EXT2 FS utility)
1.
2.
3.
4.
5.
6.
7.
8.
Initializes the superblock and the group descriptors.
For each block group, reserves all the disk blocks needed to
store the superblock, the group descriptors, the inode table, and
the two bitmaps.
Initializes the inode bitmap and the data map bitmap of each
block group to 0.
Initializes the inode table of each block group.
Creates the /root directory.
Creates the lost+found directory, which is used by e2fsck to link
the lost and found defective blocks.
Updates the inode bitmap and the data block bitmap of the
block group in which the two previous directories have been
created.
Groups the defective blocks (if any) in the lost+found directory.
Embedded Software Lab.
17
Memory Data Structures(4)
18
• Inode Object
For each component of the pathname that is not already in the dentry cache,
a new dentry object and a new inode object are created.
• When the VFS accesses an Ext2 disk inode, it creates a correspondi
ng inode descriptor of type ext2_inode_info
• Inode object include these
- The whole VFS inode object
- Most of the fields found in the disk’s inode structure that are not kept in
the VFS inode
- The i_next_alloc_block and i_next_alloc_goal fields, which store the logic
al block number and physical block number of the disk block
- The i_acl and i_default_acl fields, which point to the ACLs of the file
Embedded Software Lab.
17
19
Methods
17
Ext2 Super Block Operations
<fs/ext2/super.c>
• Point to the EXT2 specific operations
ext2_sops
•••
alloc_inode
read_inode
write_inode
•••
Ext2 inode Operations
•
•
•
includes directory operations in terms of EXT2
includes regular file operations in terms of EXT2
if some methods are NULL, call VFS generic methods or nothing.
Embedded Software Lab.
20
Methods(2)
17
Operations Table
<EXT2 file Operations>
<EXT2 Inode Operations>
Embedded Software Lab.
Managing Disk Space
21
17
A FS tries to keep the block in contiguous order.
However blocks can be scattered and file holes makes Volumes bigger.
We will cover the operations of inode and data block in terms of
•
•
Avoid File Fragmentation
A volume management must work ASAP.
Embedded Software Lab.
Managing Disk Space – Creating Inode
•
Creating inodes
find_group_orlov()
find_group_other()
Embedded Software Lab.
22
17
Managing Disk Space – Deleting Inode
•
Deleting inodes
Clear_inode() :
Embedded Software Lab.
23
17
Managing Disk Space – Data Blocks Addr
essing
•
24
17
Data Blocks Addressing
Blocks may be referred to either by their relative position inside the file (their file block n
umber) or by their position inside the disk partition(LBN-logical block number)
An offset f
• Derive the file block number from the f
• Translate the file block number to LBN
It is hard to translate file block number into LBN
EXT2 provides a method to store the connection between each file block number
and the LBN on disk
We will look up i_block field thoroughly
Embedded Software Lab.
Managing Disk Space – Data Blocks Addressing(2
)
•
25
17
The i_block field in the disk inode is an array of EXT2_N_BLOCKS components that contain lo
gical block numbers.
We can calculate upper size of data i
n terms of n-indirected
Ex) 2-directed = direct + 1-directed
direct 12*4KB = 48KB
1-directed (4KB/4B)*4KB = 1024*4KB = 4MB+48KB
2-directed (4KB/4B)*(4KB/4B)*4KB = 4GB + 4MB +48K
B
4KB contains the points
to1024 LBNs
Indirect Double indirect
0
0
1
4096
4KB
2
4096 * 2
Triple indirect
3
4096 * 3
4
4096 * 4
•••
4096 * 5
Embedded Software Lab.
26
File Hole
17
File Hole
• A file hole is a portion of regular file that contains “\0” and is not sorted i
n any data block on disk
• File holes were introduced to avoid wasting disk space.
• A block is assigned to a file only when the process needs to write data int
o it
Condition : i_size > 512 * i_blocks
That’s HOLE!
Embedded Software Lab.
Allocating a data block
27
17
• ext2_get_block() searches for a free block
• file fragmentation should be reduced
•
•
Try to keep the meta-data and data blocks closely
Try to keep the files under the same directory
Embedded Software Lab.
Releasing a Data block
Embedded Software Lab.
28
17
29
EXT3 Overview
17
• Inter-compatible
ext3
ext2
Journaling
– Ext2 converts to Ext3
– Ext3 can be read by Ext2
• Ext3 adds journaling for consistency
Inter-Compatible
– Journal is a small, circular area written
before writing to the disk
– After crash, read the journal to ensure
all write operations were completed
• Redo any that were not completed
Embedded Software Lab.
EXT3 Filesystem
30
• Designed with two simple concepts in mind:
– To be a journaling filesystem
– To be compatible with the old Ext2 filesystem
• Journaling Filesystems
– Updates to filesystem blocks might be kept in dynamic memory
for long period of time before being flushed to disk
– A dramatic event such as a power-down failure or a system crash
might thus leave the filesystem in an inconsistent state
– To overcome this problem, each traditional Unix filesystem is
checked before being mounted  too long time
– avoid running time-consuming consistency checks on the whole
filesystem
– Instead, look in a special disk area that contains the most recent
disk write operations named journal
Embedded Software Lab.
17
EXT3 Journaling
31
• The idea behind Ext3 journaling
– First, a copy of the blocks to be written is stored in the journal
– When the I/O data transfer to the journal is completed (in short,
data is committed to the journal), the blocks are written in the
filesystem
• When system failure occurred before a commit to the
journal
– Either the copies of the blocks relative to the high-level change
are missing from the journal or they are incomplete;
– e2fsck ignores journals.
• When system failure occurred after a commit to the journal
– The copies of the blocks are valid, and e2fsck writes journals into
the filesystem.
Embedded Software Lab.
17
32
EXT3 Journaling(3)
17
• The first block in the journal is journal superblock, and it contains
the first logging data address and its sequence number.
• Updates are done in transactions, and each transaction has a
sequence number.
• Each transaction starts with a descriptor block that contains the
transaction sequence number and a list of what blocks are being
updated.
• Following the descriptor block are the updated blocks.
• When the updates have been written to disk, a commit block is
written with the same sequence number.
Transaction
Checkpoint=write to the Disk
Embedded Software Lab.
33
Ext3 Journaling(2)
17
Before committing, they gathered file manipulation which called “transaction”
X
Manipulate A
Y
My
Journal
Section
W
name
Manipulate B
Manipulate C
Z
is
hyemin
eslab
Descriptor
Block
the
Transaction
Best
eslab
is
the
Best
X
Y
Z
W
Embedded Software Lab.
Commit
Block
34
EXT3 Journaling modes
17
There are three journaling modes
Mode
Journal
Ordered
Role
• All Filesystem data
and metadata
Pros & Cons
Safest and slowest
Writeback
• Only changes to fil • Only changes to
esystem metadata
filesystem meta
are logged into th
data are logged
e journal
.
Default Ext3 journalin
g mode
Fastest mode but
not safe
This is the method found on
the other journaling
filesystems
Embedded Software Lab.
35
Ext3 – Journal Structure
17
Journal Section locates in Filesystem or Other partition.
• In filesystem, inode num 8 points to journal section
• It has no dir entry so Users cannot see it.  .journal
Circular buffer
s_start
Transaction
•••
Transaction
Journal Super Block
Desc Block
Block
Block
•••
Header
Embedded Software Lab.
Commit Block
EXT3 JBD(Journaling Block Device) Laye
r
36
Transaction
Block Block Block Block Block Block Block Block
File Operation
Log Record
JBD must also protect itself from system failures that could corrupt the journal via
three fundamental units:
• Log Record
- Describes a single update of a disk block of the journaling filesystem
• Atomic Operation Handle
- Includes log records relative to a single high-level change of the filesystem
- typically, each system call modifying the filesystem gives rise to a single atomic
operation handle
- To start an atomic operation the Ext3 filesystem invokes the journal_start() JDB
Function, which allocates, if necessary, a new atomic operation handle and inserts
it into the current transaction
• Transaction
- Includes several atomic operation handles whose log records are marked valid
for e2fsck at the same time.
Embedded Software Lab.
17
EXT3 JBD(Journaling Block Device) Laye
r(2)
How a transaction works
37
17
Complete : All log records included i
n the transaction are written in Jour
nal
(e2fsck works well)
 t_state = T_FINISHED
Incomplete : (e2fsck ignores incompl
ete transaction)
 t_state could be set these flags
T_RUNNING
T_LOCKED
T_FLUSH
T_COMMIT
Embedded Software Lab.
38
How Journaling Works(2)
17
Journal_get_write_access()
Register target buffer head at JBD
kjournald2
Start
CheckPoint
Commit complet
e
Preparation JBD
<Ordered Mode>
Write Operation Done!
Embedded Software Lab.
39
EXT4-Overview
17
• EXT4: October 2008, stable code in the Linux 2.6.28
–
–
–
–
preliminary development version in Linux 2.6.19
easily upgrade ext3
Utilize the previous work, focus on adding advanced features
a new scalable enterprise-ready file system in a short time
• Maintainers
– Theodore Ts'o [email protected]
– Andreas Dilger [email protected]
Embedded Software Lab.
40
EXT4-Usage
17
2010/1/15 Google announced that it would upgrade its storage infrastructure from ex
t2 to ext4.
2010/12/14 Google announced they would use ext4, instead of YAFFS on Android 2.3
Embedded Software Lab.
41
EXT4-features
17
• Bigger file/filesystem size support.
– Compared to ext3, ext4 is 8 times larger in file size,
– 65536 times larger in filesystem size.
• I/O performance improvement
– delayed allocation, multi block allocator extent map and
persistent preallocation
– Fast fsck: flex_bg and uninit_bg
– Reliability: journal checksumming
– Maintenance: online defragmentation
– Misc: backward compatibility with ext2/ext3, nanosec
timestamps, subdir scalability, etc.
Embedded Software Lab.
42
EXT3 vs. EXT4
17
Embedded Software Lab.
Scalability Enhancements
43
• ext3: 16TB file system size limit
– caused by the 32-bit block number
– 4KB(1 block size) X 2^32 (blocks_count: unsigned int) = 16TB
• ext4: 1EB
– 48-bit block numbers
– 4KB X 2^48 = 1EB (2^(12+48)B) = 1000^6 BYTE(TB*1000^2)
– Metadata in the superblock, the group descriptors, and the
journal:
• New fields added for most significant 32 bits for block-counter
variables, s_free_blocks_count, s_blocks_count, and
s_r_blocks_count
– JBD -> JBD2 (support 48-bit block addresses)
Why not 64-bit support ?
• 1EB is enough in current situation
• 1EB file system 119 years to finish one full e2fcsk, so reliability issue
Embedded Software Lab.
17
44
Scalability Enhancements(2)
•
•
•
•
•
17
Extent: represent a range of contiguous physical blocks
Efficient to represent large files
Better CPU utilization, fewer metadata IOs
One extent: 215 contiguous blocks (128MB, 1 block=4KB)
4 extents in ext4 inode structure or extent_header
< Ext4_inode i_block[EXT4_N_BLOCKS] >
15*4bytes array
4bytes
header
12bytes
extent0
12bytes
extent1
12bytes
< Ext4_inode i_block[EXT4_N_BLOCKS] > 60bytes
extent2
extent3
12bytes
12bytes
Embedded Software Lab.
Scalability Enhancements(3)
45
/* This is the extent on-disk structure. It's used at the bottom of the tree. */
struct ext4_extent {
__le32 ee_block;
/* first logical block extent covers */
__le16 ee_len;
/* number of blocks covered by extent */
__le16 ee_start_hi; /* high 16 bits of physical block */
__le32 ee_start_lo; /* low 32 bits of physical block */
};
/* This is index on-disk structure. It's used at all the levels except the bottom. */
struct ext4_extent_idx {
__le32 ei_block;
/* index covers logical blocks from 'block' */
__le32 ei_leaf_lo;
/* pointer to the physical block of the next
* level. leaf or next index could be there */
__le16 ei_leaf_hi;
/* high 16 bits of physical block */
__u16 ei_unused;
};
struct ext4_extent_header {
__le16 eh_magic;
/* probably will support different formats */
__le16 eh_entries;
/* number of valid entries */
__le16 eh_max;
/* capacity of store in entries */
__le16 eh_depth;
/* has tree real underlying blocks? */
__le32 eh_generation; /* generation of the tree */
};
That’s why # of contiguou
s block is 2^15
eh_magic : block mapped extent or extent f
or robustness
Embedded Software Lab.
17
Scalability Enhancements(4)
Embedded Software Lab.
46
17
Scalability Enhancements(5)
47
17
• Large files
– Ext3 file size: i_blocks counter value in Linux.
• Block size: 4KB, Max file size: 4TB =((4KB/4B)^3 X 4KB) -> file system level
• Unit in sector(512B): 2^32 X 512B = 2TB -> Linux limitation
– ext4: feature HUGE_FILE added
• 32 bit logical block numbers with extent, 2^32 X 4KB = 16TB
• Large number of files
– Ext3 allocates inode statically so fixed number inode  It limits # of files
– dynamic inode tables, a cluster of contiguous inode table blocks (ITBC) can
be allocated on demand.
– 15-bit relative block number: 2^15 = 4K X 8 bit (block bitmap)
– 4 bit offset: 4KB(1 block)/256B (default ext4 inode structure) = 2^4 (16)
64-bit inode layout
Embedded Software Lab.
Scalability Enhancements(6)
• Directory scalability
– ext3: 32,000 maximum number of subdirectories,
linked list -> very inefficient with large numbers
of entries
– ext4: storing directory entries in a constant
depth Htree data structure
• (specialized BTree-like structure using 32-bit
hashes)
• Large inode and fast extended attributes
– The default inode structure size 128 bytes.
(already crowded)
– In ext4, default inode structure size  256
bytes
– fixed-field section: nanosecond timestamps, fast
extended attributes (EAs)
Embedded Software Lab.
48
17
Reliability Enhancements
49
• Reliability is very important to ext3 and is one of the
reasons for its vast popularity.
– robust metadata design, internal redundancy at various levels,
and built-in integrity checking using checksums.
– Important is the speed at which a file system is recovered
after corruption.
• Unused inode count and fast e2fsck
– (next slide)
• Checksumming
– Adding metadata checksumming
– easily detect corruption, avoid blindly trusting the data
– group descriptors, journal have a checksum added
Embedded Software Lab.
17
50
Reliability Enhancements(2)
17
• Unused inode count and fast e2fsck
– The uninitialized groups and inode table high watermark feature
allows much of the lengthy e2fsck pass 1 scanning to be safely
skipped.
– reduce the total time taken by e2fsck by 2 to 20 times
– enabled at mke2fs time or using tune2fs via “-O uninit_groups”
option.
– the kernel stores the number of unused inodes at the end of each
block group’s inode table.
• EXT3
– e2fsck time grows linearly with the total
number of inodes, regardless of how
many are used.
– e2fsck takes the same amount of time
with zero used files as with 2.1M used
files.
• EXT4 with the unused inode high
watermark feature
– e2fsck time is only dependent on the
number of used inodes.
Embedded Software Lab.
ext3: 0 files
ext3: 100k files
ext3: 2.1M files
ext4: 100k files
ext4: 2.1M files
Block Allocation Enhancements
51
• Persistent preallocation
–
–
–
–
–
Preallocate blocks for a file up-front
DB, Streaming Media Server
ensure contiguous allocation as far as possible for a file
allocated but uninitialized
The MSB of the extent length field indicates whether a given
extent contains uninitialized data.
• Delayed block allocation
– block allocations are postponed to page flush time rather than
during the write()
– Combine many block allocation requests into a single request
• Reduce fragmentation and save CPU cycles.
• avoids unnecessary block allocation for shortlived files
– There is a trade-off between performance and reliability
– 30% improved throughput, 50% reduction in CPU
Embedded Software Lab.
17
Block Allocation Enhancements(2)
52
• Online defragmentation
– with age, the filesystem still become quite fragmented
– e4defrag
• Creates a temporary inode and allocates contiguous extents
using multiple block allocation
• Copies the original file data to the page cache and flushes the
dirty pages to the temporary inode’s blocks
• Migrates the block pointers from the temporary inode to the
original inode
Embedded Software Lab.
17
Ext3 Vs Ext4 in terms of Scalability
Problems with Ext3 block allocator
• Lack of free extent information across the file system
- Use only the bitmap to search for the free blocks to reserve
- Search for free blocks only inside the reservation window
• Doesn’t differentiate allocation for small / large files
Embedded Software Lab.
53
17
Multiple Blocks Allocator
54
• EXT3 block reservation
– subsequent request for blocks for a file get served before
interleaved
– per-file reservation window
• EXT4 Multiple Blocks Allocator
– Different strategy for different allocation requests
– Per-block-group buddy cache
• Contiguous multiple blocks are allocated at once to prevent file
fragmentation.
• builds per-block group free extents information based on the ondisk block bitmap to guide the search for free extents
• generated at filesystem mount time and stored in memory using
a buddy structure.
Embedded Software Lab.
17
Multiple Blocks Allocator(2)
• Different strategy for different allocation requests
– Better allocation for small and large files
• Ext4 multiple block allocator maintains two preallocated
spaces
– Small allocation request,
• per-CPU locality group preallocation
• used for small files are places closer on disk
– Large allocation request,
• per-file (per-inode) preallocation
• used for larger files are less interleaved
• Which preallocation space to use
– depends on the total size derived out of current file size and
allocation request size.
– If the total size < stream_req blocks, per-CPU locality group
preallocation space.
– Default is 16 (/prof/fs/ext4/<partition>/stream_req)
Embedded Software Lab.
55
17
Multiple Block Allocator(3)
• Per-block-group buddy cache
– When it can’t allocate blocks from the preallocation
– Contiguous free blocks of block group are managed by the
buddy system in memory (20-213).
Embedded Software Lab.
56
17
Multiple Blocks Allocator(4)
57
• Per-block-group buddy cache
– Blocks unused by the current allocation are added to inode
preallocation
– Inode preallocation enables blocks will be assigned preferentially
when the next block allocation comes. Consequently contiguous
multiple blocks are used.
– For a file smaller than 16 blocks is added to the per-CPU locality
group to pack small files together
Embedded Software Lab.
17