Ext4 is the evolution of the most used Linux filesystem, Ext3. In many ways, Ext4 is a deeper improvement over Ext3 than Ext3 was over Ext2. Ext3 was mostly about adding journaling to Ext2, but Ext4 modifies important data structures of the filesystem such as the ones destined to store the file data. The result is a filesystem with an improved design, better performance, reliability and features.
Ext4 was released as a functionally complete and stable filesystem in Linux 2.6.28, and it’s getting included in all the modern distros (in some cases as the default fs), so if you are using a modern distro, it’s possible that you already have Ext4 support and you don’t need to modify your system to run Ext4.
Any existing Ext3 filesystem can be migrated to Ext4 with an easy procedure which consists in running a couple of commands in read-only mode (described in the next section). This means that we can improve the performance, storage limits and features of your current filesystems without reformatting and/or reinstalling wer OS and software environment. If we need the advantages of Ext4 on a production system, we can upgrade the filesystem.
Bigger filesystem / file sizes
Ext3 support 16 TB of maximum filesystem size, and 2 TB of maximum file size. Ext4 adds 48-bit block addressing, so it will have 1 EB of maximum filesystem size and 16 TB of maximum file size. 1 EB = 1,048,576 TB (1 EB = 1024 PB, 1 PB = 1024 TB, 1 TB = 1024 GB).
Sub directory scalability
The maximum possible number of sub directories contained in a single directory in Ext3 is 32000. Ext4 breaks that limit and allows a unlimited number of sub directories.
The traditionally Unix-derived filesystems like Ext3 use a indirect block mapping scheme to keep track of each block used for the blocks corresponding to the data of a file. This is inefficient for large files, specially on large file delete and truncate operations, because the mapping keeps a entry for every single block, and big files have many blocks -> huge mappings, slow to handle. Modern filesystems use a different approach called “extents”. An extent is basically a bunch of contiguous physical blocks. It basically says “The data is in the next n blocks”. For example, a 100 MB file can be allocated into a single extent of that size, instead of needing to create the indirect mapping for 25600 blocks (4 KB per block). Huge files are split in several extents. Extents improve the performance and also help to reduce the fragmentation, since an extent encourages continuous lawets on the disk.
When Ext3 needs to write new data to the disk, there’s a block allocator that decides which free blocks will be used to write the data. But the Ext3 block allocator only allocates one block (4KB) at a time. That means that if the system needs to write the 100 MB data mentioned in the previous point, it will need to call the block allocator 25600 times. Not only this is inefficient, it doesn’t allow the block allocator to optimize the allocation policy because it doesn’t knows how many total data is being allocated, it only knows about a single block. Ext4 uses a “multiblock allocator” (mballoc) which allocates many blocks in a single call, instead of a single block per call, avoiding a lot of overhead. This improves the performance, and it’s particularly useful with delayed allocation and extents. This feature doesn’t affect the disk format.
Delayed allocation is a performance feature (it doesn’t change the disk format) found in a few modern filesystems such as XFS, ZFS, btrfs or Reiser 4, and it consists in delaying the allocation of blocks as much as possible, contrary to what traditionally filesystems (such as Ext3, reiser3, etc) do: allocate the blocks as soon as possible. For example, if a process write()s, the filesystem code will allocate immediately the blocks where the data will be placed – even if the data is not being written right now to the disk and it’s going to be kept in the cache for some time. This approach has disadvantages. For example when a process is writing continually to a file that grows, successive write()s allocate blocks for the data, but they don’t know if the file will keep growing. Delayed allocation, on the other hand, does not allocate the blocks immediately when the process write()s, rather, it delays the allocation of the blocks while the file is kept in cache, until it is really going to be written to the disk. This gives the block allocator the opportunity to optimize the allocation in situations where the old system couldn’t. Delayed allocation plays very nicely with the two previous features mentioned, extents and multiblock allocation, because in many workloads when the file is written finally to the disk it will be allocated in extents whose block allocation is done with the mballoc allocator. The performance is much better, and the fragmentation is much improved in some workloads.
In ext4, unallocated block groups and sections of the inode table are marked as such. This enables e2fsck to skip them entirely on a check and greatly reduce the time it takes to check a file system of the size ext4 is built to support. This feature is implemented in version 2.6.24 of the Linux kernel.
Ext4 uses checksums in the journal to improve reliability, since the journal is one of the most used files of the disk. This feature has a side benefit; it can safely avoid a disk I/O wait during the journaling process, improving performance slightly. The technique of journal checksumming was inspired by a research paper from the University of Wisconsin titled IRON File Systems (specifically, section 6, called “transaction checksums”).
“No Journaling” mode
Journaling ensures the integrity of the filesystem by keeping a log of the ongoing disk changes. However, it is know to have a small overhead. Some people with special requirements and workloads can run without a journal and its integrity advantages. In Ext4 the journaling feature can be disabled, which provides a small performance improvement.