这是一个不可移植的、特定于 Linux 的系统调用。 For the portable, POSIX.1-specified method of ensuring that space is allocated for a file, see posix_fallocate(3).
fallocate() 允许调用者直接 *** 作 fd 引用的文件所分配的磁盘空间, *** 作的字节范围为[ offset , offset + len ]。
mode 参数确定要在给定范围上执行的 *** 作。 支持的 *** 作的详细信息在下面的小节中给出。
fallocate()的默认 *** 作(即 mode =0)是在参数 offset 和 len 指定的范围内分配磁盘空间。如果 offset + len 大于文件的大小,则文件大小将被修改。超过原范围的区域将会被初始化为0。此默认行为与 posix_fallocate(3) 库函数的行为非常相似,是实现 posix_fallocate(3) 的最佳实现方法。
调用成功后,后续写入 offset 和 len 指定的范围不会因为磁盘空间不足而失败。
注:这样做的有什么用呢?根据博客 用fallocate进行"文件预留"或"文件打洞" ,可以有以下好处:
(1)可以让文件尽可能的占用连续的磁盘扇区,减少后续写入和读取文件时的磁盘寻道开销;
(2)迅速占用磁盘空间,防止使用过程中所需空间不足。
(3)后面再追加数据的话,不会需要改变文件大小,所以后面将不涉及metadata的修改
如果在mode中指定了 FALLOC_FL_KEEP_SIZE 标志,调用的行为类似,即依然会为文件分配磁盘空间,但是不会修改文件大小。这种预分配的方式可以用来优化文件的append *** 作,也就是在执行append的时候不需要再额外申请磁盘空间了。
如果在 mode 中指定了 FALLOC_FL_UNSHARE_RANGE 标志,则共享文件数据范围将成为文件私有的,以保证后续写入不会因空间不足而失败。 通常,这将通过对文件中的所有共享数据执行写时复制 *** 作来完成。 并非所有文件系统都支持此标志。
由于分配是以块大小的块完成的,fallocate() 可能会分配比指定范围更大的磁盘空间。
当mode指定为 FALLOC_FL_PUNCH_HOLE 时,会释放指定范围内的空间,即创建一个空洞。在指定的范围内,部分的文件块(即文件块部分属于该范围)将会被置为0,全部的在范围内的文件块,将会被从文件系统中删除。成功调用后,后续的读取将会返回0。
FALLOC_FL_PUNCH_HOLE 必须和 FALLOC_FL_KEEP_SIZE 通过或运算一起使用,换句话说, FALLOC_FL_PUNCH_HOLE 是不能修改文件的大小的。
并非所有文件系统都支持 FALLOC_FL_PUNCH_HOLE; 如果文件系统不支持该 *** 作,则返回错误。 至少以下文件系统支持该 *** 作:
当mode指定为 FALLOC_FL_COLLAPSE_RANGE 标志时,将从文件中删除指定的字节范围,而不会留下空洞。 *** 作完成后,从 offset + len 开始位置的文件内容将会被追加到 offset 处。文件大小会减少 len
文件系统可能会限制 *** 作的粒度,以确保有效实施。 通常,offset 和 len 必须是文件系统逻辑块大小的倍数,这取决于文件系统类型和配置。 如果文件系统有这样的要求,如果违反了该要求,fallocate() 将失败并显示错误 EINVAL。
If the region specified by offset plus len reaches or passes the end of file, an error is returnedinstead, use ftruncate(2) to truncate a file.
FALLOC_FL_COLLAPSE_RANGE 标志和其他标志不兼容。
在 Linux 3.15 中,ext4(only for extent-based files)和 XFS 支持 FALLOC_FL_COLLAPSE_RANGE 标志。
当mode指定为 FALLOC_FL_COLLAPSE_RANGE 标志时,将会指定范围内分配磁盘空间,填补空洞。成功调用后后续读取将会返回0。
Zeroing is done within the filesystem preferably by converting the range into unwritten extents. This approach means that the specified range will not be physically zeroed out on the device (except for partial blocks at the either end of the range), and I/O is (otherwise) required only to update metadata.可能的意思是,最好不要将物理磁盘清零,而是配置一个为写入的状态,这样读取上来的pagecache就是0。这种情况下,仅仅需要修改文件元数据就可有了。
如果在mode中额外指定了FALLOC_FL_KEEP_SIZE标志,调用的行为类似,但即使offset+len大于文件大小,文件大小也不会改变。 此行为与在指定 FALLOC_FL_KEEP_SIZE 的情况下预分配空间时相同。
并非所有文件系统都支持 FALLOC_FL_ZERO_RANGE; 如果文件系统不支持该 *** 作,则返回错误。 至少以下文件系统支持该 *** 作:
如果在mode中额外指定了 FALLOC_FL_INSERT_RANGE 标志,那么将会在offset开始的位置插入一个大小为len的空洞,在不覆盖文件内容的前提下增加文件的空间。
此模式在 *** 作粒度方面与 FALLOC_FL_COLLAPSE_RANGE 具有相同的限制。 如果不满足粒度要求,fallocate() 将失败并显示错误 EINVAL。 如果偏移量等于或大于文件末尾,则返回错误。 对于此类 *** 作(即在文件末尾插入一个洞),应使用 ftruncate(2)。
FALLOC_FL_INSERT_RANGE 标志与其他标志不兼容。
目前只有XFS (since Linux 4.1) 和 ext4 (since Linux 4.2)支持此标志。
n success, fallocate() returns zero. On error, -1 is returned and errno is set to indicate the error.
fallocate() is available on Linux since kernel 2.6.23. Support is provided by glibc since version 2.10. The FALLOC_FL_* flags are defined in glibc headers only since version 2.18.
fallocate() is Linux-specific.
man page 其实就是Manual Page的意思,使用方法是 man 命令 ,然后就会显示某个命令的所有官方说明和用法。这个是基础中的基础。但是苦于Man Page和历史上所有的说明书一样,实在是太官方太枯燥了,所以我们可以看到一些衍生品:
TLDR 的意思是Too longDont' read. 这个词在写文章时代表接下来要出现一个很长的内容了,但是在Linux中其实代表着相反的意思:把大长篇的说明简化为两三句话,直入重点展示命令的用法。
tldr 是Linux命令行工具, 官网在此 。安装方式如下:
注意:各种设备、平台上的安装方法都不同,请到官网看详情。
相当与 tldr 的社区版,即社区可以贡献每种命令的使用事例,然后通过投票方式排名。所以bropage每次执行都是需要联网查询的。
用法是: bro 命令
Anatomy of Linux flash file systemsOptions and architectures
Summary: You've probably heard of Journaling Flash File System (JFFS) and Yet
Another Flash File System (YAFFS), but do you know what it means to have a file
system that assumes an underlying flash device? This article introduces you to
flash file systems for Linux®, explores how they care for their underlying
consumable devices (flash parts) through wear leveling, and identifies the
various flash file systems available along with their fundamental designs.
Solid-state drives are all the rage these days, but embedded systems have
used solid-state devices for storage for quite some time. You'll find flash
file systems used in personal digital assistants (PDAs), cell phones, MP3
players, digital cameras, USB flash drives (UFDs), and even laptop computers.
In many cases, the file systems for commercial devices can be custom and
proprietary, but they face the same challenges discussed below.
Flash-based file systems come in a variety of forms. This article explores
a couple of the read-only file systems and also reviews the various read/write
file systems available today and how they work. But first, let's explore the
flash devices and the challenges that they introduce.
Flash memory technologies
Flash memory, which can come in several different technologies, is non-volatile
memory, which means that its contents persist after its source of power is
removed. For a great history of flash memory devices, see Resources.
Two of the most common types of flash devices are defined by their
respective technologies: NOR and NAND. NOR-based flash is the older technology
that supported high read performance at the expense of smaller capacities. NAND
flash offers higher capacities with significantly faster write and erase
performance. NAND also requires a much more complicated input/output (I/O)
interface.
Flash parts are commonly divided into partitions, which allows
multiple operations to occur simultaneously (erasing one partition while
reading from another). Partitions are further divided into blocks
(commonly 64KB or 128KB in size). Firmware that uses the partitions can further
apply unique segmenting to the blocks—for example, 512-byte segments within a
block, not including metadata.
Flash devices exhibit a common constraint that requires device management
when compared to other storage devices such as RAM disks. The only Write
operation permitted on a flash memory device is to change a bit from a one to a
zero. If the reverse operation is needed, then the block must be erased (to
reset all bits to the one state). This means that other valid data within the
block must be moved for it to persist. NOR flash memory can typically be
programmed a byte at a time, whereas NAND flash memory must be programmed in
multi-byte bursts (typically, 512 bytes).
The process of erasing a block differs between the two memory types. Each
requires a special Erase operation that covers an entire block of the flash
memory. NOR technology requires a precursor step to clear all values to zero
before the Erase operation can begin. An Erase is a special operation
with the flash device and can be time-consuming. Erasing is an electrical
operation that drains the electrons from each cell in an entire block.
NOR flash devices typically require seconds for the Erase operation,
whereas a NAND device can erase in milliseconds. A key characteristic of flash
devices is the number of Erase operations that can be performed. In a NOR
device, each block in the flash memory can be erased up to 100,000 times. NAND
flash memories can be erased up to one million times.
Flash memory challenges
In addition to and as a result of the constraints explored in the previous
section, managing flash devices presents several challenges. The three most
important are garbage collection, managing bad blocks, and wear leveling.
Garbage collection
Garbage collection is the process of reclaiming invalid blocks (those that
contain some amount of invalid data). Reclamation involves moving the valid
data to a new block, and then erasing the invalid block to make it available.
This process is commonly done in the background or as needed, if the file
system is low on available space.
Managing bad blocks
Over time, flash devices can develop bad blocks through use and can even
ship from the manufacturer with blocks that are bad and cannot be used. You can
detect the presence of back blocks from a failed flash operation (such as an
Erase) or an invalid Write operation (discovered through an invalid Error
Correction Code, or ECC).
After bad blocks have been identified, they are marked within the flash
itself in a bad block table. How this is done is device-dependent but can be
implemented with a separate set of reserved blocks managed separately from
normal data blocks. The process of handling bad blocks—whether they ship with
the device or appear over time—is called bad block management. In some
cases, this functionality is implemented in hardware by an internal
microcontroller and is therefore transparent to the upper-level file system.
Wear leveling
Recall that flash devices are consumable parts: You can perform a finite
number of Erase cycles on each block before the block becomes bad (and must
therefore be tagged by bad block management). To maximize the life of the
flash, wear-leveling algorithms are provided. Wear leveling comes in two
varieties: dynamic wear leveling and static wear leveling.
Dynamic wear leveling addresses the problem of a limited number of Erase
cycles for a given block. Rather than randomly using blocks as they are
available, dynamic wear-leveling algorithms attempt to evenly distribute the
use of blocks so that each gets uniform use. Static wear-leveling algorithms
address an even more interesting problem. In addition to a maximum number of
Erase cycles, certain flash devices suffer from a maximum number of Read cycles
between Erase cycles. This means that if data sits for too long in a block and
is read too many times, the data can dissipate and result in data loss. Static
wear-leveling algorithms address this by periodically moving stale data to new
blocks.
System architecture
So far, I've explored flash devices and their fundamental challenges. Now,
look at how these pieces come together as part of a layered architecture (see
Figure 1). At the top is the virtual file system (VFS), which presents a common
interface to higher-level applications. The VFS is followed by the flash file
system, which will be covered in the next section. Next is the Flash
Translation Layer (FTL), which provides for overall management of the flash
device, including allocation of blocks from the underlying flash device as well
as address translation, dynamic wear leveling, and garbage collection. In some
flash devices, a portion of the FTL can be implemented in hardware.
The Linux kernel uses the Memory Technology Device (MTD) interface, which
is a generic interface for flash devices. The MTD can automatically detect the
width of the flash device bus and the number of devices necessary for
implementing the bus width.
Flash file systems
Several flash file systems are available for Linux. The next sectionsexplain the design and advantages of each.
Journaling Flash File System
One of the earliest flash file systems for Linux is called the Journaling
Flash File System. JFFS is a log-structured file system that was designed
for NOR flash devices. It was unique and addressed a variety of problems with
flash devices, but it created another.
JFFS viewed the flash device as a circular log of blocks. Data written to
the flash is written to the tail, and blocks at the head are reclaimed. The
space between the tail and head is free spacewhen this space becomes low, the
garbage collector is executed. The garbage collector moves valid blocks to the
tail of the log, skips invalid or obsolete blocks, and erases them (see Figure
2). The result is a file system that is automatically wear leveled both
statically and dynamically. The fundamental problem with this architecture is
that the flash device is erased too often (instead of an optimal erase
strategy), which wears the device out too quickly.
When a JFFS is mounted, the structural details are read into memory, whichcan be slow at mount-time and consume more memory than desired.
Journaling Flash File System 2
Although JFFS was very useful in its time, its wear-leveling algorithm
tended to shorten the life of NOR flash devices. The result was a redesign of
the underlying algorithm to remove the circular log. The JFFS2 algorithm was
designed for NAND flash devices and also includes improved performance with
compression.
In JFFS2, each block in the flash is treated independently. JFFS2 maintains
block lists to sufficiently wear-level the device. The clean list represents
blocks on the device that are full of valid nodes. The dirty list contains
blocks with at least one obsoleted node. Finally, the free list represents the
blocks that have been erased and are available for use.
The garbage collection algorithm can then intelligently decide what to
reclaim in a reasonable way. Currently, the algorithm probabilistically selects
from the clean or dirty list. The dirty list is selected 99 percent of the time
to reclaim blocks (moving the valid contents to another block), and the clean
list is selected 1 percent of the time (simply moving the contents to a new
block). In both cases, the selected block is erased and placed on the free list
(see Figure 3). This allows the garbage collector to re-use blocks that are
obsoleted (or partially so) but still move data around the flash to support
static wear leveling.
Yet Another Flash File System
YAFFS is another flash file system developed for NAND flash. The initial
version (YAFFS) supported flash devices with 512-byte pages, but the newer
version (YAFFS2) supports newer devices with larger page sizes and greater
Write constraints.
In most flash file systems, obsolete blocks are marked as such, but YAFFS2
additionally marks blocks with monotonically increasing sequence numbers. When
the file system is scanned at mount time, the valid inodes can be quickly
identified. YAFFS also maintains trees in RAM to represent the block structure
of the flash device, including fast mounting through checkpointing —the
process of saving the RAM tree structure to the flash device on a normal
unmount so that it can be quickly read and restored to RAM at mount time (see
Figure 4). Mount-time performance is a great advantage of YAFFS2 over other
flash file systems.
Read-only compressed file systems
In some embedded systems, there's no need to provide a mutable file system:
An immutable one will suffice. Linux supports a variety of read-only file
systems, two of the most useful are cramfs and SquashFS.
Cramfs
The cramfs file system is a compressed read-only Linux file system that can
exist within flash devices. The primary characteristics of cramfs are that it
is both simple and space-efficient. This file system is used in small-footprint
embedded designs.
While cramfs metadata is not compressed, cramfs uses zlib compression on a
per-page basis to allow random page access (pages are decompressed upon
access).
You can play with cramfs using the mkcramfs utility and the loopbackdevice.
SquashFS
SquashFS is another compressed read-only Linux file system that is useful
within flash devices. You'll also find SquashFS in numerous Live CD Linux
distributions. In addition to supporting zlib for compression, SquashFS uses
Lembel-Ziv-Markov chain Algorithm (LZMA) for improved compression and speed.
Like cramfs, you can use SquashFS on a standard Linux system withmksquashfs and the loopback device.
Going further
Like most of open source, software continues to evolve, and new flash file
systems are under development. An interesting alternative still in development
is LogFS, which includes some very novel ideas. For example, LogFS maintains a
tree structure on the flash device itself so that the mount times are similar
to traditional file systems, such as ext2. It also uses a wandering tree for
garbage collection (a form of B+tree). What makes LogFS particularly
interesting, however, is that it is very scalable and can support large flash
parts.
With the growing popularity of flash file systems, you'll see a
considerable amount of research being applied toward them. LogFS is one
example, but other options, such as UbiFS, are also growing. Flash file systems
are interesting architecturally and will continue to be a source of innovationin the future.
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)