linux手册翻译——fallocate(2)_系统运维

fallocate - manipulate file space

这是一个不可移植的、特定于 Linux 的系统调用。 For the portable, POSIX.1-specified method of ensuring that space is allocated for a file, see posix_fallocate(3).

fallocate() 允许调用者直接 *** 作 fd 引用的文件所分配的磁盘空间， *** 作的字节范围为[ offset , offset + len ]。

mode 参数确定要在给定范围上执行的 *** 作。支持的 *** 作的详细信息在下面的小节中给出。

fallocate()的默认 *** 作（即 mode =0）是在参数 offset 和 len 指定的范围内分配磁盘空间。如果 offset + len 大于文件的大小，则文件大小将被修改。超过原范围的区域将会被初始化为0。此默认行为与 posix_fallocate(3) 库函数的行为非常相似，是实现 posix_fallocate(3) 的最佳实现方法。

调用成功后，后续写入 offset 和 len 指定的范围不会因为磁盘空间不足而失败。

注：这样做的有什么用呢？根据博客用fallocate进行"文件预留"或"文件打洞" ，可以有以下好处：

（1）可以让文件尽可能的占用连续的磁盘扇区，减少后续写入和读取文件时的磁盘寻道开销；

（2）迅速占用磁盘空间，防止使用过程中所需空间不足。

（3）后面再追加数据的话，不会需要改变文件大小，所以后面将不涉及metadata的修改

如果在mode中指定了 FALLOC_FL_KEEP_SIZE 标志，调用的行为类似，即依然会为文件分配磁盘空间,但是不会修改文件大小。这种预分配的方式可以用来优化文件的append *** 作，也就是在执行append的时候不需要再额外申请磁盘空间了。

如果在 mode 中指定了 FALLOC_FL_UNSHARE_RANGE 标志，则共享文件数据范围将成为文件私有的，以保证后续写入不会因空间不足而失败。通常，这将通过对文件中的所有共享数据执行写时复制 *** 作来完成。并非所有文件系统都支持此标志。

由于分配是以块大小的块完成的，fallocate() 可能会分配比指定范围更大的磁盘空间。

当mode指定为 FALLOC_FL_PUNCH_HOLE 时，会释放指定范围内的空间，即创建一个空洞。在指定的范围内，部分的文件块(即文件块部分属于该范围)将会被置为0，全部的在范围内的文件块，将会被从文件系统中删除。成功调用后，后续的读取将会返回0。

FALLOC_FL_PUNCH_HOLE 必须和 FALLOC_FL_KEEP_SIZE 通过或运算一起使用，换句话说， FALLOC_FL_PUNCH_HOLE 是不能修改文件的大小的。

并非所有文件系统都支持 FALLOC_FL_PUNCH_HOLE；如果文件系统不支持该 *** 作，则返回错误。至少以下文件系统支持该 *** 作：

当mode指定为 FALLOC_FL_COLLAPSE_RANGE 标志时，将从文件中删除指定的字节范围，而不会留下空洞。 *** 作完成后，从 offset + len 开始位置的文件内容将会被追加到 offset 处。文件大小会减少 len

文件系统可能会限制 *** 作的粒度，以确保有效实施。通常，offset 和 len 必须是文件系统逻辑块大小的倍数，这取决于文件系统类型和配置。如果文件系统有这样的要求，如果违反了该要求，fallocate() 将失败并显示错误 EINVAL。

If the region specified by offset plus len reaches or passes the end of file, an error is returnedinstead, use ftruncate(2) to truncate a file.

FALLOC_FL_COLLAPSE_RANGE 标志和其他标志不兼容。

在 Linux 3.15 中，ext4（only for extent-based files）和 XFS 支持 FALLOC_FL_COLLAPSE_RANGE 标志。

当mode指定为 FALLOC_FL_COLLAPSE_RANGE 标志时，将会指定范围内分配磁盘空间，填补空洞。成功调用后后续读取将会返回0。

Zeroing is done within the filesystem preferably by converting the range into unwritten extents. This approach means that the specified range will not be physically zeroed out on the device (except for partial blocks at the either end of the range), and I/O is (otherwise) required only to update metadata.可能的意思是，最好不要将物理磁盘清零，而是配置一个为写入的状态，这样读取上来的pagecache就是0。这种情况下，仅仅需要修改文件元数据就可有了。

如果在mode中额外指定了FALLOC_FL_KEEP_SIZE标志，调用的行为类似，但即使offset+len大于文件大小，文件大小也不会改变。此行为与在指定 FALLOC_FL_KEEP_SIZE 的情况下预分配空间时相同。

并非所有文件系统都支持 FALLOC_FL_ZERO_RANGE；如果文件系统不支持该 *** 作，则返回错误。至少以下文件系统支持该 *** 作：

如果在mode中额外指定了 FALLOC_FL_INSERT_RANGE 标志，那么将会在offset开始的位置插入一个大小为len的空洞，在不覆盖文件内容的前提下增加文件的空间。

此模式在 *** 作粒度方面与 FALLOC_FL_COLLAPSE_RANGE 具有相同的限制。如果不满足粒度要求，fallocate() 将失败并显示错误 EINVAL。如果偏移量等于或大于文件末尾，则返回错误。对于此类 *** 作（即在文件末尾插入一个洞），应使用 ftruncate(2)。

FALLOC_FL_INSERT_RANGE 标志与其他标志不兼容。

目前只有XFS (since Linux 4.1) 和 ext4 (since Linux 4.2)支持此标志。

n success, fallocate() returns zero. On error, -1 is returned and errno is set to indicate the error.

fallocate() is available on Linux since kernel 2.6.23. Support is provided by glibc since version 2.10. The FALLOC_FL_* flags are defined in glibc headers only since version 2.18.

fallocate() is Linux-specific.

man page 其实就是Manual Page的意思，使用方法是 man 命令，然后就会显示某个命令的所有官方说明和用法。这个是基础中的基础。

但是苦于Man Page和历史上所有的说明书一样，实在是太官方太枯燥了，所以我们可以看到一些衍生品：

TLDR 的意思是Too longDont' read. 这个词在写文章时代表接下来要出现一个很长的内容了，但是在Linux中其实代表着相反的意思：把大长篇的说明简化为两三句话，直入重点展示命令的用法。

tldr 是Linux命令行工具，官网在此。安装方式如下：

注意：各种设备、平台上的安装方法都不同，请到官网看详情。

相当与 tldr 的社区版，即社区可以贡献每种命令的使用事例，然后通过投票方式排名。所以bropage每次执行都是需要联网查询的。

用法是： bro 命令

Anatomy of Linux flash file systems

Options and architectures

Summary: You've probably heard of Journaling Flash File System (JFFS) and Yet

Another Flash File System (YAFFS), but do you know what it means to have a file

system that assumes an underlying flash device? This article introduces you to

flash file systems for Linux®, explores how they care for their underlying

consumable devices (flash parts) through wear leveling, and identifies the

various flash file systems available along with their fundamental designs.

Solid-state drives are all the rage these days, but embedded systems have

used solid-state devices for storage for quite some time. You'll find flash

file systems used in personal digital assistants (PDAs), cell phones, MP3

players, digital cameras, USB flash drives (UFDs), and even laptop computers.

In many cases, the file systems for commercial devices can be custom and

proprietary, but they face the same challenges discussed below.

Flash-based file systems come in a variety of forms. This article explores

a couple of the read-only file systems and also reviews the various read/write

file systems available today and how they work. But first, let's explore the

flash devices and the challenges that they introduce.

Flash memory technologies

Flash memory, which can come in several different technologies, is non-volatile

memory, which means that its contents persist after its source of power is

removed. For a great history of flash memory devices, see Resources.

Two of the most common types of flash devices are defined by their

respective technologies: NOR and NAND. NOR-based flash is the older technology

that supported high read performance at the expense of smaller capacities. NAND

flash offers higher capacities with significantly faster write and erase

performance. NAND also requires a much more complicated input/output (I/O)

interface.

Flash parts are commonly divided into partitions, which allows

multiple operations to occur simultaneously (erasing one partition while

reading from another). Partitions are further divided into blocks

(commonly 64KB or 128KB in size). Firmware that uses the partitions can further

apply unique segmenting to the blocks—for example, 512-byte segments within a

block, not including metadata.

Flash devices exhibit a common constraint that requires device management

when compared to other storage devices such as RAM disks. The only Write

operation permitted on a flash memory device is to change a bit from a one to a

zero. If the reverse operation is needed, then the block must be erased (to

reset all bits to the one state). This means that other valid data within the

block must be moved for it to persist. NOR flash memory can typically be

programmed a byte at a time, whereas NAND flash memory must be programmed in

multi-byte bursts (typically, 512 bytes).

The process of erasing a block differs between the two memory types. Each

requires a special Erase operation that covers an entire block of the flash

memory. NOR technology requires a precursor step to clear all values to zero

before the Erase operation can begin. An Erase is a special operation

with the flash device and can be time-consuming. Erasing is an electrical

operation that drains the electrons from each cell in an entire block.

NOR flash devices typically require seconds for the Erase operation,

whereas a NAND device can erase in milliseconds. A key characteristic of flash

devices is the number of Erase operations that can be performed. In a NOR

device, each block in the flash memory can be erased up to 100,000 times. NAND

flash memories can be erased up to one million times.

Flash memory challenges

In addition to and as a result of the constraints explored in the previous

section, managing flash devices presents several challenges. The three most

important are garbage collection, managing bad blocks, and wear leveling.

Garbage collection

Garbage collection is the process of reclaiming invalid blocks (those that

contain some amount of invalid data). Reclamation involves moving the valid

data to a new block, and then erasing the invalid block to make it available.

This process is commonly done in the background or as needed, if the file

system is low on available space.

Managing bad blocks

Over time, flash devices can develop bad blocks through use and can even

ship from the manufacturer with blocks that are bad and cannot be used. You can

detect the presence of back blocks from a failed flash operation (such as an

Erase) or an invalid Write operation (discovered through an invalid Error

Correction Code, or ECC).

After bad blocks have been identified, they are marked within the flash

itself in a bad block table. How this is done is device-dependent but can be

implemented with a separate set of reserved blocks managed separately from

normal data blocks. The process of handling bad blocks—whether they ship with

the device or appear over time—is called bad block management. In some

cases, this functionality is implemented in hardware by an internal

microcontroller and is therefore transparent to the upper-level file system.

Wear leveling

Recall that flash devices are consumable parts: You can perform a finite

number of Erase cycles on each block before the block becomes bad (and must

therefore be tagged by bad block management). To maximize the life of the

flash, wear-leveling algorithms are provided. Wear leveling comes in two

varieties: dynamic wear leveling and static wear leveling.

Dynamic wear leveling addresses the problem of a limited number of Erase

cycles for a given block. Rather than randomly using blocks as they are

available, dynamic wear-leveling algorithms attempt to evenly distribute the

use of blocks so that each gets uniform use. Static wear-leveling algorithms

address an even more interesting problem. In addition to a maximum number of

Erase cycles, certain flash devices suffer from a maximum number of Read cycles

between Erase cycles. This means that if data sits for too long in a block and

is read too many times, the data can dissipate and result in data loss. Static

wear-leveling algorithms address this by periodically moving stale data to new

blocks.

System architecture

So far, I've explored flash devices and their fundamental challenges. Now,

look at how these pieces come together as part of a layered architecture (see

Figure 1). At the top is the virtual file system (VFS), which presents a common

interface to higher-level applications. The VFS is followed by the flash file

system, which will be covered in the next section. Next is the Flash

Translation Layer (FTL), which provides for overall management of the flash

device, including allocation of blocks from the underlying flash device as well

as address translation, dynamic wear leveling, and garbage collection. In some

flash devices, a portion of the FTL can be implemented in hardware.

The Linux kernel uses the Memory Technology Device (MTD) interface, which

is a generic interface for flash devices. The MTD can automatically detect the

width of the flash device bus and the number of devices necessary for

implementing the bus width.

Flash file systems

Several flash file systems are available for Linux. The next sectionsexplain the design and advantages of each.

Journaling Flash File System

One of the earliest flash file systems for Linux is called the Journaling

Flash File System. JFFS is a log-structured file system that was designed

for NOR flash devices. It was unique and addressed a variety of problems with

flash devices, but it created another.

JFFS viewed the flash device as a circular log of blocks. Data written to

the flash is written to the tail, and blocks at the head are reclaimed. The

space between the tail and head is free spacewhen this space becomes low, the

garbage collector is executed. The garbage collector moves valid blocks to the

tail of the log, skips invalid or obsolete blocks, and erases them (see Figure

2). The result is a file system that is automatically wear leveled both

statically and dynamically. The fundamental problem with this architecture is

that the flash device is erased too often (instead of an optimal erase

strategy), which wears the device out too quickly.

When a JFFS is mounted, the structural details are read into memory, whichcan be slow at mount-time and consume more memory than desired.

Journaling Flash File System 2

Although JFFS was very useful in its time, its wear-leveling algorithm

tended to shorten the life of NOR flash devices. The result was a redesign of

the underlying algorithm to remove the circular log. The JFFS2 algorithm was

designed for NAND flash devices and also includes improved performance with

compression.

In JFFS2, each block in the flash is treated independently. JFFS2 maintains

block lists to sufficiently wear-level the device. The clean list represents

blocks on the device that are full of valid nodes. The dirty list contains

blocks with at least one obsoleted node. Finally, the free list represents the

blocks that have been erased and are available for use.

The garbage collection algorithm can then intelligently decide what to

reclaim in a reasonable way. Currently, the algorithm probabilistically selects

from the clean or dirty list. The dirty list is selected 99 percent of the time

to reclaim blocks (moving the valid contents to another block), and the clean

list is selected 1 percent of the time (simply moving the contents to a new

block). In both cases, the selected block is erased and placed on the free list

(see Figure 3). This allows the garbage collector to re-use blocks that are

obsoleted (or partially so) but still move data around the flash to support

static wear leveling.

Yet Another Flash File System

YAFFS is another flash file system developed for NAND flash. The initial

version (YAFFS) supported flash devices with 512-byte pages, but the newer

version (YAFFS2) supports newer devices with larger page sizes and greater

Write constraints.

In most flash file systems, obsolete blocks are marked as such, but YAFFS2

additionally marks blocks with monotonically increasing sequence numbers. When

the file system is scanned at mount time, the valid inodes can be quickly

identified. YAFFS also maintains trees in RAM to represent the block structure

of the flash device, including fast mounting through checkpointing —the

process of saving the RAM tree structure to the flash device on a normal

unmount so that it can be quickly read and restored to RAM at mount time (see

Figure 4). Mount-time performance is a great advantage of YAFFS2 over other

flash file systems.

Read-only compressed file systems

In some embedded systems, there's no need to provide a mutable file system:

An immutable one will suffice. Linux supports a variety of read-only file

systems, two of the most useful are cramfs and SquashFS.

Cramfs

The cramfs file system is a compressed read-only Linux file system that can

exist within flash devices. The primary characteristics of cramfs are that it

is both simple and space-efficient. This file system is used in small-footprint

embedded designs.

While cramfs metadata is not compressed, cramfs uses zlib compression on a

per-page basis to allow random page access (pages are decompressed upon

access).

You can play with cramfs using the mkcramfs utility and the loopbackdevice.

SquashFS

SquashFS is another compressed read-only Linux file system that is useful

within flash devices. You'll also find SquashFS in numerous Live CD Linux

distributions. In addition to supporting zlib for compression, SquashFS uses

Lembel-Ziv-Markov chain Algorithm (LZMA) for improved compression and speed.

Like cramfs, you can use SquashFS on a standard Linux system withmksquashfs and the loopback device.

Going further

Like most of open source, software continues to evolve, and new flash file

systems are under development. An interesting alternative still in development

is LogFS, which includes some very novel ideas. For example, LogFS maintains a

tree structure on the flash device itself so that the mount times are similar

to traditional file systems, such as ext2. It also uses a wandering tree for

garbage collection (a form of B+tree). What makes LogFS particularly

interesting, however, is that it is very scalable and can support large flash

parts.

With the growing popularity of flash file systems, you'll see a

considerable amount of research being applied toward them. LogFS is one

example, but other options, such as UbiFS, are also growing. Flash file systems

are interesting architecturally and will continue to be a source of innovationin the future.

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/yw/7538137.html

linux手册翻译——fallocate(2)

发表评论

评论列表（0条）