Linux 权能综述_系统运维

为了执行权限检查，传统的 UNIX 实现区分两种类型的进程：特权进程（其有效用户 ID 为0，称为超级用户或 root），和非特权用户（其有效 UID 非0）。特权进程绕过所有的内核权限检查，而非特权进程受基于进程的认证信息（通常是：有效 UID，有效 GID，和补充组列表）的完整权限检查的支配。

自内核 2.2 版本开始，Linux 将传统上与超级用户关联的特权分为几个单元，称为 capabilities （权能），它们可以被独立的启用或禁用。权能是每个线程的属性。

下面的列表展示了 Linux 上实现的权能，以及每种权能允许的 *** 作或行为：

权能的完整实现需要：

在内核 2.6.24 之前，只有前两个要求能够满足；自内核 2.6.24 开始，所有三个要求都能满足。

每个线程具有三个包含零个或多个上面的权能的权能集合：

A child created via fork(2) inherits copies of its parent's capability sets. See below for a discussion of the treatment of capabilities during execve(2).

Using capset(2), a thread may manipulate its own capability sets (see below).

Since Linux 3.2, the file /proc/sys/kernel/cap_last_cap exposes the numerical value of the highest capability supported by the running kernelthis can be used to determine the highest bit that may be set in a capability set.

Since kernel 2.6.24, the kernel supports associating capability sets with an executable file using setcap(8). The file capability sets are stored in an extended attribute (see setxattr(2)) named security.capability. Writing to this extended attribute requires the CAP_SETFCAP capability. The file capability sets, in conjunction with the capability sets of the thread, determine the capabilities of a thread after an execve(2).

The three file capability sets are:

During an execve(2), the kernel calculates the new capabilities of the process using the following algorithm:

其中：

A privileged file is one that has capabilities or has the set-user-ID or set-group-ID bit set.

In order to provide an all-powerful root using capability sets, during an execve(2):

The upshot of the above rules, combined with the capabilities transformations described above, is that when a process execve(2)s a set-user-ID-root program, or when a process with an effective UID of 0 execve(2)s a program, it gains all capabilities in its permitted and effective capability sets, except those masked out by the capability bounding set. This provides semantics that are the same as those provided by traditional UNIX systems.

The capability bounding set is a security mechanism that can be used to limit the capabilities that can be gained during an execve(2). The bounding set is used in the following ways:

Note that the bounding set masks the file permitted capabilities, but not the inherited capabilities. If a thread maintains a capability in its inherited set that is not in its bounding set, then it can still gain that capability in its permitted set by executing a file that has the capability in its inherited set.

Depending on the kernel version, the capability bounding set is either a system-wide attribute, or a per-process attribute.

In kernels before 2.6.25, the capability bounding set is a system-wide attribute that affects all threads on the system. The bounding set is accessible via the file /proc/sys/kernel/cap-bound. (Confusingly, this bit mask parameter is expressed as a signed decimal number in /proc/sys/kernel/capbound.)

Only the init process may set capabilities in the capability bounding setother than that, the superuser (more precisely: programs with the CAP_SYS_MODULE capability) may only clear capabilities from this set.

On a standard system the capability bounding set always masks out the CAP_SETPCAP capability. To remove this restriction (dangerous!), modify the definition of CAP_INIT_EFF_SET in include/linux/capability.h and rebuild the kernel.

The system-wide capability bounding set feature was added to Linux starting with kernel version 2.2.11.

From Linux 2.6.25, the capability bounding set is a per-thread attribute. (There is no longer a systemwide capability bounding set.)

The bounding set is inherited at fork(2) from the thread's parent, and is preserved across an execve(2).

A thread may remove capabilities from its capability bounding set using the prctl(2) PR_CAPBSET_DROP operation, provided it has the CAP_SETPCAP capability. Once a capability has been dropped from the bounding set, it cannot be restored to that set. A thread can determine if a capability is in its bounding set using the prctl(2) PR_CAPBSET_READ operation.

Removing capabilities from the bounding set is supported only if file capabilities are compiled into the kernel. In kernels before Linux 2.6.33, file capabilities were an optional feature configurable via the CONFIG_SECURITY_FILE_CAPABILITIES option. Since Linux 2.6.33, the configuration option has been removed and file capabilities are always part of the kernel. When file capabilities are compiled into the kernel, the init process (the ancestor of all processes) begins with a full bounding set. If file capabilities are not compiled into the kernel, then init begins with a full bounding set minus CAP_SETPCAP, because this capability has a different meaning when there are no file capabilities.

Removing a capability from the bounding set does not remove it from the thread's inherited set. However it does prevent the capability from being added back into the thread's inherited set in the future.

To preserve the traditional semantics for transitions between 0 and nonzero user IDs, the kernel makes the following changes to a thread's capability sets on changes to the thread's real, effective, saved set, and filesystem user IDs (using setuid(2), setresuid(2), or similar):

If a thread that has a 0 value for one or more of its user IDs wants to prevent its permitted capability set being cleared when it resets all of its user IDs to nonzero values, it can do so using the prctl(2) PR_SET_KEEPCAPS operation or the SECBIT_KEEP_CAPS securebits flag described below.

A thread can retrieve and change its capability sets using the capget(2) and capset(2) system calls. However, the use of cap_get_proc(3) and cap_set_proc(3), both provided in the libcap package, is preferred for this purpose. The following rules govern changes to the thread capability sets:

Starting with kernel 2.6.26, and with a kernel in which file capabilities are enabled, Linux implements a set of per-thread securebits flags that can be used to disable special handling of capabilities for UID 0 (root). These flags are as follows:

Each of the above "base" flags has a companion "locked" flag. Setting any of the "locked" flags is irreversible, and has the effect of preventing further changes to the corresponding "base" flag. The locked flags are: SECBIT_KEEP_CAPS_LOCKED, SECBIT_NO_SETUID_FIXUP_LOCKED, SECBIT_NOROOT_LOCKED, and SECBIT_NO_CAP_AMBIENT_RAISE.

The securebits flags can be modified and retrieved using the prctl(2) PR_SET_SECUREBITS and PR_GET_SECUREBITS operations. The CAP_SETPCAP capability is required to modify the flags.

The securebits flags are inherited by child processes. During an execve(2), all of the flags are preserved, except SECBIT_KEEP_CAPS which is always cleared.

An application can use the following call to lock itself, and all of its descendants, into an environment where the only way of gaining capabilities is by executing a program with associated file capabilities:

For a discussion of the interaction of capabilities and user namespaces, see user_namespaces(7).

No standards govern capabilities, but the Linux capability implementation is based on the withdrawn POSIX.1e draft standardsee ⟨ http://wt.tuxomania.net/publications/posix.1e/ ⟩.

From kernel 2.5.27 to kernel 2.6.26, capabilities were an optional kernel component, and can be enabled/disabled via the CONFIG_SECURITY_CAPABILITIES kernel configuration option.

The /proc/PID/task/TID/status file can be used to view the capability sets of a thread. The /proc/PID/status file shows the capability sets of a process's main thread. Before Linux 3.8, nonexistent capabilities were shown as being enabled (1) in these sets. Since Linux 3.8, all nonexistent capabilities (above CAP_LAST_CAP) are shown as disabled (0).

The libcap package provides a suite of routines for setting and getting capabilities that is more comfortable and less likely to change than the interface provided by capset(2) and capget(2). This package also provides the setcap(8) and getcap(8) programs. It can be found at ⟨ http://www.kernel.org/pub/linux/libs/security/linux-privs ⟩.

Before kernel 2.6.24, and from kernel 2.6.24 to kernel 2.6.32 if file capabilities are not enabled, a thread with the CAP_SETPCAP capability can manipulate the capabilities of threads other than itself. However, this is only theoretically possible, since no thread ever has CAP_SETPCAP in either of these cases:

capsh(1), setpriv(1), prctl(2), setfsuid(2), cap_clear(3), cap_copy_ext(3), cap_from_text(3), cap_get_file(3), cap_get_proc(3), cap_init(3), capgetp(3), capsetp(3), libcap(3), credentials(7), user_namespaces(7), pthreads(7), getcap(8), setcap(8)

include/linux/capability.h in the Linux kernel source tree

This page is part of release 4.04 of the Linux man-pages project. A description of the project, information about reporting bugs, and the latest version of this page, can be found at http://www.kernel.org/doc/man-pages/ .

*** 作系统可能包含许多关于系统当前状态的信息。当系统发生变化时，这些数据结构必须做相应的改变以反映这些情况。例如，当用户登录进系统时将产生一个新的进程。核心必须创建表示新进程的数据结构，同时将它和系统中其他进程的数据结构连接在一起。大多数数据结构存在于物理内存中并只能由核心或者其子系统来访问。数据结构包括数据和指针；还有其他数据结构的地址或者子程序的地址。它们混在一起让Linux核心数据结构看上去非常混乱。尽管可能被几个核心子系统同时用到，每个数据结构都有其专门的用途。理解Linux核心的关键是理解它的数据结构以及Linux核心中 *** 纵这些数据结构的各种函数。本书把Linux核心的描叙重点放在数据结构上，主要讨论每个核心子系统的算法，完成任务的途径以及对核心数据结构的使用。

2.3.1 连接列表

Linux使用的许多软件工程的技术来连接它的数据结构。在许多场合下，它使用linked或者chained数据结构。每个数据结构描叙某一事物，比如某个进程或网络设备，核心必须能够访问到所有这些结构。在链表结构中，个根节点指针包含第一个结构的地址，而在每个结构中又包含表中下一个结构的指针。表的最后一项必须是0或者NULL，以表明这是表的尾部。在双向链表中，每个结构包含着指向表中前一结构和后一结构的指针。使用双向链表的好处在于更容易在表的中部添加与删除节点，但需要更多的内存 *** 作。这是一种典型的 *** 作系统开销与CPU循环之间的折中。

2.3.2 散列表

链表用来连接数据结构比较方便，但链表的 *** 作效率不高。如果要搜寻某个特定内容，我们可能不得不遍历整个链表。Linux使用另外一种技术:散列表来提高效率。散列表是指针的数组或向量，指向内存中连续的相邻数据集合。散列表中每个指针元素指向一个独立链表。如果你使用数据结构来描叙村子里的人，则你可以使用年龄作为索引。为了找到某个人的数据，可以在人口散列表中使用年龄作为索引，找到包含此人特定数据的数据结构。但是在村子里有很多人的年龄相同，这样散列表指针变成了一个指向具有相同年龄的人数据链表的指针。搜索这个小链表的速度显然要比搜索整个数据链表快得多。由于散列表加快了对数据结构的访问速度，Linux经常使用它来实现Caches。Caches是保存经常访问的信息的子集。经常被核心使用的数据结构将被放入Cache中保存。Caches的缺点是比使用和维护单一链表和散列表更复杂。寻找某个数据结构时，如果在Cache中能够找到（这种情况称为cache 命中），这的确很不错。但是如果没有找到，则必须找出它，并且添加到Cache中去。如果Cache空间已经用完则Linux必须决定哪一个结构将从其中抛弃，但是有可能这个要抛弃的数据就是Linux下次要使用的数据。

2.3.3 抽象接口

Linux核心常将其接口抽象出来。接口指一组以特定方式执行的子程序和数据结构的集合。例如，所有的网络设备驱动必须提供对某些特定数据结构进行 *** 作的子程序。通用代码可能会使用底层的某些代码。例如网络层代码是通用的，它得到遵循标准接口的特定设备相关代码的支持。通常在系统启动时，底层接口向更高层接口注册(Register)自身。这些注册 *** 作包括向链表中加入结构节点。例如，构造进核心的每个文件系统在系统启动时将其自身向核心注册。文件/proc/filesysems中可以看到已经向核心注册过的文件系统。注册数据结构通常包括指向函数的指针，以文件系统注册为例，它向Linux核心注册时必须将那些mount文件系统连接时使用的一些相关函数的地址传入。

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/yw/7317575.html

Linux 权能综述

发表评论

评论列表（0条）