Linux 权能综述_系统运维

为了执行权限检查，传统的 UNIX 实现区分两种类型的进程：特权进程（其有效用户 ID 为0，称为超级用户或 root），和非特权用户（其有效 UID 非0）。特权进程绕过所有的内核权限检查，而非特权进程受基于进程的认证信息（通常是：有效 UID，有效 GID，和补充组列表）的完整权限检查的支配。

自内核 2.2 版本开始，Linux 将传统上与超级用户关联的特权分为几个单元，称为 capabilities （权能），它们可以被独立的启用或禁用。权能是每个线程的属性。

下面的列表展示了 Linux 上实现的权能，以及每种权能允许的 *** 作或行为：

权能的完整实现需要：

在内核 2.6.24 之前，只有前两个要求能够满足；自内核 2.6.24 开始，所有三个要求都能满足。

每个线程具有三个包含零个或多个上面的权能的权能集合：

A child created via fork(2) inherits copies of its parent's capability sets. See below for a discussion of the treatment of capabilities during execve(2).

Using capset(2), a thread may manipulate its own capability sets (see below).

Since Linux 3.2, the file /proc/sys/kernel/cap_last_cap exposes the numerical value of the highest capability supported by the running kernelthis can be used to determine the highest bit that may be set in a capability set.

Since kernel 2.6.24, the kernel supports associating capability sets with an executable file using setcap(8). The file capability sets are stored in an extended attribute (see setxattr(2)) named security.capability. Writing to this extended attribute requires the CAP_SETFCAP capability. The file capability sets, in conjunction with the capability sets of the thread, determine the capabilities of a thread after an execve(2).

The three file capability sets are:

During an execve(2), the kernel calculates the new capabilities of the process using the following algorithm:

其中：

A privileged file is one that has capabilities or has the set-user-ID or set-group-ID bit set.

In order to provide an all-powerful root using capability sets, during an execve(2):

The upshot of the above rules, combined with the capabilities transformations described above, is that when a process execve(2)s a set-user-ID-root program, or when a process with an effective UID of 0 execve(2)s a program, it gains all capabilities in its permitted and effective capability sets, except those masked out by the capability bounding set. This provides semantics that are the same as those provided by traditional UNIX systems.

The capability bounding set is a security mechanism that can be used to limit the capabilities that can be gained during an execve(2). The bounding set is used in the following ways:

Note that the bounding set masks the file permitted capabilities, but not the inherited capabilities. If a thread maintains a capability in its inherited set that is not in its bounding set, then it can still gain that capability in its permitted set by executing a file that has the capability in its inherited set.

Depending on the kernel version, the capability bounding set is either a system-wide attribute, or a per-process attribute.

In kernels before 2.6.25, the capability bounding set is a system-wide attribute that affects all threads on the system. The bounding set is accessible via the file /proc/sys/kernel/cap-bound. (Confusingly, this bit mask parameter is expressed as a signed decimal number in /proc/sys/kernel/capbound.)

Only the init process may set capabilities in the capability bounding setother than that, the superuser (more precisely: programs with the CAP_SYS_MODULE capability) may only clear capabilities from this set.

On a standard system the capability bounding set always masks out the CAP_SETPCAP capability. To remove this restriction (dangerous!), modify the definition of CAP_INIT_EFF_SET in include/linux/capability.h and rebuild the kernel.

The system-wide capability bounding set feature was added to Linux starting with kernel version 2.2.11.

From Linux 2.6.25, the capability bounding set is a per-thread attribute. (There is no longer a systemwide capability bounding set.)

The bounding set is inherited at fork(2) from the thread's parent, and is preserved across an execve(2).

A thread may remove capabilities from its capability bounding set using the prctl(2) PR_CAPBSET_DROP operation, provided it has the CAP_SETPCAP capability. Once a capability has been dropped from the bounding set, it cannot be restored to that set. A thread can determine if a capability is in its bounding set using the prctl(2) PR_CAPBSET_READ operation.

Removing capabilities from the bounding set is supported only if file capabilities are compiled into the kernel. In kernels before Linux 2.6.33, file capabilities were an optional feature configurable via the CONFIG_SECURITY_FILE_CAPABILITIES option. Since Linux 2.6.33, the configuration option has been removed and file capabilities are always part of the kernel. When file capabilities are compiled into the kernel, the init process (the ancestor of all processes) begins with a full bounding set. If file capabilities are not compiled into the kernel, then init begins with a full bounding set minus CAP_SETPCAP, because this capability has a different meaning when there are no file capabilities.

Removing a capability from the bounding set does not remove it from the thread's inherited set. However it does prevent the capability from being added back into the thread's inherited set in the future.

To preserve the traditional semantics for transitions between 0 and nonzero user IDs, the kernel makes the following changes to a thread's capability sets on changes to the thread's real, effective, saved set, and filesystem user IDs (using setuid(2), setresuid(2), or similar):

If a thread that has a 0 value for one or more of its user IDs wants to prevent its permitted capability set being cleared when it resets all of its user IDs to nonzero values, it can do so using the prctl(2) PR_SET_KEEPCAPS operation or the SECBIT_KEEP_CAPS securebits flag described below.

A thread can retrieve and change its capability sets using the capget(2) and capset(2) system calls. However, the use of cap_get_proc(3) and cap_set_proc(3), both provided in the libcap package, is preferred for this purpose. The following rules govern changes to the thread capability sets:

Starting with kernel 2.6.26, and with a kernel in which file capabilities are enabled, Linux implements a set of per-thread securebits flags that can be used to disable special handling of capabilities for UID 0 (root). These flags are as follows:

Each of the above "base" flags has a companion "locked" flag. Setting any of the "locked" flags is irreversible, and has the effect of preventing further changes to the corresponding "base" flag. The locked flags are: SECBIT_KEEP_CAPS_LOCKED, SECBIT_NO_SETUID_FIXUP_LOCKED, SECBIT_NOROOT_LOCKED, and SECBIT_NO_CAP_AMBIENT_RAISE.

The securebits flags can be modified and retrieved using the prctl(2) PR_SET_SECUREBITS and PR_GET_SECUREBITS operations. The CAP_SETPCAP capability is required to modify the flags.

The securebits flags are inherited by child processes. During an execve(2), all of the flags are preserved, except SECBIT_KEEP_CAPS which is always cleared.

An application can use the following call to lock itself, and all of its descendants, into an environment where the only way of gaining capabilities is by executing a program with associated file capabilities:

For a discussion of the interaction of capabilities and user namespaces, see user_namespaces(7).

No standards govern capabilities, but the Linux capability implementation is based on the withdrawn POSIX.1e draft standardsee ⟨ http://wt.tuxomania.net/publications/posix.1e/ ⟩.

From kernel 2.5.27 to kernel 2.6.26, capabilities were an optional kernel component, and can be enabled/disabled via the CONFIG_SECURITY_CAPABILITIES kernel configuration option.

The /proc/PID/task/TID/status file can be used to view the capability sets of a thread. The /proc/PID/status file shows the capability sets of a process's main thread. Before Linux 3.8, nonexistent capabilities were shown as being enabled (1) in these sets. Since Linux 3.8, all nonexistent capabilities (above CAP_LAST_CAP) are shown as disabled (0).

The libcap package provides a suite of routines for setting and getting capabilities that is more comfortable and less likely to change than the interface provided by capset(2) and capget(2). This package also provides the setcap(8) and getcap(8) programs. It can be found at ⟨ http://www.kernel.org/pub/linux/libs/security/linux-privs ⟩.

Before kernel 2.6.24, and from kernel 2.6.24 to kernel 2.6.32 if file capabilities are not enabled, a thread with the CAP_SETPCAP capability can manipulate the capabilities of threads other than itself. However, this is only theoretically possible, since no thread ever has CAP_SETPCAP in either of these cases:

capsh(1), setpriv(1), prctl(2), setfsuid(2), cap_clear(3), cap_copy_ext(3), cap_from_text(3), cap_get_file(3), cap_get_proc(3), cap_init(3), capgetp(3), capsetp(3), libcap(3), credentials(7), user_namespaces(7), pthreads(7), getcap(8), setcap(8)

include/linux/capability.h in the Linux kernel source tree

This page is part of release 4.04 of the Linux man-pages project. A description of the project, information about reporting bugs, and the latest version of this page, can be found at http://www.kernel.org/doc/man-pages/ .

description: "使用 CAP 为容器提权"

date: 2021.11.07 10:34

categories:

- K8s

tags: [Linux, K8s]

keywords: Linux, CAP, capabilities

原文地址： https://www.jianshu.com/p/a7f6c4f420fa

发音

译为能力或功能，一般缩写 CAP ，以下我们简称 Capabilities 为 CAP

从内核 2.2 开始， Linux 将传统上与超级用户 root 关联的特权划分为不同的单元，称为 CAP 。

CAP 作为线程( Linux 并不真正区分进程和线程)的属性存在，每个单元可以独立启用和禁用。

如此一来，权限检查的过程就变成了：

在执行特权 *** 作时，如果进程的有效身份不是 root ，就去检查是否具有该特权 *** 作所对应的 CAP ，并以此决定是否可以进行该特权 *** 作。

比如要向进程发送信号( kill() )，就得具有 CAP_KILL ；如果设置系统时间，就得具有 CAP_SYS_TIME 。

在 CAP 出现之前，系统进程分为两种：

特权进程可以做所有的事情: 进行管理级别的内核调用；而非特权进程被限制为标准用户的子集调用

某些可执行文件需要由标准用户运行，但也需要进行有特权的内核调用，它们需要设置 suid 位，从而有效地授予它们特权访问权限。(典型的例子是 ping ，它被授予进行 ICMP 调用的完全特权访问权。)

这些可执行文件是黑客关注的主要目标——如果他们可以利用其中的漏洞，他们就可以在系统上升级他们的特权级别。

由此内核开发人员提出了一个更微妙的解决方案: CAP 。

意图很简单: 将所有可能的特权内核调用划分为相关功能组，赋予进程所需要的功能子集。

因此，内核调用被划分为几十个不同的类别，在很大程度上是成功的。

回到 ping 的例子， CAP 的出现使得它仅被赋予一个 CAP_NET_RAW 功能，就能实现所需功能，这大大降低了安全风险。

注意： 比较老的 *** 作系统上，会通过为 ping 添加 SUID 权限的方式，实现普通用户可使用。

这存在很大的安全隐患，笔者所用 *** 作系统（ CentOS7 ）上 ping 指令已通过 CAP 方式实现

Set capabilities for a Container

基于 Linux capabilities ，您可以授予某个进程某些特权，而不授予 root 用户的所有特权。

要为容器添加或删除 Linux 功能，请在容器清单的 securityContext 部分中包含 capability 字段。

输出显示了容器的进程 id ( pid ):

输出显示了进程的能力位图:

解码

接下来，运行一个与前一个容器相同的容器，只是它有额外的功能集。

进程的能力位图:

进程的能力位图值解码

有关常 capability 数的定义，请参阅 capability.h 。

注意: Linux capability 常量的形式是 CAP_XXX 。

但是，当您在容器清单中列出功能时，必须忽略常量的 CAP_ 部分。

例如，要添加 CAP_SYS_TIME ，请在功能列表中包含 SYS_TIME 。

这里我们介绍进程状态中与 Capabilities 相关的几个值:

借用上述例子中未配置 CAP 的进程能力位图

对比发现，容器运行时内的 root 用户并非拥有全部权限，仅仅是默认拥有 14 条权限，其他权限如果使用需要额外开启。

显然当镜像指定 USER 为非特权用户运行时， CAP 配置并不生效

Linux Capabilities 简介

Linux Capabilities: Why They Exist and How They Work

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/yw/8318896.html

Linux 权能综述

发表评论

评论列表（0条）