aboutsummaryrefslogtreecommitdiff
path: root/Documentation/sysctl
diff options
context:
space:
mode:
authorPeter Xu2019-05-13 17:16:41 -0700
committerLinus Torvalds2019-05-14 09:47:45 -0700
commitcefdca0a86be517bc390fc4541e3674b8e7803b0 (patch)
treef85716c23f5e1356c8e5213162489a04d40b06f9 /Documentation/sysctl
parentf0fd50504a54f5548eb666dc16ddf8394e44e4b7 (diff)
userfaultfd/sysctl: add vm.unprivileged_userfaultfd
Userfaultfd can be misued to make it easier to exploit existing use-after-free (and similar) bugs that might otherwise only make a short window or race condition available. By using userfaultfd to stall a kernel thread, a malicious program can keep some state that it wrote, stable for an extended period, which it can then access using an existing exploit. While it doesn't cause the exploit itself, and while it's not the only thing that can stall a kernel thread when accessing a memory location, it's one of the few that never needs privilege. We can add a flag, allowing userfaultfd to be restricted, so that in general it won't be useable by arbitrary user programs, but in environments that require userfaultfd it can be turned back on. Add a global sysctl knob "vm.unprivileged_userfaultfd" to control whether userfaultfd is allowed by unprivileged users. When this is set to zero, only privileged users (root user, or users with the CAP_SYS_PTRACE capability) will be able to use the userfaultfd syscalls. Andrea said: : The only difference between the bpf sysctl and the userfaultfd sysctl : this way is that the bpf sysctl adds the CAP_SYS_ADMIN capability : requirement, while userfaultfd adds the CAP_SYS_PTRACE requirement, : because the userfaultfd monitor is more likely to need CAP_SYS_PTRACE : already if it's doing other kind of tracking on processes runtime, in : addition of userfaultfd. In other words both syscalls works only for : root, when the two sysctl are opt-in set to 1. [dgilbert@redhat.com: changelog additions] [akpm@linux-foundation.org: documentation tweak, per Mike] Link: http://lkml.kernel.org/r/20190319030722.12441-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Suggested-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Maxime Coquelin <maxime.coquelin@redhat.com> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Martin Cracauer <cracauer@cons.org> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Kees Cook <keescook@chromium.org> Cc: Mel Gorman <mgorman@suse.de> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation/sysctl')
-rw-r--r--Documentation/sysctl/vm.txt12
1 files changed, 12 insertions, 0 deletions
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 3f13d8599337..749322060f10 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -61,6 +61,7 @@ Currently, these files are in /proc/sys/vm:
- stat_refresh
- numa_stat
- swappiness
+- unprivileged_userfaultfd
- user_reserve_kbytes
- vfs_cache_pressure
- watermark_boost_factor
@@ -818,6 +819,17 @@ The default value is 60.
==============================================================
+unprivileged_userfaultfd
+
+This flag controls whether unprivileged users can use the userfaultfd
+system calls. Set this to 1 to allow unprivileged users to use the
+userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
+privileged users (with SYS_CAP_PTRACE capability).
+
+The default value is 1.
+
+==============================================================
+
- user_reserve_kbytes
When overcommit_memory is set to 2, "never overcommit" mode, reserve