2017-11-01 14:08:43 +00:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
|
shm: add memfd_create() syscall
memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
that you can pass to mmap(). It can support sealing and avoids any
connection to user-visible mount-points. Thus, it's not subject to quotas
on mounted file-systems, but can be used like malloc()'ed memory, but with
a file-descriptor to it.
memfd_create() returns the raw shmem file, so calls like ftruncate() can
be used to modify the underlying inode. Also calls like fstat() will
return proper information and mark the file as regular file. If you want
sealing, you can specify MFD_ALLOW_SEALING. Otherwise, sealing is not
supported (like on all other regular files).
Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
subject to a filesystem size limit. It is still properly accounted to
memcg limits, though, and to the same overcommit or no-overcommit
accounting as all user memory.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 21:25:29 +00:00
|
|
|
#ifndef _UAPI_LINUX_MEMFD_H
|
|
|
|
|
#define _UAPI_LINUX_MEMFD_H
|
|
|
|
|
|
2017-09-06 23:24:16 +00:00
|
|
|
#include <asm-generic/hugetlb_encode.h>
|
|
|
|
|
|
shm: add memfd_create() syscall
memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
that you can pass to mmap(). It can support sealing and avoids any
connection to user-visible mount-points. Thus, it's not subject to quotas
on mounted file-systems, but can be used like malloc()'ed memory, but with
a file-descriptor to it.
memfd_create() returns the raw shmem file, so calls like ftruncate() can
be used to modify the underlying inode. Also calls like fstat() will
return proper information and mark the file as regular file. If you want
sealing, you can specify MFD_ALLOW_SEALING. Otherwise, sealing is not
supported (like on all other regular files).
Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
subject to a filesystem size limit. It is still properly accounted to
memcg limits, though, and to the same overcommit or no-overcommit
accounting as all user memory.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 21:25:29 +00:00
|
|
|
/* flags for memfd_create(2) (unsigned int) */
|
|
|
|
|
#define MFD_CLOEXEC 0x0001U
|
|
|
|
|
#define MFD_ALLOW_SEALING 0x0002U
|
2017-09-06 23:24:16 +00:00
|
|
|
#define MFD_HUGETLB 0x0004U
|
mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
The new MFD_NOEXEC_SEAL and MFD_EXEC flags allows application to set
executable bit at creation time (memfd_create).
When MFD_NOEXEC_SEAL is set, memfd is created without executable bit
(mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to be
executable (mode: 0777) after creation.
when MFD_EXEC flag is set, memfd is created with executable bit
(mode:0777), this is the same as the old behavior of memfd_create.
The new pid namespaced sysctl vm.memfd_noexec has 3 values:
0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
MFD_EXEC was set.
1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
MFD_NOEXEC_SEAL was set.
2: memfd_create() without MFD_NOEXEC_SEAL will be rejected.
The sysctl allows finer control of memfd_create for old-software that
doesn't set the executable bit, for example, a container with
vm.memfd_noexec=1 means the old-software will create non-executable memfd
by default. Also, the value of memfd_noexec is passed to child namespace
at creation time. For example, if the init namespace has
vm.memfd_noexec=2, all its children namespaces will be created with 2.
[akpm@linux-foundation.org: add stub functions to fix build]
[akpm@linux-foundation.org: remove unneeded register_pid_ns_ctl_table_vm() stub, per Jeff]
[akpm@linux-foundation.org: s/pr_warn_ratelimited/pr_warn_once/, per review]
[akpm@linux-foundation.org: fix CONFIG_SYSCTL=n warning]
Link: https://lkml.kernel.org/r/20221215001205.51969-4-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@google.com>
Co-developed-by: Daniel Verkamp <dverkamp@chromium.org>
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-15 00:12:03 +00:00
|
|
|
/* not executable and sealed to prevent changing to executable. */
|
|
|
|
|
#define MFD_NOEXEC_SEAL 0x0008U
|
|
|
|
|
/* executable */
|
|
|
|
|
#define MFD_EXEC 0x0010U
|
2017-09-06 23:24:16 +00:00
|
|
|
|
|
|
|
|
/*
|
|
|
|
|
* Huge page size encoding when MFD_HUGETLB is specified, and a huge page
|
|
|
|
|
* size other than the default is desired. See hugetlb_encode.h.
|
|
|
|
|
* All known huge page size encodings are provided here. It is the
|
|
|
|
|
* responsibility of the application to know which sizes are supported on
|
|
|
|
|
* the running system. See mmap(2) man page for details.
|
|
|
|
|
*/
|
|
|
|
|
#define MFD_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT
|
|
|
|
|
#define MFD_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK
|
|
|
|
|
|
|
|
|
|
#define MFD_HUGE_64KB HUGETLB_FLAG_ENCODE_64KB
|
|
|
|
|
#define MFD_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB
|
|
|
|
|
#define MFD_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB
|
|
|
|
|
#define MFD_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB
|
|
|
|
|
#define MFD_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB
|
|
|
|
|
#define MFD_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB
|
2018-10-05 22:51:54 +00:00
|
|
|
#define MFD_HUGE_32MB HUGETLB_FLAG_ENCODE_32MB
|
2017-09-06 23:24:16 +00:00
|
|
|
#define MFD_HUGE_256MB HUGETLB_FLAG_ENCODE_256MB
|
2018-10-05 22:51:54 +00:00
|
|
|
#define MFD_HUGE_512MB HUGETLB_FLAG_ENCODE_512MB
|
2017-09-06 23:24:16 +00:00
|
|
|
#define MFD_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB
|
|
|
|
|
#define MFD_HUGE_2GB HUGETLB_FLAG_ENCODE_2GB
|
|
|
|
|
#define MFD_HUGE_16GB HUGETLB_FLAG_ENCODE_16GB
|
shm: add memfd_create() syscall
memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
that you can pass to mmap(). It can support sealing and avoids any
connection to user-visible mount-points. Thus, it's not subject to quotas
on mounted file-systems, but can be used like malloc()'ed memory, but with
a file-descriptor to it.
memfd_create() returns the raw shmem file, so calls like ftruncate() can
be used to modify the underlying inode. Also calls like fstat() will
return proper information and mark the file as regular file. If you want
sealing, you can specify MFD_ALLOW_SEALING. Otherwise, sealing is not
supported (like on all other regular files).
Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
subject to a filesystem size limit. It is still properly accounted to
memcg limits, though, and to the same overcommit or no-overcommit
accounting as all user memory.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 21:25:29 +00:00
|
|
|
|
|
|
|
|
#endif /* _UAPI_LINUX_MEMFD_H */
|