linux-kernelorg-stable/fs
Lorenzo Stoakes 5dba5cc2e0 mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smaps
Patch series "introduce VM_MAYBE_GUARD and make it sticky", v4.

Currently, guard regions are not visible to users except through
/proc/$pid/pagemap, with no explicit visibility at the VMA level.

This makes the feature less useful, as it isn't entirely apparent which
VMAs may have these entries present, especially when performing actions
which walk through memory regions such as those performed by CRIU.

This series addresses this issue by introducing the VM_MAYBE_GUARD flag
which fulfils this role, updating the smaps logic to display an entry for
these.

The semantics of this flag are that a guard region MAY be present if set
(we cannot be sure, as we can't efficiently track whether an
MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if
not set the VMA definitely does NOT have any guard regions present.

It's problematic to establish this flag without further action, because
that means that VMAs with guard regions in them become non-mergeable with
adjacent VMAs for no especially good reason.

To work around this, this series also introduces the concept of 'sticky'
VMA flags - that is flags which:

a. if set in one VMA and not in another still permit those VMAs to be
   merged (if otherwise compatible).

b. When they are merged, the resultant VMA must have the flag set.

The VMA logic is updated to propagate these flags correctly.

Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve
an issue with file-backed guard regions - previously these established an
anon_vma object for file-backed mappings solely to have vma_needs_copy()
correctly propagate guard region mappings to child processes.

We introduce a new flag alias VM_COPY_ON_FORK (which currently only
specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly
for this flag and to copy page tables if it is present, which resolves
this issue.

Additionally, we add the ability for allow-listed VMA flags to be
atomically writable with only mmap/VMA read locks held.

The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure
does not cause any races by being allowed to do so.

This allows us to maintain guard region installation as a read-locked
operation and not endure the overhead of obtaining a write lock here.

Finally we introduce extensive VMA userland tests to assert that the
sticky VMA logic behaves correctly as well as guard region self tests to
assert that smaps visibility is correctly implemented.


This patch (of 9):

Currently, if a user needs to determine if guard regions are present in a
range, they have to scan all VMAs (or have knowledge of which ones might
have guard regions).

Since commit 8e2f2aeb8b ("fs/proc/task_mmu: add guard region bit to
pagemap") and the related commit a516403787 ("fs/proc: extend the
PAGEMAP_SCAN ioctl to report guard regions"), users can use either
/proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this
operation at a virtual address level.

This is not ideal, and it gives no visibility at a /proc/$pid/smaps level
that guard regions exist in ranges.

This patch remedies the situation by establishing a new VMA flag,
VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is
uncertain because we cannot reasonably determine whether a
MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and
additionally VMAs may change across merge/split).

We utilise 0x800 for this flag which makes it available to 32-bit
architectures also, a flag that was previously used by VM_DENYWRITE, which
was removed in commit 8d0920bde5 ("mm: remove VM_DENYWRITE") and hasn't
bee reused yet.

We also update the smaps logic and documentation to identify these VMAs.

Another major use of this functionality is that we can use it to identify
that we ought to copy page tables on fork.

We do not actually implement usage of this flag in mm/madvise.c yet as we
need to allow some VMA flags to be applied atomically under mmap/VMA read
lock in order to avoid the need to acquire a write lock for this purpose.

Link: https://lkml.kernel.org/r/cover.1763460113.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/cf8ef821eba29b6c5b5e138fffe95d6dcabdedb9.1763460113.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-20 13:43:58 -08:00
..
9p Revert "fs/9p: Refresh metadata in d_revalidate for uncached mode too" 2025-10-22 14:25:27 +09:00
adfs
affs
afs Simplifying ->d_name audits, easy part. 2025-10-03 11:14:02 -07:00
autofs
befs
bfs
btrfs for-6.18-rc4-tag 2025-11-04 14:25:38 +09:00
cachefiles
ceph Some messenger improvements from Eric and Max, a patch to address the 2025-10-10 11:30:19 -07:00
coda
configfs file->f_path constification 2025-10-03 16:32:36 -07:00
cramfs Patch series in this pull request: 2025-10-02 18:44:54 -07:00
crypto fscrypt: fix left shift underflow when inode->i_blkbits > PAGE_SHIFT 2025-11-04 16:37:38 -08:00
debugfs vfs-6.18-rc1.async 2025-09-29 11:55:15 -07:00
devpts
dlm dlm for 6.18 2025-09-29 15:24:58 -07:00
ecryptfs mount-related stuff for this cycle 2025-10-03 10:19:44 -07:00
efivarfs vfs-6.18-rc1.misc 2025-09-29 09:03:07 -07:00
efs
erofs erofs: consolidate z_erofs_extent_lookback() 2025-10-22 07:54:31 +08:00
exfat exfat: fix out-of-bounds in exfat_nls_to_ucs2() 2025-10-15 17:53:20 +09:00
exportfs
ext2
ext4 Ext4 bug fixes for 6.18-rc2, including 2025-10-15 07:51:57 -07:00
f2fs f2fs: fix wrong block mapping for multi-devices 2025-10-13 23:55:44 +00:00
fat
freevxfs
fuse Revert "fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP" 2025-10-10 13:44:03 +02:00
gfs2 finish_no_open calling conventions change 2025-10-03 10:59:31 -07:00
hfs
hfsplus
hostfs
hpfs - Avoid -Wflex-array-member-not-at-end warnings 2025-10-10 14:06:02 -07:00
hugetlbfs mm/hugetlbfs: update hugetlbfs to use mmap_prepare 2025-11-16 17:28:13 -08:00
iomap iomap: open code bio_iov_iter_get_bdev_pages 2025-10-07 08:05:44 -06:00
isofs
jbd2 jbd2: ensure that all ongoing I/O complete before freeing blocks 2025-10-10 13:10:06 -04:00
jffs2
jfs A few fixes and cleanups for JFS. 2025-10-03 13:54:23 -07:00
kernfs vfs-6.18-rc1.misc 2025-09-29 09:03:07 -07:00
lockd
minix
netfs vfs-6.18-rc1.workqueue 2025-09-29 10:27:17 -07:00
nfs NFS4: Fix state renewals missing after boot 2025-10-13 14:33:00 -04:00
nfs_common
nfsd nfsd-6.18 fixes: 2025-10-28 12:13:20 -07:00
nilfs2 nilfs2: avoid having an active sc_timer before freeing sci 2025-11-09 21:19:46 -08:00
nls
notify fs/notify: call exportfs_encode_fid with s_umount 2025-10-06 16:31:52 +02:00
ntfs3 mm: add vma_desc_size(), vma_desc_pages() helpers 2025-11-16 17:28:11 -08:00
ocfs2 ocfs2: clear extent cache after moving/defragmenting extents 2025-10-15 13:24:33 -07:00
omfs
openpromfs
orangefs orangefs: Two cleanups and a bug fix. 2025-10-03 13:59:56 -07:00
overlayfs ovl: remove redundant IOCB_DIO_CALLER_COMP clearing 2025-10-10 14:02:47 +02:00
proc mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smaps 2025-11-20 13:43:58 -08:00
pstore pstore update for v6.18-rc1 2025-09-29 18:08:34 -07:00
qnx4
qnx6
quota
ramfs mm: consistently use current->mm in mm_get_unmapped_area() 2025-11-16 17:27:57 -08:00
resctrl mm: update resctl to use mmap_prepare 2025-11-16 17:28:14 -08:00
romfs
smb three smb client fixes 2025-11-08 10:17:30 -08:00
squashfs Patch series in this pull request: 2025-10-02 18:44:54 -07:00
sysfs sysfs: check visibility before changing group attribute ownership 2025-10-17 09:48:34 +02:00
tests
tracefs
ubifs Summary of significant series in this pull request: 2025-10-02 18:18:33 -07:00
udf
ufs
unicode
vboxsf
verity Optimize fsverity with 2-way interleaved hashing 2025-09-29 15:55:20 -07:00
xfs xfs: free xfs_busy_extents structure when no RT extents are queued 2025-11-06 08:59:19 +01:00
zonefs
Kconfig Summary of significant series in this pull request: 2025-10-02 18:18:33 -07:00
Kconfig.binfmt
Makefile Remove bcachefs core code 2025-09-29 13:43:52 -07:00
aio.c Summary of significant series in this pull request: 2025-10-02 18:18:33 -07:00
anon_inodes.c
attr.c
backing-file.c
bad_inode.c
binfmt_elf.c
binfmt_elf_fdpic.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
bpf_fs_kfuncs.c
buffer.c
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: fix core_pattern input validation 2025-10-07 13:12:46 +02:00
d_path.c
dax.c treewide: include linux/pgalloc.h instead of asm/pgalloc.h 2025-11-16 17:28:25 -08:00
dcache.c vfs: Don't leak disconnected dentries on umount 2025-10-07 13:09:08 +02:00
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c
exec.c coredump: fix core_pattern input validation 2025-10-07 13:12:46 +02:00
fcntl.c fcntl: trim arguments 2025-09-26 10:21:23 +02:00
fhandle.c namespace-6.18-rc1 2025-09-29 11:20:29 -07:00
file.c
file_attr.c fs: return EOPNOTSUPP from file_setattr/file_getattr syscalls 2025-10-10 13:46:00 +02:00
file_table.c fs: update comment in init_file() 2025-10-07 12:48:33 +02:00
filesystems.c
fs-writeback.c vfs-6.18-rc1.writeback 2025-09-29 11:34:40 -07:00
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c vfs-6.18-rc1.inode 2025-09-29 09:42:30 -07:00
internal.h file->f_path constification 2025-10-03 16:32:36 -07:00
ioctl.c
kernel_read_file.c
libfs.c
locks.c
mbcache.c
mnt_idmapping.c
mount.h mount-related stuff for this cycle 2025-10-03 10:19:44 -07:00
mpage.c
namei.c file->f_path constification 2025-10-03 16:32:36 -07:00
namespace.c vfs_parse_fs_string() stuff 2025-10-03 10:51:44 -07:00
nsfs.c nsfs: handle inode number mismatches gracefully in file handles 2025-10-07 12:48:33 +02:00
open.c file->f_path constification 2025-10-03 16:32:36 -07:00
pidfs.c file->f_path constification 2025-10-03 16:32:36 -07:00
pipe.c
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c
readdir.c
remap_range.c
select.c
seq_file.c
signalfd.c
splice.c
stack.c
stat.c
statfs.c
super.c mount-related stuff for this cycle 2025-10-03 10:19:44 -07:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c
utimes.c
xattr.c