JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit 1eeaa4fd39b0b1b3e986f8eab6978e69b01e3c5e
Author: Liu Shixin <liushixin2@huawei.com>
Date: Fri Sep 23 11:33:47 2022 +0800
memory: move hotplug memory notifier priority to same file for easy sorting
The priority of hotplug memory callback is defined in a different file.
And there are some callers using numbers directly. Collect them together
into include/linux/memory.h for easy reading. This allows us to sort
their priorities more intuitively without additional comments.
Link: https://lkml.kernel.org/r/20220923033347.3935160-9-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: zefan li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit cddb8d09ff1e477de8236a061a5017b21bab3c14
Author: Liu Shixin <liushixin2@huawei.com>
Date: Fri Sep 23 11:33:43 2022 +0800
mm/mmap: use hotplug_memory_notifier() directly
Commit 76ae847497bc52 ("Documentation: raise minimum supported version of
GCC to 5.1") updated the minimum gcc version to 5.1. So the problem
mentioned in f02c696800 ("include/linux/memory.h: implement
register_hotmemory_notifier()") no longer exist. So we can now switch to
use hotplug_memory_notifier() directly rather than
register_hotmemory_notifier().
Link: https://lkml.kernel.org/r/20220923033347.3935160-5-liushixin2@huawei.com
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Waiman Long <longman@redhat.com>
Cc: zefan li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 6c28ca6485ddd7c5da171e479e3ebfbe661efc4d
Author: Liam Howlett <liam.howlett@oracle.com>
Date: Mon Dec 5 19:23:17 2022 +0000
mmap: fix do_brk_flags() modifying obviously incorrect VMAs
Add more sanity checks to the VMA that do_brk_flags() will expand. Ensure
the VMA matches basic merge requirements within the function before
calling can_vma_merge_after().
Drop the duplicate checks from vm_brk_flags() since they will be enforced
later.
The old code would expand file VMAs on brk(), which is functionally
wrong and also dangerous in terms of locking because the brk() path
isn't designed for file VMAs and therefore doesn't lock the file
mapping. Checking can_vma_merge_after() ensures that new anonymous
VMAs can't be merged into file VMAs.
See https://lore.kernel.org/linux-mm/CAG48ez1tJZTOjS_FjRZhvtDA-STFmdw8PEizPDwMGFd_ui0Nrw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20221205192304.1957418-1-Liam.Howlett@oracle.com
Fixes: 2e7ce7d354f2 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 4a42344081ff7fbb890c0741e11d22cd7f658894
Author: Ian Cowan <ian@linux.cowan.aero>
Date: Sun Nov 13 19:33:49 2022 -0500
mm: mmap: fix documentation for vma_mas_szero
When the struct_mm input, mm, was changed to a struct ma_state, mas, the
documentation for the function was never updated. This updates that
documentation reference.
Link: https://lkml.kernel.org/r/20221114003349.41235-1-ian@linux.cowan.aero
Signed-off-by: Ian Cowan <ian@linux.cowan.aero>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit cc674ab3c0188002917c8a2c28e4424131f1fd7e
Author: Li Zetao <lizetao1@huawei.com>
Date: Fri Oct 28 15:37:17 2022 +0800
mm/mmap: fix memory leak in mmap_region()
There is a memory leak reported by kmemleak:
unreferenced object 0xffff88817231ce40 (size 224):
comm "mount.cifs", pid 19308, jiffies 4295917571 (age 405.880s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
60 c0 b2 00 81 88 ff ff 98 83 01 42 81 88 ff ff `..........B....
backtrace:
[<ffffffff81936171>] __alloc_file+0x21/0x250
[<ffffffff81937051>] alloc_empty_file+0x41/0xf0
[<ffffffff81937159>] alloc_file+0x59/0x710
[<ffffffff81937964>] alloc_file_pseudo+0x154/0x210
[<ffffffff81741dbf>] __shmem_file_setup+0xff/0x2a0
[<ffffffff817502cd>] shmem_zero_setup+0x8d/0x160
[<ffffffff817cc1d5>] mmap_region+0x1075/0x19d0
[<ffffffff817cd257>] do_mmap+0x727/0x1110
[<ffffffff817518b2>] vm_mmap_pgoff+0x112/0x1e0
[<ffffffff83adf955>] do_syscall_64+0x35/0x80
[<ffffffff83c0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
The root cause was traced to an error handing path in mmap_region() when
arch_validate_flags() or mas_preallocate() fails. In the shared anonymous
mapping sence, vma will be setuped and mapped with a new shared anonymous
file via shmem_zero_setup(). So in this case, the file resource needs to
be released.
Fix it by calling fput(vma->vm_file) and unmap_region() when
arch_validate_flags() or mas_preallocate() returns an error in the shared
anonymous mapping sence.
Link: https://lkml.kernel.org/r/20221028073717.1179380-1-lizetao1@huawei.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Fixes: c462ac288f ("mm: Introduce arch_validate_flags()")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 1db43d3f3733351849ddca4b573c037c7821bfd8
Author: Liam Howlett <liam.howlett@oracle.com>
Date: Tue Oct 25 16:12:49 2022 +0000
mmap: fix remap_file_pages() regression
When using the VMA iterator, the final execution will set the variable
'next' to NULL which causes the function to fail out. Restore the break
in the loop to exit the VMA iterator early without clearing NULL fixes the
issue.
Link: https://lore.kernel.org/lkml/29344.1666681759@jrobl/
Link: https://lkml.kernel.org/r/20221025161222.2634030-1-Liam.Howlett@oracle.com
Fixes: 763ecb035029 (mm: remove the vma linked list)
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: "J. R. Okajima" <hooanon05g@gmail.com>
Tested-by: "J. R. Okajima" <hooanon05g@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit a57b70519d1f7c53be98478623652738e5ac70d5
Author: Liam Howlett <liam.howlett@oracle.com>
Date: Tue Oct 18 19:17:12 2022 +0000
mm/mmap: fix MAP_FIXED address return on VMA merge
mmap should return the start address of newly mapped area when successful.
On a successful merge of a VMA, the return address was changed and thus
was violating that expectation from userspace.
This is a restoration of functionality provided by 309d08d9b3
(mm/mmap.c: fix mmap return value when vma is merged after call_mmap()).
For completeness of fixing MAP_FIXED, implement the comments from the
previous discussion to never update the address and fail if the address
changes. Leaving the error as a WARN_ON() to avoid crashing the kernel.
Link: https://lkml.kernel.org/r/20221018191613.4133459-1-Liam.Howlett@oracle.com
Link: https://lore.kernel.org/all/Y06yk66SKxlrwwfb@lakrids/
Link: https://lore.kernel.org/all/20201203085350.22624-1-liuzixian4@huawei.com/
Fixes: 4dd1b84140c1 ("mm/mmap: use advanced maple tree API for mmap_region()")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Cc: Liu Zixian <liuzixian4@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 1cd916d0340d0f45b151599c24ec40b5b2fd8e4a
Author: Andrew Morton <akpm@linux-foundation.org>
Date: Tue Oct 18 13:57:37 2022 -0700
mm/mmap.c: __vma_adjust(): suppress uninitialized var warning
The code is OK, but it fools gcc.
mm/mmap.c:802 __vma_adjust() error: uninitialized symbol 'next_next'.
Fixes: 524e00b36e8c5 ("mm: remove rb tree.")
Reported-by: kernel test robot <lkp@intel.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 5789151e48acc3fd34d2109bf2021dc4df5e33e9
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date: Mon Oct 17 19:49:45 2022 -0700
mm/mmap: undo ->mmap() when mas_preallocate() fails
A memory leak in hugetlb_reserve_pages was reported in [1]. The root
cause was traced to an error path in mmap_region when mas_preallocate()
fails. In this case, the vma is freed after a successful call to
filesystem specific mmap. The hugetlbfs mmap routine may allocate data
structures pointed to by m_private_data. These need to be cleaned up by
the hugetlb vm_ops->close() routine.
The same issue was addressed by commit deb0f6562884 ("mm/mmap: undo
->mmap() when arch_validate_flags() fails") for the arch_validate_flags()
test. Go to the same close_and_free_vma label if mas_preallocate() fails.
[1] https://lore.kernel.org/linux-mm/CAKXUXMxf7OiCwbxib7MwfR4M1b5+b3cNTU7n5NV9Zm4967=FPQ@mail.gmail.com/
Link: https://lkml.kernel.org/r/20221018024945.415036-1-mike.kravetz@oracle.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit deb0f6562884b5b4beb883d73e66a7d3a1b96d99
Author: Carlos Llamas <cmllamas@google.com>
Date: Fri Sep 30 00:38:43 2022 +0000
mm/mmap: undo ->mmap() when arch_validate_flags() fails
Commit c462ac288f ("mm: Introduce arch_validate_flags()") added a late
check in mmap_region() to let architectures validate vm_flags. The check
needs to happen after calling ->mmap() as the flags can potentially be
modified during this callback.
If arch_validate_flags() check fails we unmap and free the vma. However,
the error path fails to undo the ->mmap() call that previously succeeded
and depending on the specific ->mmap() implementation this translates to
reference increments, memory allocations and other operations what will
not be cleaned up.
There are several places (mainly device drivers) where this is an issue.
However, one specific example is bpf_map_mmap() which keeps count of the
mappings in map->writecnt. The count is incremented on ->mmap() and then
decremented on vm_ops->close(). When arch_validate_flags() fails this
count is off since bpf_map_mmap_close() is never called.
One can reproduce this issue in arm64 devices with MTE support. Here the
vm_flags are checked to only allow VM_MTE if VM_MTE_ALLOWED has been set
previously. From userspace then is enough to pass the PROT_MTE flag to
mmap() syscall to trigger the arch_validate_flags() failure.
The following program reproduces this issue:
#include <stdio.h>
#include <unistd.h>
#include <linux/unistd.h>
#include <linux/bpf.h>
#include <sys/mman.h>
int main(void)
{
union bpf_attr attr = {
.map_type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(long long),
.max_entries = 256,
.map_flags = BPF_F_MMAPABLE,
};
int fd;
fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
mmap(NULL, 4096, PROT_WRITE | PROT_MTE, MAP_SHARED, fd, 0);
return 0;
}
By manually adding some log statements to the vm_ops callbacks we can
confirm that when passing PROT_MTE to mmap() the map->writecnt is off upon
->release():
With PROT_MTE flag:
root@debian:~# ./bpf-test
[ 111.263874] bpf_map_write_active_inc: map=9 writecnt=1
[ 111.288763] bpf_map_release: map=9 writecnt=1
Without PROT_MTE flag:
root@debian:~# ./bpf-test
[ 157.816912] bpf_map_write_active_inc: map=10 writecnt=1
[ 157.830442] bpf_map_write_active_dec: map=10 writecnt=0
[ 157.832396] bpf_map_release: map=10 writecnt=0
This patch fixes the above issue by calling vm_ops->close() when the
arch_validate_flags() check fails, after this we can proceed to unmap and
free the vma on the error path.
Link: https://lkml.kernel.org/r/20220930003844.1210987-1-cmllamas@google.com
Fixes: c462ac288f ("mm: Introduce arch_validate_flags()")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Liam Howlett <liam.howlett@oracle.com>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: <stable@vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts: mm/mmap.c - We already have
54a611b60590 ("Maple Tree: add new data structure")
so mas_preallocate doesn't have a vma argument
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 28c5609fb236807910ca347ad3e26c4567998526
Author: Liam Howlett <liam.howlett@oracle.com>
Date: Tue Oct 11 16:08:37 2022 +0000
mm/mmap: preallocate maple nodes for brk vma expansion
If the brk VMA is the last vma in a maple node and meets the rare criteria
that it can be expanded, then preallocation is necessary to avoid a
potential fs_reclaim circular lock issue on low resources.
At the same time use the actual vma start address (unaligned) when calling
vma_adjust_trans_huge().
Link: https://lkml.kernel.org/r/20221011160624.1253454-1-Liam.Howlett@oracle
.com
Fixes: 2e7ce7d354f2 (mm/mmap: change do_brk_flags() to expand existing VMA a
nd add do_brk_munmap())
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit eef199440df950942b3c7ef2e2de507fd6ced031
Author: Jakub Matěna <matenajakub@gmail.com>
Date: Fri Jun 3 16:57:18 2022 +0200
mm: refactor of vma_merge()
Patch series "Refactor of vma_merge and new merge call", v4.
I am currently working on my master's thesis trying to increase number of
merges of VMAs currently failing because of page offset incompatibility
and difference in their anon_vmas. The following refactor and added merge
call included in this series is just two smaller upgrades I created along
the way.
This patch (of 2):
Refactor vma_merge() to make it shorter and more understandable. Main
change is the elimination of code duplicity in the case of merge next
check. This is done by first doing checks and caching the results before
executing the merge itself. The variable 'area' is divided into 'mid' and
'res' as previously it was used for two purposes, as the middle VMA
between prev and next and also as the result of the merge itself. Exit
paths are also unified.
Link: https://lkml.kernel.org/r/20220603145719.1012094-1-matenajakub@gmail.com
Link: https://lkml.kernel.org/r/20220603145719.1012094-2-matenajakub@gmail.com
Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Rik van Riel <riel@surriel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit c154124fe925a451e471233aa7d1ab9a91f0a5ad
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:49:06 2022 +0000
mm/mmap.c: pass in mapping to __vma_link_file()
__vma_link_file() resolves the mapping from the file, if there is one.
Pass through the mapping and check the vm_file externally since most
places already have the required information and check of vm_file.
Link: https://lkml.kernel.org/r/20220906194824.2110408-71-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit d0601a500c35856f9c134126b2423c9cfc86c701
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:49:06 2022 +0000
mm/mmap: drop range_has_overlap() function
Since there is no longer a linked list, the range_has_overlap() function
is identical to the find_vma_intersection() function.
Link: https://lkml.kernel.org/r/20220906194824.2110408-70-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts:
include/linux/mm.h - We already have
21b85b09527c ("madvise: use zap_page_range_single for madvise dontneed")
so keep declaration for zap_page_range_single
kernel/fork.c - We already have
f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter")
so keep declaration of i
mm/mmap.c - We already have
a1e8cb93bf ("mm: drop oom code from exit_mmap")
and
db3644c677 ("mm: delete unused MMF_OOM_VICTIM flag")
so keep setting MMF_OOM_SKIP in mm->flags
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 763ecb035029f500d7e6dc99acd1ad299b7726a1
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:49:06 2022 +0000
mm: remove the vma linked list
Replace any vm_next use with vma_find().
Update free_pgtables(), unmap_vmas(), and zap_page_range() to use the
maple tree.
Use the new free_pgtables() and unmap_vmas() in do_mas_align_munmap(). At
the same time, alter the loop to be more compact.
Now that free_pgtables() and unmap_vmas() take a maple tree as an
argument, rearrange do_mas_align_munmap() to use the new tree to hold the
vmas to remove.
Remove __vma_link_list() and __vma_unlink_list() as they are exclusively
used to update the linked list.
Drop linked list update from __insert_vm_struct().
Rework validation of tree as it was depending on the linked list.
[yang.lee@linux.alibaba.com: fix one kernel-doc comment]
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1949
Link: https://lkml.kernel.org/r/20220824021918.94116-1-yang.lee@linux.alib
aba.comLink: https://lkml.kernel.org/r/20220906194824.2110408-69-Liam.Howlett@or
acle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts: We already have
51d3d5eb74ff ("mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA")
so call userfaultfd_set_vm_flags instead of setting
vma->vm_flags directly. Also add the missing close brace
mentioned in the backport of 51d3d5eb74ff's Conflicts section.
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 69dbe6daf1041e32e003f966d71f70f20c63af53
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:57 2022 +0000
userfaultfd: use maple tree iterator to iterate VMAs
Don't use the mm_struct linked list or the vma->vm_next in prep for
removal.
Link: https://lkml.kernel.org/r/20220906194824.2110408-45-Liam.Howlett@oracl
e.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 67e7c16764c3cbf84a57d441fba3474217ac08d6
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:52 2022 +0000
mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()
do_brk_munmap() has already aligned the address and has a maple tree state
to be used. Use the new do_mas_align_munmap() to avoid unnecessary
alignment and error checks.
Link: https://lkml.kernel.org/r/20220906194824.2110408-30-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts: The backport of
54a611b60590 ("Maple Tree: add new data structure")
removed the vma argument of mas_preallocate. This causes a merge
conflict with code being removed as well as the new
mas_preallocate call this patch adds
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 11f9a21ab65542189372b7d64bb2d2937dfdc9dc
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:52 2022 +0000
mm/mmap: reorganize munmap to use maple states
Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and
do_mas_align_munmap().
do_munmap() is a wrapper to create a maple state for any callers that have
not been converted to the maple tree.
do_mas_munmap() takes a maple state to mumap a range. This is just a
small function which checks for error conditions and aligns the end of the
range.
do_mas_align_munmap() uses the aligned range to mumap a range.
do_mas_align_munmap() starts with the first VMA in the range, then finds
the last VMA in the range. Both start and end are split if necessary.
Then the VMAs are removed from the linked list and the mm mlock count is
updated at the same time. Followed by a single tree operation of
overwriting the area in with a NULL. Finally, the detached list is
unmapped and freed.
By reorganizing the munmap calls as outlined, it is now possible to avoid
extra work of aligning pre-aligned callers which are known to be safe,
avoid extra VMA lookups or tree walks for modifications.
detach_vmas_to_be_unmapped() is no longer used, so drop this code.
vm_brk_flags() can just call the do_mas_munmap() as it checks for
intersecting VMAs directly.
Link: https://lkml.kernel.org/r/20220906194824.2110408-29-Liam.Howlett@oracl
e.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts: mm/mmap.c - Because of
4dd1b84140c1 ("mm/mmap: use advanced maple tree API for mmap_region()")
we get a merge conflict in mmap_region. Also because of 4dd1b84140c1
the moved version of mmap_region also needs to remove the vma
argument from mas_preallocate.
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit e99668a56430a25a871113bcd3989ed20eae1cfc
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:52 2022 +0000
mm/mmap: move mmap_region() below do_munmap()
Relocation of code for the next commit. There should be no changes here.
Link: https://lkml.kernel.org/r/20220906194824.2110408-28-Liam.Howlett@oracl
e.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 7964cf8caa4dfa42c4149f3833d3878713cda3dc
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:51 2022 +0000
mm: remove vmacache
By using the maple tree and the maple tree state, the vmacache is no
longer beneficial and is complicating the VMA code. Remove the vmacache
to reduce the work in keeping it up to date and code complexity.
Link: https://lkml.kernel.org/r/20220906194824.2110408-26-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts: mm/mmap.c - We already have
54a611b60590 ("Maple Tree: add new data structure")
so mas_preallocate no longer takes a vma argument.
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 4dd1b84140c1b87a89d69a683bebbbdaeb620e39
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:51 2022 +0000
mm/mmap: use advanced maple tree API for mmap_region()
Changing mmap_region() to use the maple tree state and the advanced maple
tree interface allows for a lot less tree walking.
This change removes the last caller of munmap_vma_range(), so drop this
unused function.
Add vma_expand() to expand a VMA if possible by doing the necessary
hugepage check, uprobe_munmap of files, dcache flush, modifications then
undoing the detaches, etc.
Link: https://lkml.kernel.org/r/20220906194824.2110408-25-Liam.Howlett@oracl
e.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit abdba2dda0c477ca708a939b02f9b2e74666ed2d
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:50 2022 +0000
mm: use maple tree operations for find_vma_intersection()
Move find_vma_intersection() to mmap.c and change implementation to maple
tree.
When searching for a vma within a range, it is easier to use the maple
tree interface.
Exported find_vma_intersection() for kvm module.
Link: https://lkml.kernel.org/r/20220906194824.2110408-24-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 2e7ce7d354f2fae4c9becb8af799cbedf4f71665
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:50 2022 +0000
mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()
Avoid allocating a new VMA when it a vma modification can occur. When a
brk() can expand or contract a VMA, then the single store operation will
only modify one index of the maple tree instead of causing a node to split
or coalesce. This avoids unnecessary allocations/frees of maple tree
nodes and VMAs.
Move some limit & flag verifications out of the do_brk_flags() function to
use only relevant checks in the code path of bkr() and vm_brk_flags().
Set the vma to check if it can expand in vm_brk_flags() if extra criteria
are met.
Drop userfaultfd from do_brk_flags() path and only use it in
vm_brk_flags() path since that is the only place a munmap will happen.
Add a wraper for munmap for the brk case called do_brk_munmap().
Link: https://lkml.kernel.org/r/20220906194824.2110408-23-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 3b0e81a1cdc9afbddb0543d08e38edb4e33c4baf
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:49 2022 +0000
mmap: change zeroing of maple tree in __vma_adjust()
Only write to the maple tree if we are not inserting or the insert isn't
going to overwrite the area to clear. This avoids spanning writes and
node coealescing when unnecessary.
The change requires a custom search for the linked list addition to find
the correct VMA for the prev link.
Link: https://lkml.kernel.org/r/20220906194824.2110408-19-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts: mm/mmap.c -
We already have
54a611b60590 ("Maple Tree: add new data structure")
so mas_preallocate no longer takes a vma argument.
We already have
92b7399695a5 ("mmap: fix copy_vma() failure path")
so keep check for new_vma->vm_file, the fput on it, and
the unlink_anon_vmas call
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 524e00b36e8c547f5582eef3fb645a8d9fc5e3df
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:48 2022 +0000
mm: remove rb tree.
Remove the RB tree and start using the maple tree for vm_area_struct
tracking.
Drop validate_mm() calls in expand_upwards() and expand_downwards() as the
lock is not held.
Link: https://lkml.kernel.org/r/20220906194824.2110408-18-Liam.Howlett@oracl
e.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 3499a13168da6a0c122c70f24e653b650d18c882
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:47 2022 +0000
mm/mmap: use maple tree for unmapped_area{_topdown}
The maple tree code was added to find the unmapped area in a previous
commit and was checked against what the rbtree returned, but the actual
result was never used. Start using the maple tree implementation and
remove the rbtree code.
Add kernel documentation comment for these functions.
Link: https://lkml.kernel.org/r/20220906194824.2110408-14-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 7fdbd37da5c6ff002dc6d15e89a7708c2df4928e
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:47 2022 +0000
mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree
Use the maple tree's advanced API and a maple state to walk the tree for
the entry at the address of the next vma, then use the maple state to walk
back one entry to find the previous entry.
Add kernel documentation comments for this API.
Link: https://lkml.kernel.org/r/20220906194824.2110408-13-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit be8432e7166ef8cc5647d6d350e73897d48a9659
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:46 2022 +0000
mm/mmap: use the maple tree in find_vma() instead of the rbtree.
Using the maple tree interface mt_find() will handle the RCU locking and
will start searching at the address up to the limit, ULONG_MAX in this
case.
Add kernel documentation to this API.
Link: https://lkml.kernel.org/r/20220906194824.2110408-12-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit 2e3af1db174423e0fb75c7887251f168d8401424
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Tue Sep 6 19:48:46 2022 +0000
mmap: use the VMA iterator in count_vma_pages_range()
This simplifies the implementation and is faster than using the linked
list.
Link: https://lkml.kernel.org/r/20220906194824.2110408-11-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit f39af05949a4280b9f04d5dd0f606b81aac3dae8
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Tue Sep 6 19:48:46 2022 +0000
mm: add VMA iterator
This thin layer of abstraction over the maple tree state is for iterating
over VMAs. You can go forwards, go backwards or ask where the iterator
is. Rename the existing vma_next() to __vma_next() -- it will be removed
by the end of this series.
Link: https://lkml.kernel.org/r/20220906194824.2110408-10-Liam.Howlett@oracle.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27736
commit d4af56c5c7c6781ca6ca8075e2cf5bc119ed33d1
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date: Tue Sep 6 19:48:45 2022 +0000
mm: start tracking VMAs with maple tree
Start tracking the VMAs with the new maple tree structure in parallel with
the rb_tree. Add debug and trace events for maple tree operations and
duplicate the rb_tree that is created on forks into the maple tree.
The maple tree is added to the mm_struct including the mm_init struct,
added support in required mm/mmap functions, added tracking in kernel/fork
for process forking, and used to find the unmapped_area and checked
against what the rbtree finds.
This also moves the mmap_lock() in exit_mmap() since the oom reaper call
does walk the VMAs. Otherwise lockdep will be unhappy if oom happens.
When splitting a vma fails due to allocations of the maple tree nodes,
the error path in __split_vma() calls new->vm_ops->close(new). The page
accounting for hugetlb is actually in the close() operation, so it
accounts for the removal of 1/2 of the VMA which was not adjusted. This
results in a negative exit value. To avoid the negative charge, set
vm_start = vm_end and vm_pgoff = 0.
There is also a potential accounting issue in special mappings from
insert_vm_struct() failing to allocate, so reverse the charge there in
the failure scenario.
Link: https://lkml.kernel.org/r/20220906194824.2110408-9-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Yu Zhao <yuzhao@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: SeongJae Park <sj@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-1848
commit 0ba09b1733878afe838fe35c310715fda3d46428
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun Dec 4 12:51:59 2022 -0800
Revert "mm: align larger anonymous mappings on THP boundaries"
This reverts commit f35b5d7d676e59e401690b678cd3cfec5e785c23.
It has been reported to cause huge performance regressions on some loads
(will-it-scale.per_process_ops, but also building the kernel with
clang).
The commit did speed up gcc builds by a small amount, so it's not an
unambiguous regression, but until the big regressions are understood,
let's revert it.
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/r/202210181535.7144dd15-yujie.liu@intel.com
Reported-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/lkml/Y1DNQaoPWxE%2BrGce@dev-arch.thelio-3990X/
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts: mm/mmap.c - We don't have
524e00b36e8c ("mm: remove rb tree.")
so don't add out_vma_link label or call new_vma's close method
JIRA: https://issues.redhat.com/browse/RHEL-1848
commit 92b7399695a5cc961c44fc6e4624d3bc3c699ee7
Author: Liam Howlett <liam.howlett@oracle.com>
Date: Tue Oct 11 20:36:51 2022 +0000
mmap: fix copy_vma() failure path
The anon vma was not unlinked and the file was not closed in the failure
path when the machine runs out of memory during the maple tree
modification. This caused a memory leak of the anon vma chain and vma
since neither would be freed.
Link: https://lkml.kernel.org/r/20221011203621.1446507-1-Liam.Howlett@oracle
.com
Fixes: 524e00b36e8c ("mm: remove rb tree")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-1848
commit f35b5d7d676e59e401690b678cd3cfec5e785c23
Author: Rik van Riel <riel@surriel.com>
Date: Tue Aug 9 14:24:57 2022 -0400
mm: align larger anonymous mappings on THP boundaries
Align larger anonymous memory mappings on THP boundaries by going through
thp_get_unmapped_area if THPs are enabled for the current process.
With this patch, larger anonymous mappings are now THP aligned. When a
malloc library allocates a 2MB or larger arena, that arena can now be
mapped with THPs right from the start, which can result in better TLB hit
rates and execution time.
Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
commit b3541d912a84dc40cabb516f2deeac9ae6fa30da
Author: Suren Baghdasaryan <surenb@google.com>
Date: Tue May 31 15:31:00 2022 -0700
mm: delete unused MMF_OOM_VICTIM flag
With the last usage of MMF_OOM_VICTIM in exit_mmap gone, this flag is now
unused and can be removed.
[akpm@linux-foundation.org: remove comment about now-removed mm_is_oom_victim()]
Link: https://lkml.kernel.org/r/20220531223100.510392-2-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168372
Signed-off-by: Nico Pache <npache@redhat.com>
Conflicts:
mm/mmap.c: slight differences in unmap_vmas and free_pgtables
arguments.
commit bf3980c85212fc71512d27a46f5aab66f46ca284
Author: Suren Baghdasaryan <surenb@google.com>
Date: Tue May 31 15:30:59 2022 -0700
mm: drop oom code from exit_mmap
The primary reason to invoke the oom reaper from the exit_mmap path used
to be a prevention of an excessive oom killing if the oom victim exit
races with the oom reaper (see [1] for more details). The invocation has
moved around since then because of the interaction with the munlock logic
but the underlying reason has remained the same (see [2]).
Munlock code is no longer a problem since [3] and there shouldn't be any
blocking operation before the memory is unmapped by exit_mmap so the oom
reaper invocation can be dropped. The unmapping part can be done with the
non-exclusive mmap_sem and the exclusive one is only required when page
tables are freed.
Remove the oom_reaper from exit_mmap which will make the code easier to
read. This is really unlikely to make any observable difference although
some microbenchmarks could benefit from one less branch that needs to be
evaluated even though it almost never is true.
[1] 2129258024 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
[2] 27ae357fa8 ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
[3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")
[akpm@linux-foundation.org: restore Suren's mmap_read_lock() optimization]
Link: https://lkml.kernel.org/r/20220531223100.510392-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168372
Signed-off-by: Nico Pache <npache@redhat.com>
Conflicts: fs/userfaultfd.c - We don't have
69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs")
so we don't have a closing brace like the upstream patch expects
Bugzilla: https://bugzilla.redhat.com/2160210
commit 51d3d5eb74ff53b92dcff48b30ae2ed8edd85a32
Author: David Hildenbrand <david@redhat.com>
Date: Fri Dec 9 09:09:12 2022 +0100
mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
Currently, we don't enable writenotify when enabling userfaultfd-wp on a
shared writable mapping (for now only shmem and hugetlb). The consequence
is that vma->vm_page_prot will still include write permissions, to be set
as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting,
page migration, ...).
So far, vma->vm_page_prot is assumed to be a safe default, meaning that we
only add permissions (e.g., mkwrite) but not remove permissions (e.g.,
wrprotect). For example, when enabling softdirty tracking, we enable
writenotify. With uffd-wp on shared mappings, that changed. More details
on vma->vm_page_prot semantics were summarized in [1].
This is problematic for uffd-wp: we'd have to manually check for a uffd-wp
PTEs/PMDs and manually write-protect PTEs/PMDs, which is error prone.
Prone to such issues is any code that uses vma->vm_page_prot to set PTE
permissions: primarily pte_modify() and mk_pte().
Instead, let's enable writenotify such that PTEs/PMDs/... will be mapped
write-protected as default and we will only allow selected PTEs that are
definitely safe to be mapped without write-protection (see
can_change_pte_writable()) to be writable. In the future, we might want
to enable write-bit recovery -- e.g., can_change_pte_writable() -- at more
locations, for example, also when removing uffd-wp protection.
This fixes two known cases:
(a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting
in uffd-wp not triggering on write access.
(b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs
writable, resulting in uffd-wp not triggering on write access.
Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even
without NUMA hinting (which currently doesn't seem to be applicable to
shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA. On
such a VMA, userfaultfd-wp is currently non-functional.
Note that when enabling userfaultfd-wp, there is no need to walk page
tables to enforce the new default protection for the PTEs: we know that
they cannot be uffd-wp'ed yet, because that can only happen after enabling
uffd-wp for the VMA in general.
Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not
accidentally set the write bit -- which would result in uffd-wp not
triggering on later write access. This commit makes uffd-wp on shmem
behave just like uffd-wp on anonymous memory in that regard, even though,
mixing mprotect with uffd-wp is controversial.
[1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.co
m
Link: https://lkml.kernel.org/r/20221209080912.7968-1-david@redhat.com
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs
")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: Ives van Hoorne <ives@codesandbox.io>
Debugged-by: Peter Xu <peterx@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit cdb5c9e53f2e7166409dbf7248364f592d11bd1c
Author: Miaohe Lin <linmiaohe@huawei.com>
Date: Sat Jul 9 17:25:27 2022 +0800
mm/mmap: fix obsolete comment of find_extend_vma
mmget_still_valid() has already been removed via commit 4d45e75a99 ("mm:
remove the now-unnecessary mmget_still_valid() hack"). Update the
corresponding comment.
Link: https://lkml.kernel.org/r/20220709092527.47778-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit 43957b5d11037a651d162f65c682ec3c76777fc8
Author: Anshuman Khandual <anshuman.khandual@arm.com>
Date: Mon Jul 11 12:35:36 2022 +0530
mm/mmap: define DECLARE_VM_GET_PAGE_PROT
This just converts the generic vm_get_page_prot() implementation into a
new macro i.e DECLARE_VM_GET_PAGE_PROT which later can be used across
platforms when enabling them with ARCH_HAS_VM_GET_PAGE_PROT. This does
not create any functional change.
Link: https://lkml.kernel.org/r/20220711070600.2378316-3-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit 840532711d7299d7e937952482ec899d4622c452
Author: Anshuman Khandual <anshuman.khandual@arm.com>
Date: Mon Jul 11 12:35:35 2022 +0530
mm/mmap: build protect protection_map[] with __P000
Patch series "mm/mmap: Drop __SXXX/__PXXX macros from across platforms",
v7.
__SXXX/__PXXX macros are unnecessary abstraction layer in creating the
generic protection_map[] array which is used for vm_get_page_prot(). This
abstraction layer can be avoided, if the platforms just define the array
protection_map[] for all possible vm_flags access permission combinations
and also export vm_get_page_prot() implementation.
This series drops __SXXX/__PXXX macros from across platforms in the tree.
First it build protects generic protection_map[] array with '#ifdef
__P000' and moves it inside platforms which enable
ARCH_HAS_VM_GET_PAGE_PROT. Later this build protects same array with
'#ifdef ARCH_HAS_VM_GET_PAGE_PROT' and moves inside remaining platforms
while enabling ARCH_HAS_VM_GET_PAGE_PROT. This adds a new macro
DECLARE_VM_GET_PAGE_PROT defining the current generic vm_get_page_prot(),
in order for it to be reused on platforms that do not require custom
implementation. Finally, ARCH_HAS_VM_GET_PAGE_PROT can just be dropped,
as all platforms now define and export vm_get_page_prot(), via looking up
a private and static protection_map[] array. protection_map[] data type
has been changed as 'static const' on all platforms that do not change it
during boot.
This patch (of 26):
Build protect generic protection_map[] array with __P000, so that it can
be moved inside all the platforms one after the other. Otherwise there
will be build failures during this process.
CONFIG_ARCH_HAS_VM_GET_PAGE_PROT cannot be used for this purpose as only
certain platforms enable this config now.
Link: https://lkml.kernel.org/r/20220711070600.2378316-1-anshuman.khandual@arm.com
Link: https://lkml.kernel.org/r/20220711070600.2378316-2-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts: drop changes to arch/loongarch/Kconfig - unsupported config
Bugzilla: https://bugzilla.redhat.com/2160210
commit ee65728e103bb7dd99d8604bf6c7aa89c7d7e446
Author: Mike Rapoport <rppt@kernel.org>
Date: Mon Jun 27 09:00:26 2022 +0300
docs: rename Documentation/vm to Documentation/mm
so it will be consistent with code mm directory and with
Documentation/admin-guide/mm and won't be confused with virtual machines.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Wu XiangCheng <bobwxc@email.cn>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit 613bec092fe78307a8b130353ce1ef340915587f
Author: Yang Shi <shy828301@gmail.com>
Date: Thu May 19 14:08:50 2022 -0700
mm: mmap: register suitable readonly file vmas for khugepaged
The readonly FS THP relies on khugepaged to collapse THP for suitable
vmas. But the behavior is inconsistent for "always" mode
(https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/).
The "always" mode means THP allocation should be tried all the time and
khugepaged should try to collapse THP all the time. Of course the
allocation and collapse may fail due to other factors and conditions.
Currently file THP may not be collapsed by khugepaged even though all the
conditions are met. That does break the semantics of "always" mode.
So make sure readonly FS vmas are registered to khugepaged to fix the
break.
Register suitable vmas in common mmap path, that could cover both readonly
FS vmas and shmem vmas, so remove the khugepaged calls in shmem.c.
Still need to keep the khugepaged call in vma_merge() since vma_merge() is
called in a lot of places, for example, madvise, mprotect, etc.
Link: https://lkml.kernel.org/r/20220510203222.24246-9-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Vlastmil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Song Liu <song@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit c791576c60288c89b351ea2d2098f6a872d78fa7
Author: Yang Shi <shy828301@gmail.com>
Date: Thu May 19 14:08:50 2022 -0700
mm: khugepaged: introduce khugepaged_enter_vma() helper
The khugepaged_enter_vma_merge() actually does as the same thing as the
khugepaged_enter() section called by shmem_mmap(), so consolidate them
into one helper and rename it to khugepaged_enter_vma().
Link: https://lkml.kernel.org/r/20220510203222.24246-8-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Acked-by: Vlastmil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Song Liu <song@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit 3afa793082e624f7bd83533010ff4a676451d4ee
Author: Anshuman Khandual <anshuman.khandual@arm.com>
Date: Thu Apr 28 23:16:14 2022 -0700
mm/mmap: drop arch_vm_get_page_pgprot()
There are no platforms left which use arch_vm_get_page_prot(). Just drop
generic arch_vm_get_page_prot().
Link: https://lkml.kernel.org/r/20220414062125.609297-8-anshuman.khandual@arm.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit 5dcfc6a1cc53ea0859de98a7380933f9e779859d
Author: Anshuman Khandual <anshuman.khandual@arm.com>
Date: Thu Apr 28 23:16:13 2022 -0700
mm/mmap: drop arch_filter_pgprot()
There are no platforms left which subscribe ARCH_HAS_FILTER_PGPROT. Hence
drop generic arch_filter_pgprot() and also config ARCH_HAS_FILTER_PGPROT.
Link: https://lkml.kernel.org/r/20220414062125.609297-7-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit c5d8a3643d91be748d7ff12eedc5876f32cc8283
Author: Miaohe Lin <linmiaohe@huawei.com>
Date: Thu Apr 28 23:16:12 2022 -0700
mm/mmap.c: use helper mlock_future_check()
Use helper mlock_future_check() to check whether it's safe to enlarge the
locked_vm to simplify the code. Minor readability improvement.
Link: https://lkml.kernel.org/r/20220402032231.64974-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit 325bca1fe0b1bb9f535e69bb9ec48d4a6e0ca3ce
Author: Rolf Eike Beer <eb@emlix.com>
Date: Thu Apr 28 23:16:11 2022 -0700
mm/mmap.c: use mmap_assert_write_locked() instead of open coding it
In case the lock is actually not held at this point.
Link: https://lkml.kernel.org/r/5827758.TJ1SttVevJ@mobilepool36.emlix.com
Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
commit f96f7a40874d7c746680c0b9f57cef2262ae551f
Author: David Hildenbrand <david@redhat.com>
Date: Thu Aug 11 12:34:34 2022 +0200
mm/hugetlb: fix hugetlb not supporting softdirty tracking
Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.
I observed that hugetlb does not support/expect write-faults in shared
mappings that would have to map the R/O-mapped page writable -- and I
found two case where we could currently get such faults and would
erroneously map an anon page into a shared mapping.
Reproducers part of the patches.
I propose to backport both fixes to stable trees. The first fix needs a
small adjustment.
This patch (of 2):
Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping. In fact, there is none, and so far we thought we could get away
with that because e.g., mprotect() should always do the right thing and
map all pages directly writable.
Looks like we were wrong:
--------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>
#define HUGETLB_SIZE (2 * 1024 * 1024u)
static void clear_softdirty(void)
{
int fd = open("/proc/self/clear_refs", O_WRONLY);
const char *ctrl = "4";
int ret;
if (fd < 0) {
fprintf(stderr, "open(clear_refs) failed\n");
exit(1);
}
ret = write(fd, ctrl, strlen(ctrl));
if (ret != strlen(ctrl)) {
fprintf(stderr, "write(clear_refs) failed\n");
exit(1);
}
close(fd);
}
int main(int argc, char **argv)
{
char *map;
int fd;
fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
if (!fd) {
fprintf(stderr, "open() failed\n");
return -errno;
}
if (ftruncate(fd, HUGETLB_SIZE)) {
fprintf(stderr, "ftruncate() failed\n");
return -errno;
}
map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
return -errno;
}
*map = 0;
if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
fprintf(stderr, "mmprotect() failed\n");
return -errno;
}
clear_softdirty();
if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
fprintf(stderr, "mmprotect() failed\n");
return -errno;
}
*map = 0;
return 0;
}
--------------------------------------------------------------------------
Above test fails with SIGBUS when there is only a single free hugetlb page.
# echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
Bus error (core dumped)
And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
# echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
# cat /proc/meminfo | grep HugePages_
HugePages_Total: 2
HugePages_Free: 1
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0
Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support softdirty tracking.
Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com
Fixes: 64e455079e ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org> [3.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089498
Signed-off-by: Nico Pache <npache@redhat.com>