Commit Graph

619 Commits

Author SHA1 Message Date
Audra Mitchell c9d5756843 memory: move hotplug memory notifier priority to same file for easy sorting
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 1eeaa4fd39b0b1b3e986f8eab6978e69b01e3c5e
Author: Liu Shixin <liushixin2@huawei.com>
Date:   Fri Sep 23 11:33:47 2022 +0800

    memory: move hotplug memory notifier priority to same file for easy sorting

    The priority of hotplug memory callback is defined in a different file.
    And there are some callers using numbers directly.  Collect them together
    into include/linux/memory.h for easy reading.  This allows us to sort
    their priorities more intuitively without additional comments.

    Link: https://lkml.kernel.org/r/20220923033347.3935160-9-liushixin2@huawei.com
    Signed-off-by: Liu Shixin <liushixin2@huawei.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Waiman Long <longman@redhat.com>
    Cc: zefan li <lizefan.x@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:51 -04:00
Audra Mitchell 74d4b8f72a mm/mmap: use hotplug_memory_notifier() directly
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit cddb8d09ff1e477de8236a061a5017b21bab3c14
Author: Liu Shixin <liushixin2@huawei.com>
Date:   Fri Sep 23 11:33:43 2022 +0800

    mm/mmap: use hotplug_memory_notifier() directly

    Commit 76ae847497bc52 ("Documentation: raise minimum supported version of
    GCC to 5.1") updated the minimum gcc version to 5.1.  So the problem
    mentioned in f02c696800 ("include/linux/memory.h: implement
    register_hotmemory_notifier()") no longer exist.  So we can now switch to
    use hotplug_memory_notifier() directly rather than
    register_hotmemory_notifier().

    Link: https://lkml.kernel.org/r/20220923033347.3935160-5-liushixin2@huawei.com
    Signed-off-by: Liu Shixin <liushixin2@huawei.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Waiman Long <longman@redhat.com>
    Cc: zefan li <lizefan.x@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:51 -04:00
Chris von Recklinghausen 41c8c0ebba mmap: fix do_brk_flags() modifying obviously incorrect VMAs
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 6c28ca6485ddd7c5da171e479e3ebfbe661efc4d
Author: Liam Howlett <liam.howlett@oracle.com>
Date:   Mon Dec 5 19:23:17 2022 +0000

    mmap: fix do_brk_flags() modifying obviously incorrect VMAs

    Add more sanity checks to the VMA that do_brk_flags() will expand.  Ensure
    the VMA matches basic merge requirements within the function before
    calling can_vma_merge_after().

    Drop the duplicate checks from vm_brk_flags() since they will be enforced
    later.

    The old code would expand file VMAs on brk(), which is functionally
    wrong and also dangerous in terms of locking because the brk() path
    isn't designed for file VMAs and therefore doesn't lock the file
    mapping.  Checking can_vma_merge_after() ensures that new anonymous
    VMAs can't be merged into file VMAs.

    See https://lore.kernel.org/linux-mm/CAG48ez1tJZTOjS_FjRZhvtDA-STFmdw8PEizPDwMGFd_ui0Nrw@mail.gmail.com/

    Link: https://lkml.kernel.org/r/20221205192304.1957418-1-Liam.Howlett@oracle.com
    Fixes: 2e7ce7d354f2 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()")
    Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Suggested-by: Jann Horn <jannh@google.com>
    Cc: Jason A. Donenfeld <Jason@zx2c4.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Yu Zhao <yuzhao@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:07 -04:00
Chris von Recklinghausen 2e800149c5 mm: do not BUG_ON missing brk mapping, because userspace can unmap it
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit f5ad5083404bb56c9de777dccb68c6672ef6487e
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Fri Dec 2 17:27:24 2022 +0100

    mm: do not BUG_ON missing brk mapping, because userspace can unmap it

    The following program will trigger the BUG_ON that this patch removes,
    because the user can munmap() mm->brk:

      #include <sys/syscall.h>
      #include <sys/mman.h>
      #include <assert.h>
      #include <unistd.h>

      static void *brk_now(void)
      {
        return (void *)syscall(SYS_brk, 0);
      }

      static void brk_set(void *b)
      {
        assert(syscall(SYS_brk, b) != -1);
      }

      int main(int argc, char *argv[])
      {
        void *b = brk_now();
        brk_set(b + 4096);
        assert(munmap(b - 4096, 4096 * 2) == 0);
        brk_set(b);
        return 0;
      }

    Compile that with musl, since glibc actually uses brk(), and then
    execute it, and it'll hit this splat:

      kernel BUG at mm/mmap.c:229!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 12 PID: 1379 Comm: a.out Tainted: G S   U             6.1.0-rc7+ #419
      RIP: 0010:__do_sys_brk+0x2fc/0x340
      Code: 00 00 4c 89 ef e8 04 d3 fe ff eb 9a be 01 00 00 00 4c 89 ff e8 35 e0 fe ff e9 6e ff ff ff 4d 89 a7 20>
      RSP: 0018:ffff888140bc7eb0 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 00000000007e7000 RCX: ffff8881020fe000
      RDX: ffff8881020fe001 RSI: ffff8881955c9b00 RDI: ffff8881955c9b08
      RBP: 0000000000000000 R08: ffff8881955c9b00 R09: 00007ffc77844000
      R10: 0000000000000000 R11: 0000000000000001 R12: 00000000007e8000
      R13: 00000000007e8000 R14: 00000000007e7000 R15: ffff8881020fe000
      FS:  0000000000604298(0000) GS:ffff88901f700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000603fe0 CR3: 000000015ba9a005 CR4: 0000000000770ee0
      PKRU: 55555554
      Call Trace:
       <TASK>
       do_syscall_64+0x2b/0x50
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x400678
      Code: 10 4c 8d 41 08 4c 89 44 24 10 4c 8b 01 8b 4c 24 08 83 f9 2f 77 0a 4c 8d 4c 24 20 4c 01 c9 eb 05 48 8b>
      RSP: 002b:00007ffc77863890 EFLAGS: 00000212 ORIG_RAX: 000000000000000c
      RAX: ffffffffffffffda RBX: 000000000040031b RCX: 0000000000400678
      RDX: 00000000004006a1 RSI: 00000000007e6000 RDI: 00000000007e7000
      RBP: 00007ffc77863900 R08: 0000000000000000 R09: 00000000007e6000
      R10: 00007ffc77863930 R11: 0000000000000212 R12: 00007ffc77863978
      R13: 00007ffc77863988 R14: 0000000000000000 R15: 0000000000000000
       </TASK>

    Instead, just return the old brk value if the original mapping has been
    removed.

    [akpm@linux-foundation.org: fix changelog, per Liam]
    Link: https://lkml.kernel.org/r/20221202162724.2009-1-Jason@zx2c4.com
    Fixes: 2e7ce7d354f2 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()")
    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Reviewed-by: SeongJae Park <sj@kernel.org>
    Cc: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Jann Horn <jannh@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:07 -04:00
Chris von Recklinghausen 65e2538817 mm: mmap: fix documentation for vma_mas_szero
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 4a42344081ff7fbb890c0741e11d22cd7f658894
Author: Ian Cowan <ian@linux.cowan.aero>
Date:   Sun Nov 13 19:33:49 2022 -0500

    mm: mmap: fix documentation for vma_mas_szero

    When the struct_mm input, mm, was changed to a struct ma_state, mas, the
    documentation for the function was never updated.  This updates that
    documentation reference.

    Link: https://lkml.kernel.org/r/20221114003349.41235-1-ian@linux.cowan.aero
    Signed-off-by: Ian Cowan <ian@linux.cowan.aero>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Liam Howlett <liam.howlett@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:04 -04:00
Chris von Recklinghausen 28ad3f239c mm/mmap: fix memory leak in mmap_region()
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit cc674ab3c0188002917c8a2c28e4424131f1fd7e
Author: Li Zetao <lizetao1@huawei.com>
Date:   Fri Oct 28 15:37:17 2022 +0800

    mm/mmap: fix memory leak in mmap_region()

    There is a memory leak reported by kmemleak:

      unreferenced object 0xffff88817231ce40 (size 224):
        comm "mount.cifs", pid 19308, jiffies 4295917571 (age 405.880s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          60 c0 b2 00 81 88 ff ff 98 83 01 42 81 88 ff ff  `..........B....
        backtrace:
          [<ffffffff81936171>] __alloc_file+0x21/0x250
          [<ffffffff81937051>] alloc_empty_file+0x41/0xf0
          [<ffffffff81937159>] alloc_file+0x59/0x710
          [<ffffffff81937964>] alloc_file_pseudo+0x154/0x210
          [<ffffffff81741dbf>] __shmem_file_setup+0xff/0x2a0
          [<ffffffff817502cd>] shmem_zero_setup+0x8d/0x160
          [<ffffffff817cc1d5>] mmap_region+0x1075/0x19d0
          [<ffffffff817cd257>] do_mmap+0x727/0x1110
          [<ffffffff817518b2>] vm_mmap_pgoff+0x112/0x1e0
          [<ffffffff83adf955>] do_syscall_64+0x35/0x80
          [<ffffffff83c0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

    The root cause was traced to an error handing path in mmap_region() when
    arch_validate_flags() or mas_preallocate() fails.  In the shared anonymous
    mapping sence, vma will be setuped and mapped with a new shared anonymous
    file via shmem_zero_setup().  So in this case, the file resource needs to
    be released.

    Fix it by calling fput(vma->vm_file) and unmap_region() when
    arch_validate_flags() or mas_preallocate() returns an error in the shared
    anonymous mapping sence.

    Link: https://lkml.kernel.org/r/20221028073717.1179380-1-lizetao1@huawei.com
    Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
    Fixes: c462ac288f ("mm: Introduce arch_validate_flags()")
    Signed-off-by: Li Zetao <lizetao1@huawei.com>
    Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:04 -04:00
Chris von Recklinghausen f67b71bff9 mmap: fix remap_file_pages() regression
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 1db43d3f3733351849ddca4b573c037c7821bfd8
Author: Liam Howlett <liam.howlett@oracle.com>
Date:   Tue Oct 25 16:12:49 2022 +0000

    mmap: fix remap_file_pages() regression

    When using the VMA iterator, the final execution will set the variable
    'next' to NULL which causes the function to fail out.  Restore the break
    in the loop to exit the VMA iterator early without clearing NULL fixes the
    issue.

    Link: https://lore.kernel.org/lkml/29344.1666681759@jrobl/
    Link: https://lkml.kernel.org/r/20221025161222.2634030-1-Liam.Howlett@oracle.com
    Fixes: 763ecb035029 (mm: remove the vma linked list)
    Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Reported-by: "J. R. Okajima" <hooanon05g@gmail.com>
    Tested-by: "J. R. Okajima" <hooanon05g@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:03 -04:00
Chris von Recklinghausen 2cc45e24fc mm/mmap: fix MAP_FIXED address return on VMA merge
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit a57b70519d1f7c53be98478623652738e5ac70d5
Author: Liam Howlett <liam.howlett@oracle.com>
Date:   Tue Oct 18 19:17:12 2022 +0000

    mm/mmap: fix MAP_FIXED address return on VMA merge

    mmap should return the start address of newly mapped area when successful.
    On a successful merge of a VMA, the return address was changed and thus
    was violating that expectation from userspace.

    This is a restoration of functionality provided by 309d08d9b3
    (mm/mmap.c: fix mmap return value when vma is merged after call_mmap()).
    For completeness of fixing MAP_FIXED, implement the comments from the
    previous discussion to never update the address and fail if the address
    changes.  Leaving the error as a WARN_ON() to avoid crashing the kernel.

    Link: https://lkml.kernel.org/r/20221018191613.4133459-1-Liam.Howlett@oracle.com
    Link: https://lore.kernel.org/all/Y06yk66SKxlrwwfb@lakrids/
    Link: https://lore.kernel.org/all/20201203085350.22624-1-liuzixian4@huawei.com/
    Fixes: 4dd1b84140c1 ("mm/mmap: use advanced maple tree API for mmap_region()")
    Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Reported-by: Mark Rutland <mark.rutland@arm.com>
    Cc: Liu Zixian <liuzixian4@huawei.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:02 -04:00
Chris von Recklinghausen ee3e4872dd mm/mmap.c: __vma_adjust(): suppress uninitialized var warning
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 1cd916d0340d0f45b151599c24ec40b5b2fd8e4a
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Tue Oct 18 13:57:37 2022 -0700

    mm/mmap.c: __vma_adjust(): suppress uninitialized var warning

    The code is OK, but it fools gcc.

    mm/mmap.c:802 __vma_adjust() error: uninitialized symbol 'next_next'.

    Fixes: 524e00b36e8c5 ("mm: remove rb tree.")
    Reported-by: kernel test robot <lkp@intel.com>
    Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:02 -04:00
Chris von Recklinghausen 56aa2a93bd mm/mmap: undo ->mmap() when mas_preallocate() fails
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 5789151e48acc3fd34d2109bf2021dc4df5e33e9
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Mon Oct 17 19:49:45 2022 -0700

    mm/mmap: undo ->mmap() when mas_preallocate() fails

    A memory leak in hugetlb_reserve_pages was reported in [1].  The root
    cause was traced to an error path in mmap_region when mas_preallocate()
    fails.  In this case, the vma is freed after a successful call to
    filesystem specific mmap.  The hugetlbfs mmap routine may allocate data
    structures pointed to by m_private_data.  These need to be cleaned up by
    the hugetlb vm_ops->close() routine.

    The same issue was addressed by commit deb0f6562884 ("mm/mmap: undo
    ->mmap() when arch_validate_flags() fails") for the arch_validate_flags()
    test.  Go to the same close_and_free_vma label if mas_preallocate() fails.

    [1] https://lore.kernel.org/linux-mm/CAKXUXMxf7OiCwbxib7MwfR4M1b5+b3cNTU7n5NV9Zm4967=FPQ@mail.gmail.com/

    Link: https://lkml.kernel.org/r/20221018024945.415036-1-mike.kravetz@oracle.com
    Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
    Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Andrii Nakryiko <andrii@kernel.org>
    Cc: Carlos Llamas <cmllamas@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:02 -04:00
Chris von Recklinghausen c9c38b2760 mm/mmap: undo ->mmap() when arch_validate_flags() fails
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit deb0f6562884b5b4beb883d73e66a7d3a1b96d99
Author: Carlos Llamas <cmllamas@google.com>
Date:   Fri Sep 30 00:38:43 2022 +0000

    mm/mmap: undo ->mmap() when arch_validate_flags() fails

    Commit c462ac288f ("mm: Introduce arch_validate_flags()") added a late
    check in mmap_region() to let architectures validate vm_flags.  The check
    needs to happen after calling ->mmap() as the flags can potentially be
    modified during this callback.

    If arch_validate_flags() check fails we unmap and free the vma.  However,
    the error path fails to undo the ->mmap() call that previously succeeded
    and depending on the specific ->mmap() implementation this translates to
    reference increments, memory allocations and other operations what will
    not be cleaned up.

    There are several places (mainly device drivers) where this is an issue.
    However, one specific example is bpf_map_mmap() which keeps count of the
    mappings in map->writecnt.  The count is incremented on ->mmap() and then
    decremented on vm_ops->close().  When arch_validate_flags() fails this
    count is off since bpf_map_mmap_close() is never called.

    One can reproduce this issue in arm64 devices with MTE support.  Here the
    vm_flags are checked to only allow VM_MTE if VM_MTE_ALLOWED has been set
    previously.  From userspace then is enough to pass the PROT_MTE flag to
    mmap() syscall to trigger the arch_validate_flags() failure.

    The following program reproduces this issue:

      #include <stdio.h>
      #include <unistd.h>
      #include <linux/unistd.h>
      #include <linux/bpf.h>
      #include <sys/mman.h>

      int main(void)
      {
            union bpf_attr attr = {
                    .map_type = BPF_MAP_TYPE_ARRAY,
                    .key_size = sizeof(int),
                    .value_size = sizeof(long long),
                    .max_entries = 256,
                    .map_flags = BPF_F_MMAPABLE,
            };
            int fd;

            fd = syscall(__NR_bpf, BPF_MAP_CREATE, &attr, sizeof(attr));
            mmap(NULL, 4096, PROT_WRITE | PROT_MTE, MAP_SHARED, fd, 0);

            return 0;
      }

    By manually adding some log statements to the vm_ops callbacks we can
    confirm that when passing PROT_MTE to mmap() the map->writecnt is off upon
    ->release():

    With PROT_MTE flag:
      root@debian:~# ./bpf-test
      [  111.263874] bpf_map_write_active_inc: map=9 writecnt=1
      [  111.288763] bpf_map_release: map=9 writecnt=1

    Without PROT_MTE flag:
      root@debian:~# ./bpf-test
      [  157.816912] bpf_map_write_active_inc: map=10 writecnt=1
      [  157.830442] bpf_map_write_active_dec: map=10 writecnt=0
      [  157.832396] bpf_map_release: map=10 writecnt=0

    This patch fixes the above issue by calling vm_ops->close() when the
    arch_validate_flags() check fails, after this we can proceed to unmap and
    free the vma on the error path.

    Link: https://lkml.kernel.org/r/20220930003844.1210987-1-cmllamas@google.com
    Fixes: c462ac288f ("mm: Introduce arch_validate_flags()")
    Signed-off-by: Carlos Llamas <cmllamas@google.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Liam Howlett <liam.howlett@oracle.com>
    Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: <stable@vger.kernel.org>    [5.10+]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:01 -04:00
Chris von Recklinghausen 99fcc27d26 mm/mmap: preallocate maple nodes for brk vma expansion
Conflicts: mm/mmap.c - We already have
	54a611b60590 ("Maple Tree: add new data structure")
	so mas_preallocate doesn't have a vma argument

JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 28c5609fb236807910ca347ad3e26c4567998526
Author: Liam Howlett <liam.howlett@oracle.com>
Date:   Tue Oct 11 16:08:37 2022 +0000

    mm/mmap: preallocate maple nodes for brk vma expansion

    If the brk VMA is the last vma in a maple node and meets the rare criteria
    that it can be expanded, then preallocation is necessary to avoid a
    potential fs_reclaim circular lock issue on low resources.

    At the same time use the actual vma start address (unaligned) when calling
    vma_adjust_trans_huge().

    Link: https://lkml.kernel.org/r/20221011160624.1253454-1-Liam.Howlett@oracle
.com
    Fixes: 2e7ce7d354f2 (mm/mmap: change do_brk_flags() to expand existing VMA a
nd add do_brk_munmap())
    Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Reported-by: Yu Zhao <yuzhao@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:20:01 -04:00
Chris von Recklinghausen 32ff8772ac mm: refactor of vma_merge()
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit eef199440df950942b3c7ef2e2de507fd6ced031
Author: Jakub Matěna <matenajakub@gmail.com>
Date:   Fri Jun 3 16:57:18 2022 +0200

    mm: refactor of vma_merge()

    Patch series "Refactor of vma_merge and new merge call", v4.

    I am currently working on my master's thesis trying to increase number of
    merges of VMAs currently failing because of page offset incompatibility
    and difference in their anon_vmas.  The following refactor and added merge
    call included in this series is just two smaller upgrades I created along
    the way.

    This patch (of 2):

    Refactor vma_merge() to make it shorter and more understandable.  Main
    change is the elimination of code duplicity in the case of merge next
    check.  This is done by first doing checks and caching the results before
    executing the merge itself.  The variable 'area' is divided into 'mid' and
    'res' as previously it was used for two purposes, as the middle VMA
    between prev and next and also as the result of the merge itself.  Exit
    paths are also unified.

    Link: https://lkml.kernel.org/r/20220603145719.1012094-1-matenajakub@gmail.com
    Link: https://lkml.kernel.org/r/20220603145719.1012094-2-matenajakub@gmail.com
    Signed-off-by: Jakub Matěna <matenajakub@gmail.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Liam Howlett <liam.howlett@oracle.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:58 -04:00
Chris von Recklinghausen 1dabfb38a4 mm/mmap.c: pass in mapping to __vma_link_file()
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit c154124fe925a451e471233aa7d1ab9a91f0a5ad
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:49:06 2022 +0000

    mm/mmap.c: pass in mapping to __vma_link_file()

    __vma_link_file() resolves the mapping from the file, if there is one.
    Pass through the mapping and check the vm_file externally since most
    places already have the required information and check of vm_file.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-71-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:58 -04:00
Chris von Recklinghausen 819fcca42d mm/mmap: drop range_has_overlap() function
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit d0601a500c35856f9c134126b2423c9cfc86c701
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:49:06 2022 +0000

    mm/mmap: drop range_has_overlap() function

    Since there is no longer a linked list, the range_has_overlap() function
    is identical to the find_vma_intersection() function.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-70-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:58 -04:00
Chris von Recklinghausen 26437a89ef mm: remove the vma linked list
Conflicts:
	include/linux/mm.h - We already have
		21b85b09527c ("madvise: use zap_page_range_single for madvise dontneed")
		so keep declaration for zap_page_range_single
	kernel/fork.c - We already have
		f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter")
		so keep declaration of i
	mm/mmap.c - We already have
		a1e8cb93bf ("mm: drop oom code from exit_mmap")
		and
		db3644c677 ("mm: delete unused MMF_OOM_VICTIM flag")
		so keep setting MMF_OOM_SKIP in mm->flags

JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 763ecb035029f500d7e6dc99acd1ad299b7726a1
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:49:06 2022 +0000

    mm: remove the vma linked list

    Replace any vm_next use with vma_find().

    Update free_pgtables(), unmap_vmas(), and zap_page_range() to use the
    maple tree.

    Use the new free_pgtables() and unmap_vmas() in do_mas_align_munmap().  At
    the same time, alter the loop to be more compact.

    Now that free_pgtables() and unmap_vmas() take a maple tree as an
    argument, rearrange do_mas_align_munmap() to use the new tree to hold the
    vmas to remove.

    Remove __vma_link_list() and __vma_unlink_list() as they are exclusively
    used to update the linked list.

    Drop linked list update from __insert_vm_struct().

    Rework validation of tree as it was depending on the linked list.

    [yang.lee@linux.alibaba.com: fix one kernel-doc comment]
      Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1949
      Link: https://lkml.kernel.org/r/20220824021918.94116-1-yang.lee@linux.alib
aba.comLink: https://lkml.kernel.org/r/20220906194824.2110408-69-Liam.Howlett@or
acle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:57 -04:00
Chris von Recklinghausen b0ae9352e7 userfaultfd: use maple tree iterator to iterate VMAs
Conflicts: We already have
	51d3d5eb74ff ("mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA")
	so call userfaultfd_set_vm_flags instead of setting
	vma->vm_flags directly. Also add the missing close brace
	mentioned in the backport of 51d3d5eb74ff's Conflicts section.

JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 69dbe6daf1041e32e003f966d71f70f20c63af53
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:57 2022 +0000

    userfaultfd: use maple tree iterator to iterate VMAs

    Don't use the mm_struct linked list or the vma->vm_next in prep for
    removal.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-45-Liam.Howlett@oracl
e.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:51 -04:00
Chris von Recklinghausen 4576fd9615 mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 67e7c16764c3cbf84a57d441fba3474217ac08d6
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:52 2022 +0000

    mm/mmap: change do_brk_munmap() to use do_mas_align_munmap()

    do_brk_munmap() has already aligned the address and has a maple tree state
    to be used.  Use the new do_mas_align_munmap() to avoid unnecessary
    alignment and error checks.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-30-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:47 -04:00
Chris von Recklinghausen 72ad80b0bf mm/mmap: reorganize munmap to use maple states
Conflicts: The backport of
	54a611b60590 ("Maple Tree: add new data structure")
	removed the vma argument of mas_preallocate. This causes a merge
	conflict with code being removed as well as the new
	mas_preallocate call this patch adds

JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 11f9a21ab65542189372b7d64bb2d2937dfdc9dc
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:52 2022 +0000

    mm/mmap: reorganize munmap to use maple states

    Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and
    do_mas_align_munmap().

    do_munmap() is a wrapper to create a maple state for any callers that have
    not been converted to the maple tree.

    do_mas_munmap() takes a maple state to mumap a range.  This is just a
    small function which checks for error conditions and aligns the end of the
    range.

    do_mas_align_munmap() uses the aligned range to mumap a range.
    do_mas_align_munmap() starts with the first VMA in the range, then finds
    the last VMA in the range.  Both start and end are split if necessary.
    Then the VMAs are removed from the linked list and the mm mlock count is
    updated at the same time.  Followed by a single tree operation of
    overwriting the area in with a NULL.  Finally, the detached list is
    unmapped and freed.

    By reorganizing the munmap calls as outlined, it is now possible to avoid
    extra work of aligning pre-aligned callers which are known to be safe,
    avoid extra VMA lookups or tree walks for modifications.

    detach_vmas_to_be_unmapped() is no longer used, so drop this code.

    vm_brk_flags() can just call the do_mas_munmap() as it checks for
    intersecting VMAs directly.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-29-Liam.Howlett@oracl
e.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:47 -04:00
Chris von Recklinghausen 2f6e019c3d mm/mmap: move mmap_region() below do_munmap()
Conflicts: mm/mmap.c - Because of
	4dd1b84140c1 ("mm/mmap: use advanced maple tree API for mmap_region()")
	we get a merge conflict in mmap_region. Also because of 4dd1b84140c1
	the moved version of mmap_region also needs to remove the vma
	argument from mas_preallocate.

JIRA: https://issues.redhat.com/browse/RHEL-27736

commit e99668a56430a25a871113bcd3989ed20eae1cfc
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:52 2022 +0000

    mm/mmap: move mmap_region() below do_munmap()

    Relocation of code for the next commit.  There should be no changes here.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-28-Liam.Howlett@oracl
e.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:46 -04:00
Chris von Recklinghausen eb370ae179 mm: remove vmacache
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 7964cf8caa4dfa42c4149f3833d3878713cda3dc
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:51 2022 +0000

    mm: remove vmacache

    By using the maple tree and the maple tree state, the vmacache is no
    longer beneficial and is complicating the VMA code.  Remove the vmacache
    to reduce the work in keeping it up to date and code complexity.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-26-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:46 -04:00
Chris von Recklinghausen c1a476da65 mm/mmap: use advanced maple tree API for mmap_region()
Conflicts: mm/mmap.c - We already have
	54a611b60590 ("Maple Tree: add new data structure")
	so mas_preallocate no longer takes a vma argument.

JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 4dd1b84140c1b87a89d69a683bebbbdaeb620e39
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:51 2022 +0000

    mm/mmap: use advanced maple tree API for mmap_region()

    Changing mmap_region() to use the maple tree state and the advanced maple
    tree interface allows for a lot less tree walking.

    This change removes the last caller of munmap_vma_range(), so drop this
    unused function.

    Add vma_expand() to expand a VMA if possible by doing the necessary
    hugepage check, uprobe_munmap of files, dcache flush, modifications then
    undoing the detaches, etc.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-25-Liam.Howlett@oracl
e.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:45 -04:00
Chris von Recklinghausen c0d1a47b97 mm: use maple tree operations for find_vma_intersection()
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit abdba2dda0c477ca708a939b02f9b2e74666ed2d
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:50 2022 +0000

    mm: use maple tree operations for find_vma_intersection()

    Move find_vma_intersection() to mmap.c and change implementation to maple
    tree.

    When searching for a vma within a range, it is easier to use the maple
    tree interface.

    Exported find_vma_intersection() for kvm module.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-24-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:45 -04:00
Chris von Recklinghausen 0ff85d75fb mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 2e7ce7d354f2fae4c9becb8af799cbedf4f71665
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:50 2022 +0000

    mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()

    Avoid allocating a new VMA when it a vma modification can occur.  When a
    brk() can expand or contract a VMA, then the single store operation will
    only modify one index of the maple tree instead of causing a node to split
    or coalesce.  This avoids unnecessary allocations/frees of maple tree
    nodes and VMAs.

    Move some limit & flag verifications out of the do_brk_flags() function to
    use only relevant checks in the code path of bkr() and vm_brk_flags().

    Set the vma to check if it can expand in vm_brk_flags() if extra criteria
    are met.

    Drop userfaultfd from do_brk_flags() path and only use it in
    vm_brk_flags() path since that is the only place a munmap will happen.

    Add a wraper for munmap for the brk case called do_brk_munmap().

    Link: https://lkml.kernel.org/r/20220906194824.2110408-23-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:45 -04:00
Chris von Recklinghausen 178f60b2ca mmap: change zeroing of maple tree in __vma_adjust()
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 3b0e81a1cdc9afbddb0543d08e38edb4e33c4baf
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:49 2022 +0000

    mmap: change zeroing of maple tree in __vma_adjust()

    Only write to the maple tree if we are not inserting or the insert isn't
    going to overwrite the area to clear.  This avoids spanning writes and
    node coealescing when unnecessary.

    The change requires a custom search for the linked list addition to find
    the correct VMA for the prev link.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-19-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:44 -04:00
Chris von Recklinghausen adeb5664bb mm: remove rb tree.
Conflicts: mm/mmap.c -
	We already have
	54a611b60590 ("Maple Tree: add new data structure")
	so mas_preallocate no longer takes a vma argument.
	We already have
	92b7399695a5 ("mmap: fix copy_vma() failure path")
	so keep check for new_vma->vm_file, the fput on it, and
	the unlink_anon_vmas call

JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 524e00b36e8c547f5582eef3fb645a8d9fc5e3df
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:48 2022 +0000

    mm: remove rb tree.

    Remove the RB tree and start using the maple tree for vm_area_struct
    tracking.

    Drop validate_mm() calls in expand_upwards() and expand_downwards() as the
    lock is not held.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-18-Liam.Howlett@oracl
e.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:44 -04:00
Chris von Recklinghausen 2a24708eb2 mm/mmap: use maple tree for unmapped_area{_topdown}
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 3499a13168da6a0c122c70f24e653b650d18c882
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:47 2022 +0000

    mm/mmap: use maple tree for unmapped_area{_topdown}

    The maple tree code was added to find the unmapped area in a previous
    commit and was checked against what the rbtree returned, but the actual
    result was never used.  Start using the maple tree implementation and
    remove the rbtree code.

    Add kernel documentation comment for these functions.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-14-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:43 -04:00
Chris von Recklinghausen 1f27f16a19 mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 7fdbd37da5c6ff002dc6d15e89a7708c2df4928e
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:47 2022 +0000

    mm/mmap: use the maple tree for find_vma_prev() instead of the rbtree

    Use the maple tree's advanced API and a maple state to walk the tree for
    the entry at the address of the next vma, then use the maple state to walk
    back one entry to find the previous entry.

    Add kernel documentation comments for this API.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-13-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:42 -04:00
Chris von Recklinghausen b9ee174385 mm/mmap: use the maple tree in find_vma() instead of the rbtree.
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit be8432e7166ef8cc5647d6d350e73897d48a9659
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:46 2022 +0000

    mm/mmap: use the maple tree in find_vma() instead of the rbtree.

    Using the maple tree interface mt_find() will handle the RCU locking and
    will start searching at the address up to the limit, ULONG_MAX in this
    case.

    Add kernel documentation to this API.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-12-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:42 -04:00
Chris von Recklinghausen 69ac8bdd8a mmap: use the VMA iterator in count_vma_pages_range()
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit 2e3af1db174423e0fb75c7887251f168d8401424
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Sep 6 19:48:46 2022 +0000

    mmap: use the VMA iterator in count_vma_pages_range()

    This simplifies the implementation and is faster than using the linked
    list.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-11-Liam.Howlett@oracle.com
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:42 -04:00
Chris von Recklinghausen 0833f676ec mm: add VMA iterator
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit f39af05949a4280b9f04d5dd0f606b81aac3dae8
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Sep 6 19:48:46 2022 +0000

    mm: add VMA iterator

    This thin layer of abstraction over the maple tree state is for iterating
    over VMAs.  You can go forwards, go backwards or ask where the iterator
    is.  Rename the existing vma_next() to __vma_next() -- it will be removed
    by the end of this series.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-10-Liam.Howlett@oracle.com
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:41 -04:00
Chris von Recklinghausen cde31a5d92 mm: start tracking VMAs with maple tree
JIRA: https://issues.redhat.com/browse/RHEL-27736

commit d4af56c5c7c6781ca6ca8075e2cf5bc119ed33d1
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Tue Sep 6 19:48:45 2022 +0000

    mm: start tracking VMAs with maple tree

    Start tracking the VMAs with the new maple tree structure in parallel with
    the rb_tree.  Add debug and trace events for maple tree operations and
    duplicate the rb_tree that is created on forks into the maple tree.

    The maple tree is added to the mm_struct including the mm_init struct,
    added support in required mm/mmap functions, added tracking in kernel/fork
    for process forking, and used to find the unmapped_area and checked
    against what the rbtree finds.

    This also moves the mmap_lock() in exit_mmap() since the oom reaper call
    does walk the VMAs.  Otherwise lockdep will be unhappy if oom happens.

    When splitting a vma fails due to allocations of the maple tree nodes,
    the error path in __split_vma() calls new->vm_ops->close(new).  The page
    accounting for hugetlb is actually in the close() operation,  so it
    accounts for the removal of 1/2 of the VMA which was not adjusted.  This
    results in a negative exit value.  To avoid the negative charge, set
    vm_start = vm_end and vm_pgoff = 0.

    There is also a potential accounting issue in special mappings from
    insert_vm_struct() failing to allocate, so reverse the charge there in
    the failure scenario.

    Link: https://lkml.kernel.org/r/20220906194824.2110408-9-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Yu Zhao <yuzhao@google.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Howells <dhowells@redhat.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: SeongJae Park <sj@kernel.org>
    Cc: Sven Schnelle <svens@linux.ibm.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-01 11:19:41 -04:00
Chris von Recklinghausen eebc80b8a1 Revert "mm: align larger anonymous mappings on THP boundaries"
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 0ba09b1733878afe838fe35c310715fda3d46428
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Dec 4 12:51:59 2022 -0800

    Revert "mm: align larger anonymous mappings on THP boundaries"

    This reverts commit f35b5d7d676e59e401690b678cd3cfec5e785c23.

    It has been reported to cause huge performance regressions on some loads
    (will-it-scale.per_process_ops, but also building the kernel with
    clang).

    The commit did speed up gcc builds by a small amount, so it's not an
    unambiguous regression, but until the big regressions are understood,
    let's revert it.

    Reported-by: kernel test robot <yujie.liu@intel.com>
    Link: https://lore.kernel.org/r/202210181535.7144dd15-yujie.liu@intel.com
    Reported-by: Nathan Chancellor <nathan@kernel.org>
    Link: https://lore.kernel.org/lkml/Y1DNQaoPWxE%2BrGce@dev-arch.thelio-3990X/
    Cc: Huang, Ying <ying.huang@intel.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:13 -04:00
Chris von Recklinghausen 0847d3f5c9 mmap: fix copy_vma() failure path
Conflicts: mm/mmap.c - We don't have
	524e00b36e8c ("mm: remove rb tree.")
	so don't add out_vma_link label or call new_vma's close method

JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 92b7399695a5cc961c44fc6e4624d3bc3c699ee7
Author: Liam Howlett <liam.howlett@oracle.com>
Date:   Tue Oct 11 20:36:51 2022 +0000

    mmap: fix copy_vma() failure path

    The anon vma was not unlinked and the file was not closed in the failure
    path when the machine runs out of memory during the maple tree
    modification.  This caused a memory leak of the anon vma chain and vma
    since neither would be freed.

    Link: https://lkml.kernel.org/r/20221011203621.1446507-1-Liam.Howlett@oracle
.com
    Fixes: 524e00b36e8c ("mm: remove rb tree")
    Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
    Tested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:00 -04:00
Chris von Recklinghausen c97e820389 mm: align larger anonymous mappings on THP boundaries
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit f35b5d7d676e59e401690b678cd3cfec5e785c23
Author: Rik van Riel <riel@surriel.com>
Date:   Tue Aug 9 14:24:57 2022 -0400

    mm: align larger anonymous mappings on THP boundaries

    Align larger anonymous memory mappings on THP boundaries by going through
    thp_get_unmapped_area if THPs are enabled for the current process.

    With this patch, larger anonymous mappings are now THP aligned.  When a
    malloc library allocates a 2MB or larger arena, that arena can now be
    mapped with THPs right from the start, which can result in better TLB hit
    rates and execution time.

    Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com
    Signed-off-by: Rik van Riel <riel@surriel.com>
    Reviewed-by: Yang Shi <shy828301@gmail.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:24 -04:00
Nico Pache db3644c677 mm: delete unused MMF_OOM_VICTIM flag
commit b3541d912a84dc40cabb516f2deeac9ae6fa30da
Author: Suren Baghdasaryan <surenb@google.com>
Date:   Tue May 31 15:31:00 2022 -0700

    mm: delete unused MMF_OOM_VICTIM flag

    With the last usage of MMF_OOM_VICTIM in exit_mmap gone, this flag is now
    unused and can be removed.

    [akpm@linux-foundation.org: remove comment about now-removed mm_is_oom_victim()]
    Link: https://lkml.kernel.org/r/20220531223100.510392-2-surenb@google.com
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Liam Howlett <liam.howlett@oracle.com>

    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168372
Signed-off-by: Nico Pache <npache@redhat.com>
2023-06-14 15:11:01 -06:00
Nico Pache a1e8cb93bf mm: drop oom code from exit_mmap
Conflicts:
       mm/mmap.c: slight differences in unmap_vmas and free_pgtables
        arguments.

commit bf3980c85212fc71512d27a46f5aab66f46ca284
Author: Suren Baghdasaryan <surenb@google.com>
Date:   Tue May 31 15:30:59 2022 -0700

    mm: drop oom code from exit_mmap

    The primary reason to invoke the oom reaper from the exit_mmap path used
    to be a prevention of an excessive oom killing if the oom victim exit
    races with the oom reaper (see [1] for more details).  The invocation has
    moved around since then because of the interaction with the munlock logic
    but the underlying reason has remained the same (see [2]).

    Munlock code is no longer a problem since [3] and there shouldn't be any
    blocking operation before the memory is unmapped by exit_mmap so the oom
    reaper invocation can be dropped.  The unmapping part can be done with the
    non-exclusive mmap_sem and the exclusive one is only required when page
    tables are freed.

    Remove the oom_reaper from exit_mmap which will make the code easier to
    read.  This is really unlikely to make any observable difference although
    some microbenchmarks could benefit from one less branch that needs to be
    evaluated even though it almost never is true.

    [1] 2129258024 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
    [2] 27ae357fa8 ("mm, oom: fix concurrent munlock and oom reaper unmap, v3")
    [3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap")

    [akpm@linux-foundation.org: restore Suren's mmap_read_lock() optimization]
    Link: https://lkml.kernel.org/r/20220531223100.510392-1-surenb@google.com
    Signed-off-by: Suren Baghdasaryan <surenb@google.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Jann Horn <jannh@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
    Cc: Liam Howlett <liam.howlett@oracle.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168372
Signed-off-by: Nico Pache <npache@redhat.com>
2023-06-14 15:11:01 -06:00
Chris von Recklinghausen c18d0d6ff7 mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
Conflicts: fs/userfaultfd.c - We don't have
	69dbe6daf104 ("userfaultfd: use maple tree iterator to iterate VMAs")
	so we don't have a closing brace like the upstream patch expects

Bugzilla: https://bugzilla.redhat.com/2160210

commit 51d3d5eb74ff53b92dcff48b30ae2ed8edd85a32
Author: David Hildenbrand <david@redhat.com>
Date:   Fri Dec 9 09:09:12 2022 +0100

    mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA

    Currently, we don't enable writenotify when enabling userfaultfd-wp on a
    shared writable mapping (for now only shmem and hugetlb).  The consequence
    is that vma->vm_page_prot will still include write permissions, to be set
    as default for all PTEs that get remapped (e.g., mprotect(), NUMA hinting,
    page migration, ...).

    So far, vma->vm_page_prot is assumed to be a safe default, meaning that we
    only add permissions (e.g., mkwrite) but not remove permissions (e.g.,
    wrprotect).  For example, when enabling softdirty tracking, we enable
    writenotify.  With uffd-wp on shared mappings, that changed.  More details
    on vma->vm_page_prot semantics were summarized in [1].

    This is problematic for uffd-wp: we'd have to manually check for a uffd-wp
    PTEs/PMDs and manually write-protect PTEs/PMDs, which is error prone.
    Prone to such issues is any code that uses vma->vm_page_prot to set PTE
    permissions: primarily pte_modify() and mk_pte().

    Instead, let's enable writenotify such that PTEs/PMDs/...  will be mapped
    write-protected as default and we will only allow selected PTEs that are
    definitely safe to be mapped without write-protection (see
    can_change_pte_writable()) to be writable.  In the future, we might want
    to enable write-bit recovery -- e.g., can_change_pte_writable() -- at more
    locations, for example, also when removing uffd-wp protection.

    This fixes two known cases:

    (a) remove_migration_pte() mapping uffd-wp'ed PTEs writable, resulting
        in uffd-wp not triggering on write access.
    (b) do_numa_page() / do_huge_pmd_numa_page() mapping uffd-wp'ed PTEs/PMDs
        writable, resulting in uffd-wp not triggering on write access.

    Note that do_numa_page() / do_huge_pmd_numa_page() can be reached even
    without NUMA hinting (which currently doesn't seem to be applicable to
    shmem), for example, by using uffd-wp with a PROT_WRITE shmem VMA.  On
    such a VMA, userfaultfd-wp is currently non-functional.

    Note that when enabling userfaultfd-wp, there is no need to walk page
    tables to enforce the new default protection for the PTEs: we know that
    they cannot be uffd-wp'ed yet, because that can only happen after enabling
    uffd-wp for the VMA in general.

    Also note that this makes mprotect() on ranges with uffd-wp'ed PTEs not
    accidentally set the write bit -- which would result in uffd-wp not
    triggering on later write access.  This commit makes uffd-wp on shmem
    behave just like uffd-wp on anonymous memory in that regard, even though,
    mixing mprotect with uffd-wp is controversial.

    [1] https://lkml.kernel.org/r/92173bad-caa3-6b43-9d1e-9a471fdbc184@redhat.co
m

    Link: https://lkml.kernel.org/r/20221209080912.7968-1-david@redhat.com
    Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs
")
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reported-by: Ives van Hoorne <ives@codesandbox.io>
    Debugged-by: Peter Xu <peterx@redhat.com>
    Acked-by: Peter Xu <peterx@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:34 -04:00
Chris von Recklinghausen 900ee6e27a mm/mmap: fix obsolete comment of find_extend_vma
Bugzilla: https://bugzilla.redhat.com/2160210

commit cdb5c9e53f2e7166409dbf7248364f592d11bd1c
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Sat Jul 9 17:25:27 2022 +0800

    mm/mmap: fix obsolete comment of find_extend_vma

    mmget_still_valid() has already been removed via commit 4d45e75a99 ("mm:
    remove the now-unnecessary mmget_still_valid() hack").  Update the
    corresponding comment.

    Link: https://lkml.kernel.org/r/20220709092527.47778-1-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:27 -04:00
Chris von Recklinghausen 24b42a8de4 mm/mmap: build protect protection_map[] with ARCH_HAS_VM_GET_PAGE_PROT
Bugzilla: https://bugzilla.redhat.com/2160210

commit 09095f74130dfb2110ef2bcdd9ad0d42addaa1d5
Author: Anshuman Khandual <anshuman.khandual@arm.com>
Date:   Mon Jul 11 12:35:41 2022 +0530

    mm/mmap: build protect protection_map[] with ARCH_HAS_VM_GET_PAGE_PROT

    Now that protection_map[] has been moved inside those platforms that
    enable ARCH_HAS_VM_GET_PAGE_PROT.  Hence generic protection_map[] array
    now can be protected with CONFIG_ARCH_HAS_VM_GET_PAGE_PROT intead of
    __P000.

    Link: https://lkml.kernel.org/r/20220711070600.2378316-8-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Brian Cain <bcain@quicinc.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Chris Zankel <chris@zankel.net>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Dinh Nguyen <dinguyen@kernel.org>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <hca@linux.ibm.com>
    Cc: Huacai Chen <chenhuacai@kernel.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jeff Dike <jdike@addtoit.com>
    Cc: Jonas Bonn <jonas@southpole.se>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Henderson <rth@twiddle.net>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Sam Ravnborg <sam@ravnborg.org>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Vineet Gupta <vgupta@kernel.org>
    Cc: WANG Xuerui <kernel@xen0n.name>
    Cc: Will Deacon <will@kernel.org>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:25 -04:00
Chris von Recklinghausen 9fe954b5ac mm/mmap: define DECLARE_VM_GET_PAGE_PROT
Bugzilla: https://bugzilla.redhat.com/2160210

commit 43957b5d11037a651d162f65c682ec3c76777fc8
Author: Anshuman Khandual <anshuman.khandual@arm.com>
Date:   Mon Jul 11 12:35:36 2022 +0530

    mm/mmap: define DECLARE_VM_GET_PAGE_PROT

    This just converts the generic vm_get_page_prot() implementation into a
    new macro i.e DECLARE_VM_GET_PAGE_PROT which later can be used across
    platforms when enabling them with ARCH_HAS_VM_GET_PAGE_PROT.  This does
    not create any functional change.

    Link: https://lkml.kernel.org/r/20220711070600.2378316-3-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Suggested-by: Christoph Hellwig <hch@infradead.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Brian Cain <bcain@quicinc.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Chris Zankel <chris@zankel.net>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Dinh Nguyen <dinguyen@kernel.org>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <hca@linux.ibm.com>
    Cc: Huacai Chen <chenhuacai@kernel.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jeff Dike <jdike@addtoit.com>
    Cc: Jonas Bonn <jonas@southpole.se>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Henderson <rth@twiddle.net>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Sam Ravnborg <sam@ravnborg.org>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Vineet Gupta <vgupta@kernel.org>
    Cc: WANG Xuerui <kernel@xen0n.name>
    Cc: Will Deacon <will@kernel.org>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:25 -04:00
Chris von Recklinghausen e00be64829 mm/mmap: build protect protection_map[] with __P000
Bugzilla: https://bugzilla.redhat.com/2160210

commit 840532711d7299d7e937952482ec899d4622c452
Author: Anshuman Khandual <anshuman.khandual@arm.com>
Date:   Mon Jul 11 12:35:35 2022 +0530

    mm/mmap: build protect protection_map[] with __P000

    Patch series "mm/mmap: Drop __SXXX/__PXXX macros from across platforms",
    v7.

    __SXXX/__PXXX macros are unnecessary abstraction layer in creating the
    generic protection_map[] array which is used for vm_get_page_prot().  This
    abstraction layer can be avoided, if the platforms just define the array
    protection_map[] for all possible vm_flags access permission combinations
    and also export vm_get_page_prot() implementation.

    This series drops __SXXX/__PXXX macros from across platforms in the tree.
    First it build protects generic protection_map[] array with '#ifdef
    __P000' and moves it inside platforms which enable
    ARCH_HAS_VM_GET_PAGE_PROT.  Later this build protects same array with
    '#ifdef ARCH_HAS_VM_GET_PAGE_PROT' and moves inside remaining platforms
    while enabling ARCH_HAS_VM_GET_PAGE_PROT.  This adds a new macro
    DECLARE_VM_GET_PAGE_PROT defining the current generic vm_get_page_prot(),
    in order for it to be reused on platforms that do not require custom
    implementation.  Finally, ARCH_HAS_VM_GET_PAGE_PROT can just be dropped,
    as all platforms now define and export vm_get_page_prot(), via looking up
    a private and static protection_map[] array.  protection_map[] data type
    has been changed as 'static const' on all platforms that do not change it
    during boot.

    This patch (of 26):

    Build protect generic protection_map[] array with __P000, so that it can
    be moved inside all the platforms one after the other.  Otherwise there
    will be build failures during this process.
    CONFIG_ARCH_HAS_VM_GET_PAGE_PROT cannot be used for this purpose as only
    certain platforms enable this config now.

    Link: https://lkml.kernel.org/r/20220711070600.2378316-1-anshuman.khandual@arm.com
    Link: https://lkml.kernel.org/r/20220711070600.2378316-2-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Brian Cain <bcain@quicinc.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Chris Zankel <chris@zankel.net>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Dinh Nguyen <dinguyen@kernel.org>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <hca@linux.ibm.com>
    Cc: Huacai Chen <chenhuacai@kernel.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jeff Dike <jdike@addtoit.com>
    Cc: Jonas Bonn <jonas@southpole.se>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Henderson <rth@twiddle.net>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Sam Ravnborg <sam@ravnborg.org>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Vineet Gupta <vgupta@kernel.org>
    Cc: WANG Xuerui <kernel@xen0n.name>
    Cc: Will Deacon <will@kernel.org>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:25 -04:00
Chris von Recklinghausen 267a7a9b62 docs: rename Documentation/vm to Documentation/mm
Conflicts: drop changes to arch/loongarch/Kconfig - unsupported config

Bugzilla: https://bugzilla.redhat.com/2160210

commit ee65728e103bb7dd99d8604bf6c7aa89c7d7e446
Author: Mike Rapoport <rppt@kernel.org>
Date:   Mon Jun 27 09:00:26 2022 +0300

    docs: rename Documentation/vm to Documentation/mm

    so it will be consistent with code mm directory and with
    Documentation/admin-guide/mm and won't be confused with virtual machines.

    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Suggested-by: Matthew Wilcox <willy@infradead.org>
    Tested-by: Ira Weiny <ira.weiny@intel.com>
    Acked-by: Jonathan Corbet <corbet@lwn.net>
    Acked-by: Wu XiangCheng <bobwxc@email.cn>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:15 -04:00
Chris von Recklinghausen de9b55c603 mm: mmap: register suitable readonly file vmas for khugepaged
Bugzilla: https://bugzilla.redhat.com/2160210

commit 613bec092fe78307a8b130353ce1ef340915587f
Author: Yang Shi <shy828301@gmail.com>
Date:   Thu May 19 14:08:50 2022 -0700

    mm: mmap: register suitable readonly file vmas for khugepaged

    The readonly FS THP relies on khugepaged to collapse THP for suitable
    vmas.  But the behavior is inconsistent for "always" mode
    (https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/).

    The "always" mode means THP allocation should be tried all the time and
    khugepaged should try to collapse THP all the time.  Of course the
    allocation and collapse may fail due to other factors and conditions.

    Currently file THP may not be collapsed by khugepaged even though all the
    conditions are met.  That does break the semantics of "always" mode.

    So make sure readonly FS vmas are registered to khugepaged to fix the
    break.

    Register suitable vmas in common mmap path, that could cover both readonly
    FS vmas and shmem vmas, so remove the khugepaged calls in shmem.c.

    Still need to keep the khugepaged call in vma_merge() since vma_merge() is
    called in a lot of places, for example, madvise, mprotect, etc.

    Link: https://lkml.kernel.org/r/20220510203222.24246-9-shy828301@gmail.com
    Signed-off-by: Yang Shi <shy828301@gmail.com>
    Reported-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Vlastmil Babka <vbabka@suse.cz>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Song Liu <songliubraving@fb.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Cc: Song Liu <song@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:10 -04:00
Chris von Recklinghausen 4243b55200 mm: khugepaged: introduce khugepaged_enter_vma() helper
Bugzilla: https://bugzilla.redhat.com/2160210

commit c791576c60288c89b351ea2d2098f6a872d78fa7
Author: Yang Shi <shy828301@gmail.com>
Date:   Thu May 19 14:08:50 2022 -0700

    mm: khugepaged: introduce khugepaged_enter_vma() helper

    The khugepaged_enter_vma_merge() actually does as the same thing as the
    khugepaged_enter() section called by shmem_mmap(), so consolidate them
    into one helper and rename it to khugepaged_enter_vma().

    Link: https://lkml.kernel.org/r/20220510203222.24246-8-shy828301@gmail.com
    Signed-off-by: Yang Shi <shy828301@gmail.com>
    Acked-by: Vlastmil Babka <vbabka@suse.cz>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Song Liu <song@kernel.org>
    Cc: Song Liu <songliubraving@fb.com>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Cc: Zi Yan <ziy@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:09 -04:00
Chris von Recklinghausen 6b1c735a16 mm/mmap: drop arch_vm_get_page_pgprot()
Bugzilla: https://bugzilla.redhat.com/2160210

commit 3afa793082e624f7bd83533010ff4a676451d4ee
Author: Anshuman Khandual <anshuman.khandual@arm.com>
Date:   Thu Apr 28 23:16:14 2022 -0700

    mm/mmap: drop arch_vm_get_page_pgprot()

    There are no platforms left which use arch_vm_get_page_prot(). Just drop
    generic arch_vm_get_page_prot().

    Link: https://lkml.kernel.org/r/20220414062125.609297-8-anshuman.khandual@arm.com
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: linux-mm@kvack.org
    Cc: linux-kernel@vger.kernel.org
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Khalid Aziz <khalid.aziz@oracle.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:55 -04:00
Chris von Recklinghausen 5d4aa4e95c mm/mmap: drop arch_filter_pgprot()
Bugzilla: https://bugzilla.redhat.com/2160210

commit 5dcfc6a1cc53ea0859de98a7380933f9e779859d
Author: Anshuman Khandual <anshuman.khandual@arm.com>
Date:   Thu Apr 28 23:16:13 2022 -0700

    mm/mmap: drop arch_filter_pgprot()

    There are no platforms left which subscribe ARCH_HAS_FILTER_PGPROT.  Hence
    drop generic arch_filter_pgprot() and also config ARCH_HAS_FILTER_PGPROT.

    Link: https://lkml.kernel.org/r/20220414062125.609297-7-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Khalid Aziz <khalid.aziz@oracle.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:55 -04:00
Chris von Recklinghausen 129e987f2c mm/mmap.c: use helper mlock_future_check()
Bugzilla: https://bugzilla.redhat.com/2160210

commit c5d8a3643d91be748d7ff12eedc5876f32cc8283
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu Apr 28 23:16:12 2022 -0700

    mm/mmap.c: use helper mlock_future_check()

    Use helper mlock_future_check() to check whether it's safe to enlarge the
    locked_vm to simplify the code.  Minor readability improvement.

    Link: https://lkml.kernel.org/r/20220402032231.64974-1-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:55 -04:00
Chris von Recklinghausen 17e82bef21 mm/mmap.c: use mmap_assert_write_locked() instead of open coding it
Bugzilla: https://bugzilla.redhat.com/2160210

commit 325bca1fe0b1bb9f535e69bb9ec48d4a6e0ca3ce
Author: Rolf Eike Beer <eb@emlix.com>
Date:   Thu Apr 28 23:16:11 2022 -0700

    mm/mmap.c: use mmap_assert_write_locked() instead of open coding it

    In case the lock is actually not held at this point.

    Link: https://lkml.kernel.org/r/5827758.TJ1SttVevJ@mobilepool36.emlix.com
    Signed-off-by: Rolf Eike Beer <eb@emlix.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:55 -04:00
Nico Pache 4484fd4079 mm/hugetlb: fix hugetlb not supporting softdirty tracking
commit f96f7a40874d7c746680c0b9f57cef2262ae551f
Author: David Hildenbrand <david@redhat.com>
Date:   Thu Aug 11 12:34:34 2022 +0200

    mm/hugetlb: fix hugetlb not supporting softdirty tracking

    Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.

    I observed that hugetlb does not support/expect write-faults in shared
    mappings that would have to map the R/O-mapped page writable -- and I
    found two case where we could currently get such faults and would
    erroneously map an anon page into a shared mapping.

    Reproducers part of the patches.

    I propose to backport both fixes to stable trees.  The first fix needs a
    small adjustment.

    This patch (of 2):

    Staring at hugetlb_wp(), one might wonder where all the logic for shared
    mappings is when stumbling over a write-protected page in a shared
    mapping.  In fact, there is none, and so far we thought we could get away
    with that because e.g., mprotect() should always do the right thing and
    map all pages directly writable.

    Looks like we were wrong:

    --------------------------------------------------------------------------
     #include <stdio.h>
     #include <stdlib.h>
     #include <string.h>
     #include <fcntl.h>
     #include <unistd.h>
     #include <errno.h>
     #include <sys/mman.h>

     #define HUGETLB_SIZE (2 * 1024 * 1024u)

     static void clear_softdirty(void)
     {
             int fd = open("/proc/self/clear_refs", O_WRONLY);
             const char *ctrl = "4";
             int ret;

             if (fd < 0) {
                     fprintf(stderr, "open(clear_refs) failed\n");
                     exit(1);
             }
             ret = write(fd, ctrl, strlen(ctrl));
             if (ret != strlen(ctrl)) {
                     fprintf(stderr, "write(clear_refs) failed\n");
                     exit(1);
             }
             close(fd);
     }

     int main(int argc, char **argv)
     {
             char *map;
             int fd;

             fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
             if (!fd) {
                     fprintf(stderr, "open() failed\n");
                     return -errno;
             }
             if (ftruncate(fd, HUGETLB_SIZE)) {
                     fprintf(stderr, "ftruncate() failed\n");
                     return -errno;
             }

             map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
             if (map == MAP_FAILED) {
                     fprintf(stderr, "mmap() failed\n");
                     return -errno;
             }

             *map = 0;

             if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                     fprintf(stderr, "mmprotect() failed\n");
                     return -errno;
             }

             clear_softdirty();

             if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                     fprintf(stderr, "mmprotect() failed\n");
                     return -errno;
             }

             *map = 0;

             return 0;
     }
    --------------------------------------------------------------------------

    Above test fails with SIGBUS when there is only a single free hugetlb page.
     # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
     # ./test
     Bus error (core dumped)

    And worse, with sufficient free hugetlb pages it will map an anonymous page
    into a shared mapping, for example, messing up accounting during unmap
    and breaking MAP_SHARED semantics:
     # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
     # ./test
     # cat /proc/meminfo | grep HugePages_
     HugePages_Total:       2
     HugePages_Free:        1
     HugePages_Rsvd:    18446744073709551615
     HugePages_Surp:        0

    Reason in this particular case is that vma_wants_writenotify() will
    return "true", removing VM_SHARED in vma_set_page_prot() to map pages
    write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
    support softdirty tracking.

    Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com
    Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com
    Fixes: 64e455079e ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Peter Feiner <pfeiner@google.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Cyrill Gorcunov <gorcunov@openvz.org>
    Cc: Pavel Emelyanov <xemul@parallels.com>
    Cc: Jamie Liu <jamieliu@google.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Bjorn Helgaas <bhelgaas@google.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: <stable@vger.kernel.org>    [3.18+]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089498
Signed-off-by: Nico Pache <npache@redhat.com>
2022-11-08 10:11:40 -07:00