linux-kernelorg-stable/mm
Harry Yoo e3253bab3c mm: introduce and use {pgd,p4d}_populate_kernel()
commit f2d2f9598e upstream.

Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
populating PGD and P4D entries for the kernel address space.  These
helpers ensure proper synchronization of page tables when updating the
kernel portion of top-level page tables.

Until now, the kernel has relied on each architecture to handle
synchronization of top-level page tables in an ad-hoc manner.  For
example, see commit 9b861528a8 ("x86-64, mem: Update all PGDs for direct
mapping and vmemmap mapping changes").

However, this approach has proven fragile for following reasons:

  1) It is easy to forget to perform the necessary page table
     synchronization when introducing new changes.
     For instance, commit 4917f55b4e ("mm/sparse-vmemmap: improve memory
     savings for compound devmaps") overlooked the need to synchronize
     page tables for the vmemmap area.

  2) It is also easy to overlook that the vmemmap and direct mapping areas
     must not be accessed before explicit page table synchronization.
     For example, commit 8d400913c2 ("x86/vmemmap: handle unpopulated
     sub-pmd ranges")) caused crashes by accessing the vmemmap area
     before calling sync_global_pgds().

To address this, as suggested by Dave Hansen, introduce _kernel() variants
of the page table population helpers, which invoke architecture-specific
hooks to properly synchronize page tables.  These are introduced in a new
header file, include/linux/pgalloc.h, so they can be called from common
code.

They reuse existing infrastructure for vmalloc and ioremap.
Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
and the actual synchronization is performed by
arch_sync_kernel_mappings().

This change currently targets only x86_64, so only PGD and P4D level
helpers are introduced.  Currently, these helpers are no-ops since no
architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.

In theory, PUD and PMD level helpers can be added later if needed by other
architectures.  For now, 32-bit architectures (x86-32 and arm) only handle
PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless
we introduce a PMD level helper.

[harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()]
  Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com
Fixes: 8d400913c2 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: bibo mao <maobibo@loongson.cn>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Adjust context ]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:42 +02:00
..
damon mm/damon/ops-common: ignore migration request to invalid nodes 2025-08-28 16:31:03 +02:00
kasan mm: introduce and use {pgd,p4d}_populate_kernel() 2025-09-19 16:35:42 +02:00
kfence kfence: skip __GFP_THISNODE allocations on NUMA systems 2025-02-17 10:05:31 +01:00
kmsan dma: kmsan: export kmsan_handle_dma() for modules 2025-03-13 13:01:58 +01:00
Kconfig resource: remove dependency on SPARSEMEM from GET_FREE_REGION 2024-10-28 21:40:39 -07:00
Kconfig.debug slub: Introduce CONFIG_SLUB_RCU_DEBUG 2024-08-27 14:12:51 +02:00
Makefile mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
backing-dev.c writeback: support retrieving per group debug writeback stats of bdi 2024-05-05 17:53:51 -07:00
balloon_compaction.c mm: remove MIGRATE_SYNC_NO_COPY mode 2024-07-03 19:30:00 -07:00
bootmem_info.c
cma.c mm/cma: add cma_{alloc,free}_folio() 2024-09-03 21:15:36 -07:00
cma.h
cma_debug.c
cma_sysfs.c
compaction.c mm/compaction: fix bug in hugetlb handling pathway 2025-04-25 10:47:53 +02:00
debug.c mm: open-code page_folio() in dump_page() 2024-12-14 20:03:33 +01:00
debug_page_alloc.c
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: clear page table entries at destroy_args() 2025-08-28 16:31:05 +02:00
dmapool.c
dmapool_test.c mm/dmapool: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
early_ioremap.c
execmem.c mm/execmem, arch: convert remaining overrides of module_alloc to execmem 2024-05-14 00:31:43 -07:00
fadvise.c introduce fd_file(), convert all accessors to it. 2024-08-12 22:00:43 -04:00
fail_page_alloc.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
failslab.c fault-inject: improve build for CONFIG_FAULT_INJECTION=n 2024-09-01 20:43:33 -07:00
filemap.c readahead: fix return value of page_cache_next_miss() when no hole is found 2025-08-28 16:30:58 +02:00
folio-compat.c mm: remove putback_lru_page() 2024-09-09 16:38:59 -07:00
gup.c mm/gup: revert "mm: gup: fix infinite loop within __get_longterm_locked" 2025-07-06 11:01:43 +02:00
gup_test.c
gup_test.h
highmem.c mm/highmem: make nr_free_highpages() return "unsigned long" 2024-07-03 19:30:06 -07:00
hmm.c mm/hmm: move pmd_to_hmm_pfn_flags() to the respective #ifdeffery 2025-08-15 12:14:13 +02:00
huge_memory.c mm/huge_memory: fix dereferencing invalid pmd migration entry 2025-05-18 08:24:51 +02:00
hugetlb.c mm/hugetlb: unshare page tables during VMA split, not before 2025-06-27 11:11:40 +01:00
hugetlb_cgroup.c mm: memcg: don't call propagate_protected_usage() needlessly 2024-09-01 20:25:50 -07:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: don't synchronize_rcu() without HVO 2024-09-01 20:25:45 -07:00
hugetlb_vmemmap.h
hwpoison-inject.c mm/hwpoison: add MODULE_DESCRIPTION() 2024-07-03 19:29:58 -07:00
init-mm.c
internal.h mm: fix folio_pte_batch() on XEN PV 2025-05-18 08:24:51 +02:00
interval_tree.c
io-mapping.c
ioremap.c
khugepaged.c mm: khugepaged: fix call hpage_collapse_scan_file() for anonymous vma 2025-08-01 09:48:47 +01:00
kmemleak.c mm: fix possible deadlock in kmemleak 2025-09-09 18:58:16 +02:00
ksm.c mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show() 2025-08-01 09:48:42 +01:00
list_lru.c mm: list_lru: fix UAF for memory cgroup 2024-08-07 18:33:56 -07:00
maccess.c
madvise.c mm: close theoretical race where stale TLB entries could linger 2025-06-27 11:11:38 +01:00
mapping_dirty_helpers.c
memblock.c memblock: Accept allocated memory before use in memblock_double_array() 2025-05-18 08:24:54 +02:00
memcontrol-v1.c mm/thp: fix deferred split unqueue naming and locking 2024-11-05 16:49:54 -08:00
memcontrol-v1.h mm: memcg: declare do_memsw_account inline 2024-12-14 20:03:33 +01:00
memcontrol.c memcg: always call cond_resched() after fn() 2025-05-29 11:03:22 +02:00
memfd.c mm: reinstate ability to map write-sealed memfd mappings read-only 2025-01-09 13:33:54 +01:00
memory-failure.c mm/memory-failure: fix infinite UCE for VM_PFNMAP pfn 2025-08-28 16:31:05 +02:00
memory-tiers.c memory tiers: use default_dram_perf_ref_source in log message 2024-09-26 14:01:44 -07:00
memory.c mm: fix apply_to_existing_page_range() 2025-04-25 10:47:53 +02:00
memory_hotplug.c mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper 2025-04-20 10:15:50 +02:00
mempolicy.c mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM 2024-12-14 20:03:32 +01:00
mempool.c mm: fix xyz_noprof functions calling profiled functions 2024-06-05 19:19:26 -07:00
memremap.c mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs() 2024-05-05 17:53:49 -07:00
memtest.c
migrate.c mm/migrate: fix shmem xarray update during migration 2025-03-28 22:03:30 +01:00
migrate_device.c mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() 2025-02-27 04:30:22 -08:00
mincore.c mm: provide mm_struct and address to huge_ptep_get() 2024-07-12 15:52:15 -07:00
mlock.c mm/mlock: set the correct prev on failure 2024-11-07 14:14:58 -08:00
mm_init.c mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION 2024-09-03 21:15:28 -07:00
mm_slot.h
mmap.c mm: reinstate ability to map write-sealed memfd mappings read-only 2025-01-09 13:33:54 +01:00
mmap_lock.c mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer 2024-07-03 19:30:26 -07:00
mmu_gather.c
mmu_notifier.c mm: move internal core VMA manipulation functions to own file 2024-09-01 20:25:54 -07:00
mmzone.c mm: improve code consistency with zonelist_* helper functions 2024-09-01 20:25:55 -07:00
mprotect.c mm: refactor map_deny_write_exec() 2024-11-05 16:49:55 -08:00
mremap.c mm/mremap: correctly handle partial mremap() of VMA starting at 0 2025-04-20 10:15:49 +02:00
mseal.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
msync.c
nommu.c nommu: pass NULL argument to vma_iter_prealloc() 2024-11-11 17:20:23 -08:00
numa.c mm: make range-to-target_node lookup facility a part of numa_memblks 2024-09-03 21:15:32 -07:00
numa_emulation.c mm: introduce numa_emulation 2024-09-03 21:15:31 -07:00
numa_memblks.c mm: numa_clear_kernel_node_hotplug: Add NUMA_NO_NODE check for node id 2024-10-28 21:40:40 -07:00
oom_kill.c memcg: fix soft lockup in the OOM process 2025-02-08 09:58:19 +01:00
page-writeback.c mm: fix ratelimit_pages update error in dirty_ratio_handler() 2025-06-27 11:11:22 +01:00
page_alloc.c page_pool: Move pp_magic check into helper functions 2025-06-19 15:31:42 +02:00
page_counter.c mm, memcg: cg2 memory{.swap,}.peak write handlers 2024-09-01 20:25:53 -07:00
page_ext.c mm: don't account memmap per-node 2024-08-15 22:16:14 -07:00
page_idle.c
page_io.c mm: count zeromap read and set for swapout and swapin 2024-11-11 00:00:37 -08:00
page_isolation.c mm/hugetlb: wait for hugetlb folios to be freed 2025-03-22 12:54:28 -07:00
page_owner.c mm/page-owner: use gfp_nested_mask() instead of open coded masking 2024-05-19 14:40:44 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_table_check.c mm/page_table_check: fix crash on ZONE_DEVICE 2024-06-15 10:43:04 -07:00
page_vma_mapped.c mm: make page_mapped_in_vma() hugetlb walk aware 2025-04-20 10:15:49 +02:00
pagewalk.c mm/pagewalk: fix usage of pmd_leaf()/pud_leaf() without present check 2024-10-28 21:40:38 -07:00
percpu-internal.h mm: remove CONFIG_MEMCG_KMEM 2024-07-10 12:14:54 -07:00
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c mm: introduce and use {pgd,p4d}_populate_kernel() 2025-09-19 16:35:42 +02:00
pgalloc-track.h
pgtable-generic.c mm: fix race between __split_huge_pmd_locked() and GUP-fast 2024-05-07 10:37:00 -07:00
process_vm_access.c
ptdump.c mm/ptdump: take the memory hotplug lock inside ptdump_walk_pgd() 2025-08-20 18:30:55 +02:00
readahead.c mm/readahead: fix large folio support in async readahead 2025-01-09 13:33:54 +01:00
rmap.c mm/rmap: reject hugetlb folios in folio_make_device_exclusive() 2025-04-20 10:15:49 +02:00
rodata_test.c
secretmem.c fs: export anon_inode_make_secure_inode() and fix secretmem LSM bypass 2025-07-10 16:05:09 +02:00
shmem.c mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper 2025-04-20 10:15:50 +02:00
shmem_quota.c shmem_quota: build the object file conditionally to the config option 2024-09-01 20:25:45 -07:00
show_mem.c mm/show_mem.c: report alloc tags in human readable units 2024-09-17 01:07:00 -07:00
shrinker.c mm: shrinker: avoid memleak in alloc_shrinker_info 2024-10-31 20:27:04 -07:00
shrinker_debug.c mm: shrinker: use min() to improve shrinker_debugfs_scan_write() 2024-09-03 21:15:40 -07:00
shuffle.c
shuffle.h
slab.h mm/slub: Avoid list corruption when removing a slab from the full list 2024-12-09 10:41:04 +01:00
slab_common.c slab: Fix too strict alignment check in create_cache() 2024-12-09 10:41:07 +01:00
slub.c mm/slub: avoid accessing metadata when pointer is invalid in object_err() 2025-09-09 18:58:22 +02:00
sparse-vmemmap.c mm: introduce and use {pgd,p4d}_populate_kernel() 2025-09-19 16:35:42 +02:00
sparse.c mm: fix accounting of memmap pages 2025-09-09 18:58:22 +02:00
swap.c mm: page_alloc: move mlocked flag clearance into free_pages_prepare() 2024-11-11 17:20:23 -08:00
swap.h mm: fix swap_read_folio_zeromap() for large folios with partial zeromap 2024-09-17 01:07:01 -07:00
swap_cgroup.c mm: attempt to batch free swap entries for zap_pte_range() 2024-09-03 21:15:33 -07:00
swap_slots.c
swap_state.c mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios 2024-09-17 01:07:01 -07:00
swapfile.c mm: swap: fix potential buffer overflow in setup_clusters() 2025-08-15 12:14:14 +02:00
truncate.c mm: Fix missing folio invalidation calls during truncation 2024-08-24 16:09:16 +02:00
usercopy.c
userfaultfd.c mm/userfaultfd: fix kmap_local LIFO ordering for CONFIG_HIGHPTE 2025-09-09 18:58:15 +02:00
util.c mm: only enforce minimum stack gap size if it's sensible 2024-09-01 20:26:02 -07:00
vma.c mm/vma: reset VMA iterator on commit_merge() OOM failure 2025-07-06 11:01:48 +02:00
vma.h mm/vma: add give_up_on_oom option on modify/merge, use in uffd release 2025-04-25 10:48:06 +02:00
vma_internal.h mm/hugetlb: unshare page tables during VMA split, not before 2025-06-27 11:11:40 +01:00
vmalloc.c mm/vmalloc: leave lazy MMU mode on PTE mapping error 2025-07-17 18:37:14 +02:00
vmpressure.c
vmscan.c mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_list 2025-08-01 09:48:44 +01:00
vmstat.c vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event 2024-12-09 10:41:01 +01:00
workingset.c cachestat: do not flush stats in recency check 2024-07-03 22:40:37 -07:00
z3fold.c mm/z3fold: add __percpu annotation to *unbuddied pointer in struct z3fold_pool 2024-09-01 20:25:56 -07:00
zbud.c
zpool.c
zsmalloc.c mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=n 2025-08-01 09:48:44 +01:00
zswap.c mm: zswap: fix crypto_free_acomp() deadlock in zswap_cpu_comp_dead() 2025-04-10 14:39:40 +02:00