linux-kernelorg-stable/Documentation
zhenwei pi 67f22ba775 mm/memory-failure: disable unpoison once hw error happens
Currently unpoison_memory(unsigned long pfn) is designed for soft
poison(hwpoison-inject) only.  Since 17fae1294a, the KPTE gets cleared
on a x86 platform once hardware memory corrupts.

Unpoisoning a hardware corrupted page puts page back buddy only, the
kernel has a chance to access the page with *NOT PRESENT* KPTE.  This
leads BUG during accessing on the corrupted KPTE.

Suggested by David&Naoya, disable unpoison mechanism when a real HW error
happens to avoid BUG like this:

 Unpoison: Software-unpoisoned page 0x61234
 BUG: unable to handle page fault for address: ffff888061234000
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0002) - not-present page
 PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062
 Oops: 0002 [#1] PREEMPT SMP NOPTI
 CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G   M       OE     5.18.0.bm.1-amd64 #7
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...
 RIP: 0010:clear_page_erms+0x7/0x10
 Code: ...
 RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000
 RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000
 RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276
 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
 R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001
 FS:  00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  prep_new_page+0x151/0x170
  get_page_from_freelist+0xca0/0xe20
  ? sysvec_apic_timer_interrupt+0xab/0xc0
  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
  __alloc_pages+0x17e/0x340
  __folio_alloc+0x17/0x40
  vma_alloc_folio+0x84/0x280
  __handle_mm_fault+0x8d4/0xeb0
  handle_mm_fault+0xd5/0x2a0
  do_user_addr_fault+0x1d0/0x680
  ? kvm_read_and_reset_apf_flags+0x3b/0x50
  exc_page_fault+0x78/0x170
  asm_exc_page_fault+0x27/0x30

Link: https://lkml.kernel.org/r/20220615093209.259374-2-pizhenwei@bytedance.com
Fixes: 847ce401df ("HWPOISON: Add unpoisoning support")
Fixes: 17fae1294a ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned")
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16 19:11:32 -07:00
..
ABI Devicetree fixes for 5.19, part 2: 2022-06-10 11:57:36 -07:00
PCI
RCU Merge branch 'exp.2022.05.11a' into HEAD 2022-05-11 11:49:35 -07:00
accounting delayacct: track delays from write-protect copy 2022-06-01 15:55:25 -07:00
admin-guide Driver core changes for 5.19-rc1 2022-06-03 11:48:47 -07:00
arc
arm docs: arm: tcm: Fix typo in description of TCM and MMU usage 2022-06-09 12:56:33 -06:00
arm64 arm64/sme: Fix SVE/SME typo in ABI documentation 2022-06-08 18:38:31 +01:00
block
bpf
cdrom It was a moderately busy cycle for documentation; highlights include: 2022-05-25 11:17:41 -07:00
core-api It was a moderately busy cycle for documentation; highlights include: 2022-05-25 11:17:41 -07:00
cpu-freq
crypto
dev-tools Yang Shi has improved the behaviour of khugepaged collapsing of readonly 2022-05-26 12:32:41 -07:00
devicetree Devicetree fixes for 5.19, part 2: 2022-06-10 11:57:36 -07:00
doc-guide
driver-api docs: Move the HTE documentation to driver-api/ 2022-06-09 10:02:47 -06:00
fault-injection
fb
features Documentation/features: Update the arch support status files 2022-06-09 09:35:57 -06:00
filesystems netfs: Rename the netfs_io_request cleanup op and give it an op pointer 2022-06-10 20:55:21 +01:00
firmware-guide TTY / Serial driver changes for 5.19-rc1 2022-06-03 11:08:40 -07:00
firmware_class
fpga Documentation: fpga: dfl: add link address of feature id table 2022-05-10 16:05:27 +08:00
gpu
hid
hwmon hwmon: Make chip parameter for with_info API mandatory 2022-05-22 11:32:31 -07:00
i2c
ia64
iio
images docs: add SVG version of the Linux logo 2022-06-01 09:32:45 -06:00
infiniband
input documentation: Format button_dev as a pointer. 2022-06-01 09:34:28 -06:00
isdn
kbuild Kbuild updates for v5.19 2022-05-26 12:09:50 -07:00
kernel-hacking
leds leds: qcom-lpg: Require pattern to follow documentation 2022-05-24 22:08:10 +02:00
litmus-tests
livepatch
locking
loongarch Documentation: LoongArch: Add basic documentations 2022-06-03 20:09:27 +08:00
m68k
maintainer
mhi
mips
misc-devices Documentation: Wire Oxford Semiconductor PCIe (Tornado) 950 2022-05-19 18:24:22 +02:00
netlabel
networking net/ipv6: Expand and rename accept_unsolicited_na to accept_untracked_na 2022-05-31 11:36:57 +02:00
nios2
nvdimm
openrisc
parisc
pcmcia
peci
power
powerpc powerpc: Enable the DAWR on POWER9 DD2.3 and above 2022-05-22 15:59:53 +10:00
process scripts/check-local-export: avoid 'wait $!' for process substitution 2022-06-10 03:47:13 +09:00
riscv Documentation: riscv: Add sv48 description to VM layout 2022-06-01 20:38:34 -07:00
s390
scheduler
scsi
security integrity-v5.19 2022-05-24 13:50:39 -07:00
sh
sound
sparc
sphinx docs: pdfdocs: Add space for chapter counts >= 100 in TOC 2022-05-17 13:41:26 -06:00
sphinx-static
spi
staging
target
timers
tools Updates to Real Time Linux Analysis tool for 5.19: 2022-05-29 10:48:58 -07:00
trace tracing/timerlat: Print stacktrace in the IRQ handler if needed 2022-05-26 21:13:00 -04:00
translations Documentation/zh_CN: Add basic LoongArch documentations 2022-06-03 20:09:27 +08:00
usb docs: usb: fix literal block marker in usbmon verification example 2022-06-09 09:50:03 -06:00
userspace-api media: lirc: add missing exceptions for lirc uapi header file 2022-05-26 14:30:17 -07:00
virt S390: 2022-05-26 14:20:14 -07:00
vm mm/memory-failure: disable unpoison once hw error happens 2022-06-16 19:11:32 -07:00
w1
watchdog
x86 It was a moderately busy cycle for documentation; highlights include: 2022-05-25 11:17:41 -07:00
xtensa
.gitignore
Changes
CodingStyle
Kconfig
Makefile
SubmittingPatches
arch.rst Documentation: LoongArch: Add basic documentations 2022-06-03 20:09:27 +08:00
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt
conf.py docs/conf.py: Cope with removal of language=None in Sphinx 5.0.0 2022-06-01 09:26:05 -06:00
docutils.conf
dontdiff
index.rst docs: Move the HTE documentation to driver-api/ 2022-06-09 10:02:47 -06:00
memory-barriers.txt