linux-kernelorg-stable/tools/testing/selftests
Luis Gerhorst d6f1c85f22 bpf: Fall back to nospec for Spectre v1
This implements the core of the series and causes the verifier to fall
back to mitigating Spectre v1 using speculation barriers. The approach
was presented at LPC'24 [1] and RAID'24 [2].

If we find any forbidden behavior on a speculative path, we insert a
nospec (e.g., lfence speculation barrier on x86) before the instruction
and stop verifying the path. While verifying a speculative path, we can
furthermore stop verification of that path whenever we encounter a
nospec instruction.

A minimal example program would look as follows:

	A = true
	B = true
	if A goto e
	f()
	if B goto e
	unsafe()
e:	exit

There are the following speculative and non-speculative paths
(`cur->speculative` and `speculative` referring to the value of the
push_stack() parameters):

- A = true
- B = true
- if A goto e
  - A && !cur->speculative && !speculative
    - exit
  - !A && !cur->speculative && speculative
    - f()
    - if B goto e
      - B && cur->speculative && !speculative
        - exit
      - !B && cur->speculative && speculative
        - unsafe()

If f() contains any unsafe behavior under Spectre v1 and the unsafe
behavior matches `state->speculative &&
error_recoverable_with_nospec(err)`, do_check() will now add a nospec
before f() instead of rejecting the program:

	A = true
	B = true
	if A goto e
	nospec
	f()
	if B goto e
	unsafe()
e:	exit

Alternatively, the algorithm also takes advantage of nospec instructions
inserted for other reasons (e.g., Spectre v4). Taking the program above
as an example, speculative path exploration can stop before f() if a
nospec was inserted there because of Spectre v4 sanitization.

In this example, all instructions after the nospec are dead code (and
with the nospec they are also dead code speculatively).

For this, it relies on the fact that speculation barriers generally
prevent all later instructions from executing if the speculation was not
correct:

* On Intel x86_64, lfence acts as full speculation barrier, not only as
  a load fence [3]:

    An LFENCE instruction or a serializing instruction will ensure that
    no later instructions execute, even speculatively, until all prior
    instructions complete locally. [...] Inserting an LFENCE instruction
    after a bounds check prevents later operations from executing before
    the bound check completes.

  This was experimentally confirmed in [4].

* On AMD x86_64, lfence is dispatch-serializing [5] (requires MSR
  C001_1029[1] to be set if the MSR is supported, this happens in
  init_amd()). AMD further specifies "A dispatch serializing instruction
  forces the processor to retire the serializing instruction and all
  previous instructions before the next instruction is executed" [8]. As
  dispatch is not specific to memory loads or branches, lfence therefore
  also affects all instructions there. Also, if retiring a branch means
  it's PC change becomes architectural (should be), this means any
  "wrong" speculation is aborted as required for this series.

* ARM's SB speculation barrier instruction also affects "any instruction
  that appears later in the program order than the barrier" [6].

* PowerPC's barrier also affects all subsequent instructions [7]:

    [...] executing an ori R31,R31,0 instruction ensures that all
    instructions preceding the ori R31,R31,0 instruction have completed
    before the ori R31,R31,0 instruction completes, and that no
    subsequent instructions are initiated, even out-of-order, until
    after the ori R31,R31,0 instruction completes. The ori R31,R31,0
    instruction may complete before storage accesses associated with
    instructions preceding the ori R31,R31,0 instruction have been
    performed

Regarding the example, this implies that `if B goto e` will not execute
before `if A goto e` completes. Once `if A goto e` completes, the CPU
should find that the speculation was wrong and continue with `exit`.

If there is any other path that leads to `if B goto e` (and therefore
`unsafe()`) without going through `if A goto e`, then a nospec will
still be needed there. However, this patch assumes this other path will
be explored separately and therefore be discovered by the verifier even
if the exploration discussed here stops at the nospec.

This patch furthermore has the unfortunate consequence that Spectre v1
mitigations now only support architectures which implement BPF_NOSPEC.
Before this commit, Spectre v1 mitigations prevented exploits by
rejecting the programs on all architectures. Because some JITs do not
implement BPF_NOSPEC, this patch therefore may regress unpriv BPF's
security to a limited extent:

* The regression is limited to systems vulnerable to Spectre v1, have
  unprivileged BPF enabled, and do NOT emit insns for BPF_NOSPEC. The
  latter is not the case for x86 64- and 32-bit, arm64, and powerpc
  64-bit and they are therefore not affected by the regression.
  According to commit a6f6a95f25 ("LoongArch, bpf: Fix jit to skip
  speculation barrier opcode"), LoongArch is not vulnerable to Spectre
  v1 and therefore also not affected by the regression.

* To the best of my knowledge this regression may therefore only affect
  MIPS. This is deemed acceptable because unpriv BPF is still disabled
  there by default. As stated in a previous commit, BPF_NOSPEC could be
  implemented for MIPS based on GCC's speculation_barrier
  implementation.

* It is unclear which other architectures (besides x86 64- and 32-bit,
  ARM64, PowerPC 64-bit, LoongArch, and MIPS) supported by the kernel
  are vulnerable to Spectre v1. Also, it is not clear if barriers are
  available on these architectures. Implementing BPF_NOSPEC on these
  architectures therefore is non-trivial. Searching GCC and the kernel
  for speculation barrier implementations for these architectures
  yielded no result.

* If any of those regressed systems is also vulnerable to Spectre v4,
  the system was already vulnerable to Spectre v4 attacks based on
  unpriv BPF before this patch and the impact is therefore further
  limited.

As an alternative to regressing security, one could still reject
programs if the architecture does not emit BPF_NOSPEC (e.g., by removing
the empty BPF_NOSPEC-case from all JITs except for LoongArch where it
appears justified). However, this will cause rejections on these archs
that are likely unfounded in the vast majority of cases.

In the tests, some are now successful where we previously had a
false-positive (i.e., rejection). Change them to reflect where the
nospec should be inserted (using __xlated_unpriv) and modify the error
message if the nospec is able to mitigate a problem that previously
shadowed another problem (in that case __xlated_unpriv does not work,
therefore just add a comment).

Define SPEC_V1 to avoid duplicating this ifdef whenever we check for
nospec insns using __xlated_unpriv, define it here once. This also
improves readability. PowerPC can probably also be added here. However,
omit it for now because the BPF CI currently does not include a test.

Limit it to EPERM, EACCES, and EINVAL (and not everything except for
EFAULT and ENOMEM) as it already has the desired effect for most
real-world programs. Briefly went through all the occurrences of EPERM,
EINVAL, and EACCESS in verifier.c to validate that catching them like
this makes sense.

Thanks to Dustin for their help in checking the vendor documentation.

[1] https://lpc.events/event/18/contributions/1954/ ("Mitigating
    Spectre-PHT using Speculation Barriers in Linux eBPF")
[2] https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and
    Precise Spectre Defenses for Untrusted Linux Kernel Extensions")
[3] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/runtime-speculative-side-channel-mitigations.html
    ("Managed Runtime Speculative Execution Side Channel Mitigations")
[4] https://dl.acm.org/doi/pdf/10.1145/3359789.3359837 ("Speculator: a
    tool to analyze speculative execution attacks and mitigations" -
    Section 4.6 "Stopping Speculative Execution")
[5] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/software-techniques-for-managing-speculation.pdf
    ("White Paper - SOFTWARE TECHNIQUES FOR MANAGING SPECULATION ON AMD
    PROCESSORS - REVISION 5.09.23")
[6] https://developer.arm.com/documentation/ddi0597/2020-12/Base-Instructions/SB--Speculation-Barrier-
    ("SB - Speculation Barrier - Arm Armv8-A A32/T32 Instruction Set
    Architecture (2020-12)")
[7] https://wiki.raptorcs.com/w/images/5/5f/OPF_PowerISA_v3.1C.pdf
    ("Power ISA™ - Version 3.1C - May 26, 2024 - Section 9.2.1 of Book
    III")
[8] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/40332.pdf
    ("AMD64 Architecture Programmer’s Manual Volumes 1–5 - Revision 4.08
    - April 2024 - 7.6.4 Serializing Instructions")

Signed-off-by: Luis Gerhorst <luis.gerhorst@fau.de>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Henriette Herzog <henriette.herzog@rub.de>
Cc: Dustin Nguyen <nguyen@cs.fau.de>
Cc: Maximilian Ott <ott@cs.fau.de>
Cc: Milan Stephan <milan.stephan@fau.de>
Link: https://lore.kernel.org/r/20250603212428.338473-1-luis.gerhorst@fau.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-09 20:11:10 -07:00
..
acct selftests: acct: Add ksft_exit_skip if not running as root 2025-01-14 17:06:31 -07:00
alsa selftests/alsa: Fix circular dependency involving global-timer 2024-12-20 10:00:41 +01:00
amd-pstate
arm64 kselftest/arm64: Set default OUTPUT path when undefined 2025-05-16 15:15:13 +01:00
bpf bpf: Fall back to nospec for Spectre v1 2025-06-09 20:11:10 -07:00
breakpoints
cachestat
capabilities
cgroup Generic: 2025-06-02 12:24:58 -07:00
clone3 selftests/pidfd: fixes syscall number defines 2025-03-25 14:59:05 +01:00
connector
core
coredump selftests/coredump: add tests for AF_UNIX coredumps 2025-05-21 13:59:12 +02:00
cpu-hotplug
cpufreq kselftest: cpufreq: Get rid of double suspend in rtcwake case 2025-05-09 12:43:39 -06:00
damon selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled 2025-05-31 22:46:15 -07:00
devices
dma
dmabuf-heaps
drivers net: devmem: ncdevmem: remove unused variable 2025-05-27 19:19:36 -07:00
dt
efivarfs selftests/efivarfs: add concurrent update tests 2025-01-21 16:34:41 +01:00
exec AT_EXECVE_CHECK update for v6.14-rc1 (fix1) 2025-01-31 17:12:31 -08:00
fchmodat2
filelock
filesystems - The 3 patch series "hung_task: extend blocking task stacktrace dump to 2025-05-31 19:12:53 -07:00
firmware
fpu
ftrace selftests/ftrace: Convert poll to a gen_file 2025-05-09 12:43:10 -06:00
futex selftests/futex: Fix spelling mistake "unitiliazed" -> "uninitialized" 2025-05-21 13:57:41 +02:00
gpio selftests: gpio: gpio-aggregator: add a test case for _sysfs prefix reservation 2025-04-14 22:30:01 +02:00
hid lib/crc: remove unnecessary prompt for CONFIG_CRC_T10DIF 2025-04-04 11:31:42 -07:00
ia64
intel_pstate
iommu iommufd: Test attach before detaching pasid 2025-03-28 11:40:41 -03:00
ipc selftests/ipc: Remove unused variables 2025-01-14 17:06:31 -07:00
ir
kcmp
kexec selftests/kexec: Add x86_64 selftest for kexec-jump and exception handling 2025-04-10 12:17:14 +02:00
kmod lib/test_kmod: do not hardcode/depend on any filesystem 2025-05-11 17:54:09 -07:00
kselftest printf: convert self-test to KUnit 2025-03-13 10:26:33 -07:00
kselftest_harness selftests: harness: Add kselftest harness selftest 2025-05-21 15:32:27 +02:00
kvm KVM SVM changes for 6.16: 2025-05-27 12:15:49 -04:00
landlock selftests/landlock: Add PID tests for audit records 2025-04-11 12:53:22 +02:00
lib lib/prime_numbers: KUnit test should not select PRIME_NUMBERS 2025-04-15 13:50:43 -07:00
livepatch Livepatching changes for 6.15 2025-03-27 19:26:10 -07:00
lkdtm
locking
lsm selftests: refactor the lsm `flags_overset_lsm_set_self_attr` test 2024-12-18 18:14:29 -05:00
media_tests selftest: media_tests: fix trivial UAF typo 2025-01-14 17:06:31 -07:00
membarrier
memfd selftests/memfd/memfd_test: fix possible NULL pointer dereference 2025-01-25 20:22:44 -08:00
memory-hotplug
mincore 31 hotfixes. 9 are cc:stable and the remainder address post-6.15 issues 2025-04-16 20:07:32 -07:00
mm - The 2 patch series "zram: support algorithm-specific parameters" from 2025-06-02 16:00:26 -07:00
module
mount
mount_setattr selftests/mount_settattr: remove duplicate syscall definitions 2025-05-12 11:40:12 +02:00
move_mount_set_group
mqueue
mseal_system_mappings selftest: test system mappings are sealed 2025-04-01 15:17:16 -07:00
nci selftests: nci: Fix "Electrnoics" to "Electronics" 2025-05-20 18:13:43 -07:00
net selftests: netfilter: Fix skip of wildcard interface test 2025-05-28 09:48:41 +02:00
nolibc selftests/nolibc: drop include guards around standard headers 2025-05-21 15:32:27 +02:00
ntb
openat2
pci_endpoint misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO 2025-03-26 06:11:54 +00:00
pcie_bwctrl selftests/pcie_bwctrl: Fix test progs list 2025-04-18 08:23:22 -05:00
perf_events selftests/perf_events: Fix spelling mistake "sycnhronize" -> "synchronize" 2025-04-29 13:35:55 -06:00
pid_namespace selftests: pid_namespace: add missing sys/mount.h include in pid_max.c 2025-05-09 13:12:33 -06:00
pidfd vfs-6.16-rc1.selftests 2025-05-26 11:32:28 -07:00
power_supply
powerpc powerpc updates for 6.15 2025-03-27 19:39:08 -07:00
prctl
proc
pstore
ptp testptp: Add option to open PHC in readonly mode 2025-03-05 12:43:54 +00:00
ptrace selftests/ptrace: add a test case for PTRACE_SET_SYSCALL_INFO 2025-05-11 17:48:16 -07:00
rcutorture rcutorture: Fix issue with re-using old images on ARM64 2025-05-16 11:15:34 -04:00
resctrl selftests/resctrl: Discover SNC kernel support and adjust messages 2025-01-14 17:06:32 -07:00
ring-buffer selftests/ring-buffer: Add test for out-of-bound pgoff mapping 2025-01-14 17:06:32 -07:00
riscv selftests: riscv: fix v_exec_initval_nolibc.c 2025-04-01 07:03:04 +00:00
rlimits
rseq rseq/selftests: Fix namespace collision with rseq UAPI header 2025-03-19 21:26:24 +01:00
rtc rtc: remove 'setdate' test program 2025-04-01 15:25:15 +02:00
rust
safesetid
sched sched/debug: Remove CONFIG_SCHED_DEBUG from self-test config files 2025-03-19 22:23:24 +01:00
sched_ext selftests/sched_ext: Update test enq_select_cpu_fails 2025-05-21 07:35:58 -10:00
seccomp selftests: seccomp: Fix "performace" to "performance" 2025-05-20 13:16:39 -07:00
sgx
signal
size
sparc64
splice
static_keys
sync
syscall_user_dispatch
sysctl sysctl: Add 0012 to test the u8 range check 2025-04-14 14:13:41 +02:00
tc-testing Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-05-28 10:11:15 +02:00
tdx
thermal/intel selftests: fix some typos in tools/testing/selftests 2025-05-11 17:54:13 -07:00
timens selftests/timens: timerfd: Use correct clockid type in tclock_gettime() 2025-05-09 13:12:57 -06:00
timers selftests/timers: Improve skew_consistency by testing with other clockids 2025-03-21 19:16:18 +01:00
tmpfs selftests: tmpfs: Add kselftest support to tmpfs 2025-01-14 17:06:32 -07:00
tpm2 selftests: tpm2: test_smoke: use POSIX-conformant expression operator 2025-04-08 14:56:13 -06:00
tty
turbostat
ublk selftests: ublk: add test for UBLK_F_QUIESCE 2025-05-23 09:42:12 -06:00
uevent
user_events selftests/user_events: Fix failures caused by test code 2025-02-24 16:37:17 -07:00
vDSO Updates for the VDSO infrastructure: 2025-03-25 11:30:42 -07:00
watchdog
wireguard wireguard: selftests: specify -std=gnu17 for bash 2025-05-27 09:06:19 +02:00
x86 Merge commit 'its-for-linus-20250509-merge' into x86/core, to resolve conflicts 2025-05-13 10:47:10 +02:00
zram selftests/zram: gitignore output file 2025-01-14 17:06:31 -07:00
.gitignore selftests: tpm2: create a dedicated .gitignore 2025-04-08 14:56:13 -06:00
Makefile Networking changes for 6.16. 2025-05-28 15:24:36 -07:00
gen_kselftest_tar.sh
kselftest.h Revert "selftests: kselftest: Fix build failure with NOLIBC" 2025-02-26 22:13:48 +01:00
kselftest_deps.sh
kselftest_harness.h selftests: harness: Stop using setjmp()/longjmp() 2025-05-21 15:32:37 +02:00
kselftest_install.sh
kselftest_module.h
lib.mk selftests: Add headers target 2025-03-03 20:00:12 +01:00
run_kselftest.sh selftests/run_kselftest.sh: Use readlink if realpath is not available 2025-05-15 16:52:47 -06:00