linux-kernelorg-stable/tools/testing/selftests
Eduard Zingerman 2793a8b015 bpf: exact states comparison for iterator convergence checks
Convergence for open coded iterators is computed in is_state_visited()
by examining states with branches count > 1 and using states_equal().
states_equal() computes sub-state relation using read and precision marks.
Read and precision marks are propagated from children states,
thus are not guaranteed to be complete inside a loop when branches
count > 1. This could be demonstrated using the following unsafe program:

     1. r7 = -16
     2. r6 = bpf_get_prandom_u32()
     3. while (bpf_iter_num_next(&fp[-8])) {
     4.   if (r6 != 42) {
     5.     r7 = -32
     6.     r6 = bpf_get_prandom_u32()
     7.     continue
     8.   }
     9.   r0 = r10
    10.   r0 += r7
    11.   r8 = *(u64 *)(r0 + 0)
    12.   r6 = bpf_get_prandom_u32()
    13. }

Here verifier would first visit path 1-3, create a checkpoint at 3
with r7=-16, continue to 4-7,3 with r7=-32.

Because instructions at 9-12 had not been visitied yet existing
checkpoint at 3 does not have read or precision mark for r7.
Thus states_equal() would return true and verifier would discard
current state, thus unsafe memory access at 11 would not be caught.

This commit fixes this loophole by introducing exact state comparisons
for iterator convergence logic:
- registers are compared using regs_exact() regardless of read or
  precision marks;
- stack slots have to have identical type.

Unfortunately, this is too strict even for simple programs like below:

    i = 0;
    while(iter_next(&it))
      i++;

At each iteration step i++ would produce a new distinct state and
eventually instruction processing limit would be reached.

To avoid such behavior speculatively forget (widen) range for
imprecise scalar registers, if those registers were not precise at the
end of the previous iteration and do not match exactly.

This a conservative heuristic that allows to verify wide range of
programs, however it precludes verification of programs that conjure
an imprecise value on the first loop iteration and use it as precise
on the second.

Test case iter_task_vma_for_each() presents one of such cases:

        unsigned int seen = 0;
        ...
        bpf_for_each(task_vma, vma, task, 0) {
                if (seen >= 1000)
                        break;
                ...
                seen++;
        }

Here clang generates the following code:

<LBB0_4>:
      24:       r8 = r6                          ; stash current value of
                ... body ...                       'seen'
      29:       r1 = r10
      30:       r1 += -0x8
      31:       call bpf_iter_task_vma_next
      32:       r6 += 0x1                        ; seen++;
      33:       if r0 == 0x0 goto +0x2 <LBB0_6>  ; exit on next() == NULL
      34:       r7 += 0x10
      35:       if r8 < 0x3e7 goto -0xc <LBB0_4> ; loop on seen < 1000

<LBB0_6>:
      ... exit ...

Note that counter in r6 is copied to r8 and then incremented,
conditional jump is done using r8. Because of this precision mark for
r6 lags one state behind of precision mark on r8 and widening logic
kicks in.

Adding barrier_var(seen) after conditional is sufficient to force
clang use the same register for both counting and conditional jump.

This issue was discussed in the thread [1] which was started by
Andrew Werner <awerner32@gmail.com> demonstrating a similar bug
in callback functions handling. The callbacks would be addressed
in a followup patch.

[1] https://lore.kernel.org/bpf/97a90da09404c65c8e810cf83c94ac703705dc0e.camel@gmail.com/

Co-developed-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Co-developed-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20231024000917.12153-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-10-23 21:49:31 -07:00
..
alsa
amd-pstate
arm64
bpf bpf: exact states comparison for iterator convergence checks 2023-10-23 21:49:31 -07:00
breakpoints
cachestat
capabilities
cgroup
clone3
connector
core
cpu-hotplug
cpufreq
damon
dma
dmabuf-heaps
drivers selftests: netdevsim: use suitable existing dummy file for flash test 2023-10-13 10:43:13 +01:00
efivarfs
exec
fchmodat2
filelock
filesystems
firmware
fpu
ftrace
futex
gpio
hid
ia64
intel_pstate
iommu
ipc
ir
kcmp
kexec
kmod
kselftest
kvm
landlock
lib
livepatch
lkdtm
locking
media_tests
membarrier
memfd
memory-hotplug
mincore
mm
mount
mount_setattr
move_mount_set_group
mqueue
nci
net selftests: net: remove unused variables 2023-10-16 09:20:08 +01:00
netfilter
nolibc
nsfs
ntb
openat2
perf_events
pid_namespace
pidfd
powerpc
prctl
proc
pstore
ptp ptp: add testptp mask test 2023-10-15 20:07:52 +01:00
ptrace
rcutorture
resctrl
riscv
rlimits
rseq
rtc
safesetid
sched
seccomp
sgx
sigaltstack
size
sparc64
splice
static_keys
sync
syscall_user_dispatch
sysctl
tc-testing
tdx
timens
timers
tmpfs
tpm2
tty
uevent
user
user_events
vDSO
watchdog
wireguard
x86
zram
.gitignore
Makefile
gen_kselftest_tar.sh
kselftest.h
kselftest_deps.sh
kselftest_harness.h
kselftest_install.sh
kselftest_module.h
lib.mk
run_kselftest.sh