Go to file
David Howells 89635eae07
netfs: Fix race between cache write completion and ALL_QUEUED being set
When netfslib is issuing subrequests, the subrequests start processing
immediately and may complete before we reach the end of the issuing
function.  At the end of the issuing function we set NETFS_RREQ_ALL_QUEUED
to indicate to the collector that we aren't going to issue any more subreqs
and that it can do the final notifications and cleanup.

Now, this isn't a problem if the request is synchronous
(NETFS_RREQ_OFFLOAD_COLLECTION is unset) as the result collection will be
done in-thread and we're guaranteed an opportunity to run the collector.

However, if the request is asynchronous, collection is primarily triggered
by the termination of subrequests queuing it on a workqueue.  Now, a race
can occur here if the app thread sets ALL_QUEUED after the last subrequest
terminates.

This can happen most easily with the copy2cache code (as used by Ceph)
where, in the collection routine of a read request, an asynchronous write
request is spawned to copy data to the cache.  Folios are added to the
write request as they're unlocked, but there may be a delay before
ALL_QUEUED is set as the write subrequests may complete before we get
there.

If all the write subreqs have finished by the ALL_QUEUED point, no further
events happen and the collection never happens, leaving the request
hanging.

Fix this by queuing the collector after setting ALL_QUEUED.  This is a bit
heavy-handed and it may be sufficient to do it only if there are no extant
subreqs.

Also add a tracepoint to cross-reference both requests in a copy-to-request
operation and add a trace to the netfs_rreq tracepoint to indicate the
setting of ALL_QUEUED.

Fixes: e2d46f2ec3 ("netfs: Change the read result collector to only use one work item")
Reported-by: Max Kellermann <max.kellermann@ionos.com>
Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/20250711151005.2956810-3-dhowells@redhat.com
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Viacheslav Dubeyko <slava@dubeyko.com>
cc: Alex Markuze <amarkuze@redhat.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: netfs@lists.linux.dev
cc: ceph-devel@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14 11:05:02 +02:00
Documentation i2c-for-6.16-rc5 2025-07-05 12:54:24 -07:00
LICENSES
arch - Make sure AMD SEV guests using secure TSC, include a TSC_FACTOR which 2025-07-06 10:44:20 -07:00
block block-6.16-20250626 2025-06-27 09:02:33 -07:00
certs
crypto crypto: wp512 - Use API partial block handling 2025-06-23 16:56:56 +08:00
drivers - Initialize sysfs attributes properly to avoid lockdep complaining about 2025-07-06 09:29:24 -07:00
fs netfs: Fix race between cache write completion and ALL_QUEUED being set 2025-07-14 11:05:02 +02:00
include netfs: Fix race between cache write completion and ALL_QUEUED being set 2025-07-14 11:05:02 +02:00
init futex: Temporary disable FUTEX_PRIVATE_HASH 2025-07-01 15:02:05 +02:00
io_uring io_uring-6.16-20250630 2025-06-30 16:32:43 -07:00
ipc - The 3 patch series "hung_task: extend blocking task stacktrace dump to 2025-05-31 19:12:53 -07:00
kernel - Fix the calculation of the deadline server task's runtime as this mishap was 2025-07-06 11:17:47 -07:00
lib Including fixes from Bluetooth. 2025-07-03 09:18:55 -07:00
mm secretmem: use SB_I_NOEXEC 2025-07-07 13:26:09 +02:00
net Including fixes from Bluetooth. 2025-07-03 09:18:55 -07:00
rust Driver core fixes for 6.16-rc3 2025-06-18 14:31:16 -07:00
samples - The 3 patch series "hung_task: extend blocking task stacktrace dump to 2025-05-31 19:12:53 -07:00
scripts scripts/gdb: fix dentry_name() lookup 2025-06-25 15:55:03 -07:00
security selinux: change security_compute_sid to return the ssid or tsid on match 2025-06-19 16:13:16 -04:00
sound ALSA: hda/realtek: Fix built-in mic on ASUS VivoBook X507UAR 2025-06-26 08:02:44 +02:00
tools - Fix the compilation of an x86 kernel on a big engian machine due to 2025-07-06 10:55:59 -07:00
usr
virt Merge branch 'kvm-lockdep-common' into HEAD 2025-05-28 06:29:17 -04:00
.clang-format
.clippy.toml
.cocciconfig
.editorconfig
.get_maintainer.ignore
.gitattributes
.gitignore
.mailmap 16 hotfixes. 6 are cc:stable and the remainder address post-6.15 issues 2025-06-27 20:34:10 -07:00
.pylintrc
.rustfmt.toml
COPYING
CREDITS CREDITS: Add entry for Shannon Nelson 2025-06-21 07:34:28 -07:00
Kbuild
Kconfig
MAINTAINERS Including fixes from Bluetooth. 2025-07-03 09:18:55 -07:00
Makefile Linux 6.16-rc5 2025-07-06 14:10:26 -07:00
README

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.