Compare commits

...

143 Commits

Author SHA1 Message Date
Greg Kroah-Hartman f1e375d5eb Linux 6.12.48
Link: https://lore.kernel.org/r/20250917123344.315037637@linuxfoundation.org
Tested-by: Hardik Garg <hargar@linux.microsoft.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Tested-by: Brett A C Sheffield <bacs@librecast.net>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Brett Mastbergen <bmastbergen@ciq.com>
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Pavel Machek (CIP) <pavel@denx.de>
Tested-by: Peter Schneider <pschneider1968@googlemail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:52 +02:00
Guenter Roeck 9e70cd1b77 x86: disable image size check for test builds
commit 00a241f528 upstream.

64-bit allyesconfig builds fail with

x86_64-linux-ld: kernel image bigger than KERNEL_IMAGE_SIZE

Bisect points to commit 6f110a5e4f ("Disable SLUB_TINY for build
testing") as the responsible commit.  Reverting that patch does indeed fix
the problem.  Further analysis shows that disabling SLUB_TINY enables
KASAN, and that KASAN is responsible for the image size increase.

Solve the build problem by disabling the image size check for test
builds.

[akpm@linux-foundation.org: add comment, fix nearby typo (sink->sync)]
[akpm@linux-foundation.org: fix comment snafu
  Link: https://lore.kernel.org/oe-kbuild-all/202504191813.4r9H6Glt-lkp@intel.com/
Link: https://lkml.kernel.org/r/20250417010950.2203847-1-linux@roeck-us.net
Fixes: 6f110a5e4f ("Disable SLUB_TINY for build testing")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <x86@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:52 +02:00
Florian Westphal 44b2be6d59 netfilter: nft_set_pipapo: fix null deref for empty set
commit 30c1d25b98 upstream.

Blamed commit broke the check for a null scratch map:
  -  if (unlikely(!m || !*raw_cpu_ptr(m->scratch)))
  +  if (unlikely(!raw_cpu_ptr(m->scratch)))

This should have been "if (!*raw_ ...)".
Use the pattern of the avx2 version which is more readable.

This can only be reproduced if avx2 support isn't available.

Fixes: d8d871a35c ("netfilter: nft_set_pipapo: merge pipapo_get/lookup")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Alex Deucher 5539bc82ce drm/amdgpu: fix a memory leak in fence cleanup when unloading
commit 7838fb5f11 upstream.

Commit b61badd20b ("drm/amdgpu: fix usage slab after free")
reordered when amdgpu_fence_driver_sw_fini() was called after
that patch, amdgpu_fence_driver_sw_fini() effectively became
a no-op as the sched entities we never freed because the
ring pointers were already set to NULL.  Remove the NULL
setting.

Reported-by: Lin.Cao <lincao12@amd.com>
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Fixes: b61badd20b ("drm/amdgpu: fix usage slab after free")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a525fa37aac36c4591cc8b07ae8957862415fbd5)
Cc: stable@vger.kernel.org
[ Adapt to conditional check ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Jani Nikula 215ea32e1f drm/i915/power: fix size for for_each_set_bit() in abox iteration
commit cfa7b76597 upstream.

for_each_set_bit() expects size to be in bits, not bytes. The abox mask
iteration uses bytes, but it works by coincidence, because the local
variable holding the mask is unsigned long, and the mask only ever has
bit 2 as the highest bit. Using a smaller type could lead to subtle and
very hard to track bugs.

Fixes: 62afef2811 ("drm/i915/rkl: RKL uses ABOX0 for pixel transfers")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: stable@vger.kernel.org # v5.9+
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250905104149.1144751-1-jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit 7ea3baa6efe4bb93d11e1c0e6528b1468d7debf6)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
[ adapted struct intel_display *display parameters to struct drm_i915_private *dev_priv ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Buday Csaba b9f9035d94 net: mdiobus: release reset_gpio in mdiobus_unregister_device()
commit 8ea25274eb upstream.

reset_gpio is claimed in mdiobus_register_device(), but it is not
released in mdiobus_unregister_device(). It is instead only
released when the whole MDIO bus is unregistered.
When a device uses the reset_gpio property, it becomes impossible
to unregister it and register it again, because the GPIO remains
claimed.
This patch resolves that issue.

Fixes: bafbdd527d ("phylib: Add device reset GPIO support") # see notes
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Cc: Csókás Bence <csokas.bence@prolan.hu>
[ csokas.bence: Resolve rebase conflict and clarify msg ]
Signed-off-by: Buday Csaba <buday.csaba@prolan.hu>
Link: https://patch.msgid.link/20250807135449.254254-2-csokas.bence@prolan.hu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[ csokas.bence: Use the v1 patch on top of 6.12, as specified in notes ]
Signed-off-by: Bence Csókás <csokas.bence@prolan.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
K Prateek Nayak 01e528e63c x86/cpu/topology: Always try cpu_parse_topology_ext() on AMD/Hygon
commit cba4262a19 upstream.

Support for parsing the topology on AMD/Hygon processors using CPUID leaf 0xb
was added in

  3986a0a805 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available").

In an effort to keep all the topology parsing bits in one place, this commit
also introduced a pseudo dependency on the TOPOEXT feature to parse the CPUID
leaf 0xb.

The TOPOEXT feature (CPUID 0x80000001 ECX[22]) advertises the support for
Cache Properties leaf 0x8000001d and the CPUID leaf 0x8000001e EAX for
"Extended APIC ID" however support for 0xb was introduced alongside the x2APIC
support not only on AMD [1], but also historically on x86 [2].

Similar to 0xb, the support for extended CPU topology leaf 0x80000026 too does
not depend on the TOPOEXT feature.

The support for these leaves is expected to be confirmed by ensuring

  leaf <= {extended_}cpuid_level

and then parsing the level 0 of the respective leaf to confirm EBX[15:0]
(LogProcAtThisLevel) is non-zero as stated in the definition of
"CPUID_Fn0000000B_EAX_x00 [Extended Topology Enumeration]
(Core::X86::Cpuid::ExtTopEnumEax0)" in Processor Programming Reference (PPR)
for AMD Family 19h Model 01h Rev B1 Vol1 [3] Sec. 2.1.15.1 "CPUID Instruction
Functions".

This has not been a problem on baremetal platforms since support for TOPOEXT
(Fam 0x15 and later) predates the support for CPUID leaf 0xb (Fam 0x17[Zen2]
and later), however, for AMD guests on QEMU, the "x2apic" feature can be
enabled independent of the "topoext" feature where QEMU expects topology and
the initial APICID to be parsed using the CPUID leaf 0xb (especially when
number of cores > 255) which is populated independent of the "topoext" feature
flag.

Unconditionally call cpu_parse_topology_ext() on AMD and Hygon processors to
first parse the topology using the XTOPOLOGY leaves (0x80000026 / 0xb) before
using the TOPOEXT leaf (0x8000001e).

While at it, break down the single large comment in parse_topology_amd() to
better highlight the purpose of each CPUID leaf.

Fixes: 3986a0a805 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available")
Suggested-by: Naveen N Rao (AMD) <naveen@kernel.org>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org # Only v6.9 and above; depends on x86 topology rewrite
Link: https://lore.kernel.org/lkml/1529686927-7665-1-git-send-email-suravee.suthikulpanit@amd.com/ [1]
Link: https://lore.kernel.org/lkml/20080818181435.523309000@linux-os.sc.intel.com/ [2]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 [3]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Johan Hovold 170eaf97d5 phy: ti-pipe3: fix device leak at unbind
commit e19bcea997 upstream.

Make sure to drop the reference to the control device taken by
of_find_device_by_node() during probe when the driver is unbound.

Fixes: 918ee0d21b ("usb: phy: omap-usb3: Don't use omap_get_control_dev()")
Cc: stable@vger.kernel.org	# 3.13
Cc: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20250724131206.2211-4-johan@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Johan Hovold 4dae01a7b2 phy: ti: omap-usb2: fix device leak at unbind
commit 64961557ef upstream.

Make sure to drop the reference to the control device taken by
of_find_device_by_node() during probe when the driver is unbound.

Fixes: 478b6c7436 ("usb: phy: omap-usb2: Don't use omap_get_control_dev()")
Cc: stable@vger.kernel.org	# 3.13
Cc: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20250724131206.2211-3-johan@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Johan Hovold 28cfc6ab15 phy: tegra: xusb: fix device and OF node leak at probe
commit bca065733a upstream.

Make sure to drop the references taken to the PMC OF node and device by
of_parse_phandle() and of_find_device_by_node() during probe.

Note the holding a reference to the PMC device does not prevent the
PMC regmap from going away (e.g. if the PMC driver is unbound) so there
is no need to keep the reference.

Fixes: 2d10214872 ("phy: tegra: xusb: Add wake/sleepwalk for Tegra210")
Cc: stable@vger.kernel.org	# 5.14
Cc: JC Kuo <jckuo@nvidia.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250724131206.2211-2-johan@kernel.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Miaoqian Lin feb1f80228 dmaengine: dw: dmamux: Fix device reference leak in rzn1_dmamux_route_allocate
commit aa2e1e4563 upstream.

The reference taken by of_find_device_by_node()
must be released when not needed anymore.
Add missing put_device() call to fix device reference leaks.

Fixes: 134d9c52fc ("dmaengine: dw: dmamux: Introduce RZN1 DMA router support")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20250902090358.2423285-1-linmq006@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Stephan Gerhold 1fc14731f0 dmaengine: qcom: bam_dma: Fix DT error handling for num-channels/ees
commit 5068b52548 upstream.

When we don't have a clock specified in the device tree, we have no way to
ensure the BAM is on. This is often the case for remotely-controlled or
remotely-powered BAM instances. In this case, we need to read num-channels
from the DT to have all the necessary information to complete probing.

However, at the moment invalid device trees without clock and without
num-channels still continue probing, because the error handling is missing
return statements. The driver will then later try to read the number of
channels from the registers. This is unsafe, because it relies on boot
firmware and lucky timing to succeed. Unfortunately, the lack of proper
error handling here has been abused for several Qualcomm SoCs upstream,
causing early boot crashes in several situations [1, 2].

Avoid these early crashes by erroring out when any of the required DT
properties are missing. Note that this will break some of the existing DTs
upstream (mainly BAM instances related to the crypto engine). However,
clearly these DTs have never been tested properly, since the error in the
kernel log was just ignored. It's safer to disable the crypto engine for
these broken DTBs.

[1]: https://lore.kernel.org/r/CY01EKQVWE36.B9X5TDXAREPF@fairphone.com/
[2]: https://lore.kernel.org/r/20230626145959.646747-1-krzysztof.kozlowski@linaro.org/

Cc: stable@vger.kernel.org
Fixes: 48d163b1aa ("dmaengine: qcom: bam_dma: get num-channels and num-ees from dt")
Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20250212-bam-dma-fixes-v1-8-f560889e65d8@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Takashi Iwai 877135c58a usb: gadget: midi2: Fix MIDI2 IN EP max packet size
commit 116e79c679 upstream.

The EP-IN of MIDI2 (altset 1) wasn't initialized in
f_midi2_create_usb_configs() as it's an INT EP unlike others BULK
EPs.  But this leaves rather the max packet size unchanged no matter
which speed is used, resulting in the very slow access.
And the wMaxPacketSize values set there look legit for INT EPs, so
let's initialize the MIDI2 EP-IN there for achieving the equivalent
speed as well.

Fixes: 8b645922b2 ("usb: gadget: Add support for USB MIDI 2.0 function driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20250905133240.20966-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Takashi Iwai 47949bcf66 usb: gadget: midi2: Fix missing UMP group attributes initialization
commit 21d8525d2e upstream.

The gadget card driver forgot to call snd_ump_update_group_attrs()
after adding FBs, and this leaves the UMP group attributes
uninitialized.  As a result, -ENODEV error is returned at opening a
legacy rawmidi device as an inactive group.

This patch adds the missing call to address the behavior above.

Fixes: 8b645922b2 ("usb: gadget: Add support for USB MIDI 2.0 function driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20250904153932.13589-1-tiwai@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
RD Babiera edfa1f21c2 usb: typec: tcpm: properly deliver cable vdms to altmode drivers
commit f34bfcc77b upstream.

tcpm_handle_vdm_request delivers messages to the partner altmode or the
cable altmode depending on the SVDM response type, which is incorrect.
The partner or cable should be chosen based on the received message type
instead.

Also add this filter to ADEV_NOTIFY_USB_AND_QUEUE_VDM, which is used when
the Enter Mode command is responded to by a NAK on SOP or SOP' and when
the Exit Mode command is responded to by an ACK on SOP.

Fixes: 7e7877c55e ("usb: typec: tcpm: add alt mode enter/exit/vdm support for sop'")
Cc: stable@vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera@google.com>
Reviewed-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://lore.kernel.org/r/20250821203759.1720841-2-rdbabiera@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:51 +02:00
Alan Stern 2d10b29a7e USB: gadget: dummy-hcd: Fix locking bug in RT-enabled kernels
commit 8d63c83d8e upstream.

Yunseong Kim and the syzbot fuzzer both reported a problem in
RT-enabled kernels caused by the way dummy-hcd mixes interrupt
management and spin-locking.  The pattern was:

	local_irq_save(flags);
	spin_lock(&dum->lock);
	...
	spin_unlock(&dum->lock);
	...		// calls usb_gadget_giveback_request()
	local_irq_restore(flags);

The code was written this way because usb_gadget_giveback_request()
needs to be called with interrupts disabled and the private lock not
held.

While this pattern works fine in non-RT kernels, it's not good when RT
is enabled.  RT kernels handle spinlocks much like mutexes; in particular,
spin_lock() may sleep.  But sleeping is not allowed while local
interrupts are disabled.

To fix the problem, rewrite the code to conform to the pattern used
elsewhere in dummy-hcd and other UDC drivers:

	spin_lock_irqsave(&dum->lock, flags);
	...
	spin_unlock(&dum->lock);
	usb_gadget_giveback_request(...);
	spin_lock(&dum->lock);
	...
	spin_unlock_irqrestore(&dum->lock, flags);

This approach satisfies the RT requirements.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@kernel.org>
Fixes: b4dbda1a22 ("USB: dummy-hcd: disable interrupts during req->complete")
Reported-by: Yunseong Kim <ysk@kzalloc.com>
Closes: <https://lore.kernel.org/linux-usb/5b337389-73b9-4ee4-a83e-7e82bf5af87a@kzalloc.com/>
Reported-by: syzbot+8baacc4139f12fa77909@syzkaller.appspotmail.com
Closes: <https://lore.kernel.org/linux-usb/68ac2411.050a0220.37038e.0087.GAE@google.com/>
Tested-by: syzbot+8baacc4139f12fa77909@syzkaller.appspotmail.com
CC: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: stable@vger.kernel.org
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/bb192ae2-4eee-48ee-981f-3efdbbd0d8f0@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:50 +02:00
Mathias Nyman e64b2ff864 xhci: fix memory leak regression when freeing xhci vdev devices depth first
commit edcbe06453 upstream.

Suspend-resume cycle test revealed a memory leak in 6.17-rc3

Turns out the slot_id race fix changes accidentally ends up calling
xhci_free_virt_device() with an incorrect vdev parameter.
The vdev variable was reused for temporary purposes right before calling
xhci_free_virt_device().

Fix this by passing the correct vdev parameter.

The slot_id race fix that caused this regression was targeted for stable,
so this needs to be applied there as well.

Fixes: 2eb0337615 ("usb: xhci: Fix slot_id resource race conflict")
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/linux-usb/20250829181354.4450-1-00107082@163.com
Suggested-by: Michal Pecio <michal.pecio@gmail.com>
Suggested-by: David Wang <00107082@163.com>
Cc: stable@vger.kernel.org
Tested-by: David Wang <00107082@163.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20250902105306.877476-4-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:50 +02:00
Palmer Dabbelt cfcde627f0 RISC-V: Remove unnecessary include from compat.h
[ Upstream commit 8d4f1e05ff ]

Without this I get a bunch of build errors like

    In file included from ./include/linux/sched/task_stack.h:12,
                     from ./arch/riscv/include/asm/compat.h:12,
                     from ./arch/riscv/include/asm/pgtable.h:115,
                     from ./include/linux/pgtable.h:6,
                     from ./include/linux/mm.h:30,
                     from arch/riscv/kernel/asm-offsets.c:8:
    ./include/linux/kasan.h:50:37: error: ‘MAX_PTRS_PER_PTE’ undeclared here (not in a function); did you mean ‘PTRS_PER_PTE’?
       50 | extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
          |                                     ^~~~~~~~~~~~~~~~
          |                                     PTRS_PER_PTE
    ./include/linux/kasan.h:51:8: error: unknown type name ‘pmd_t’; did you mean ‘pgd_t’?
       51 | extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
          |        ^~~~~
          |        pgd_t
    ./include/linux/kasan.h:51:37: error: ‘MAX_PTRS_PER_PMD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’?
       51 | extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
          |                                     ^~~~~~~~~~~~~~~~
          |                                     PTRS_PER_PGD
    ./include/linux/kasan.h:52:8: error: unknown type name ‘pud_t’; did you mean ‘pgd_t’?
       52 | extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
          |        ^~~~~
          |        pgd_t
    ./include/linux/kasan.h:52:37: error: ‘MAX_PTRS_PER_PUD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’?
       52 | extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
          |                                     ^~~~~~~~~~~~~~~~
          |                                     PTRS_PER_PGD
    ./include/linux/kasan.h:53:8: error: unknown type name ‘p4d_t’; did you mean ‘pgd_t’?
       53 | extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
          |        ^~~~~
          |        pgd_t
    ./include/linux/kasan.h:53:37: error: ‘MAX_PTRS_PER_P4D’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’?
       53 | extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
          |                                     ^~~~~~~~~~~~~~~~
          |                                     PTRS_PER_PGD

Link: https://lore.kernel.org/r/20241126143250.29708-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Andreas Kemnade eba05e46f8 regulator: sy7636a: fix lifecycle of power good gpio
[ Upstream commit c05d0b32ee ]

Attach the power good gpio to the regulator device devres instead of the
parent device to fix problems if probe is run multiple times
(rmmod/insmod or some deferral).

Fixes: 8c485bedfb ("regulator: sy7636a: Initial commit")
Signed-off-by: Andreas Kemnade <akemnade@kernel.org>
Reviewed-by: Alistair Francis <alistair@alistair23.me>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
Message-ID: <20250906-sy7636-rsrc-v1-2-e2886a9763a7@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Anders Roxell 069fd1688c dmaengine: ti: edma: Fix memory allocation size for queue_priority_map
[ Upstream commit e63419dbf2 ]

Fix a critical memory allocation bug in edma_setup_from_hw() where
queue_priority_map was allocated with insufficient memory. The code
declared queue_priority_map as s8 (*)[2] (pointer to array of 2 s8),
but allocated memory using sizeof(s8) instead of the correct size.

This caused out-of-bounds memory writes when accessing:
  queue_priority_map[i][0] = i;
  queue_priority_map[i][1] = i;

The bug manifested as kernel crashes with "Oops - undefined instruction"
on ARM platforms (BeagleBoard-X15) during EDMA driver probe, as the
memory corruption triggered kernel hardening features on Clang.

Change the allocation to use sizeof(*queue_priority_map) which
automatically gets the correct size for the 2D array structure.

Fixes: 2b6b3b7420 ("ARM/dmaengine: edma: Merge the two drivers under drivers/dma/")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20250830094953.3038012-1-anders.roxell@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Dan Carpenter ec5430d090 dmaengine: idxd: Fix double free in idxd_setup_wqs()
[ Upstream commit 39aaa33744 ]

The clean up in idxd_setup_wqs() has had a couple bugs because the error
handling is a bit subtle.  It's simpler to just re-write it in a cleaner
way.  The issues here are:

1) If "idxd->max_wqs" is <= 0 then we call put_device(conf_dev) when
   "conf_dev" hasn't been initialized.
2) If kzalloc_node() fails then again "conf_dev" is invalid.  It's
   either uninitialized or it points to the "conf_dev" from the
   previous iteration so it leads to a double free.

It's better to free partial loop iterations within the loop and then
the unwinding at the end can handle whole loop iterations.  I also
renamed the labels to describe what the goto does and not where the goto
was located.

Fixes: 3fd2f4bc01 ("dmaengine: idxd: fix memory leak in error handling path of idxd_setup_wqs")
Reported-by: Colin Ian King <colin.i.king@gmail.com>
Closes: https://lore.kernel.org/all/20250811095836.1642093-1-colin.i.king@gmail.com/
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/aJnJW3iYTDDCj9sk@stanley.mountain
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Yi Sun ce81905bec dmaengine: idxd: Fix refcount underflow on module unload
[ Upstream commit b7cb9a0343 ]

A recent refactor introduced a misplaced put_device() call, resulting in a
reference count underflow during module unload.

There is no need to add additional put_device() calls for idxd groups,
engines, or workqueues. Although the commit claims: "Note, this also
fixes the missing put_device() for idxd groups, engines, and wqs."

It appears no such omission actually existed. The required cleanup is
already handled by the call chain:
idxd_unregister_devices() -> device_unregister() -> put_device()

Extend idxd_cleanup() to handle the remaining necessary cleanup and
remove idxd_cleanup_internals(), which duplicates deallocation logic
for idxd, engines, groups, and workqueues. Memory management is also
properly handled through the Linux device model.

Fixes: a409e919ca ("dmaengine: idxd: Refactor remove call with idxd_cleanup() helper")
Signed-off-by: Yi Sun <yi.sun@intel.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>

Link: https://lore.kernel.org/r/20250729150313.1934101-3-yi.sun@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Yi Sun dd7a7e4326 dmaengine: idxd: Remove improper idxd_free
[ Upstream commit f41c538881 ]

The call to idxd_free() introduces a duplicate put_device() leading to a
reference count underflow:
refcount_t: underflow; use-after-free.
WARNING: CPU: 15 PID: 4428 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
...
Call Trace:
 <TASK>
  idxd_remove+0xe4/0x120 [idxd]
  pci_device_remove+0x3f/0xb0
  device_release_driver_internal+0x197/0x200
  driver_detach+0x48/0x90
  bus_remove_driver+0x74/0xf0
  pci_unregister_driver+0x2e/0xb0
  idxd_exit_module+0x34/0x7a0 [idxd]
  __do_sys_delete_module.constprop.0+0x183/0x280
  do_syscall_64+0x54/0xd70
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

The idxd_unregister_devices() which is invoked at the very beginning of
idxd_remove(), already takes care of the necessary put_device() through the
following call path:
idxd_unregister_devices() -> device_unregister() -> put_device()

In addition, when CONFIG_DEBUG_KOBJECT_RELEASE is enabled, put_device() may
trigger asynchronous cleanup via schedule_delayed_work(). If idxd_free() is
called immediately after, it can result in a use-after-free.

Remove the improper idxd_free() to avoid both the refcount underflow and
potential memory corruption during module unload.

Fixes: d5449ff1b0 ("dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call")
Signed-off-by: Yi Sun <yi.sun@intel.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>

Link: https://lore.kernel.org/r/20250729150313.1934101-2-yi.sun@intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Pengyu Luo fcd4f1af12 phy: qualcomm: phy-qcom-eusb2-repeater: fix override properties
[ Upstream commit 942e47ab22 ]

property "qcom,tune-usb2-preem" is for EUSB2_TUNE_USB2_PREEM
property "qcom,tune-usb2-amplitude" is for EUSB2_TUNE_IUSB2

The downstream correspondence is as follows:
EUSB2_TUNE_USB2_PREEM: Tx pre-emphasis tuning
EUSB2_TUNE_IUSB2: HS trasmit amplitude
EUSB2_TUNE_SQUELCH_U: Squelch detection threshold
EUSB2_TUNE_HSDISC: HS disconnect threshold
EUSB2_TUNE_EUSB_SLEW: slew rate

Fixes: 31bc94de76 ("phy: qualcomm: phy-qcom-eusb2-repeater: Don't zero-out registers")
Signed-off-by: Pengyu Luo <mitltlatltl@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Reviewed-by: Luca Weiss <luca.weiss@fairphone.com>
Link: https://lore.kernel.org/r/20250812093957.32235-1-mitltlatltl@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Hangbin Liu dac341e357 hsr: use hsr_for_each_port_rtnl in hsr_port_get_hsr
[ Upstream commit 393c841fe4 ]

hsr_port_get_hsr() iterates over ports using hsr_for_each_port(),
but many of its callers do not hold the required RCU lock.

Switch to hsr_for_each_port_rtnl(), since most callers already hold
the rtnl lock. After review, all callers are covered by either the rtnl
lock or the RCU lock, except hsr_dev_xmit(). Fix this by adding an
RCU read lock there.

Fixes: c5a7591172 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250905091533.377443-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Hangbin Liu d04d9d1aea hsr: use rtnl lock when iterating over ports
[ Upstream commit 8884c69399 ]

hsr_for_each_port is called in many places without holding the RCU read
lock, this may trigger warnings on debug kernels. Most of the callers
are actually hold rtnl lock. So add a new helper hsr_for_each_port_rtnl
to allow callers in suitable contexts to iterate ports safely without
explicit RCU locking.

This patch only fixed the callers that is hold rtnl lock. Other caller
issues will be fixed in later patches.

Fixes: c5a7591172 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250905091533.377443-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Murali Karicheri c707d2c554 net: hsr: Add VLAN CTAG filter support
[ Upstream commit 1a8a63a530 ]

This patch adds support for VLAN ctag based filtering at slave devices.
The slave ethernet device may be capable of filtering ethernet packets
based on VLAN ID. This requires that when the VLAN interface is created
over an HSR/PRP interface, it passes the VID information to the
associated slave ethernet devices so that it updates the hardware
filters to filter ethernet frames based on VID. This patch adds the
required functions to propagate the vid information to the slave
devices.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20241106091710.3308519-3-danishanwar@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 8884c69399 ("hsr: use rtnl lock when iterating over ports")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Florian Westphal d74b49bb6b netfilter: nf_tables: restart set lookup on base_seq change
[ Upstream commit b2f742c846 ]

The hash, hash_fast, rhash and bitwise sets may indicate no result even
though a matching element exists during a short time window while other
cpu is finalizing the transaction.

This happens when the hash lookup/bitwise lookup function has picked up
the old genbit, right before it was toggled by nf_tables_commit(), but
then the same cpu managed to unlink the matching old element from the
hash table:

cpu0					cpu1
  has added new elements to clone
  has marked elements as being
  inactive in new generation
					perform lookup in the set
  enters commit phase:
					A) observes old genbit
   increments base_seq
I) increments the genbit
II) removes old element from the set
					B) finds matching element
					C) returns no match: found
					element is not valid in old
					generation

					Next lookup observes new genbit and
					finds matching e2.

Consider a packet matching element e1, e2.

cpu0 processes following transaction:
1. remove e1
2. adds e2, which has same key as e1.

P matches both e1 and e2.  Therefore, cpu1 should always find a match
for P. Due to above race, this is not the case:

cpu1 observed the old genbit.  e2 will not be considered once it is found.
The element e1 is not found anymore if cpu0 managed to unlink it from the
hlist before cpu1 found it during list traversal.

The situation only occurs for a brief time period, lookups happening
after I) observe new genbit and return e2.

This problem exists in all set types except nft_set_pipapo, so fix it once
in nft_lookup rather than each set ops individually.

Sample the base sequence counter, which gets incremented right before the
genbit is changed.

Then, if no match is found, retry the lookup if the base sequence was
altered in between.

If the base sequence hasn't changed:
 - No update took place: no-match result is expected.
   This is the common case.  or:
 - nf_tables_commit() hasn't progressed to genbit update yet.
   Old elements were still visible and nomatch result is expected, or:
 - nf_tables_commit updated the genbit:
   We picked up the new base_seq, so the lookup function also picked
   up the new genbit, no-match result is expected.

If the old genbit was observed, then nft_lookup also picked up the old
base_seq: nft_lookup_should_retry() returns true and relookup is performed
in the new generation.

This problem was added when the unconditional synchronize_rcu() call
that followed the current/next generation bit toggle was removed.

Thanks to Pablo Neira Ayuso for reviewing an earlier version of this
patchset, for suggesting re-use of existing base_seq and placement of
the restart loop in nft_set_do_lookup().

Fixes: 0cbc06b3fa ("netfilter: nf_tables: remove synchronize_rcu in commit phase")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Florian Westphal 4c34625f7d netfilter: nf_tables: make nft_set_do_lookup available unconditionally
[ Upstream commit 11fe5a82e5 ]

This function was added for retpoline mitigation and is replaced by a
static inline helper if mitigations are not enabled.

Enable this helper function unconditionally so next patch can add a lookup
restart mechanism to fix possible false negatives while transactions are
in progress.

Adding lookup restarts in nft_lookup_eval doesn't work as nft_objref would
then need the same copypaste loop.

This patch is separate to ease review of the actual bug fix.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Stable-dep-of: b2f742c846 ("netfilter: nf_tables: restart set lookup on base_seq change")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Florian Westphal 259c4e86d0 netfilter: nf_tables: place base_seq in struct net
[ Upstream commit 64102d9bbc ]

This will soon be read from packet path around same time as the gencursor.

Both gencursor and base_seq get incremented almost at the same time, so
it makes sense to place them in the same structure.

This doesn't increase struct net size on 64bit due to padding.

Signed-off-by: Florian Westphal <fw@strlen.de>
Stable-dep-of: b2f742c846 ("netfilter: nf_tables: restart set lookup on base_seq change")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:50 +02:00
Phil Sutter dbe85d3115 netfilter: nf_tables: Reintroduce shortened deletion notifications
[ Upstream commit a1050dd071 ]

Restore commit 28339b21a3 ("netfilter: nf_tables: do not send complete
notification of deletions") and fix it:

- Avoid upfront modification of 'event' variable so the conditionals
  become effective.
- Always include NFTA_OBJ_TYPE attribute in object notifications, user
  space requires it for proper deserialisation.
- Catch DESTROY events, too.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: b2f742c846 ("netfilter: nf_tables: restart set lookup on base_seq change")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Florian Westphal 9f1cc747c9 netfilter: nft_set_rbtree: continue traversal if element is inactive
[ Upstream commit a60f7bf4a1 ]

When the rbtree lookup function finds a match in the rbtree, it sets the
range start interval to a potentially inactive element.

Then, after tree lookup, if the matching element is inactive, it returns
NULL and suppresses a matching result.

This is wrong and leads to false negative matches when a transaction has
already entered the commit phase.

cpu0					cpu1
  has added new elements to clone
  has marked elements as being
  inactive in new generation
					perform lookup in the set
  enters commit phase:
I) increments the genbit
					A) observes new genbit
					B) finds matching range
					C) returns no match: found
					range invalid in new generation
II) removes old elements from the tree
					C New nft_lookup happening now
				       	  will find matching element,
					  because it is no longer
					  obscured by old, inactive one.

Consider a packet matching range r1-r2:

cpu0 processes following transaction:
1. remove r1-r2
2. add r1-r3

P is contained in both ranges. Therefore, cpu1 should always find a match
for P.  Due to above race, this is not the case:

cpu1 does find r1-r2, but then ignores it due to the genbit indicating
the range has been removed.  It does NOT test for further matches.

The situation persists for all lookups until after cpu0 hits II) after
which r1-r3 range start node is tested for the first time.

Move the "interval start is valid" check ahead so that tree traversal
continues if the starting interval is not valid in this generation.

Thanks to Stefan Hanreich for providing an initial reproducer for this
bug.

Reported-by: Stefan Hanreich <s.hanreich@proxmox.com>
Fixes: c1eda3c639 ("netfilter: nft_rbtree: ignore inactive matching element with no descendants")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Florian Westphal 6fe348e837 netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
[ Upstream commit c4eaca2e10 ]

The pipapo set type is special in that it has two copies of its
datastructure: one live copy containing only valid elements and one
on-demand clone used during transaction where adds/deletes happen.

This clone is not visible to the datapath.

This is unlike all other set types in nftables, those all link new
elements into their live hlist/tree.

For those sets, the lookup functions must skip the new elements while the
transaction is ongoing to ensure consistency.

As the clone is shallow, removal does have an effect on the packet path:
once the transaction enters the commit phase the 'gencursor' bit that
determines which elements are active and which elements should be ignored
(because they are no longer valid) is flipped.

This causes the datapath lookup to ignore these elements if they are found
during lookup.

This opens up a small race window where pipapo has an inconsistent view of
the dataset from when the transaction-cpu flipped the genbit until the
transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers:

cpu0					cpu1
  has added new elements to clone
  has marked elements as being
  inactive in new generation
					perform lookup in the set
  enters commit phase:

I) increments the genbit
					A) observes new genbit
  removes elements from the clone so
  they won't be found anymore
					B) lookup in datastructure
					   can't see new elements yet,
					   but old elements are ignored
					   -> Only matches elements that
					   were not changed in the
					   transaction
II) calls nft_pipapo_commit(), clone
    and live pointers are swapped.
					C New nft_lookup happening now
				       	  will find matching elements.

Consider a packet matching range r1-r2:

cpu0 processes following transaction:
1. remove r1-r2
2. add r1-r3

P is contained in both ranges. Therefore, cpu1 should always find a match
for P.  Due to above race, this is not the case:

cpu1 does find r1-r2, but then ignores it due to the genbit indicating
the range has been removed.

At the same time, r1-r3 is not visible yet, because it can only be found
in the clone.

The situation persists for all lookups until after cpu0 hits II).

The fix is easy: Don't check the genbit from pipapo lookup functions.
This is possible because unlike the other set types, the new elements are
not reachable from the live copy of the dataset.

The clone/live pointer swap is enough to avoid matching on old elements
while at the same time all new elements are exposed in one go.

After this change, step B above returns a match in r1-r2.
This is fine: r1-r2 only becomes truly invalid the moment they get freed.
This happens after a synchronize_rcu() call and rcu read lock is held
via netfilter hook traversal (nf_hook_slow()).

Cc: Stefano Brivio <sbrivio@redhat.com>
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Florian Westphal 42a02ba5be netfilter: nft_set_pipapo: don't return bogus extension pointer
[ Upstream commit c8a7c2c608 ]

Dan Carpenter says:
Commit 17a20e09f0 ("netfilter: nft_set: remove one argument from
lookup and update functions") [..] leads to the following Smatch
static checker warning:

 net/netfilter/nft_set_pipapo_avx2.c:1269 nft_pipapo_avx2_lookup()
 error: uninitialized symbol 'ext'.

Fix this by initing ext to NULL and set it only once we've found
a match.

Fixes: 17a20e09f0 ("netfilter: nft_set: remove one argument from lookup and update functions")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/netfilter-devel/aJBzc3V5wk-yPOnH@stanley.mountain/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: c4eaca2e10 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Florian Westphal 3a2d45819a netfilter: nft_set_pipapo: merge pipapo_get/lookup
[ Upstream commit d8d871a35c ]

The matching algorithm has implemented thrice:
1. data path lookup, generic version
2. data path lookup, avx2 version
3. control plane lookup

Merge 1 and 3 by refactoring pipapo_get as a common helper, then make
nft_pipapo_lookup and nft_pipapo_get both call the common helper.

Aside from the code savings this has the benefit that we no longer allocate
temporary scratch maps for each control plane get and insertion operation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: c4eaca2e10 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Florian Westphal 39ce3db914 netfilter: nft_set: remove one argument from lookup and update functions
[ Upstream commit 17a20e09f0 ]

Return the extension pointer instead of passing it as a function
argument to be filled in by the callee.

As-is, whenever false is returned, the extension pointer is not used.

For all set types, when true is returned, the extension pointer was set
to the matching element.

Only exception: nft_set_bitmap doesn't support extensions.
Return a pointer to a static const empty element extension container.

return false -> return NULL
return true -> return the elements' extension pointer.

This saves one function argument.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: c4eaca2e10 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Florian Westphal 6c110df7b9 netfilter: nft_set_pipapo: remove unused arguments
[ Upstream commit 7792c1e030 ]

They are not used anymore, so remove them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: c4eaca2e10 ("netfilter: nft_set_pipapo: don't check genbit from packetpath lookups")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Anssi Hannula 725b33deeb can: xilinx_can: xcan_write_frame(): fix use-after-free of transmitted SKB
[ Upstream commit ef79f00be7 ]

can_put_echo_skb() takes ownership of the SKB and it may be freed
during or after the call.

However, xilinx_can xcan_write_frame() keeps using SKB after the call.

Fix that by only calling can_put_echo_skb() after the code is done
touching the SKB.

The tx_lock is held for the entire xcan_write_frame() execution and
also on the can_get_echo_skb() side so the order of operations does not
matter.

An earlier fix commit 3d3c817c3a ("can: xilinx_can: Fix usage of skb
memory") did not move the can_put_echo_skb() call far enough.

Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Fixes: 1598efe57b ("can: xilinx_can: refactor code in preparation for CAN FD support")
Link: https://patch.msgid.link/20250822095002.168389-1-anssi.hannula@bitwise.fi
[mkl: add "commit" in front of sha1 in patch description]
[mkl: fix indention]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Tetsuo Handa a6d84e51ab can: j1939: j1939_local_ecu_get(): undo increment when j1939_local_ecu_get() fails
[ Upstream commit 06e02da29f ]

Since j1939_sk_bind() and j1939_sk_release() call j1939_local_ecu_put()
when J1939_SOCK_BOUND was already set, but the error handling path for
j1939_sk_bind() will not set J1939_SOCK_BOUND when j1939_local_ecu_get()
fails, j1939_local_ecu_get() needs to undo priv->ents[sa].nusers++ when
j1939_local_ecu_get() returns an error.

Fixes: 9d71dd0c70 ("can: add support of SAE J1939 protocol")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/e7f80046-4ff7-4ce2-8ad8-7c3c678a42c9@I-love.SAKURA.ne.jp
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Tetsuo Handa 1ca9748ee5 can: j1939: j1939_sk_bind(): call j1939_priv_put() immediately when j1939_local_ecu_get() failed
[ Upstream commit f214744c8a ]

Commit 25fe97cb76 ("can: j1939: move j1939_priv_put() into sk_destruct
callback") expects that a call to j1939_priv_put() can be unconditionally
delayed until j1939_sk_sock_destruct() is called. But a refcount leak will
happen when j1939_sk_bind() is called again after j1939_local_ecu_get()
 from previous j1939_sk_bind() call returned an error. We need to call
j1939_priv_put() before j1939_sk_bind() returns an error.

Fixes: 25fe97cb76 ("can: j1939: move j1939_priv_put() into sk_destruct callback")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
Acked-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/4f49a1bc-a528-42ad-86c0-187268ab6535@I-love.SAKURA.ne.jp
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Alex Deucher 03653847b6 drm/amd/display: use udelay rather than fsleep
[ Upstream commit 1d66c3f2b8 ]

This function can be called from an atomic context so we can't use
fsleep().

Fixes: 01f60348d8 ("drm/amd/display: Fix 'failed to blank crtc!'")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4549
Cc: Wen Chen <Wen.Chen3@amd.com>
Cc: Fangzhi Zuo <jerry.zuo@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 27e4dc2c0543fd1808cc52bd888ee1e0533c4a2e)
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Michal Schmidt a30afd6617 i40e: fix IRQ freeing in i40e_vsi_request_irq_msix error path
[ Upstream commit 915470e1b4 ]

If request_irq() in i40e_vsi_request_irq_msix() fails in an iteration
later than the first, the error path wants to free the IRQs requested
so far. However, it uses the wrong dev_id argument for free_irq(), so
it does not free the IRQs correctly and instead triggers the warning:

 Trying to free already-free IRQ 173
 WARNING: CPU: 25 PID: 1091 at kernel/irq/manage.c:1829 __free_irq+0x192/0x2c0
 Modules linked in: i40e(+) [...]
 CPU: 25 UID: 0 PID: 1091 Comm: NetworkManager Not tainted 6.17.0-rc1+ #1 PREEMPT(lazy)
 Hardware name: [...]
 RIP: 0010:__free_irq+0x192/0x2c0
 [...]
 Call Trace:
  <TASK>
  free_irq+0x32/0x70
  i40e_vsi_request_irq_msix.cold+0x63/0x8b [i40e]
  i40e_vsi_request_irq+0x79/0x80 [i40e]
  i40e_vsi_open+0x21f/0x2f0 [i40e]
  i40e_open+0x63/0x130 [i40e]
  __dev_open+0xfc/0x210
  __dev_change_flags+0x1fc/0x240
  netif_change_flags+0x27/0x70
  do_setlink.isra.0+0x341/0xc70
  rtnl_newlink+0x468/0x860
  rtnetlink_rcv_msg+0x375/0x450
  netlink_rcv_skb+0x5c/0x110
  netlink_unicast+0x288/0x3c0
  netlink_sendmsg+0x20d/0x430
  ____sys_sendmsg+0x3a2/0x3d0
  ___sys_sendmsg+0x99/0xe0
  __sys_sendmsg+0x8a/0xf0
  do_syscall_64+0x82/0x2c0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  [...]
  </TASK>
 ---[ end trace 0000000000000000 ]---

Use the same dev_id for free_irq() as for request_irq().

I tested this with inserting code to fail intentionally.

Fixes: 493fb30011 ("i40e: Move q_vectors from pointer to array to array of pointers")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Kohei Enju 21a3cd01ca igb: fix link test skipping when interface is admin down
[ Upstream commit d709f178ab ]

The igb driver incorrectly skips the link test when the network
interface is admin down (if_running == false), causing the test to
always report PASS regardless of the actual physical link state.

This behavior is inconsistent with other drivers (e.g. i40e, ice, ixgbe,
etc.) which correctly test the physical link state regardless of admin
state.
Remove the if_running check to ensure link test always reflects the
physical link state.

Fixes: 8d420a1b3e ("igb: correct link test not being run when link is down")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Alex Tran 2935d8230e docs: networking: can: change bcm_msg_head frames member to support flexible array
[ Upstream commit 641427d5bf ]

The documentation of the 'bcm_msg_head' struct does not match how
it is defined in 'bcm.h'. Changed the frames member to a flexible array,
matching the definition in the header file.

See commit 94dfc73e7c ("treewide: uapi: Replace zero-length arrays with
flexible-array members")

Signed-off-by: Alex Tran <alex.t.tran@gmail.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20250904031709.1426895-1-alex.t.tran@gmail.com
Fixes: 94dfc73e7c ("treewide: uapi: Replace zero-length arrays with flexible-array members")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217783
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:49 +02:00
Antoine Tenart badc803b8a tunnels: reset the GSO metadata before reusing the skb
[ Upstream commit e3c674db35 ]

If a GSO skb is sent through a Geneve tunnel and if Geneve options are
added, the split GSO skb might not fit in the MTU anymore and an ICMP
frag needed packet can be generated. In such case the ICMP packet might
go through the segmentation logic (and dropped) later if it reaches a
path were the GSO status is checked and segmentation is required.

This is especially true when an OvS bridge is used with a Geneve tunnel
attached to it. The following set of actions could lead to the ICMP
packet being wrongfully segmented:

1. An skb is constructed by the TCP layer (e.g. gso_type SKB_GSO_TCPV4,
   segs >= 2).

2. The skb hits the OvS bridge where Geneve options are added by an OvS
   action before being sent through the tunnel.

3. When the skb is xmited in the tunnel, the split skb does not fit
   anymore in the MTU and iptunnel_pmtud_build_icmp is called to
   generate an ICMP fragmentation needed packet. This is done by reusing
   the original (GSO!) skb. The GSO metadata is not cleared.

4. The ICMP packet being sent back hits the OvS bridge again and because
   skb_is_gso returns true, it goes through queue_gso_packets...

5. ...where __skb_gso_segment is called. The skb is then dropped.

6. Note that in the above example on re-transmission the skb won't be a
   GSO one as it would be segmented (len > MSS) and the ICMP packet
   should go through.

Fix this by resetting the GSO information before reusing an skb in
iptunnel_pmtud_build_icmp and iptunnel_pmtud_build_icmpv6.

Fixes: 4cb47a8644 ("tunnels: PMTU discovery support for directly bridged IP packets")
Reported-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Link: https://patch.msgid.link/20250904125351.159740-1-atenart@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:48 +02:00
Petr Machata 40600cddf4 net: bridge: Bounce invalid boolopts
[ Upstream commit 8625f5748f ]

The bridge driver currently tolerates options that it does not recognize.
Instead, it should bounce them.

Fixes: a428afe82f ("net: bridge: add support for user-controlled bool options")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/e6fdca3b5a8d54183fbda075daffef38bdd7ddce.1757070067.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:48 +02:00
Alok Tiwari 98c9d88404 genetlink: fix genl_bind() invoking bind() after -EPERM
[ Upstream commit 1dbfb03632 ]

Per family bind/unbind callbacks were introduced to allow families
to track multicast group consumer presence, e.g. to start or stop
producing events depending on listeners.

However, in genl_bind() the bind() callback was invoked even if
capability checks failed and ret was set to -EPERM. This means that
callbacks could run on behalf of unauthorized callers while the
syscall still returned failure to user space.

Fix this by only invoking bind() after "if (ret) break;" check
i.e. after permission checks have succeeded.

Fixes: 3de21a8990 ("genetlink: Add per family bind/unbind callbacks")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/20250905135731.3026965-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:48 +02:00
Stefan Wahren 4fe53aaa42 net: fec: Fix possible NPD in fec_enet_phy_reset_after_clk_enable()
[ Upstream commit 03e79de460 ]

The function of_phy_find_device may return NULL, so we need to take
care before dereferencing phy_dev.

Fixes: 64a632da53 ("net: fec: Fix phy_device lookup for phy_reset_after_clk_enable()")
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Cc: Christoph Niedermaier <cniedermaier@dh-electronics.com>
Cc: Richard Leitner <richard.leitner@skidata.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250904091334.53965-1-wahrenst@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:48 +02:00
Chia-I Wu a506ffe193 drm/panthor: validate group queue count
[ Upstream commit a00f2015ac ]

A panthor group can have at most MAX_CS_PER_CSG panthor queues.

Fixes: 4bdca11507 ("drm/panthor: Add the driver frontend block")
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> # v1
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20250903192133.288477-1-olvaffe@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:48 +02:00
Linus Torvalds e7639cf1e6 Disable SLUB_TINY for build testing
[ Upstream commit 6f110a5e4f ]

... and don't error out so hard on missing module descriptions.

Before commit 6c6c1fc09d ("modpost: require a MODULE_DESCRIPTION()")
we used to warn about missing module descriptions, but only when
building with extra warnigns (ie 'W=1').

After that commit the warning became an unconditional hard error.

And it turns out not all modules have been converted despite the claims
to the contrary.  As reported by Damian Tometzki, the slub KUnit test
didn't have a module description, and apparently nobody ever really
noticed.

The reason nobody noticed seems to be that the slub KUnit tests get
disabled by SLUB_TINY, which also ends up disabling a lot of other code,
both in tests and in slub itself.  And so anybody doing full build tests
didn't actually see this failre.

So let's disable SLUB_TINY for build-only tests, since it clearly ends
up limiting build coverage.  Also turn the missing module descriptions
error back into a warning, but let's keep it around for non-'W=1'
builds.

Reported-by: Damian Tometzki <damian@riscv-rocks.de>
Link: https://lore.kernel.org/all/01070196099fd059-e8463438-7b1b-4ec8-816d-173874be9966-000000@eu-central-1.amazonses.com/
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Fixes: 6c6c1fc09d ("modpost: require a MODULE_DESCRIPTION()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:48 +02:00
Fabio Porcedda b15c4bffdc USB: serial: option: add Telit Cinterion LE910C4-WWX new compositions
commit a5a261bea9 upstream.

Add the following Telit Cinterion LE910C4-WWX new compositions:

0x1034: tty (AT) + tty (AT) + rmnet
T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  8 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=1034 Rev=00.00
S:  Manufacturer=Telit
S:  Product=LE910C4-WWX
S:  SerialNumber=93f617e7
C:  #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

0x1036: tty (AT) + tty (AT)
T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 10 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=1036 Rev=00.00
S:  Manufacturer=Telit
S:  Product=LE910C4-WWX
S:  SerialNumber=93f617e7
C:  #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

0x1037: tty (diag) + tty (Telit custom) + tty (AT) + tty (AT) + rmnet
T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 15 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=1037 Rev=00.00
S:  Manufacturer=Telit
S:  Product=LE910C4-WWX
S:  SerialNumber=93f617e7
C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

0x1038: tty (Telit custom) + tty (AT) + tty (AT) + rmnet
T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  9 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=1038 Rev=00.00
S:  Manufacturer=Telit
S:  Product=LE910C4-WWX
S:  SerialNumber=93f617e7
C:  #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

0x103b: tty (diag) + tty (Telit custom) + tty (AT) + tty (AT)
T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 10 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=103b Rev=00.00
S:  Manufacturer=Telit
S:  Product=LE910C4-WWX
S:  SerialNumber=93f617e7
C:  #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

0x103c: tty (Telit custom) + tty (AT) + tty (AT)
T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 11 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=103c Rev=00.00
S:  Manufacturer=Telit
S:  Product=LE910C4-WWX
S:  SerialNumber=93f617e7
C:  #Ifs= 3 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=84(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

Cc: stable@vger.kernel.org
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:48 +02:00
Fabio Porcedda 18bae1d492 USB: serial: option: add Telit Cinterion FN990A w/audio compositions
commit cba70aff62 upstream.

Add the following Telit Cinterion FN990A w/audio compositions:

0x1077: tty (diag) + adb + rmnet + audio + tty (AT/NMEA) + tty (AT) +
tty (AT) + tty (AT)
T:  Bus=01 Lev=01 Prnt=01 Port=09 Cnt=01 Dev#=  8 Spd=480 MxCh= 0
D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=1077 Rev=05.04
S:  Manufacturer=Telit Wireless Solutions
S:  Product=FN990
S:  SerialNumber=67e04c35
C:  #Ifs=10 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
E:  Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 3 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio
I:  If#= 4 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
E:  Ad=03(O) Atr=0d(Isoc) MxPS=  68 Ivl=1ms
I:  If#= 5 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
E:  Ad=84(I) Atr=0d(Isoc) MxPS=  68 Ivl=1ms
I:  If#= 6 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=88(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8a(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 9 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8c(I) Atr=03(Int.) MxPS=  10 Ivl=32ms

0x1078: tty (diag) + adb + MBIM + audio + tty (AT/NMEA) + tty (AT) +
tty (AT) + tty (AT)
T:  Bus=01 Lev=01 Prnt=01 Port=09 Cnt=01 Dev#= 21 Spd=480 MxCh= 0
D:  Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=1078 Rev=05.04
S:  Manufacturer=Telit Wireless Solutions
S:  Product=FN990
S:  SerialNumber=67e04c35
C:  #Ifs=11 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#=10 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8c(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 2 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim
E:  Ad=83(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
I:  If#= 3 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
E:  Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 4 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio
I:  If#= 5 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
I:  If#= 6 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
E:  Ad=84(I) Atr=0d(Isoc) MxPS=  68 Ivl=1ms
I:  If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=88(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 9 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8a(I) Atr=03(Int.) MxPS=  10 Ivl=32ms

0x1079: RNDIS + tty (diag) + adb + audio + tty (AT/NMEA) + tty (AT) +
tty (AT) + tty (AT)
T:  Bus=01 Lev=01 Prnt=01 Port=09 Cnt=01 Dev#= 23 Spd=480 MxCh= 0
D:  Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=1bc7 ProdID=1079 Rev=05.04
S:  Manufacturer=Telit Wireless Solutions
S:  Product=FN990
S:  SerialNumber=67e04c35
C:  #Ifs=11 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=ef(misc ) Sub=04 Prot=01 Driver=rndis_host
E:  Ad=81(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
I:  If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host
E:  Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#=10 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8b(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8c(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:  If#= 4 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio
I:  If#= 5 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
I:  If#= 6 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
E:  Ad=84(I) Atr=0d(Isoc) MxPS=  68 Ivl=1ms
I:  If#= 7 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=60 Driver=option
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=88(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
I:  If#= 9 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
E:  Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=8a(I) Atr=03(Int.) MxPS=  10 Ivl=32ms

Cc: stable@vger.kernel.org
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:48 +02:00
Krzysztof Kozlowski fee858fa03 dt-bindings: serial: brcm,bcm7271-uart: Constrain clocks
commit ee047e1d85 upstream.

Lists should have fixed constraints, because binding must be specific in
respect to hardware, thus add missing constraints to number of clocks.

Cc: stable <stable@kernel.org>
Fixes: 88a499cd70 ("dt-bindings: Add support for the Broadcom UART driver")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20250812121630.67072-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:48 +02:00
Hugo Villeneuve d02bb770ec serial: sc16is7xx: fix bug in flow control levels init
commit 535fd4c984 upstream.

When trying to set MCR[2], XON1 is incorrectly accessed instead. And when
writing to the TCR register to configure flow control levels, we are
incorrectly writing to the MSR register. The default value of $00 is then
used for TCR, which means that selectable trigger levels in FCR are used
in place of TCR.

TCR/TLR access requires EFR[4] (enable enhanced functions) and MCR[2]
to be set. EFR[4] is already set in probe().

MCR access requires LCR[7] to be zero.

Since LCR is set to $BF when trying to set MCR[2], XON1 is incorrectly
accessed instead because MCR shares the same address space as XON1.

Since MCR[2] is unmodified and still zero, when writing to TCR we are in
fact writing to MSR because TCR/TLR registers share the same address space
as MSR/SPR.

Fix by first removing useless reconfiguration of EFR[4] (enable enhanced
functions), as it is already enabled in sc16is7xx_probe() since commit
43c51bb573 ("sc16is7xx: make sure device is in suspend once probed").
Now LCR is $00, which means that MCR access is enabled.

Also remove regcache_cache_bypass() calls since we no longer access the
enhanced registers set, and TCR is already declared as volatile (in fact
by declaring MSR as volatile, which shares the same address).

Finally disable access to TCR/TLR registers after modifying them by
clearing MCR[2].

Note: the comment about "... and internal clock div" is wrong and can be
      ignored/removed as access to internal clock div registers (DLL/DLH)
      is permitted only when LCR[7] is logic 1, not when enhanced features
      is enabled. And DLL/DLH access is not needed in sc16is7xx_startup().

Fixes: dfeae619d7 ("serial: sc16is7xx")
Cc: stable@vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve@dimonoff.com>
Link: https://lore.kernel.org/r/20250731124451.1108864-1-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:48 +02:00
Fabian Vogt 32864297aa tty: hvc_console: Call hvc_kick in hvc_write unconditionally
commit cfd956dcb1 upstream.

After hvc_write completes, call hvc_kick also in the case the output
buffer has been drained, to ensure tty_wakeup gets called.

This fixes that functions which wait for a drained buffer got stuck
occasionally.

Cc: stable <stable@kernel.org>
Closes: https://bugzilla.opensuse.org/show_bug.cgi?id=1230062
Signed-off-by: Fabian Vogt <fvogt@suse.de>
Link: https://lore.kernel.org/r/2011735.PYKUYFuaPT@fvogt-thinkpad
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:48 +02:00
Paolo Abeni 8a0e676dc5 Revert "net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups"
commit 63a796558b upstream.

This reverts commit 5537a46794 ("net: usb: asix: ax88772: drop
phylink use in PM to avoid MDIO runtime PM wakeups"), it breaks
operation of asix ethernet usb dongle after system suspend-resume
cycle.

Link: https://lore.kernel.org/all/b5ea8296-f981-445d-a09a-2f389d7f6fdd@samsung.com/
Fixes: 5537a46794 ("net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/2945b9dbadb8ee1fee058b19554a5cb14f1763c1.1757601118.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:48 +02:00
Christoffer Sandberg 7158588efd Input: i8042 - add TUXEDO InfinityBook Pro Gen10 AMD to i8042 quirk table
commit 1939a9fcb8 upstream.

Occasionally wakes up from suspend with missing input on the internal
keyboard. Setting the quirks appears to fix the issue for this device as
well.

Signed-off-by: Christoffer Sandberg <cs@tuxedo.de>
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250826142646.13516-1-wse@tuxedocomputers.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:48 +02:00
Jeff LaBundy f80c46c5fb Input: iqs7222 - avoid enabling unused interrupts
commit c9ddc41cdd upstream.

If a proximity event node is defined so as to specify the wake-up
properties of the touch surface, the proximity event interrupt is
enabled unconditionally. This may result in unwanted interrupts.

Solve this problem by enabling the interrupt only if the event is
mapped to a key or switch code.

Signed-off-by: Jeff LaBundy <jeff@labundy.com>
Link: https://lore.kernel.org/r/aKJxxgEWpNaNcUaW@nixie71
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:48 +02:00
Xiongfeng Wang 51d7f652b3 hrtimers: Unconditionally update target CPU base after offline timer migration
commit e895f8e291 upstream.

When testing softirq based hrtimers on an ARM32 board, with high resolution
mode and NOHZ inactive, softirq based hrtimers fail to expire after being
moved away from an offline CPU:

CPU0				CPU1
				hrtimer_start(..., HRTIMER_MODE_SOFT);
cpu_down(CPU1)			...
				hrtimers_cpu_dying()
				  // Migrate timers to CPU0
				  smp_call_function_single(CPU0, returgger_next_event);
  retrigger_next_event()
    if (!highres && !nohz)
        return;

As retrigger_next_event() is a NOOP when both high resolution timers and
NOHZ are inactive CPU0's hrtimer_cpu_base::softirq_expires_next is not
updated and the migrated softirq timers never expire unless there is a
softirq based hrtimer queued on CPU0 later.

Fix this by removing the hrtimer_hres_active() and tick_nohz_active() check
in retrigger_next_event(), which enforces a full update of the CPU base.
As this is not a fast path the extra cost does not matter.

[ tglx: Massaged change log ]

Fixes: 5c0930ccaa ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250805081025.54235-1-wangxiongfeng2@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Qu Wenruo 2dd4679961 btrfs: fix corruption reading compressed range when block size is smaller than page size
[ Upstream commit 9786531399 ]

[BUG]
With 64K page size (aarch64 with 64K page size config) and 4K btrfs
block size, the following workload can easily lead to a corrupted read:

        mkfs.btrfs -f -s 4k $dev > /dev/null
        mount -o compress $dev $mnt
        xfs_io -f -c "pwrite -S 0xff 0 64k" $mnt/base > /dev/null
	echo "correct result:"
        od -Ad -t x1 $mnt/base
        xfs_io -f -c "reflink $mnt/base 32k 0 32k" \
		  -c "reflink $mnt/base 0 32k 32k" \
		  -c "pwrite -S 0xff 60k 4k" $mnt/new > /dev/null
	echo "incorrect result:"
        od -Ad -t x1 $mnt/new
        umount $mnt

This shows the following result:

correct result:
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0065536
incorrect result:
0000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0032768 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0061440 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
*
0065536

Notice the zero in the range [32K, 60K), which is incorrect.

[CAUSE]
With extra trace printk, it shows the following events during od:
(some unrelated info removed like CPU and context)

 od-3457   btrfs_do_readpage: enter r/i=5/258 folio=0(65536) prev_em_start=0000000000000000

The "r/i" is indicating the root and inode number. In our case the file
"new" is using ino 258 from fs tree (root 5).

Here notice the @prev_em_start pointer is NULL. This means the
btrfs_do_readpage() is called from btrfs_read_folio(), not from
btrfs_readahead().

 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=0 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=4096 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=8192 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=12288 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=16384 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=20480 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=24576 got em start=0 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=28672 got em start=0 len=32768

These above 32K blocks will be read from the first half of the
compressed data extent.

 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=32768 got em start=32768 len=32768

Note here there is no btrfs_submit_compressed_read() call. Which is
incorrect now.
Although both extent maps at 0 and 32K are pointing to the same compressed
data, their offsets are different thus can not be merged into the same
read.

So this means the compressed data read merge check is doing something
wrong.

 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=36864 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=40960 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=45056 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=49152 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=53248 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=57344 got em start=32768 len=32768
 od-3457   btrfs_do_readpage: r/i=5/258 folio=0(65536) cur=61440 skip uptodate
 od-3457   btrfs_submit_compressed_read: cb orig_bio: file off=0 len=61440

The function btrfs_submit_compressed_read() is only called at the end of
folio read. The compressed bio will only have an extent map of range [0,
32K), but the original bio passed in is for the whole 64K folio.

This will cause the decompression part to only fill the first 32K,
leaving the rest untouched (aka, filled with zero).

This incorrect compressed read merge leads to the above data corruption.

There were similar problems that happened in the past, commit 808f80b467
("Btrfs: update fix for read corruption of compressed and shared
extents") is doing pretty much the same fix for readahead.

But that's back to 2015, where btrfs still only supports bs (block size)
== ps (page size) cases.
This means btrfs_do_readpage() only needs to handle a folio which
contains exactly one block.

Only btrfs_readahead() can lead to a read covering multiple blocks.
Thus only btrfs_readahead() passes a non-NULL @prev_em_start pointer.

With v5.15 kernel btrfs introduced bs < ps support. This breaks the above
assumption that a folio can only contain one block.

Now btrfs_read_folio() can also read multiple blocks in one go.
But btrfs_read_folio() doesn't pass a @prev_em_start pointer, thus the
existing bio force submission check will never be triggered.

In theory, this can also happen for btrfs with large folios, but since
large folio is still experimental, we don't need to bother it, thus only
bs < ps support is affected for now.

[FIX]
Instead of passing @prev_em_start to do the proper compressed extent
check, introduce one new member, btrfs_bio_ctrl::last_em_start, so that
the existing bio force submission logic will always be triggered.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Boris Burkov 7cd3bc42ad btrfs: use readahead_expand() on compressed extents
[ Upstream commit 9e9ff875e4 ]

We recently received a report of poor performance doing sequential
buffered reads of a file with compressed extents. With bs=128k, a naive
sequential dd ran as fast on a compressed file as on an uncompressed
(1.2GB/s on my reproducing system) while with bs<32k, this performance
tanked down to ~300MB/s.

i.e., slow:

  dd if=some-compressed-file of=/dev/null bs=4k count=X

vs fast:

  dd if=some-compressed-file of=/dev/null bs=128k count=Y

The cause of this slowness is overhead to do with looking up extent_maps
to enable readahead pre-caching on compressed extents
(add_ra_bio_pages()), as well as some overhead in the generic VFS
readahead code we hit more in the slow case. Notably, the main
difference between the two read sizes is that in the large sized request
case, we call btrfs_readahead() relatively rarely while in the smaller
request we call it for every compressed extent. So the fast case stays
in the btrfs readahead loop:

    while ((folio = readahead_folio(rac)) != NULL)
	    btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start);

where the slower one breaks out of that loop every time. This results in
calling add_ra_bio_pages a lot, doing lots of extent_map lookups,
extent_map locking, etc.

This happens because although add_ra_bio_pages() does add the
appropriate un-compressed file pages to the cache, it does not
communicate back to the ractl in any way. To solve this, we should be
using readahead_expand() to signal to readahead to expand the readahead
window.

This change passes the readahead_control into the btrfs_bio_ctrl and in
the case of compressed reads sets the expansion to the size of the
extent_map we already looked up anyway. It skips the subpage case as
that one already doesn't do add_ra_bio_pages().

With this change, whether we use bs=4k or bs=128k, btrfs expands the
readahead window up to the largest compressed extent we have seen so far
(in the trivial example: 128k) and the call stacks of the two modes look
identical. Notably, we barely call add_ra_bio_pages at all. And the
performance becomes identical as well. So this change certainly "fixes"
this performance problem.

Of course, it does seem to beg a few questions:

1. Will this waste too much page cache with a too large ra window?
2. Will this somehow cause bugs prevented by the more thoughtful
   checking in add_ra_bio_pages?
3. Should we delete add_ra_bio_pages?

My stabs at some answers:

1. Hard to say. See attempts at generic performance testing below. Is
   there a "readahead_shrink" we should be using? Should we expand more
   slowly, by half the remaining em size each time?
2. I don't think so. Since the new behavior is indistinguishable from
   reading the file with a larger read size passed in, I don't see why
   one would be safe but not the other.
3. Probably! I tested that and it was fine in fstests, and it seems like
   the pages would get re-used just as well in the readahead case.
   However, it is possible some reads that use page cache but not
   btrfs_readahead() could suffer. I will investigate this further as a
   follow up.

I tested the performance implications of this change in 3 ways (using
compress-force=zstd:3 for compression):

Directly test the affected workload of small sequential reads on a
compressed file (improved from ~250MB/s to ~1.2GB/s)

==========for-next==========
  dd /mnt/lol/non-cmpr 4k
  1048576+0 records in
  1048576+0 records out
  4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.02983 s, 712 MB/s
  dd /mnt/lol/non-cmpr 128k
  32768+0 records in
  32768+0 records out
  4294967296 bytes (4.3 GB, 4.0 GiB) copied, 5.92403 s, 725 MB/s
  dd /mnt/lol/cmpr 4k
  1048576+0 records in
  1048576+0 records out
  4294967296 bytes (4.3 GB, 4.0 GiB) copied, 17.8832 s, 240 MB/s
  dd /mnt/lol/cmpr 128k
  32768+0 records in
  32768+0 records out
  4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.71001 s, 1.2 GB/s

==========ra-expand==========
  dd /mnt/lol/non-cmpr 4k
  1048576+0 records in
  1048576+0 records out
  4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.09001 s, 705 MB/s
  dd /mnt/lol/non-cmpr 128k
  32768+0 records in
  32768+0 records out
  4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.07664 s, 707 MB/s
  dd /mnt/lol/cmpr 4k
  1048576+0 records in
  1048576+0 records out
  4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.79531 s, 1.1 GB/s
  dd /mnt/lol/cmpr 128k
  32768+0 records in
  32768+0 records out
  4294967296 bytes (4.3 GB, 4.0 GiB) copied, 3.69533 s, 1.2 GB/s

Built the linux kernel from clean (no change)

Ran fsperf. Mostly neutral results with some improvements and
regressions here and there.

Reported-by: Dimitrios Apostolou <jimis@gmx.net>
Link: https://lore.kernel.org/linux-btrfs/34601559-6c16-6ccc-1793-20a97ca0dbba@gmx.net/
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
[ Assert doesn't take a format string ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Santhosh Kumar K 044ba8d238 mtd: spinand: winbond: Fix oob_layout for W25N01JW
[ Upstream commit 4550d33e18 ]

Fix the W25N01JW's oob_layout according to the datasheet [1]

[1] https://www.winbond.com/hq/product/code-storage-flash-memory/qspinand-flash/?__locale=en&partNo=W25N01JW

Fixes: 6a804fb72d ("mtd: spinand: winbond: add support for serial NAND flash")
Cc: Sridharan S N <quic_sridsn@quicinc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Santhosh Kumar K <s-k6@ti.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Jeongjun Park deccd93ae1 mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()
[ Upstream commit 21cc2b5c50 ]

When restoring a reservation for an anonymous page, we need to check to
freeing a surplus.  However, __unmap_hugepage_range() causes data race
because it reads h->surplus_huge_pages without the protection of
hugetlb_lock.

And adjust_reservation is a boolean variable that indicates whether
reservations for anonymous pages in each folio should be restored.
Therefore, it should be initialized to false for each round of the loop.
However, this variable is not initialized to false except when defining
the current adjust_reservation variable.

This means that once adjust_reservation is set to true even once within
the loop, reservations for anonymous pages will be restored
unconditionally in all subsequent rounds, regardless of the folio's state.

To fix this, we need to add the missing hugetlb_lock, unlock the
page_table_lock earlier so that we don't lock the hugetlb_lock inside the
page_table_lock lock, and initialize adjust_reservation to false on each
round within the loop.

Link: https://lkml.kernel.org/r/20250823182115.1193563-1-aha310510@gmail.com
Fixes: df7a6d1f64 ("mm/hugetlb: restore the reservation if needed")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
Reported-by: syzbot+417aeb05fd190f3a6da9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=417aeb05fd190f3a6da9
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Page vs folio differences ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Quanmin Yan 5d6eeb3c68 mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters()
commit e6b543ca98 upstream.

When creating a new scheme of DAMON_RECLAIM, the calculation of
'min_age_region' uses 'aggr_interval' as the divisor, which may lead to
division-by-zero errors.  Fix it by directly returning -EINVAL when such a
case occurs.

Link: https://lkml.kernel.org/r/20250827115858.1186261-3-yanquanmin1@huawei.com
Fixes: f5a79d7c0c ("mm/damon: introduce struct damos_access_pattern")
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: ze zuo <zuoze1@huawei.com>
Cc: <stable@vger.kernel.org>	[6.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Stanislav Fort 26d29b2ac8 mm/damon/sysfs: fix use-after-free in state_show()
commit 3260a3f082 upstream.

state_show() reads kdamond->damon_ctx without holding damon_sysfs_lock.
This allows a use-after-free race:

CPU 0                         CPU 1
-----                         -----
state_show()                  damon_sysfs_turn_damon_on()
ctx = kdamond->damon_ctx;     mutex_lock(&damon_sysfs_lock);
                              damon_destroy_ctx(kdamond->damon_ctx);
                              kdamond->damon_ctx = NULL;
                              mutex_unlock(&damon_sysfs_lock);
damon_is_running(ctx);        /* ctx is freed */
mutex_lock(&ctx->kdamond_lock); /* UAF */

(The race can also occur with damon_sysfs_kdamonds_rm_dirs() and
damon_sysfs_kdamond_release(), which free or replace the context under
damon_sysfs_lock.)

Fix by taking damon_sysfs_lock before dereferencing the context, mirroring
the locking used in pid_show().

The bug has existed since state_show() first accessed kdamond->damon_ctx.

Link: https://lkml.kernel.org/r/20250905101046.2288-1-disclosure@aisle.com
Fixes: a61ea561c8 ("mm/damon/sysfs: link DAMON for virtual address spaces monitoring")
Signed-off-by: Stanislav Fort <disclosure@aisle.com>
Reported-by: Stanislav Fort <disclosure@aisle.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Alex Markuze 305935130d ceph: fix race condition where r_parent becomes stale before sending message
commit bec324f33d upstream.

When the parent directory's i_rwsem is not locked, req->r_parent may become
stale due to concurrent operations (e.g. rename) between dentry lookup and
message creation. Validate that r_parent matches the encoded parent inode
and update to the correct inode if a mismatch is detected.

[ idryomov: folded a follow-up fix from Alex to drop extra reference
  from ceph_get_reply_dir() in ceph_fill_trace():

  ceph_get_reply_dir() may return a different, referenced inode when
  r_parent is stale and the parent directory lock is not held.
  ceph_fill_trace() used that inode but failed to drop the reference
  when it differed from req->r_parent, leaking an inode reference.

  Keep the directory inode in a local variable and iput() it at
  function end if it does not match req->r_parent. ]

Cc: stable@vger.kernel.org
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Alex Markuze db378e6f83 ceph: fix race condition validating r_parent before applying state
commit 15f519e9f8 upstream.

Add validation to ensure the cached parent directory inode matches the
directory info in MDS replies. This prevents client-side race conditions
where concurrent operations (e.g. rename) cause r_parent to become stale
between request initiation and reply processing, which could lead to
applying state changes to incorrect directory inodes.

[ idryomov: folded a kerneldoc fixup and a follow-up fix from Alex to
  move CEPH_CAP_PIN reference when r_parent is updated:

  When the parent directory lock is not held, req->r_parent can become
  stale and is updated to point to the correct inode.  However, the
  associated CEPH_CAP_PIN reference was not being adjusted.  The
  CEPH_CAP_PIN is a reference on an inode that is tracked for
  accounting purposes.  Moving this pin is important to keep the
  accounting balanced. When the pin was not moved from the old parent
  to the new one, it created two problems: The reference on the old,
  stale parent was never released, causing a reference leak.
  A reference for the new parent was never acquired, creating the risk
  of a reference underflow later in ceph_mdsc_release_request().  This
  patch corrects the logic by releasing the pin from the old parent and
  acquiring it for the new parent when r_parent is switched.  This
  ensures reference accounting stays balanced. ]

Cc: stable@vger.kernel.org
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Ilya Dryomov 35dbbc3dbf libceph: fix invalid accesses to ceph_connection_v1_info
commit cdbc9836c7 upstream.

There is a place where generic code in messenger.c is reading and
another place where it is writing to con->v1 union member without
checking that the union member is active (i.e. msgr1 is in use).

On 64-bit systems, con->v1.auth_retry overlaps with con->v2.out_iter,
so such a read is almost guaranteed to return a bogus value instead of
0 when msgr2 is in use.  This ends up being fairly benign because the
side effect is just the invalidation of the authorizer and successive
fetching of new tickets.

con->v1.connect_seq overlaps with con->v2.conn_bufs and the fact that
it's being written to can cause more serious consequences, but luckily
it's not something that happens often.

Cc: stable@vger.kernel.org
Fixes: cd1a677cad ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Chen Ridong 7e64474aba kernfs: Fix UAF in polling when open file is released
commit 3c9ba2777d upstream.

A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
Stall Information) monitoring mechanism:

BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
Read of size 8 at addr ffff3de3d50bd308 by task systemd/1

psi_trigger_poll+0x3c/0x140
cgroup_pressure_poll+0x70/0xa0
cgroup_file_poll+0x8c/0x100
kernfs_fop_poll+0x11c/0x1c0
ep_item_poll.isra.0+0x188/0x2c0

Allocated by task 1:
cgroup_file_open+0x88/0x388
kernfs_fop_open+0x73c/0xaf0
do_dentry_open+0x5fc/0x1200
vfs_open+0xa0/0x3f0
do_open+0x7e8/0xd08
path_openat+0x2fc/0x6b0
do_filp_open+0x174/0x368

Freed by task 8462:
cgroup_file_release+0x130/0x1f8
kernfs_drain_open_files+0x17c/0x440
kernfs_drain+0x2dc/0x360
kernfs_show+0x1b8/0x288
cgroup_file_show+0x150/0x268
cgroup_pressure_write+0x1dc/0x340
cgroup_file_write+0x274/0x548

Reproduction Steps:
1. Open test/cpu.pressure and establish epoll monitoring
2. Disable monitoring: echo 0 > test/cgroup.pressure
3. Re-enable monitoring: echo 1 > test/cgroup.pressure

The race condition occurs because:
1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
   - Releases PSI triggers via cgroup_file_release()
   - Frees of->priv through kernfs_drain_open_files()
2. While epoll still holds reference to the file and continues polling
3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv

epolling			disable/enable cgroup.pressure
fd=open(cpu.pressure)
while(1)
...
epoll_wait
kernfs_fop_poll
kernfs_get_active = true	echo 0 > cgroup.pressure
...				cgroup_file_show
				kernfs_show
				// inactive kn
				kernfs_drain_open_files
				cft->release(of);
				kfree(ctx);
				...
kernfs_get_active = false
				echo 1 > cgroup.pressure
				kernfs_show
				kernfs_activate_one(kn);
kernfs_fop_poll
kernfs_get_active = true
cgroup_file_poll
psi_trigger_poll
// UAF
...
end: close(fd)

To address this issue, introduce kernfs_get_active_of() for kernfs open
files to obtain active references. This function will fail if the open file
has been released. Replace kernfs_get_active() with kernfs_get_active_of()
to prevent further operations on released file descriptors.

Fixes: 34f26a1561 ("sched/psi: Per-cgroup PSI accounting disable/re-enable interface")
Cc: stable <stable@kernel.org>
Reported-by: Zhang Zhaotian <zhangzhaotian@huawei.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250822070715.1565236-2-chenridong@huaweicloud.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Matthieu Baerts (NGI0) 3ac1ec2745 netlink: specs: mptcp: fix if-idx attribute type
[ Upstream commit 7094b84863 ]

This attribute is used as a signed number in the code in pm_netlink.c:

  nla_put_s32(skb, MPTCP_ATTR_IF_IDX, ssk->sk_bound_dev_if))

The specs should then reflect that. Note that other 'if-idx' attributes
from the same .yaml file use a signed number as well.

Fixes: bc8aeb2045 ("Documentation: netlink: add a YAML spec for mptcp")
Cc: stable@vger.kernel.org
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250908-net-mptcp-misc-fixes-6-17-rc5-v1-1-5f2168a66079@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Jakub Kicinski 20a2c389b3 netlink: specs: mptcp: replace underscores with dashes in names
[ Upstream commit 9e6dd4c256 ]

We're trying to add a strict regexp for the name format in the spec.
Underscores will not be allowed, dashes should be used instead.
This makes no difference to C (codegen, if used, replaces special
chars in names) but it gives more uniform naming in Python.

Fixes: bc8aeb2045 ("Documentation: netlink: add a YAML spec for mptcp")
Reviewed-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250624211002.3475021-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 7094b84863 ("netlink: specs: mptcp: fix if-idx attribute type")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:47 +02:00
Matthieu Baerts (NGI0) e295bf08b2 netlink: specs: mptcp: clearly mention attributes
[ Upstream commit bea87657b5 ]

The rendered version of the MPTCP events [1] looked strange, because the
whole content of the 'doc' was displayed in the same block.

It was then not clear that the first words, not even ended by a period,
were the attributes that are defined when such events are emitted. These
attributes have now been moved to the end, prefixed by 'Attributes:' and
ended with a period. Note that '>-' has been added after 'doc:' to allow
':' in the text below.

The documentation in the UAPI header has been auto-generated by:

  ./tools/net/ynl/ynl-regen.sh

Link: https://docs.kernel.org/networking/netlink_spec/mptcp_pm.html#event-type [1]
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241221-net-mptcp-netlink-specs-pm-doc-fixes-v2-2-e54f2db3f844@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 7094b84863 ("netlink: specs: mptcp: fix if-idx attribute type")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Matthieu Baerts (NGI0) 5ea53f2701 netlink: specs: mptcp: add missing 'server-side' attr
[ Upstream commit 6b830c6a02 ]

This attribute is added with the 'created' and 'established' events, but
the documentation didn't mention it.

The documentation in the UAPI header has been auto-generated by:

  ./tools/net/ynl/ynl-regen.sh

Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20241221-net-mptcp-netlink-specs-pm-doc-fixes-v2-1-e54f2db3f844@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 7094b84863 ("netlink: specs: mptcp: fix if-idx attribute type")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
David Rosca 6dc4eddeb7 drm/amdgpu/vcn4: Fix IB parsing with multiple engine info packages
commit 2b10cb58d7 upstream.

There can be multiple engine info packages in one IB and the first one
may be common engine, not decode/encode.
We need to parse the entire IB instead of stopping after finding first
engine info.

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit dc8f9f0f45166a6b37864e7a031c726981d6e5fc)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
David Rosca c53a6447d1 drm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any time
commit 3318f2d20c upstream.

There is no reason to require this to happen on first submitted IB only.
We need to wait for the queue to be idle, but it can be done at any
time (including when there are multiple video sessions active).

Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8908fdce0634a623404e9923ed2f536101a39db5)
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Thomas Hellström 7d07bc9c4f drm/xe: Attempt to bring bos back to VRAM after eviction
commit 5c87fee3c9 upstream.

VRAM+TT bos that are evicted from VRAM to TT may remain in
TT also after a revalidation following eviction or suspend.

This manifests itself as applications becoming sluggish
after buffer objects get evicted or after a resume from
suspend or hibernation.

If the bo supports placement in both VRAM and TT, and
we are on DGFX, mark the TT placement as fallback. This means
that it is tried only after VRAM + eviction.

This flaw has probably been present since the xe module was
upstreamed but use a Fixes: commit below where backporting is
likely to be simple. For earlier versions we need to open-
code the fallback algorithm in the driver.

v2:
- Remove check for dgfx. (Matthew Auld)
- Update the xe_dma_buf kunit test for the new strategy (CI)
- Allow dma-buf to pin in current placement (CI)
- Make xe_bo_validate() for pinned bos a NOP.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995
Fixes: a78a8da51b ("drm/ttm: replace busy placement with flags v6")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v6.9+
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel.com
(cherry picked from commit cb3d7b3b46b799c96b54f8e8fe36794a55a77f0b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Johan Hovold b58a26cdd4 drm/mediatek: fix potential OF node use-after-free
commit 4de37a48b6 upstream.

The for_each_child_of_node() helper drops the reference it takes to each
node as it iterates over children and an explicit of_node_put() is only
needed when exiting the loop early.

Drop the recently introduced bogus additional reference count decrement
at each iteration that could potentially lead to a use-after-free.

Fixes: 1f403699c4 ("drm/mediatek: Fix device/node reference count leaks in mtk_drm_get_all_drm_priv")
Cc: Ma Ke <make24@iscas.ac.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: CK Hu <ck.hu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20250829090345.21075-2-johan@kernel.org/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Quanmin Yan af0ae62b93 mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters()
commit 711f19dfd7 upstream.

Patch series "mm/damon: avoid divide-by-zero in DAMON module's parameters
application".

DAMON's RECLAIM and LRU_SORT modules perform no validation on
user-configured parameters during application, which may lead to
division-by-zero errors.

Avoid the divide-by-zero by adding validation checks when DAMON modules
attempt to apply the parameters.


This patch (of 2):

During the calculation of 'hot_thres' and 'cold_thres', either
'sample_interval' or 'aggr_interval' is used as the divisor, which may
lead to division-by-zero errors.  Fix it by directly returning -EINVAL
when such a case occurs.  Additionally, since 'aggr_interval' is already
required to be set no smaller than 'sample_interval' in damon_set_attrs(),
only the case where 'sample_interval' is zero needs to be checked.

Link: https://lkml.kernel.org/r/20250827115858.1186261-2-yanquanmin1@huawei.com
Fixes: 40e983cca9 ("mm/damon: introduce DAMON-based LRU-lists Sorting")
Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: ze zuo <zuoze1@huawei.com>
Cc: <stable@vger.kernel.org>	[6.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Sang-Heon Jeon 1797fd7b43 mm/damon/core: set quota->charged_from to jiffies at first charge window
commit ce652aac9c upstream.

Kernel initializes the "jiffies" timer as 5 minutes below zero, as shown
in include/linux/jiffies.h

 /*
 * Have the 32 bit jiffies value wrap 5 minutes after boot
 * so jiffies wrap bugs show up earlier.
 */
 #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))

And jiffies comparison help functions cast unsigned value to signed to
cover wraparound

 #define time_after_eq(a,b) \
  (typecheck(unsigned long, a) && \
  typecheck(unsigned long, b) && \
  ((long)((a) - (b)) >= 0))

When quota->charged_from is initialized to 0, time_after_eq() can
incorrectly return FALSE even after reset_interval has elapsed.  This
occurs when (jiffies - reset_interval) produces a value with MSB=1, which
is interpreted as negative in signed arithmetic.

This issue primarily affects 32-bit systems because: On 64-bit systems:
MSB=1 values occur after ~292 million years from boot (assuming HZ=1000),
almost impossible.

On 32-bit systems: MSB=1 values occur during the first 5 minutes after
boot, and the second half of every jiffies wraparound cycle, starting from
day 25 (assuming HZ=1000)

When above unexpected FALSE return from time_after_eq() occurs, the
charging window will not reset.  The user impact depends on esz value at
that time.

If esz is 0, scheme ignores configured quotas and runs without any limits.

If esz is not 0, scheme stops working once the quota is exhausted.  It
remains until the charging window finally resets.

So, change quota->charged_from to jiffies at damos_adjust_quota() when it
is considered as the first charge window.  By this change, we can avoid
unexpected FALSE return from time_after_eq()

Link: https://lkml.kernel.org/r/20250822025057.1740854-1-ekffu200098@gmail.com
Fixes: 2b8a248d58 ("mm/damon/schemes: implement size quota for schemes application speed control") # 5.16
Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Kyle Meyer de84f2978d mm/memory-failure: fix redundant updates for already poisoned pages
commit 3be306cccd upstream.

Duplicate memory errors can be reported by multiple sources.

Passing an already poisoned page to action_result() causes issues:

* The amount of hardware corrupted memory is incorrectly updated.
* Per NUMA node MF stats are incorrectly updated.
* Redundant "already poisoned" messages are printed.

Avoid those issues by:

* Skipping hardware corrupted memory updates for already poisoned pages.
* Skipping per NUMA node MF stats updates for already poisoned pages.
* Dropping redundant "already poisoned" messages.

Make MF_MSG_ALREADY_POISONED consistent with other action_page_types and
make calls to action_result() consistent for already poisoned normal pages
and huge pages.

Link: https://lkml.kernel.org/r/aLCiHMy12Ck3ouwC@hpe.com
Fixes: b8b9488d50 ("mm/memory-failure: improve memory failure action_result messages")
Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com>
Reviewed-by: Jiaqi Yan <jiaqiyan@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Kyle Meyer <kyle.meyer@hpe.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Miaohe Lin 7618fd443a mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory
commit d613f53c83 upstream.

When I did memory failure tests, below panic occurs:

page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
kernel BUG at include/linux/page-flags.h:616!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
RIP: 0010:unpoison_memory+0x2f3/0x590
RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 unpoison_memory+0x2f3/0x590
 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
 debugfs_attr_write+0x42/0x60
 full_proxy_write+0x5b/0x80
 vfs_write+0xd5/0x540
 ksys_write+0x64/0xe0
 do_syscall_64+0xb9/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f08f0314887
RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
 </TASK>
Modules linked in: hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:unpoison_memory+0x2f3/0x590
RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
FS:  00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---

The root cause is that unpoison_memory() tries to check the PG_HWPoison
flags of an uninitialized page.  So VM_BUG_ON_PAGE(PagePoisoned(page)) is
triggered.  This can be reproduced by below steps:

1.Offline memory block:

 echo offline > /sys/devices/system/memory/memory12/state

2.Get offlined memory pfn:

 page-types -b n -rlN

3.Write pfn to unpoison-pfn

 echo <pfn> > /sys/kernel/debug/hwpoison/unpoison-pfn

This scenario can be identified by pfn_to_online_page() returning NULL.
And ZONE_DEVICE pages are never expected, so we can simply fail if
pfn_to_online_page() == NULL to fix the bug.

Link: https://lkml.kernel.org/r/20250828024618.1744895-1-linmiaohe@huawei.com
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Wei Yang fd714c92b1 mm/khugepaged: fix the address passed to notifier on testing young
commit 394bfac1c7 upstream.

Commit 8ee53820ed ("thp: mmu_notifier_test_young") introduced
mmu_notifier_test_young(), but we are passing the wrong address.
In xxx_scan_pmd(), the actual iteration address is "_address" not
"address".  We seem to misuse the variable on the very beginning.

Change it to the right one.

[akpm@linux-foundation.org fix whitespace, per everyone]
Link: https://lkml.kernel.org/r/20250822063318.11644-1-richard.weiyang@gmail.com
Fixes: 8ee53820ed ("thp: mmu_notifier_test_young")
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Miklos Szeredi 532b87643f fuse: prevent overflow in copy_file_range return value
commit 1e08938c36 upstream.

The FUSE protocol uses struct fuse_write_out to convey the return value of
copy_file_range, which is restricted to uint32_t.  But the COPY_FILE_RANGE
interface supports a 64-bit size copies.

Currently the number of bytes copied is silently truncated to 32-bit, which
may result in poor performance or even failure to copy in case of
truncation to zero.

Reported-by: Florian Weimer <fweimer@redhat.com>
Closes: https://lore.kernel.org/all/lhuh5ynl8z5.fsf@oldenburg.str.redhat.com/
Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Miklos Szeredi b7c40f063f fuse: check if copy_file_range() returns larger than requested size
commit e5203209b3 upstream.

Just like write(), copy_file_range() should check if the return value is
less or equal to the requested number of bytes.

Reported-by: Chunsheng Luo <luochunsheng@ustc.edu>
Closes: https://lore.kernel.org/all/20250807062425.694-1-luochunsheng@ustc.edu/
Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Amir Goldstein 30814d40fc fuse: do not allow mapping a non-regular backing file
commit e9c8da670e upstream.

We do not support passthrough operations other than read/write on
regular file, so allowing non-regular backing files makes no sense.

Fixes: efad7153bf ("fuse: allow O_PATH fd for FUSE_DEV_IOC_BACKING_OPEN")
Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Christophe Kerello b6f8cd737b mtd: rawnand: stm32_fmc2: fix ECC overwrite
commit 811c0da454 upstream.

In case OOB write is requested during a data write, ECC is currently
lost. Avoid this issue by only writing in the free spare area.
This issue has been seen with a YAFFS2 file system.

Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com>
Cc: stable@vger.kernel.org
Fixes: 2cd457f328 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:46 +02:00
Christophe Kerello 26adba1e7d mtd: rawnand: stm32_fmc2: avoid overlapping mappings on ECC buffer
commit 513c40e59d upstream.

Avoid below overlapping mappings by using a contiguous
non-cacheable buffer.

[    4.077708] DMA-API: stm32_fmc2_nfc 48810000.nand-controller: cacheline tracking EEXIST,
overlapping mappings aren't supported
[    4.089103] WARNING: CPU: 1 PID: 44 at kernel/dma/debug.c:568 add_dma_entry+0x23c/0x300
[    4.097071] Modules linked in:
[    4.100101] CPU: 1 PID: 44 Comm: kworker/u4:2 Not tainted 6.1.82 #1
[    4.106346] Hardware name: STMicroelectronics STM32MP257F VALID1 SNOR / MB1704 (LPDDR4 Power discrete) + MB1703 + MB1708 (SNOR MB1730) (DT)
[    4.118824] Workqueue: events_unbound deferred_probe_work_func
[    4.124674] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    4.131624] pc : add_dma_entry+0x23c/0x300
[    4.135658] lr : add_dma_entry+0x23c/0x300
[    4.139792] sp : ffff800009dbb490
[    4.143016] x29: ffff800009dbb4a0 x28: 0000000004008022 x27: ffff8000098a6000
[    4.150174] x26: 0000000000000000 x25: ffff8000099e7000 x24: ffff8000099e7de8
[    4.157231] x23: 00000000ffffffff x22: 0000000000000000 x21: ffff8000098a6a20
[    4.164388] x20: ffff000080964180 x19: ffff800009819ba0 x18: 0000000000000006
[    4.171545] x17: 6361727420656e69 x16: 6c6568636163203a x15: 72656c6c6f72746e
[    4.178602] x14: 6f632d646e616e2e x13: ffff800009832f58 x12: 00000000000004ec
[    4.185759] x11: 00000000000001a4 x10: ffff80000988af58 x9 : ffff800009832f58
[    4.192916] x8 : 00000000ffffefff x7 : ffff80000988af58 x6 : 80000000fffff000
[    4.199972] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
[    4.207128] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000812d2c40
[    4.214185] Call trace:
[    4.216605]  add_dma_entry+0x23c/0x300
[    4.220338]  debug_dma_map_sg+0x198/0x350
[    4.224373]  __dma_map_sg_attrs+0xa0/0x110
[    4.228411]  dma_map_sg_attrs+0x10/0x2c
[    4.232247]  stm32_fmc2_nfc_xfer.isra.0+0x1c8/0x3fc
[    4.237088]  stm32_fmc2_nfc_seq_read_page+0xc8/0x174
[    4.242127]  nand_read_oob+0x1d4/0x8e0
[    4.245861]  mtd_read_oob_std+0x58/0x84
[    4.249596]  mtd_read_oob+0x90/0x150
[    4.253231]  mtd_read+0x68/0xac

Signed-off-by: Christophe Kerello <christophe.kerello@foss.st.com>
Cc: stable@vger.kernel.org
Fixes: 2cd457f328 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
Alexander Sverdlin 040c78723a mtd: nand: raw: atmel: Respect tAR, tCLR in read setup timing
commit fd779eac2d upstream.

Having setup time 0 violates tAR, tCLR of some chips, for instance
TOSHIBA TC58NVG2S3ETAI0 cannot be detected successfully (first ID byte
being read duplicated, i.e. 98 98 dc 90 15 76 14 03 instead of
98 dc 90 15 76 ...).

Atmel Application Notes postulated 1 cycle NRD_SETUP without explanation
[1], but it looks more appropriate to just calculate setup time properly.

[1] Link: https://ww1.microchip.com/downloads/aemDocuments/documents/MPU32/ApplicationNotes/ApplicationNotes/doc6255.pdf

Cc: stable@vger.kernel.org
Fixes: f9ce2eddf1 ("mtd: nand: atmel: Add ->setup_data_interface() hooks")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Tested-by: Alexander Dahl <ada@thorsis.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
Oleksij Rempel 2e2eb78906 net: usb: asix: ax88772: drop phylink use in PM to avoid MDIO runtime PM wakeups
commit 5537a46794 upstream.

Drop phylink_{suspend,resume}() from ax88772 PM callbacks.

MDIO bus accesses have their own runtime-PM handling and will try to
wake the device if it is suspended. Such wake attempts must not happen
from PM callbacks while the device PM lock is held. Since phylink
{sus|re}sume may trigger MDIO, it must not be called in PM context.

No extra phylink PM handling is required for this driver:
- .ndo_open/.ndo_stop control the phylink start/stop lifecycle.
- ethtool/phylib entry points run in process context, not PM.
- phylink MAC ops program the MAC on link changes after resume.

Fixes: e0bffe3e68 ("net: asix: ax88772: migrate to phylink")
Reported-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Tested-by: Hubert Wiśniewski <hubert.wisniewski.25632@gmail.com>
Tested-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://patch.msgid.link/20250908112619.2900723-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
Chiasheng Lee f98d88bf36 i2c: i801: Hide Intel Birch Stream SoC TCO WDT
commit 664596bd98 upstream.

Hide the Intel Birch Stream SoC TCO WDT feature since it was removed.

On platforms with PCH TCO WDT, this redundant device might be rendering
errors like this:

[   28.144542] sysfs: cannot create duplicate filename '/bus/platform/devices/iTCO_wdt'

Fixes: 8c56f9ef25 ("i2c: i801: Add support for Intel Birch Stream SoC")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220320
Signed-off-by: Chiasheng Lee <chiasheng.lee@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.7+
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Link: https://lore.kernel.org/r/20250901125943.916522-1-chiasheng.lee@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
Omar Sandoval 9ba898c9fc btrfs: fix subvolume deletion lockup caused by inodes xarray race
commit f6a6c28005 upstream.

There is a race condition between inode eviction and inode caching that
can cause a live struct btrfs_inode to be missing from the root->inodes
xarray. Specifically, there is a window during evict() between the inode
being unhashed and deleted from the xarray. If btrfs_iget() is called
for the same inode in that window, it will be recreated and inserted
into the xarray, but then eviction will delete the new entry, leaving
nothing in the xarray:

Thread 1                          Thread 2
---------------------------------------------------------------
evict()
  remove_inode_hash()
                                  btrfs_iget_path()
                                    btrfs_iget_locked()
                                    btrfs_read_locked_inode()
                                      btrfs_add_inode_to_root()
  destroy_inode()
    btrfs_destroy_inode()
      btrfs_del_inode_from_root()
        __xa_erase

In turn, this can cause issues for subvolume deletion. Specifically, if
an inode is in this lost state, and all other inodes are evicted, then
btrfs_del_inode_from_root() will call btrfs_add_dead_root() prematurely.
If the lost inode has a delayed_node attached to it, then when
btrfs_clean_one_deleted_snapshot() calls btrfs_kill_all_delayed_nodes(),
it will loop forever because the delayed_nodes xarray will never become
empty (unless memory pressure forces the inode out). We saw this
manifest as soft lockups in production.

Fix it by only deleting the xarray entry if it matches the given inode
(using __xa_cmpxchg()).

Fixes: 310b2f5d5a ("btrfs: use an xarray to track open inodes in a root")
Cc: stable@vger.kernel.org # 6.11+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Co-authored-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: Leo Martins <loemra.dev@gmail.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
Boris Burkov 6e9a12ab07 btrfs: fix squota compressed stats leak
commit de134cb54c upstream.

The following workload on a squota enabled fs:

  btrfs subvol create mnt/subvol

  # ensure subvol extents get accounted
  sync
  btrfs qgroup create 1/1 mnt
  btrfs qgroup assign mnt/subvol 1/1 mnt
  btrfs qgroup delete mnt/subvol

  # make the cleaner thread run
  btrfs filesystem sync mnt
  sleep 1
  btrfs filesystem sync mnt
  btrfs qgroup destroy 1/1 mnt

will fail with EBUSY. The reason is that 1/1 does the quick accounting
when we assign subvol to it, gaining its exclusive usage as excl and
excl_cmpr. But then when we delete subvol, the decrement happens via
record_squota_delta() which does not update excl_cmpr, as squotas does
not make any distinction between compressed and normal extents. Thus,
we increment excl_cmpr but never decrement it, and are unable to delete
1/1. The two possible fixes are to make squota always mirror excl and
excl_cmpr or to make the fast accounting separately track the plain and
cmpr numbers. The latter felt cleaner to me so that is what I opted for.

Fixes: 1e0e9d5771 ("btrfs: add helper for recording simple quota deltas")
CC: stable@vger.kernel.org # 6.12+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
Mark Tinguely 1d3c96547e ocfs2: fix recursive semaphore deadlock in fiemap call
commit 04100f775c upstream.

syzbot detected a OCFS2 hang due to a recursive semaphore on a
FS_IOC_FIEMAP of the extent list on a specially crafted mmap file.

context_switch kernel/sched/core.c:5357 [inline]
   __schedule+0x1798/0x4cc0 kernel/sched/core.c:6961
   __schedule_loop kernel/sched/core.c:7043 [inline]
   schedule+0x165/0x360 kernel/sched/core.c:7058
   schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:7115
   rwsem_down_write_slowpath+0x872/0xfe0 kernel/locking/rwsem.c:1185
   __down_write_common kernel/locking/rwsem.c:1317 [inline]
   __down_write kernel/locking/rwsem.c:1326 [inline]
   down_write+0x1ab/0x1f0 kernel/locking/rwsem.c:1591
   ocfs2_page_mkwrite+0x2ff/0xc40 fs/ocfs2/mmap.c:142
   do_page_mkwrite+0x14d/0x310 mm/memory.c:3361
   wp_page_shared mm/memory.c:3762 [inline]
   do_wp_page+0x268d/0x5800 mm/memory.c:3981
   handle_pte_fault mm/memory.c:6068 [inline]
   __handle_mm_fault+0x1033/0x5440 mm/memory.c:6195
   handle_mm_fault+0x40a/0x8e0 mm/memory.c:6364
   do_user_addr_fault+0x764/0x1390 arch/x86/mm/fault.c:1387
   handle_page_fault arch/x86/mm/fault.c:1476 [inline]
   exc_page_fault+0x76/0xf0 arch/x86/mm/fault.c:1532
   asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
RIP: 0010:copy_user_generic arch/x86/include/asm/uaccess_64.h:126 [inline]
RIP: 0010:raw_copy_to_user arch/x86/include/asm/uaccess_64.h:147 [inline]
RIP: 0010:_inline_copy_to_user include/linux/uaccess.h:197 [inline]
RIP: 0010:_copy_to_user+0x85/0xb0 lib/usercopy.c:26
Code: e8 00 bc f7 fc 4d 39 fc 72 3d 4d 39 ec 77 38 e8 91 b9 f7 fc 4c 89
f7 89 de e8 47 25 5b fd 0f 01 cb 4c 89 ff 48 89 d9 4c 89 f6 <f3> a4 0f
1f 00 48 89 cb 0f 01 ca 48 89 d8 5b 41 5c 41 5d 41 5e 41
RSP: 0018:ffffc9000403f950 EFLAGS: 00050256
RAX: ffffffff84c7f101 RBX: 0000000000000038 RCX: 0000000000000038
RDX: 0000000000000000 RSI: ffffc9000403f9e0 RDI: 0000200000000060
RBP: ffffc9000403fa90 R08: ffffc9000403fa17 R09: 1ffff92000807f42
R10: dffffc0000000000 R11: fffff52000807f43 R12: 0000200000000098
R13: 00007ffffffff000 R14: ffffc9000403f9e0 R15: 0000200000000060
   copy_to_user include/linux/uaccess.h:225 [inline]
   fiemap_fill_next_extent+0x1c0/0x390 fs/ioctl.c:145
   ocfs2_fiemap+0x888/0xc90 fs/ocfs2/extent_map.c:806
   ioctl_fiemap fs/ioctl.c:220 [inline]
   do_vfs_ioctl+0x1173/0x1430 fs/ioctl.c:532
   __do_sys_ioctl fs/ioctl.c:596 [inline]
   __se_sys_ioctl+0x82/0x170 fs/ioctl.c:584
   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
   do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5f13850fd9
RSP: 002b:00007ffe3b3518b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000200000000000 RCX: 00007f5f13850fd9
RDX: 0000200000000040 RSI: 00000000c020660b RDI: 0000000000000004
RBP: 6165627472616568 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe3b3518f0
R13: 00007ffe3b351b18 R14: 431bde82d7b634db R15: 00007f5f1389a03b

ocfs2_fiemap() takes a read lock of the ip_alloc_sem semaphore (since
v2.6.22-527-g7307de80510a) and calls fiemap_fill_next_extent() to read the
extent list of this running mmap executable.  The user supplied buffer to
hold the fiemap information page faults calling ocfs2_page_mkwrite() which
will take a write lock (since v2.6.27-38-g00dc417fa3e7) of the same
semaphore.  This recursive semaphore will hold filesystem locks and causes
a hang of the fileystem.

The ip_alloc_sem protects the inode extent list and size.  Release the
read semphore before calling fiemap_fill_next_extent() in ocfs2_fiemap()
and ocfs2_fiemap_inline().  This does an unnecessary semaphore lock/unlock
on the last extent but simplifies the error path.

Link: https://lkml.kernel.org/r/61d1a62b-2631-4f12-81e2-cd689914360b@oracle.com
Fixes: 00dc417fa3 ("ocfs2: fiemap support")
Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
Reported-by: syzbot+541dcc6ee768f77103e7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=541dcc6ee768f77103e7
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
Krister Johansen 9be08390ed mptcp: sockopt: make sync_socket_options propagate SOCK_KEEPOPEN
commit 648de37416 upstream.

Users reported a scenario where MPTCP connections that were configured
with SO_KEEPALIVE prior to connect would fail to enable their keepalives
if MTPCP fell back to TCP mode.

After investigating, this affects keepalives for any connection where
sync_socket_options is called on a socket that is in the closed or
listening state.  Joins are handled properly. For connects,
sync_socket_options is called when the socket is still in the closed
state.  The tcp_set_keepalive() function does not act on sockets that
are closed or listening, hence keepalive is not immediately enabled.
Since the SO_KEEPOPEN flag is absent, it is not enabled later in the
connect sequence via tcp_finish_connect.  Setting the keepalive via
sockopt after connect does work, but would not address any subsequently
created flows.

Fortunately, the fix here is straight-forward: set SOCK_KEEPOPEN on the
subflow when calling sync_socket_options.

The fix was valdidated both by using tcpdump to observe keepalive
packets not being sent before the fix, and being sent after the fix.  It
was also possible to observe via ss that the keepalive timer was not
enabled on these sockets before the fix, but was enabled afterwards.

Fixes: 1b3e7ede13 ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
Cc: stable@vger.kernel.org
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/aL8dYfPZrwedCIh9@templeofstupid.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
Nathan Chancellor 5b4605974b compiler-clang.h: define __SANITIZE_*__ macros only when undefined
commit 3fac212fe4 upstream.

Clang 22 recently added support for defining __SANITIZE__ macros similar
to GCC [1], which causes warnings (or errors with CONFIG_WERROR=y or W=e)
with the existing defines that the kernel creates to emulate this behavior
with existing clang versions.

  In file included from <built-in>:3:
  In file included from include/linux/compiler_types.h:171:
  include/linux/compiler-clang.h:37:9: error: '__SANITIZE_THREAD__' macro redefined [-Werror,-Wmacro-redefined]
     37 | #define __SANITIZE_THREAD__
        |         ^
  <built-in>:352:9: note: previous definition is here
    352 | #define __SANITIZE_THREAD__ 1
        |         ^

Refactor compiler-clang.h to only define the sanitizer macros when they
are undefined and adjust the rest of the code to use these macros for
checking if the sanitizers are enabled, clearing up the warnings and
allowing the kernel to easily drop these defines when the minimum
supported version of LLVM for building the kernel becomes 22.0.0 or newer.

Link: https://lkml.kernel.org/r/20250902-clang-update-sanitize-defines-v1-1-cf3702ca3d92@kernel.org
Link: 568c23bbd3 [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
Trond Myklebust 02f6274f9f Revert "SUNRPC: Don't allow waiting for exiting tasks"
commit 199cd9e8d1 upstream.

This reverts commit 14e41b16e8.

This patch breaks the LTP acct02 test, so let's revert and look for a
better solution.

Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: Harshvardhan Jha <harshvardhan.j.jha@oracle.com>
Link: https://lore.kernel.org/linux-nfs/7d4d57b0-39a3-49f1-8ada-60364743e3b4@sirena.org.uk/
Cc: stable@vger.kernel.org # 6.15.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
Salah Triki 589a319dcd EDAC/altera: Delete an inappropriate dma_free_coherent() call
commit ff2a66d21f upstream.

dma_free_coherent() must only be called if the corresponding
dma_alloc_coherent() call has succeeded. Calling it when the allocation fails
leads to undefined behavior.

Delete the wrong call.

  [ bp: Massage commit message. ]

Fixes: 71bcada88b ("edac: altera: Add Altera SDRAM EDAC support")
Signed-off-by: Salah Triki <salah.triki@gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/aIrfzzqh4IzYtDVC@pc
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:45 +02:00
wangzijie 1ddb0a6ffa proc: fix type confusion in pde_set_flags()
[ Upstream commit 0ce9398aa0 ]

Commit 2ce3d282bd ("proc: fix missing pde_set_flags() for net proc
files") missed a key part in the definition of proc_dir_entry:

union {
	const struct proc_ops *proc_ops;
	const struct file_operations *proc_dir_ops;
};

So dereference of ->proc_ops assumes it is a proc_ops structure results in
type confusion and make NULL check for 'proc_ops' not work for proc dir.

Add !S_ISDIR(dp->mode) test before calling pde_set_flags() to fix it.

Link: https://lkml.kernel.org/r/20250904135715.3972782-1-wangzijie1@honor.com
Fixes: 2ce3d282bd ("proc: fix missing pde_set_flags() for net proc files")
Signed-off-by: wangzijie <wangzijie1@honor.com>
Reported-by: Brad Spengler <spender@grsecurity.net>
Closes: https://lore.kernel.org/all/20250903065758.3678537-1-wangzijie1@honor.com/
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Stefano Brivio <sbrivio@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:45 +02:00
Kuniyuki Iwashima 539920180c tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock->cork.
[ Upstream commit a3967baad4 ]

syzbot reported the splat below. [0]

The repro does the following:

  1. Load a sk_msg prog that calls bpf_msg_cork_bytes(msg, cork_bytes)
  2. Attach the prog to a SOCKMAP
  3. Add a socket to the SOCKMAP
  4. Activate fault injection
  5. Send data less than cork_bytes

At 5., the data is carried over to the next sendmsg() as it is
smaller than the cork_bytes specified by bpf_msg_cork_bytes().

Then, tcp_bpf_send_verdict() tries to allocate psock->cork to hold
the data, but this fails silently due to fault injection + __GFP_NOWARN.

If the allocation fails, we need to revert the sk->sk_forward_alloc
change done by sk_msg_alloc().

Let's call sk_msg_free() when tcp_bpf_send_verdict fails to allocate
psock->cork.

The "*copied" also needs to be updated such that a proper error can
be returned to the caller, sendmsg. It fails to allocate psock->cork.
Nothing has been corked so far, so this patch simply sets "*copied"
to 0.

[0]:
WARNING: net/ipv4/af_inet.c:156 at inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156, CPU#1: syz-executor/5983
Modules linked in:
CPU: 1 UID: 0 PID: 5983 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:inet_sock_destruct+0x623/0x730 net/ipv4/af_inet.c:156
Code: 0f 0b 90 e9 62 fe ff ff e8 7a db b5 f7 90 0f 0b 90 e9 95 fe ff ff e8 6c db b5 f7 90 0f 0b 90 e9 bb fe ff ff e8 5e db b5 f7 90 <0f> 0b 90 e9 e1 fe ff ff 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 9f fc
RSP: 0018:ffffc90000a08b48 EFLAGS: 00010246
RAX: ffffffff8a09d0b2 RBX: dffffc0000000000 RCX: ffff888024a23c80
RDX: 0000000000000100 RSI: 0000000000000fff RDI: 0000000000000000
RBP: 0000000000000fff R08: ffff88807e07c627 R09: 1ffff1100fc0f8c4
R10: dffffc0000000000 R11: ffffed100fc0f8c5 R12: ffff88807e07c380
R13: dffffc0000000000 R14: ffff88807e07c60c R15: 1ffff1100fc0f872
FS:  00005555604c4500(0000) GS:ffff888125af1000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555604df5c8 CR3: 0000000032b06000 CR4: 00000000003526f0
Call Trace:
 <IRQ>
 __sk_destruct+0x86/0x660 net/core/sock.c:2339
 rcu_do_batch kernel/rcu/tree.c:2605 [inline]
 rcu_core+0xca8/0x1770 kernel/rcu/tree.c:2861
 handle_softirqs+0x286/0x870 kernel/softirq.c:579
 __do_softirq kernel/softirq.c:613 [inline]
 invoke_softirq kernel/softirq.c:453 [inline]
 __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680
 irq_exit_rcu+0x9/0x30 kernel/softirq.c:696
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1052 [inline]
 sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1052
 </IRQ>

Fixes: 4f738adba3 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Reported-by: syzbot+4cabd1d2fa917a456db8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/68c0b6b5.050a0220.3c6139.0013.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250909232623.4151337-1-kuniyu@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:45 +02:00
Peilin Ye cd1fd26bb1 bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()
[ Upstream commit 6d78b4473c ]

Currently, calling bpf_map_kmalloc_node() from __bpf_async_init() can
cause various locking issues; see the following stack trace (edited for
style) as one example:

...
 [10.011566]  do_raw_spin_lock.cold
 [10.011570]  try_to_wake_up             (5) double-acquiring the same
 [10.011575]  kick_pool                      rq_lock, causing a hardlockup
 [10.011579]  __queue_work
 [10.011582]  queue_work_on
 [10.011585]  kernfs_notify
 [10.011589]  cgroup_file_notify
 [10.011593]  try_charge_memcg           (4) memcg accounting raises an
 [10.011597]  obj_cgroup_charge_pages        MEMCG_MAX event
 [10.011599]  obj_cgroup_charge_account
 [10.011600]  __memcg_slab_post_alloc_hook
 [10.011603]  __kmalloc_node_noprof
...
 [10.011611]  bpf_map_kmalloc_node
 [10.011612]  __bpf_async_init
 [10.011615]  bpf_timer_init             (3) BPF calls bpf_timer_init()
 [10.011617]  bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
 [10.011619]  bpf__sched_ext_ops_runnable
 [10.011620]  enqueue_task_scx           (2) BPF runs with rq_lock held
 [10.011622]  enqueue_task
 [10.011626]  ttwu_do_activate
 [10.011629]  sched_ttwu_pending         (1) grabs rq_lock
...

The above was reproduced on bpf-next (b338cf849ec8) by modifying
./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
ops.runnable(), and hacking the memcg accounting code a bit to make
a bpf_timer_init() call more likely to raise an MEMCG_MAX event.

We have also run into other similar variants (both internally and on
bpf-next), including double-acquiring cgroup_file_kn_lock, the same
worker_pool::lock, etc.

As suggested by Shakeel, fix this by using __GFP_HIGH instead of
GFP_ATOMIC in __bpf_async_init(), so that e.g. if try_charge_memcg()
raises an MEMCG_MAX event, we call __memcg_memory_event() with
@allow_spinning=false and avoid calling cgroup_file_notify() there.

Depends on mm patch
"memcg: skip cgroup_file_notify if spinning is not allowed":
https://lore.kernel.org/bpf/20250905201606.66198-1-shakeel.butt@linux.dev/

v0 approach s/bpf_map_kmalloc_node/bpf_mem_alloc/
https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
v1 approach:
https://lore.kernel.org/bpf/20250905234547.862249-1-yepeilin@google.com/

Fixes: b00628b1c7 ("bpf: Introduce bpf timers.")
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/20250909095222.2121438-1-yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
KaFai Wan 82967254a9 bpf: Allow fall back to interpreter for programs with stack size <= 512
[ Upstream commit df0cb5cb50 ]

OpenWRT users reported regression on ARMv6 devices after updating to latest
HEAD, where tcpdump filter:

tcpdump "not ether host 3c37121a2b3c and not ether host 184ecbca2a3a \
and not ether host 14130b4d3f47 and not ether host f0f61cf440b7 \
and not ether host a84b4dedf471 and not ether host d022be17e1d7 \
and not ether host 5c497967208b and not ether host 706655784d5b"

fails with warning: "Kernel filter failed: No error information"
when using config:
 # CONFIG_BPF_JIT_ALWAYS_ON is not set
 CONFIG_BPF_JIT_DEFAULT_ON=y

The issue arises because commits:
1. "bpf: Fix array bounds error with may_goto" changed default runtime to
   __bpf_prog_ret0_warn when jit_requested = 1
2. "bpf: Avoid __bpf_prog_ret0_warn when jit fails" returns error when
   jit_requested = 1 but jit fails

This change restores interpreter fallback capability for BPF programs with
stack size <= 512 bytes when jit fails.

Reported-by: Felix Fietkau <nbd@nbd.name>
Closes: https://lore.kernel.org/bpf/2e267b4b-0540-45d8-9310-e127bf95fc63@nbd.name/
Fixes: 6ebc5030e0 ("bpf: Fix array bounds error with may_goto")
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250909144614.2991253-1-kafai.wan@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Daniel Borkmann 0126358df1 bpf: Fix out-of-bounds dynptr write in bpf_crypto_crypt
[ Upstream commit f9bb6ffa7f ]

Stanislav reported that in bpf_crypto_crypt() the destination dynptr's
size is not validated to be at least as large as the source dynptr's
size before calling into the crypto backend with 'len = src_len'. This
can result in an OOB write when the destination is smaller than the
source.

Concretely, in mentioned function, psrc and pdst are both linear
buffers fetched from each dynptr:

  psrc = __bpf_dynptr_data(src, src_len);
  [...]
  pdst = __bpf_dynptr_data_rw(dst, dst_len);
  [...]
  err = decrypt ?
        ctx->type->decrypt(ctx->tfm, psrc, pdst, src_len, piv) :
        ctx->type->encrypt(ctx->tfm, psrc, pdst, src_len, piv);

The crypto backend expects pdst to be large enough with a src_len length
that can be written. Add an additional src_len > dst_len check and bail
out if it's the case. Note that these kfuncs are accessible under root
privileges only.

Fixes: 3e1c6f3540 ("bpf: make common crypto API for TC/XDP programs")
Reported-by: Stanislav Fort <disclosure@aisle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://lore.kernel.org/r/20250829143657.318524-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Thomas Richter 4eebb6c60e s390/cpum_cf: Deny all sampling events by counter PMU
[ Upstream commit ce97123324 ]

Deny all sampling event by the CPUMF counter facility device driver
and return -ENOENT. This return value is used to try other PMUs.
Up to now events for type PERF_TYPE_HARDWARE were not tested for
sampling and returned later on -EOPNOTSUPP. This ends the search
for alternative PMUs. Change that behavior and try other PMUs
instead.

Fixes: 613a41b0d1 ("s390/cpum_cf: Reject request for sampling in event initialization")
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Thomas Richter 5665ac5c51 s390/pai: Deny all events not handled by this PMU
[ Upstream commit 85941afd2c ]

Each PAI PMU device driver returns -EINVAL when an event is out of
its accepted range. This return value aborts the search for an
alternative PMU device driver to handle this event.
Change the return value to -ENOENT. This return value is used to
try other PMUs instead.  This makes the PMUs more robust when
the sequence of PMU device driver initialization changes (at boot time)
or by using modules.

Fixes: 39d62336f5 ("s390/pai: add support for cryptography counters")
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Pu Lehui 88525accf1 tracing: Silence warning when chunk allocation fails in trace_pid_write
[ Upstream commit cd4453c5e9 ]

Syzkaller trigger a fault injection warning:

WARNING: CPU: 1 PID: 12326 at tracepoint_add_func+0xbfc/0xeb0
Modules linked in:
CPU: 1 UID: 0 PID: 12326 Comm: syz.6.10325 Tainted: G U 6.14.0-rc5-syzkaller #0
Tainted: [U]=USER
Hardware name: Google Compute Engine/Google Compute Engine
RIP: 0010:tracepoint_add_func+0xbfc/0xeb0 kernel/tracepoint.c:294
Code: 09 fe ff 90 0f 0b 90 0f b6 74 24 43 31 ff 41 bc ea ff ff ff
RSP: 0018:ffffc9000414fb48 EFLAGS: 00010283
RAX: 00000000000012a1 RBX: ffffffff8e240ae0 RCX: ffffc90014b78000
RDX: 0000000000080000 RSI: ffffffff81bbd78b RDI: 0000000000000001
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffffffffef
R13: 0000000000000000 R14: dffffc0000000000 R15: ffffffff81c264f0
FS:  00007f27217f66c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2e80dff8 CR3: 00000000268f8000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 tracepoint_probe_register_prio+0xc0/0x110 kernel/tracepoint.c:464
 register_trace_prio_sched_switch include/trace/events/sched.h:222 [inline]
 register_pid_events kernel/trace/trace_events.c:2354 [inline]
 event_pid_write.isra.0+0x439/0x7a0 kernel/trace/trace_events.c:2425
 vfs_write+0x24c/0x1150 fs/read_write.c:677
 ksys_write+0x12b/0x250 fs/read_write.c:731
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

We can reproduce the warning by following the steps below:
1. echo 8 >> set_event_notrace_pid. Let tr->filtered_pids owns one pid
   and register sched_switch tracepoint.
2. echo ' ' >> set_event_pid, and perform fault injection during chunk
   allocation of trace_pid_list_alloc. Let pid_list with no pid and
assign to tr->filtered_pids.
3. echo ' ' >> set_event_pid. Let pid_list is NULL and assign to
   tr->filtered_pids.
4. echo 9 >> set_event_pid, will trigger the double register
   sched_switch tracepoint warning.

The reason is that syzkaller injects a fault into the chunk allocation
in trace_pid_list_alloc, causing a failure in trace_pid_list_set, which
may trigger double register of the same tracepoint. This only occurs
when the system is about to crash, but to suppress this warning, let's
add failure handling logic to trace_pid_list_set.

Link: https://lore.kernel.org/20250908024658.2390398-1-pulehui@huaweicloud.com
Fixes: 8d6e90983a ("tracing: Create a sparse bitmask for pid filtering")
Reported-by: syzbot+161412ccaeff20ce4dde@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67cb890e.050a0220.d8275.022e.GAE@google.com
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Jonathan Curley f15ebc876f NFSv4/flexfiles: Fix layout merge mirror check.
[ Upstream commit dd2fa82473 ]

Typo in ff_lseg_match_mirrors makes the diff ineffective. This results
in merge happening all the time. Merge happening all the time is
problematic because it marks lsegs invalid. Marking lsegs invalid
causes all outstanding IO to get restarted with EAGAIN and connections
to get closed.

Closing connections constantly triggers race conditions in the RDMA
implementation...

Fixes: 660d1eb223 ("pNFS/flexfile: Don't merge layout segments if the mirrors don't match")
Signed-off-by: Jonathan Curley <jcurley@purestorage.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Trond Myklebust b7c6c76c85 NFS: nfs_invalidate_folio() must observe the offset and size arguments
[ Upstream commit b7b8574225 ]

If we're truncating part of the folio, then we need to write out the
data on the part that is not covered by the cancellation.

Fixes: d47992f86b ("mm: change invalidatepage prototype to accept length")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Trond Myklebust e1651ba799 NFSv4.2: Serialise O_DIRECT i/o and copy range
[ Upstream commit ca247c8990 ]

Ensure that all O_DIRECT reads and writes complete before copying a file
range, so that the destination is up to date.

Fixes: a5864c999d ("NFS: Do not serialise O_DIRECT reads and writes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Trond Myklebust fc0e6342ad NFSv4.2: Serialise O_DIRECT i/o and clone range
[ Upstream commit c80ebeba11 ]

Ensure that all O_DIRECT reads and writes complete before cloning a file
range, so that both the source and destination are up to date.

Fixes: a5864c999d ("NFS: Do not serialise O_DIRECT reads and writes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Trond Myklebust 5eb9e22919 NFSv4.2: Serialise O_DIRECT i/o and fallocate()
[ Upstream commit b93128f297 ]

Ensure that all O_DIRECT reads and writes complete before calling
fallocate so that we don't race w.r.t. attribute updates.

Fixes: 99f2378322 ("NFSv4.2: Always flush out writes in nfs42_proc_fallocate()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Trond Myklebust abfd17844a NFS: Serialise O_DIRECT i/o and truncate()
[ Upstream commit 9eb90f4354 ]

Ensure that all O_DIRECT reads and writes are complete, and prevent the
initiation of new i/o until the setattr operation that will truncate the
file is complete.

Fixes: a5864c999d ("NFS: Do not serialise O_DIRECT reads and writes")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Max Kellermann 7f08d14103 fs/nfs/io: make nfs_start_io_*() killable
[ Upstream commit 38a125b315 ]

This allows killing processes that wait for a lock when one process is
stuck waiting for the NFS server.  This aims to complete the coverage
of NFS operations being killable, like nfs_direct_wait() does, for
example.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Stable-dep-of: 9eb90f4354 ("NFS: Serialise O_DIRECT i/o and truncate()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Vladimir Riabchun fd84053daf ftrace/samples: Fix function size computation
[ Upstream commit 80d03a4083 ]

In my_tramp1 function .size directive was placed above
ASM_RET instruction, leading to a wrong function size.

Link: https://lore.kernel.org/aK3d7vxNcO52kEmg@vova-pc
Fixes: 9d907f1ae8 ("samples/ftrace: Fix asm function ELF annotations")
Signed-off-by: Vladimir Riabchun <ferr.lambarginio@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:44 +02:00
Scott Mayhew 57c1bb02b4 nfs/localio: restore creds before releasing pageio data
[ Upstream commit 992203a1fb ]

Otherwise if the nfsd filecache code releases the nfsd_file
immediately, it can trigger the BUG_ON(cred == current->cred) in
__put_cred() when it puts the nfsd_file->nf_file->f-cred.

Fixes: b9f5dd57f4 ("nfs/localio: use dedicated workqueues for filesystem read and write")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20250807164938.2395136-1-smayhew@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Mike Snitzer a707c9a838 nfs/localio: add direct IO enablement with sync and async IO support
[ Upstream commit 3feec68563 ]

This commit simply adds the required O_DIRECT plumbing.  It doesn't
address the fact that NFS doesn't ensure all writes are page aligned
(nor device logical block size aligned as required by O_DIRECT).

Because NFS will read-modify-write for IO that isn't aligned, LOCALIO
will not use O_DIRECT semantics by default if/when an application
requests the use of O_DIRECT.  Allow the use of O_DIRECT semantics by:
1: Adding a flag to the nfs_pgio_header struct to allow the NFS
   O_DIRECT layer to signal that O_DIRECT was used by the application
2: Adding a 'localio_O_DIRECT_semantics' NFS module parameter that
   when enabled will cause LOCALIO to use O_DIRECT semantics (this may
   cause IO to fail if applications do not properly align their IO).

This commit is derived from code developed by Weston Andros Adamson.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Stable-dep-of: 992203a1fb ("nfs/localio: restore creds before releasing pageio data")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Mike Snitzer b0bf81e05b nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
[ Upstream commit 0978e5b85f ]

Push the read_iter and write_iter availability checks down to
nfs_do_local_read and nfs_do_local_write respectively.

This eliminates a redundant nfs_to->nfsd_file_file() call.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Stable-dep-of: 992203a1fb ("nfs/localio: restore creds before releasing pageio data")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Luo Gengkun 3f9b5dfbc4 tracing: Fix tracing_marker may trigger page fault during preempt_disable
[ Upstream commit 3d62ab32df ]

Both tracing_mark_write and tracing_mark_raw_write call
__copy_from_user_inatomic during preempt_disable. But in some case,
__copy_from_user_inatomic may trigger page fault, and will call schedule()
subtly. And if a task is migrated to other cpu, the following warning will
be trigger:
        if (RB_WARN_ON(cpu_buffer,
                       !local_read(&cpu_buffer->committing)))

An example can illustrate this issue:

process flow						CPU
---------------------------------------------------------------------

tracing_mark_raw_write():				cpu:0
   ...
   ring_buffer_lock_reserve():				cpu:0
      ...
      cpu = raw_smp_processor_id()			cpu:0
      cpu_buffer = buffer->buffers[cpu]			cpu:0
      ...
   ...
   __copy_from_user_inatomic():				cpu:0
      ...
      # page fault
      do_mem_abort():					cpu:0
         ...
         # Call schedule
         schedule()					cpu:0
	 ...
   # the task schedule to cpu1
   __buffer_unlock_commit():				cpu:1
      ...
      ring_buffer_unlock_commit():			cpu:1
	 ...
	 cpu = raw_smp_processor_id()			cpu:1
	 cpu_buffer = buffer->buffers[cpu]		cpu:1

As shown above, the process will acquire cpuid twice and the return values
are not the same.

To fix this problem using copy_from_user_nofault instead of
__copy_from_user_inatomic, as the former performs 'access_ok' before
copying.

Link: https://lore.kernel.org/20250819105152.2766363-1-luogengkun@huaweicloud.com
Fixes: 656c7f0d2d ("tracing: Replace kmap with copy_from_user() in trace_marker writing")
Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Trond Myklebust 526d747df4 NFSv4: Clear the NFS_CAP_XATTR flag if not supported by the server
[ Upstream commit 4fb2b677fc ]

nfs_server_set_fsinfo() shouldn't assume that NFS_CAP_XATTR is unset
on entry to the function.

Fixes: b78ef845c3 ("NFSv4.2: query the server for extended attribute support")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Trond Myklebust 643ccedbbe NFSv4: Clear NFS_CAP_OPEN_XOR and NFS_CAP_DELEGTIME if not supported
[ Upstream commit b3ac334360 ]

_nfs4_server_capabilities() should clear capabilities that are not
supported by the server.

Fixes: d2a00cceb9 ("NFSv4: Detect support for OPEN4_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Trond Myklebust 4e7c053674 NFSv4: Clear the NFS_CAP_FS_LOCATIONS flag if it is not set
[ Upstream commit dd5a8621b8 ]

_nfs4_server_capabilities() is expected to clear any flags that are not
supported by the server.

Fixes: 8a59bb93b7 ("NFSv4 store server support for fs_location attribute")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Guenter Roeck 35601bc50d trace/fgraph: Fix error handling
[ Upstream commit ab1396af75 ]

Commit edede7a6dc ("trace/fgraph: Fix the warning caused by missing
unregister notifier") added a call to unregister the PM notifier if
register_ftrace_graph() failed. It does so unconditionally. However,
the PM notifier is only registered with the first call to
register_ftrace_graph(). If the first registration was successful and
a subsequent registration failed, the notifier is now unregistered even
if ftrace graphs are still registered.

Fix the problem by only unregistering the PM notifier during error handling
if there are no active fgraph registrations.

Fixes: edede7a6dc ("trace/fgraph: Fix the warning caused by missing unregister notifier")
Closes: https://lore.kernel.org/all/63b0ba5a-a928-438e-84f9-93028dd72e54@roeck-us.net/
Cc: Ye Weihua <yeweihua4@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250906050618.2634078-1-linux@roeck-us.net
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Trond Myklebust 2bc2060856 NFSv4: Don't clear capabilities that won't be reset
[ Upstream commit 31f1a960ad ]

Don't clear the capabilities that are not going to get reset by the call
to _nfs4_server_capabilities().

Reported-by: Scott Haiden <scott.b.haiden@gmail.com>
Fixes: b01f21cacd ("NFS: Fix the setting of capabilities when automounting a new filesystem")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Justin Worrell 2f7f112eae SUNRPC: call xs_sock_process_cmsg for all cmsg
[ Upstream commit 9559d2fffd ]

xs_sock_recv_cmsg was failing to call xs_sock_process_cmsg for any cmsg
type other than TLS_RECORD_TYPE_ALERT (TLS_RECORD_TYPE_DATA, and other
values not handled.) Based on my reading of the previous commit
(cc5d5908: sunrpc: fix client side handling of tls alerts), it looks
like only iov_iter_revert should be conditional on TLS_RECORD_TYPE_ALERT
(but that other cmsg types should still call xs_sock_process_cmsg). On
my machine, I was unable to connect (over mtls) to an NFS share hosted
on FreeBSD. With this patch applied, I am able to mount the share again.

Fixes: cc5d59081f ("sunrpc: fix client side handling of tls alerts")
Signed-off-by: Justin Worrell <jworrell@gmail.com>
Reviewed-and-tested-by: Scott Mayhew <smayhew@redhat.com>
Link: https://lore.kernel.org/r/20250904211038.12874-3-jworrell@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Tigran Mkrtchyan 606da574c1 flexfiles/pNFS: fix NULL checks on result of ff_layout_choose_ds_for_read
[ Upstream commit 5a46d2339a ]

Recent commit f06bedfa62 ("pNFS/flexfiles: don't attempt pnfs on fatal DS
errors") has changed the error return type of ff_layout_choose_ds_for_read() from
NULL to an error pointer. However, not all code paths have been updated
to match the change. Thus, some non-NULL checks will accept error pointers
as a valid return value.

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: f06bedfa62 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors")
Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
David Rosca 8d7cc14712 drm/amdgpu: Add back JPEG to video caps for carrizo and newer
[ Upstream commit 2036be3174 ]

JPEG is not supported on Vega only.

Fixes: 0a6e7b06bd ("drm/amdgpu: Remove JPEG from vega and carrizo video caps")
Signed-off-by: David Rosca <david.rosca@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0f4dfe86fe)
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Takashi Iwai 66809e11a2 ALSA: hda/realtek: Fix built-in mic assignment on ASUS VivoBook X515UA
[ Upstream commit 829ee558f3 ]

ASUS VivoBook X515UA with PCI SSID 1043:106f had a default quirk
pickup via pin table that applies ALC256_FIXUP_ASUS_MIC, but this adds
a bogus built-in mic pin 0x13 enabled.  This was no big problem
because the pin 0x13 was assigned as the secondary mic, but the recent
fix made the entries sorted, hence this bogus pin appeared now as the
primary input and it broke.

For fixing the bug, put the right quirk entry for this device pointing
to ALC256_FIXUP_ASUS_MIC_NO_PRESENCE.

Fixes: 3b4309546b ("ALSA: hda: Fix headset detection failure due to unstable sort")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219897
Link: https://patch.msgid.link/20250324153233.21195-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Aurabindo Pillai 86c7bcb699 Revert "drm/amd/display: Optimize cursor position updates"
[ Upstream commit a5d258a00b ]

This reverts commit 88c7c56d07c108ed4de319c8dba44aa4b8a38dd1.

SW and HW state are not always matching in some cases causing cursor to
be disabled.

Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:43 +02:00
Srinivasan Shanmugam 278d96bd0b drm/amd/display: Fix error pointers in amdgpu_dm_crtc_mem_type_changed
[ Upstream commit da29abe71e ]

The function amdgpu_dm_crtc_mem_type_changed was dereferencing pointers
returned by drm_atomic_get_plane_state without checking for errors. This
could lead to undefined behavior if the function returns an error pointer.

This commit adds checks using IS_ERR to ensure that new_plane_state and
old_plane_state are valid before dereferencing them.

Fixes the below:

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:11486 amdgpu_dm_crtc_mem_type_changed()
error: 'new_plane_state' dereferencing possible ERR_PTR()

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c
    11475 static bool amdgpu_dm_crtc_mem_type_changed(struct drm_device *dev,
    11476                                             struct drm_atomic_state *state,
    11477                                             struct drm_crtc_state *crtc_state)
    11478 {
    11479         struct drm_plane *plane;
    11480         struct drm_plane_state *new_plane_state, *old_plane_state;
    11481
    11482         drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) {
    11483                 new_plane_state = drm_atomic_get_plane_state(state, plane);
    11484                 old_plane_state = drm_atomic_get_plane_state(state, plane);
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^ These functions can fail.

    11485
--> 11486                 if (old_plane_state->fb && new_plane_state->fb &&
    11487                     get_mem_type(old_plane_state->fb) != get_mem_type(new_plane_state->fb))
    11488                         return true;
    11489         }
    11490
    11491         return false;
    11492 }

Fixes: 4caacd1671 ("drm/amd/display: Do not elevate mem_type change to full update")
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:42 +02:00
Umesh Nerlige Ramappa 996ab5ee7d drm/i915/pmu: Fix zero delta busyness issue
[ Upstream commit cb5fab2afd ]

When running igt@gem_exec_balancer@individual for multiple iterations,
it is seen that the delta busyness returned by PMU is 0. The issue stems
from a combination of 2 implementation specific details:

1) gt_park is throttling __update_guc_busyness_stats() so that it does
not hog PCI bandwidth for some use cases. (Ref: 59bcdb564b)

2) busyness implementation always returns monotonically increasing
counters. (Ref: cf907f6d29)

If an application queried an engine while it was active,
engine->stats.guc.running is set to true. Following that, if all PM
wakeref's are released, then gt is parked. At this time the throttling
of __update_guc_busyness_stats() may result in a missed update to the
running state of the engine (due to (1) above). This means subsequent
calls to guc_engine_busyness() will think that the engine is still
running and they will keep updating the cached counter (stats->total).
This results in an inflated cached counter.

Later when the application runs a workload and queries for busyness, we
return the cached value since it is larger than the actual value (due to
(2) above)

All subsequent queries will return the same large (inflated) value, so
the application sees a delta busyness of zero.

Fix the issue by resetting the running state of engines each time
intel_guc_busyness_park() is called.

v2: (Rodrigo)
- Use the correct tag in commit message
- Drop the redundant wakeref check in guc_engine_busyness() and update
  commit message

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13366
Fixes: cf907f6d29 ("i915/guc: Ensure busyness counter increases motonically")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250123193839.2394694-1-umesh.nerlige.ramappa@intel.com
(cherry picked from commit 431b742e2b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:42 +02:00
Theodore Ts'o aa66603ddf ext4: introduce linear search for dentries
[ Upstream commit 9e28059d56 ]

This patch addresses an issue where some files in case-insensitive
directories become inaccessible due to changes in how the kernel
function, utf8_casefold(), generates case-folded strings from the
commit 5c26d2f1d3 ("unicode: Don't special case ignorable code
points").

There are good reasons why this change should be made; it's actually
quite stupid that Unicode seems to think that the characters ❤ and ❤️
should be casefolded.  Unfortimately because of the backwards
compatibility issue, this commit was reverted in 231825b2e1.

This problem is addressed by instituting a brute-force linear fallback
if a lookup fails on case-folded directory, which does result in a
performance hit when looking up files affected by the changing how
thekernel treats ignorable Uniode characters, or when attempting to
look up non-existent file names.  So this fallback can be disabled by
setting an encoding flag if in the future, the system administrator or
the manufacturer of a mobile handset or tablet can be sure that there
was no opportunity for a kernel to insert file names with incompatible
encodings.

Fixes: 5c26d2f1d3 ("unicode: Don't special case ignorable code points")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:42 +02:00
Huan Yang 3a7fd0e56e Revert "udmabuf: fix vmap_udmabuf error page set"
[ Upstream commit ceb7b62eaa ]

This reverts commit 18d7de823b.

We cannot use vmap_pfn() in vmap_udmabuf() as it would fail the pfn_valid()
check in vmap_pfn_apply(). This is because vmap_pfn() is intended to be
used for mapping non-struct-page memory such as PCIe BARs. Since, udmabuf
mostly works with pages/folios backed by shmem/hugetlbfs/THP, vmap_pfn()
is not the right tool or API to invoke for implementing vmap.

Signed-off-by: Huan Yang <link@vivo.com>
Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reported-by: Bingbu Cao <bingbu.cao@linux.intel.com>
Closes: https://lore.kernel.org/dri-devel/eb7e0137-3508-4287-98c4-816c5fd98e10@vivo.com/T/#mbda4f64a3532b32e061f4e8763bc8e307bea3ca8
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Link: https://lore.kernel.org/r/20250428073831.19942-2-link@vivo.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:42 +02:00
Maurizio Lombardi 87bbcb73d6 nvme-pci: skip nvme_write_sq_db on empty rqlist
[ Upstream commit 288ff0d10b ]

nvme_submit_cmds() should check the rqlist before calling
nvme_write_sq_db(); if the list is empty, it must return immediately.

Fixes: beadf00885 ("nvme-pci: reverse request order in nvme_queue_rqs")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:42 +02:00
Fedor Pchelkin 63371be47f dma-debug: fix physical address calculation for struct dma_debug_entry
[ Upstream commit aef7ee7649 ]

Offset into the page should also be considered while calculating a physical
address for struct dma_debug_entry. page_to_phys() just shifts the value
PAGE_SHIFT bits to the left so offset part is zero-filled.

An example (wrong) debug assertion failure with CONFIG_DMA_API_DEBUG
enabled which is observed during systemd boot process after recent
dma-debug changes:

DMA-API: e1000 0000:00:03.0: cacheline tracking EEXIST, overlapping mappings aren't supported
WARNING: CPU: 4 PID: 941 at kernel/dma/debug.c:596 add_dma_entry
CPU: 4 UID: 0 PID: 941 Comm: ip Not tainted 6.12.0+ #288
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:add_dma_entry kernel/dma/debug.c:596
Call Trace:
 <TASK>
debug_dma_map_page kernel/dma/debug.c:1236
dma_map_page_attrs kernel/dma/mapping.c:179
e1000_alloc_rx_buffers drivers/net/ethernet/intel/e1000/e1000_main.c:4616
...

Found by Linux Verification Center (linuxtesting.org).

Fixes: 9d4f645a1f ("dma-debug: store a phys_addr_t in struct dma_debug_entry")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
[hch: added a little helper to clean up the code]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:42 +02:00
Sean Anderson a0d2200def dma-mapping: fix swapped dir/flags arguments to trace_dma_alloc_sgt_err
[ Upstream commit d5bbfbad58 ]

trace_dma_alloc_sgt_err was called with the dir and flags arguments
swapped. Fix this.

Fixes: 68b6dbf1f4 ("dma-mapping: trace more error paths")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202410302243.1wnTlPk3-lkp@intel.com/
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:42 +02:00
Harry Yoo e3253bab3c mm: introduce and use {pgd,p4d}_populate_kernel()
commit f2d2f9598e upstream.

Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
populating PGD and P4D entries for the kernel address space.  These
helpers ensure proper synchronization of page tables when updating the
kernel portion of top-level page tables.

Until now, the kernel has relied on each architecture to handle
synchronization of top-level page tables in an ad-hoc manner.  For
example, see commit 9b861528a8 ("x86-64, mem: Update all PGDs for direct
mapping and vmemmap mapping changes").

However, this approach has proven fragile for following reasons:

  1) It is easy to forget to perform the necessary page table
     synchronization when introducing new changes.
     For instance, commit 4917f55b4e ("mm/sparse-vmemmap: improve memory
     savings for compound devmaps") overlooked the need to synchronize
     page tables for the vmemmap area.

  2) It is also easy to overlook that the vmemmap and direct mapping areas
     must not be accessed before explicit page table synchronization.
     For example, commit 8d400913c2 ("x86/vmemmap: handle unpopulated
     sub-pmd ranges")) caused crashes by accessing the vmemmap area
     before calling sync_global_pgds().

To address this, as suggested by Dave Hansen, introduce _kernel() variants
of the page table population helpers, which invoke architecture-specific
hooks to properly synchronize page tables.  These are introduced in a new
header file, include/linux/pgalloc.h, so they can be called from common
code.

They reuse existing infrastructure for vmalloc and ioremap.
Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
and the actual synchronization is performed by
arch_sync_kernel_mappings().

This change currently targets only x86_64, so only PGD and P4D level
helpers are introduced.  Currently, these helpers are no-ops since no
architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.

In theory, PUD and PMD level helpers can be added later if needed by other
architectures.  For now, 32-bit architectures (x86-32 and arm) only handle
PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless
we introduce a PMD level helper.

[harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()]
  Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com
Fixes: 8d400913c2 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: bibo mao <maobibo@loongson.cn>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Adjust context ]
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:42 +02:00
Yevgeny Kliteynik 5682aad027 net/mlx5: HWS, change error flow on matcher disconnect
commit 1ce840c7a6 upstream.

Currently, when firmware failure occurs during matcher disconnect flow,
the error flow of the function reconnects the matcher back and returns
an error, which continues running the calling function and eventually
frees the matcher that is being disconnected.
This leads to a case where we have a freed matcher on the matchers list,
which in turn leads to use-after-free and eventual crash.

This patch fixes that by not trying to reconnect the matcher back when
some FW command fails during disconnect.

Note that we're dealing here with FW error. We can't overcome this
problem. This might lead to bad steering state (e.g. wrong connection
between matchers), and will also lead to resource leakage, as it is
the case with any other error handling during resource destruction.

However, the goal here is to allow the driver to continue and not crash
the machine with use-after-free error.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250102181415.1477316-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jan Alexander Preissler <akendo@akendo.eu>
Signed-off-by: Sujana Subramaniam <sujana.subramaniam@sap.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:42 +02:00
Yeoreum Yun 464a33c29c kunit: kasan_test: disable fortify string checker on kasan_strings() test
commit 7a19afee6f upstream.

Similar to commit 09c6304e38 ("kasan: test: fix compatibility with
FORTIFY_SOURCE") the kernel is panicing in kasan_string().

This is due to the `src` and `ptr` not being hidden from the optimizer
which would disable the runtime fortify string checker.

Call trace:
  __fortify_panic+0x10/0x20 (P)
  kasan_strings+0x980/0x9b0
  kunit_try_run_case+0x68/0x190
  kunit_generic_run_threadfn_adapter+0x34/0x68
  kthread+0x1c4/0x228
  ret_from_fork+0x10/0x20
 Code: d503233f a9bf7bfd 910003fd 9424b243 (d4210000)
 ---[ end trace 0000000000000000 ]---
 note: kunit_try_catch[128] exited with irqs disabled
 note: kunit_try_catch[128] exited with preempt_count 1
     # kasan_strings: try faulted: last
** replaying previous printk message **
     # kasan_strings: try faulted: last line seen mm/kasan/kasan_test_c.c:1600
     # kasan_strings: internal error occurred preventing test case from running: -4

Link: https://lkml.kernel.org/r/20250801120236.2962642-1-yeoreum.yun@arm.com
Fixes: 73228c7ecc ("KASAN: port KASAN Tests to KUnit")
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 16:35:42 +02:00
Baochen Qiang c6eb8d2d63 dma-debug: don't enforce dma mapping check on noncoherent allocations
[ Upstream commit 7e2368a217 ]

As discussed in [1], there is no need to enforce dma mapping check on
noncoherent allocations, a simple test on the returned CPU address is
good enough.

Add a new pair of debug helpers and use them for noncoherent alloc/free
to fix this issue.

Fixes: efa70f2fdc ("dma-mapping: add a new dma_alloc_pages API")
Link: https://lore.kernel.org/all/ff6c1fe6-820f-4e58-8395-df06aa91706c@oss.qualcomm.com # 1
Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://lore.kernel.org/r/20250828-dma-debug-fix-noncoherent-dma-check-v1-1-76e9be0dd7fc@oss.qualcomm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:42 +02:00
Sean Anderson 245eb0b6bd dma-mapping: trace more error paths
[ Upstream commit 68b6dbf1f4 ]

It can be surprising to the user if DMA functions are only traced on
success. On failure, it can be unclear what the source of the problem
is. Fix this by tracing all functions even when they fail. Cases where
we BUG/WARN are skipped, since those should be sufficiently noisy
already.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: 7e2368a217 ("dma-debug: don't enforce dma mapping check on noncoherent allocations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:42 +02:00
Sean Anderson f776ae61e1 dma-mapping: use trace_dma_alloc for dma_alloc* instead of using trace_dma_map
[ Upstream commit c4484ab86e ]

In some cases, we use trace_dma_map to trace dma_alloc* functions. This
generally follows dma_debug. However, this does not record all of the
relevant information for allocations, such as GFP flags. Create new
dma_alloc tracepoints for these functions. Note that while
dma_alloc_noncontiguous may allocate discontiguous pages (from the CPU's
point of view), the device will only see one contiguous mapping.
Therefore, we just need to trace dma_addr and size.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: 7e2368a217 ("dma-debug: don't enforce dma mapping check on noncoherent allocations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:42 +02:00
Sean Anderson 1edd532f24 dma-mapping: trace dma_alloc/free direction
[ Upstream commit 3afff779a7 ]

In preparation for using these tracepoints in a few more places, trace
the DMA direction as well. For coherent allocations this is always
bidirectional.

Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: 7e2368a217 ("dma-debug: don't enforce dma mapping check on noncoherent allocations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:41 +02:00
Christoph Hellwig c39e8483db dma-debug: store a phys_addr_t in struct dma_debug_entry
[ Upstream commit 9d4f645a1f ]

dma-debug goes to great length to split incoming physical addresses into
a PFN and offset to store them in struct dma_debug_entry, just to
recombine those for all meaningful uses.  Just store a phys_addr_t
instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: 7e2368a217 ("dma-debug: don't enforce dma mapping check on noncoherent allocations")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:41 +02:00
Amir Goldstein b80d9c5208 fhandle: use more consistent rules for decoding file handle from userns
[ Upstream commit bb585591eb ]

Commit 620c266f39 ("fhandle: relax open_by_handle_at() permission
checks") relaxed the coditions for decoding a file handle from non init
userns.

The conditions are that that decoded dentry is accessible from the user
provided mountfd (or to fs root) and that all the ancestors along the
path have a valid id mapping in the userns.

These conditions are intentionally more strict than the condition that
the decoded dentry should be "lookable" by path from the mountfd.

For example, the path /home/amir/dir/subdir is lookable by path from
unpriv userns of user amir, because /home perms is 755, but the owner of
/home does not have a valid id mapping in unpriv userns of user amir.

The current code did not check that the decoded dentry itself has a
valid id mapping in the userns.  There is no security risk in that,
because that final open still performs the needed permission checks,
but this is inconsistent with the checks performed on the ancestors,
so the behavior can be a bit confusing.

Add the check for the decoded dentry itself, so that the entire path,
including the last component has a valid id mapping in the userns.

Fixes: 620c266f39 ("fhandle: relax open_by_handle_at() permission checks")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/20250827194309.1259650-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-19 16:35:41 +02:00
145 changed files with 1896 additions and 988 deletions

View File

@ -41,7 +41,7 @@ properties:
- const: dma_intr2
clocks:
minItems: 1
maxItems: 1
clock-names:
const: sw_baud

View File

@ -306,6 +306,19 @@ is issuing IO to the underlying local filesystem that it is sharing with
the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and
fs/nfs/localio.c:nfs_local_commit().
With normal NFS that makes use of RPC to issue IO to the server, if an
application uses O_DIRECT the NFS client will bypass the pagecache but
the NFS server will not. Because the NFS server's use of buffered IO
affords applications to be less precise with their alignment when
issuing IO to the NFS client. LOCALIO can be configured to use O_DIRECT
semantics by setting the 'localio_O_DIRECT_semantics' nfs module
parameter to Y, e.g.:
echo Y > /sys/module/nfs/parameters/localio_O_DIRECT_semantics
Once enabled, it will cause LOCALIO to use O_DIRECT semantics (this may
cause IO to fail if applications do not properly align their IO).
Security
========

View File

@ -22,65 +22,67 @@ definitions:
doc: unused event
-
name: created
doc:
token, family, saddr4 | saddr6, daddr4 | daddr6, sport, dport
doc: >-
A new MPTCP connection has been created. It is the good time to
allocate memory and send ADD_ADDR if needed. Depending on the
traffic-patterns it can take a long time until the
MPTCP_EVENT_ESTABLISHED is sent.
Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6, sport,
dport, server-side.
-
name: established
doc:
token, family, saddr4 | saddr6, daddr4 | daddr6, sport, dport
doc: >-
A MPTCP connection is established (can start new subflows).
Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6, sport,
dport, server-side.
-
name: closed
doc:
token
doc: >-
A MPTCP connection has stopped.
Attribute: token.
-
name: announced
value: 6
doc:
token, rem_id, family, daddr4 | daddr6 [, dport]
doc: >-
A new address has been announced by the peer.
Attributes: token, rem_id, family, daddr4 | daddr6 [, dport].
-
name: removed
doc:
token, rem_id
doc: >-
An address has been lost by the peer.
Attributes: token, rem_id.
-
name: sub-established
value: 10
doc:
token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | daddr6, sport,
dport, backup, if_idx [, error]
doc: >-
A new subflow has been established. 'error' should not be set.
Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 |
daddr6, sport, dport, backup, if-idx [, error].
-
name: sub-closed
doc:
token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | daddr6, sport,
dport, backup, if_idx [, error]
doc: >-
A subflow has been closed. An error (copy of sk_err) could be set if an
error has been detected for this subflow.
Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 |
daddr6, sport, dport, backup, if-idx [, error].
-
name: sub-priority
value: 13
doc:
token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | daddr6, sport,
dport, backup, if_idx [, error]
doc: >-
The priority of a subflow has changed. 'error' should not be set.
Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 |
daddr6, sport, dport, backup, if-idx [, error].
-
name: listener-created
value: 15
doc:
family, sport, saddr4 | saddr6
doc: >-
A new PM listener is created.
Attributes: family, sport, saddr4 | saddr6.
-
name: listener-closed
doc:
family, sport, saddr4 | saddr6
doc: >-
A PM listener is closed.
Attributes: family, sport, saddr4 | saddr6.
attribute-sets:
-
@ -253,8 +255,8 @@ attribute-sets:
name: timeout
type: u32
-
name: if_idx
type: u32
name: if-idx
type: s32
-
name: reset-reason
type: u32

View File

@ -742,7 +742,7 @@ The broadcast manager sends responses to user space in the same form:
struct timeval ival1, ival2; /* count and subsequent interval */
canid_t can_id; /* unique can_id for task */
__u32 nframes; /* number of can_frames following */
struct can_frame frames[0];
struct can_frame frames[];
};
The aligned payload 'frames' uses the same basic CAN frame structure defined

View File

@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
VERSION = 6
PATCHLEVEL = 12
SUBLEVEL = 47
SUBLEVEL = 48
EXTRAVERSION =
NAME = Baby Opossum Posse

View File

@ -9,7 +9,6 @@
*/
#include <linux/types.h>
#include <linux/sched.h>
#include <linux/sched/task_stack.h>
#include <asm-generic/compat.h>
static inline int is_compat_task(void)

View File

@ -761,8 +761,6 @@ static int __hw_perf_event_init(struct perf_event *event, unsigned int type)
break;
case PERF_TYPE_HARDWARE:
if (is_sampling_event(event)) /* No sampling support */
return -ENOENT;
ev = attr->config;
if (!attr->exclude_user && attr->exclude_kernel) {
/*
@ -860,6 +858,8 @@ static int cpumf_pmu_event_init(struct perf_event *event)
unsigned int type = event->attr.type;
int err = -ENOENT;
if (is_sampling_event(event)) /* No sampling support */
return err;
if (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_RAW)
err = __hw_perf_event_init(event, type);
else if (event->pmu->type == type)

View File

@ -286,10 +286,10 @@ static int paicrypt_event_init(struct perf_event *event)
/* PAI crypto PMU registered as PERF_TYPE_RAW, check event type */
if (a->type != PERF_TYPE_RAW && event->pmu->type != a->type)
return -ENOENT;
/* PAI crypto event must be in valid range */
/* PAI crypto event must be in valid range, try others if not */
if (a->config < PAI_CRYPTO_BASE ||
a->config > PAI_CRYPTO_BASE + paicrypt_cnt)
return -EINVAL;
return -ENOENT;
/* Allow only CRYPTO_ALL for sampling */
if (a->sample_period && a->config != PAI_CRYPTO_BASE)
return -EINVAL;

View File

@ -266,7 +266,7 @@ static int paiext_event_valid(struct perf_event *event)
event->hw.config_base = offsetof(struct paiext_cb, acc);
return 0;
}
return -EINVAL;
return -ENOENT;
}
/* Might be called on different CPU than the one the event is intended for. */

View File

@ -174,24 +174,27 @@ static void topoext_fixup(struct topo_scan *tscan)
static void parse_topology_amd(struct topo_scan *tscan)
{
bool has_topoext = false;
/*
* If the extended topology leaf 0x8000_001e is available
* try to get SMT, CORE, TILE, and DIE shifts from extended
* Try to get SMT, CORE, TILE, and DIE shifts from extended
* CPUID leaf 0x8000_0026 on supported processors first. If
* extended CPUID leaf 0x8000_0026 is not supported, try to
* get SMT and CORE shift from leaf 0xb first, then try to
* get the CORE shift from leaf 0x8000_0008.
* get SMT and CORE shift from leaf 0xb. If either leaf is
* available, cpu_parse_topology_ext() will return true.
*/
if (cpu_feature_enabled(X86_FEATURE_TOPOEXT))
has_topoext = cpu_parse_topology_ext(tscan);
bool has_xtopology = cpu_parse_topology_ext(tscan);
if (!has_topoext && !parse_8000_0008(tscan))
/*
* If XTOPOLOGY leaves (0x26/0xb) are not available, try to
* get the CORE shift from leaf 0x8000_0008 first.
*/
if (!has_xtopology && !parse_8000_0008(tscan))
return;
/* Prefer leaf 0x8000001e if available */
if (parse_8000_001e(tscan, has_topoext))
/*
* Prefer leaf 0x8000001e if available to get the SMT shift and
* the initial APIC ID if XTOPOLOGY leaves are not available.
*/
if (parse_8000_001e(tscan, has_xtopology))
return;
/* Try the NODEID MSR */

View File

@ -486,10 +486,18 @@ SECTIONS
}
/*
* The ASSERT() sink to . is intentional, for binutils 2.14 compatibility:
* COMPILE_TEST kernels can be large - CONFIG_KASAN, for example, can cause
* this. Let's assume that nobody will be running a COMPILE_TEST kernel and
* let's assert that fuller build coverage is more valuable than being able to
* run a COMPILE_TEST kernel.
*/
#ifndef CONFIG_COMPILE_TEST
/*
* The ASSERT() sync to . is intentional, for binutils 2.14 compatibility:
*/
. = ASSERT((_end - LOAD_OFFSET <= KERNEL_IMAGE_SIZE),
"kernel image bigger than KERNEL_IMAGE_SIZE");
#endif
/* needed for Clang - see arch/x86/entry/entry.S */
PROVIDE(__ref_stack_chk_guard = __stack_chk_guard);

View File

@ -36,7 +36,6 @@ config UDMABUF
depends on DMA_SHARED_BUFFER
depends on MEMFD_CREATE || COMPILE_TEST
depends on MMU
select VMAP_PFN
help
A driver to let userspace turn memfd regions into dma-bufs.
Qemu can use this to create host dmabufs for guest framebuffers.

View File

@ -74,29 +74,21 @@ static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma)
static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map)
{
struct udmabuf *ubuf = buf->priv;
unsigned long *pfns;
struct page **pages;
void *vaddr;
pgoff_t pg;
dma_resv_assert_held(buf->resv);
/**
* HVO may free tail pages, so just use pfn to map each folio
* into vmalloc area.
*/
pfns = kvmalloc_array(ubuf->pagecount, sizeof(*pfns), GFP_KERNEL);
if (!pfns)
pages = kvmalloc_array(ubuf->pagecount, sizeof(*pages), GFP_KERNEL);
if (!pages)
return -ENOMEM;
for (pg = 0; pg < ubuf->pagecount; pg++) {
unsigned long pfn = folio_pfn(ubuf->folios[pg]);
for (pg = 0; pg < ubuf->pagecount; pg++)
pages[pg] = &ubuf->folios[pg]->page;
pfn += ubuf->offsets[pg] >> PAGE_SHIFT;
pfns[pg] = pfn;
}
vaddr = vmap_pfn(pfns, ubuf->pagecount, PAGE_KERNEL);
kvfree(pfns);
vaddr = vm_map_ram(pages, ubuf->pagecount, -1);
kvfree(pages);
if (!vaddr)
return -EINVAL;

View File

@ -48,12 +48,16 @@ static void *rzn1_dmamux_route_allocate(struct of_phandle_args *dma_spec,
u32 mask;
int ret;
if (dma_spec->args_count != RNZ1_DMAMUX_NCELLS)
return ERR_PTR(-EINVAL);
if (dma_spec->args_count != RNZ1_DMAMUX_NCELLS) {
ret = -EINVAL;
goto put_device;
}
map = kzalloc(sizeof(*map), GFP_KERNEL);
if (!map)
return ERR_PTR(-ENOMEM);
if (!map) {
ret = -ENOMEM;
goto put_device;
}
chan = dma_spec->args[0];
map->req_idx = dma_spec->args[4];
@ -94,12 +98,15 @@ static void *rzn1_dmamux_route_allocate(struct of_phandle_args *dma_spec,
if (ret)
goto clear_bitmap;
put_device(&pdev->dev);
return map;
clear_bitmap:
clear_bit(map->req_idx, dmamux->used_chans);
free_map:
kfree(map);
put_device:
put_device(&pdev->dev);
return ERR_PTR(ret);
}

View File

@ -187,27 +187,30 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
idxd->wq_enable_map = bitmap_zalloc_node(idxd->max_wqs, GFP_KERNEL, dev_to_node(dev));
if (!idxd->wq_enable_map) {
rc = -ENOMEM;
goto err_bitmap;
goto err_free_wqs;
}
for (i = 0; i < idxd->max_wqs; i++) {
wq = kzalloc_node(sizeof(*wq), GFP_KERNEL, dev_to_node(dev));
if (!wq) {
rc = -ENOMEM;
goto err;
goto err_unwind;
}
idxd_dev_set_type(&wq->idxd_dev, IDXD_DEV_WQ);
conf_dev = wq_confdev(wq);
wq->id = i;
wq->idxd = idxd;
device_initialize(wq_confdev(wq));
device_initialize(conf_dev);
conf_dev->parent = idxd_confdev(idxd);
conf_dev->bus = &dsa_bus_type;
conf_dev->type = &idxd_wq_device_type;
rc = dev_set_name(conf_dev, "wq%d.%d", idxd->id, wq->id);
if (rc < 0)
goto err;
if (rc < 0) {
put_device(conf_dev);
kfree(wq);
goto err_unwind;
}
mutex_init(&wq->wq_lock);
init_waitqueue_head(&wq->err_queue);
@ -218,15 +221,20 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
wq->enqcmds_retries = IDXD_ENQCMDS_RETRIES;
wq->wqcfg = kzalloc_node(idxd->wqcfg_size, GFP_KERNEL, dev_to_node(dev));
if (!wq->wqcfg) {
put_device(conf_dev);
kfree(wq);
rc = -ENOMEM;
goto err;
goto err_unwind;
}
if (idxd->hw.wq_cap.op_config) {
wq->opcap_bmap = bitmap_zalloc(IDXD_MAX_OPCAP_BITS, GFP_KERNEL);
if (!wq->opcap_bmap) {
kfree(wq->wqcfg);
put_device(conf_dev);
kfree(wq);
rc = -ENOMEM;
goto err_opcap_bmap;
goto err_unwind;
}
bitmap_copy(wq->opcap_bmap, idxd->opcap_bmap, IDXD_MAX_OPCAP_BITS);
}
@ -237,13 +245,7 @@ static int idxd_setup_wqs(struct idxd_device *idxd)
return 0;
err_opcap_bmap:
kfree(wq->wqcfg);
err:
put_device(conf_dev);
kfree(wq);
err_unwind:
while (--i >= 0) {
wq = idxd->wqs[i];
if (idxd->hw.wq_cap.op_config)
@ -252,11 +254,10 @@ err:
conf_dev = wq_confdev(wq);
put_device(conf_dev);
kfree(wq);
}
bitmap_free(idxd->wq_enable_map);
err_bitmap:
err_free_wqs:
kfree(idxd->wqs);
return rc;
@ -918,10 +919,12 @@ static void idxd_remove(struct pci_dev *pdev)
device_unregister(idxd_confdev(idxd));
idxd_shutdown(pdev);
idxd_device_remove_debugfs(idxd);
idxd_cleanup(idxd);
perfmon_pmu_remove(idxd);
idxd_cleanup_interrupts(idxd);
if (device_pasid_enabled(idxd))
idxd_disable_system_pasid(idxd);
pci_iounmap(pdev, idxd->reg_base);
put_device(idxd_confdev(idxd));
idxd_free(idxd);
pci_disable_device(pdev);
}

View File

@ -1283,13 +1283,17 @@ static int bam_dma_probe(struct platform_device *pdev)
if (!bdev->bamclk) {
ret = of_property_read_u32(pdev->dev.of_node, "num-channels",
&bdev->num_channels);
if (ret)
if (ret) {
dev_err(bdev->dev, "num-channels unspecified in dt\n");
return ret;
}
ret = of_property_read_u32(pdev->dev.of_node, "qcom,num-ees",
&bdev->num_ees);
if (ret)
if (ret) {
dev_err(bdev->dev, "num-ees unspecified in dt\n");
return ret;
}
}
ret = clk_prepare_enable(bdev->bamclk);

View File

@ -2063,8 +2063,8 @@ static int edma_setup_from_hw(struct device *dev, struct edma_soc_info *pdata,
* priority. So Q0 is the highest priority queue and the last queue has
* the lowest priority.
*/
queue_priority_map = devm_kcalloc(dev, ecc->num_tc + 1, sizeof(s8),
GFP_KERNEL);
queue_priority_map = devm_kcalloc(dev, ecc->num_tc + 1,
sizeof(*queue_priority_map), GFP_KERNEL);
if (!queue_priority_map)
return -ENOMEM;

View File

@ -128,7 +128,6 @@ static ssize_t altr_sdr_mc_err_inject_write(struct file *file,
ptemp = dma_alloc_coherent(mci->pdev, 16, &dma_handle, GFP_KERNEL);
if (!ptemp) {
dma_free_coherent(mci->pdev, 16, ptemp, dma_handle);
edac_printk(KERN_ERR, EDAC_MC,
"Inject: Buffer Allocation error\n");
return -ENOMEM;

View File

@ -400,9 +400,6 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring)
dma_fence_put(ring->vmid_wait);
ring->vmid_wait = NULL;
ring->me = 0;
if (!ring->is_mes_queue)
ring->adev->rings[ring->idx] = NULL;
}
/**

View File

@ -1813,15 +1813,19 @@ static int vcn_v3_0_limit_sched(struct amdgpu_cs_parser *p,
struct amdgpu_job *job)
{
struct drm_gpu_scheduler **scheds;
/* The create msg must be in the first IB submitted */
if (atomic_read(&job->base.entity->fence_seq))
return -EINVAL;
struct dma_fence *fence;
/* if VCN0 is harvested, we can't support AV1 */
if (p->adev->vcn.harvest_config & AMDGPU_VCN_HARVEST_VCN0)
return -EINVAL;
/* wait for all jobs to finish before switching to instance 0 */
fence = amdgpu_ctx_get_fence(p->ctx, job->base.entity, ~0ull);
if (fence) {
dma_fence_wait(fence, false);
dma_fence_put(fence);
}
scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_DEC]
[AMDGPU_RING_PRIO_DEFAULT].sched;
drm_sched_entity_modify_sched(job->base.entity, scheds, 1);

View File

@ -1737,15 +1737,19 @@ static int vcn_v4_0_limit_sched(struct amdgpu_cs_parser *p,
struct amdgpu_job *job)
{
struct drm_gpu_scheduler **scheds;
/* The create msg must be in the first IB submitted */
if (atomic_read(&job->base.entity->fence_seq))
return -EINVAL;
struct dma_fence *fence;
/* if VCN0 is harvested, we can't support AV1 */
if (p->adev->vcn.harvest_config & AMDGPU_VCN_HARVEST_VCN0)
return -EINVAL;
/* wait for all jobs to finish before switching to instance 0 */
fence = amdgpu_ctx_get_fence(p->ctx, job->base.entity, ~0ull);
if (fence) {
dma_fence_wait(fence, false);
dma_fence_put(fence);
}
scheds = p->adev->gpu_sched[AMDGPU_HW_IP_VCN_ENC]
[AMDGPU_RING_PRIO_0].sched;
drm_sched_entity_modify_sched(job->base.entity, scheds, 1);
@ -1836,22 +1840,16 @@ out:
#define RADEON_VCN_ENGINE_TYPE_ENCODE (0x00000002)
#define RADEON_VCN_ENGINE_TYPE_DECODE (0x00000003)
#define RADEON_VCN_ENGINE_INFO (0x30000001)
#define RADEON_VCN_ENGINE_INFO_MAX_OFFSET 16
#define RENCODE_ENCODE_STANDARD_AV1 2
#define RENCODE_IB_PARAM_SESSION_INIT 0x00000003
#define RENCODE_IB_PARAM_SESSION_INIT_MAX_OFFSET 64
/* return the offset in ib if id is found, -1 otherwise
* to speed up the searching we only search upto max_offset
*/
static int vcn_v4_0_enc_find_ib_param(struct amdgpu_ib *ib, uint32_t id, int max_offset)
/* return the offset in ib if id is found, -1 otherwise */
static int vcn_v4_0_enc_find_ib_param(struct amdgpu_ib *ib, uint32_t id, int start)
{
int i;
for (i = 0; i < ib->length_dw && i < max_offset && ib->ptr[i] >= 8; i += ib->ptr[i]/4) {
for (i = start; i < ib->length_dw && ib->ptr[i] >= 8; i += ib->ptr[i] / 4) {
if (ib->ptr[i + 1] == id)
return i;
}
@ -1866,18 +1864,13 @@ static int vcn_v4_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p,
struct amdgpu_vcn_decode_buffer *decode_buffer;
uint64_t addr;
uint32_t val;
int idx;
int idx = 0, sidx;
/* The first instance can decode anything */
if (!ring->me)
return 0;
/* RADEON_VCN_ENGINE_INFO is at the top of ib block */
idx = vcn_v4_0_enc_find_ib_param(ib, RADEON_VCN_ENGINE_INFO,
RADEON_VCN_ENGINE_INFO_MAX_OFFSET);
if (idx < 0) /* engine info is missing */
return 0;
while ((idx = vcn_v4_0_enc_find_ib_param(ib, RADEON_VCN_ENGINE_INFO, idx)) >= 0) {
val = amdgpu_ib_get_value(ib, idx + 2); /* RADEON_VCN_ENGINE_TYPE */
if (val == RADEON_VCN_ENGINE_TYPE_DECODE) {
decode_buffer = (struct amdgpu_vcn_decode_buffer *)&ib->ptr[idx + 6];
@ -1889,11 +1882,12 @@ static int vcn_v4_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p,
decode_buffer->msg_buffer_address_lo;
return vcn_v4_0_dec_msg(p, job, addr);
} else if (val == RADEON_VCN_ENGINE_TYPE_ENCODE) {
idx = vcn_v4_0_enc_find_ib_param(ib, RENCODE_IB_PARAM_SESSION_INIT,
RENCODE_IB_PARAM_SESSION_INIT_MAX_OFFSET);
if (idx >= 0 && ib->ptr[idx + 2] == RENCODE_ENCODE_STANDARD_AV1)
sidx = vcn_v4_0_enc_find_ib_param(ib, RENCODE_IB_PARAM_SESSION_INIT, idx);
if (sidx >= 0 && ib->ptr[sidx + 2] == RENCODE_ENCODE_STANDARD_AV1)
return vcn_v4_0_limit_sched(p, job);
}
idx += ib->ptr[idx] / 4;
}
return 0;
}

View File

@ -239,6 +239,13 @@ static const struct amdgpu_video_codec_info cz_video_codecs_decode_array[] =
.max_pixels_per_frame = 4096 * 4096,
.max_level = 186,
},
{
.codec_type = AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_JPEG,
.max_width = 4096,
.max_height = 4096,
.max_pixels_per_frame = 4096 * 4096,
.max_level = 0,
},
};
static const struct amdgpu_video_codecs cz_video_codecs_decode =

View File

@ -11483,6 +11483,11 @@ static bool amdgpu_dm_crtc_mem_type_changed(struct drm_device *dev,
new_plane_state = drm_atomic_get_plane_state(state, plane);
old_plane_state = drm_atomic_get_plane_state(state, plane);
if (IS_ERR(new_plane_state) || IS_ERR(old_plane_state)) {
DRM_ERROR("Failed to get plane state for plane %s\n", plane->name);
return false;
}
if (old_plane_state->fb && new_plane_state->fb &&
get_mem_type(old_plane_state->fb) != get_mem_type(new_plane_state->fb))
return true;

View File

@ -483,11 +483,10 @@ void dpp1_set_cursor_position(
if (src_y_offset + cursor_height <= 0)
cur_en = 0; /* not visible beyond top edge*/
if (dpp_base->pos.cur0_ctl.bits.cur0_enable != cur_en) {
REG_UPDATE(CURSOR0_CONTROL, CUR0_ENABLE, cur_en);
REG_UPDATE(CURSOR0_CONTROL,
CUR0_ENABLE, cur_en);
dpp_base->pos.cur0_ctl.bits.cur0_enable = cur_en;
}
}
void dpp1_cnv_set_optional_cursor_attributes(

View File

@ -155,11 +155,9 @@ void dpp401_set_cursor_position(
struct dcn401_dpp *dpp = TO_DCN401_DPP(dpp_base);
uint32_t cur_en = pos->enable ? 1 : 0;
if (dpp_base->pos.cur0_ctl.bits.cur0_enable != cur_en) {
REG_UPDATE(CURSOR0_CONTROL, CUR0_ENABLE, cur_en);
dpp_base->pos.cur0_ctl.bits.cur0_enable = cur_en;
}
}
void dpp401_set_optional_cursor_attributes(

View File

@ -1044,13 +1044,11 @@ void hubp2_cursor_set_position(
if (src_y_offset + cursor_height <= 0)
cur_en = 0; /* not visible beyond top edge*/
if (hubp->pos.cur_ctl.bits.cur_enable != cur_en) {
if (cur_en && REG_READ(CURSOR_SURFACE_ADDRESS) == 0)
hubp->funcs->set_cursor_attributes(hubp, &hubp->curs_attr);
REG_UPDATE(CURSOR_CONTROL,
CURSOR_ENABLE, cur_en);
}
REG_SET_2(CURSOR_POSITION, 0,
CURSOR_X_POSITION, pos->x,

View File

@ -718,13 +718,11 @@ void hubp401_cursor_set_position(
dc_fixpt_from_int(dst_x_offset),
param->h_scale_ratio));
if (hubp->pos.cur_ctl.bits.cur_enable != cur_en) {
if (cur_en && REG_READ(CURSOR_SURFACE_ADDRESS) == 0)
hubp->funcs->set_cursor_attributes(hubp, &hubp->curs_attr);
REG_UPDATE(CURSOR_CONTROL,
CURSOR_ENABLE, cur_en);
}
REG_SET_2(CURSOR_POSITION, 0,
CURSOR_X_POSITION, x_pos,

View File

@ -945,7 +945,7 @@ enum dc_status dcn20_enable_stream_timing(
return DC_ERROR_UNEXPECTED;
}
fsleep(stream->timing.v_total * (stream->timing.h_total * 10000u / stream->timing.pix_clk_100hz));
udelay(stream->timing.v_total * (stream->timing.h_total * 10000u / stream->timing.pix_clk_100hz));
params.vertical_total_min = stream->adjust.v_total_min;
params.vertical_total_max = stream->adjust.v_total_max;

View File

@ -1150,7 +1150,7 @@ static void icl_mbus_init(struct drm_i915_private *dev_priv)
if (DISPLAY_VER(dev_priv) == 12)
abox_regs |= BIT(0);
for_each_set_bit(i, &abox_regs, sizeof(abox_regs))
for_each_set_bit(i, &abox_regs, BITS_PER_TYPE(abox_regs))
intel_de_rmw(dev_priv, MBUS_ABOX_CTL(i), mask, val);
}
@ -1603,11 +1603,11 @@ static void tgl_bw_buddy_init(struct drm_i915_private *dev_priv)
if (table[config].page_mask == 0) {
drm_dbg(&dev_priv->drm,
"Unknown memory configuration; disabling address buddy logic.\n");
for_each_set_bit(i, &abox_mask, sizeof(abox_mask))
for_each_set_bit(i, &abox_mask, BITS_PER_TYPE(abox_mask))
intel_de_write(dev_priv, BW_BUDDY_CTL(i),
BW_BUDDY_DISABLE);
} else {
for_each_set_bit(i, &abox_mask, sizeof(abox_mask)) {
for_each_set_bit(i, &abox_mask, BITS_PER_TYPE(abox_mask)) {
intel_de_write(dev_priv, BW_BUDDY_PAGE_MASK(i),
table[config].page_mask);

View File

@ -1469,6 +1469,19 @@ static void __reset_guc_busyness_stats(struct intel_guc *guc)
spin_unlock_irqrestore(&guc->timestamp.lock, flags);
}
static void __update_guc_busyness_running_state(struct intel_guc *guc)
{
struct intel_gt *gt = guc_to_gt(guc);
struct intel_engine_cs *engine;
enum intel_engine_id id;
unsigned long flags;
spin_lock_irqsave(&guc->timestamp.lock, flags);
for_each_engine(engine, gt, id)
engine->stats.guc.running = false;
spin_unlock_irqrestore(&guc->timestamp.lock, flags);
}
static void __update_guc_busyness_stats(struct intel_guc *guc)
{
struct intel_gt *gt = guc_to_gt(guc);
@ -1619,6 +1632,9 @@ void intel_guc_busyness_park(struct intel_gt *gt)
if (!guc_submission_initialized(guc))
return;
/* Assume no engines are running and set running state to false */
__update_guc_busyness_running_state(guc);
/*
* There is a race with suspend flow where the worker runs after suspend
* and causes an unclaimed register access warning. Cancel the worker

View File

@ -381,11 +381,11 @@ static bool mtk_drm_get_all_drm_priv(struct device *dev)
of_id = of_match_node(mtk_drm_of_ids, node);
if (!of_id)
goto next_put_node;
continue;
pdev = of_find_device_by_node(node);
if (!pdev)
goto next_put_node;
continue;
drm_dev = device_find_child(&pdev->dev, NULL, mtk_drm_match);
if (!drm_dev)
@ -411,12 +411,11 @@ next_put_device_drm_dev:
next_put_device_pdev_dev:
put_device(&pdev->dev);
next_put_node:
if (cnt == MAX_CRTC) {
of_node_put(node);
if (cnt == MAX_CRTC)
break;
}
}
if (drm_priv->data->mmsys_dev_num == cnt) {
for (i = 0; i < cnt; i++)

View File

@ -1023,7 +1023,7 @@ static int panthor_ioctl_group_create(struct drm_device *ddev, void *data,
struct drm_panthor_queue_create *queue_args;
int ret;
if (!args->queues.count)
if (!args->queues.count || args->queues.count > MAX_CS_PER_CSG)
return -EINVAL;
ret = PANTHOR_UOBJ_GET_ARRAY(queue_args, &args->queues);

View File

@ -222,7 +222,7 @@ static int evict_test_run_tile(struct xe_device *xe, struct xe_tile *tile, struc
}
xe_bo_lock(external, false);
err = xe_bo_pin_external(external);
err = xe_bo_pin_external(external, false);
xe_bo_unlock(external);
if (err) {
KUNIT_FAIL(test, "external bo pin err=%pe\n",

View File

@ -89,14 +89,6 @@ static void check_residency(struct kunit *test, struct xe_bo *exported,
return;
}
/*
* If on different devices, the exporter is kept in system if
* possible, saving a migration step as the transfer is just
* likely as fast from system memory.
*/
if (params->mem_mask & XE_BO_FLAG_SYSTEM)
KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, XE_PL_TT));
else
KUNIT_EXPECT_TRUE(test, xe_bo_is_mem_type(exported, mem_type));
if (params->force_different_devices)

View File

@ -157,6 +157,8 @@ static void try_add_system(struct xe_device *xe, struct xe_bo *bo,
bo->placements[*c] = (struct ttm_place) {
.mem_type = XE_PL_TT,
.flags = (bo_flags & XE_BO_FLAG_VRAM_MASK) ?
TTM_PL_FLAG_FALLBACK : 0,
};
*c += 1;
}
@ -1743,6 +1745,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
/**
* xe_bo_pin_external - pin an external BO
* @bo: buffer object to be pinned
* @in_place: Pin in current placement, don't attempt to migrate.
*
* Pin an external (not tied to a VM, can be exported via dma-buf / prime FD)
* BO. Unique call compared to xe_bo_pin as this function has it own set of
@ -1750,7 +1753,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
*
* Returns 0 for success, negative error code otherwise.
*/
int xe_bo_pin_external(struct xe_bo *bo)
int xe_bo_pin_external(struct xe_bo *bo, bool in_place)
{
struct xe_device *xe = xe_bo_device(bo);
int err;
@ -1759,9 +1762,11 @@ int xe_bo_pin_external(struct xe_bo *bo)
xe_assert(xe, xe_bo_is_user(bo));
if (!xe_bo_is_pinned(bo)) {
if (!in_place) {
err = xe_bo_validate(bo, NULL, false);
if (err)
return err;
}
if (xe_bo_is_vram(bo)) {
spin_lock(&xe->pinned.lock);
@ -1913,6 +1918,9 @@ int xe_bo_validate(struct xe_bo *bo, struct xe_vm *vm, bool allow_res_evict)
.no_wait_gpu = false,
};
if (xe_bo_is_pinned(bo))
return 0;
if (vm) {
lockdep_assert_held(&vm->lock);
xe_vm_assert_held(vm);

View File

@ -173,7 +173,7 @@ static inline void xe_bo_unlock_vm_held(struct xe_bo *bo)
}
}
int xe_bo_pin_external(struct xe_bo *bo);
int xe_bo_pin_external(struct xe_bo *bo, bool in_place);
int xe_bo_pin(struct xe_bo *bo);
void xe_bo_unpin_external(struct xe_bo *bo);
void xe_bo_unpin(struct xe_bo *bo);

View File

@ -72,7 +72,7 @@ static int xe_dma_buf_pin(struct dma_buf_attachment *attach)
return ret;
}
ret = xe_bo_pin_external(bo);
ret = xe_bo_pin_external(bo, true);
xe_assert(xe, !ret);
return 0;

View File

@ -1057,7 +1057,7 @@ static const struct pci_device_id i801_ids[] = {
{ PCI_DEVICE_DATA(INTEL, METEOR_LAKE_P_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) },
{ PCI_DEVICE_DATA(INTEL, METEOR_LAKE_SOC_S_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) },
{ PCI_DEVICE_DATA(INTEL, METEOR_LAKE_PCH_S_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) },
{ PCI_DEVICE_DATA(INTEL, BIRCH_STREAM_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) },
{ PCI_DEVICE_DATA(INTEL, BIRCH_STREAM_SMBUS, FEATURES_ICH5) },
{ PCI_DEVICE_DATA(INTEL, ARROW_LAKE_H_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) },
{ PCI_DEVICE_DATA(INTEL, PANTHER_LAKE_H_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) },
{ PCI_DEVICE_DATA(INTEL, PANTHER_LAKE_P_SMBUS, FEATURES_ICH5 | FEATURE_TCO_CNL) },

View File

@ -2430,6 +2430,9 @@ static int iqs7222_parse_chan(struct iqs7222_private *iqs7222,
if (error)
return error;
if (!iqs7222->kp_type[chan_index][i])
continue;
if (!dev_desc->event_offset)
continue;

View File

@ -1155,6 +1155,20 @@ static const struct dmi_system_id i8042_dmi_quirk_table[] __initconst = {
.driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS |
SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP)
},
{
.matches = {
DMI_MATCH(DMI_BOARD_NAME, "XxHP4NAx"),
},
.driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS |
SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP)
},
{
.matches = {
DMI_MATCH(DMI_BOARD_NAME, "XxKK4NAx_XxSP4NAx"),
},
.driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS |
SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP)
},
/*
* A lot of modern Clevo barebones have touchpad and/or keyboard issues
* after suspend fixable with the forcenorestore quirk.

View File

@ -1377,14 +1377,24 @@ static int atmel_smc_nand_prepare_smcconf(struct atmel_nand *nand,
if (ret)
return ret;
/*
* Read setup timing depends on the operation done on the NAND:
*
* NRD_SETUP = max(tAR, tCLR)
*/
timeps = max(conf->timings.sdr.tAR_min, conf->timings.sdr.tCLR_min);
ncycles = DIV_ROUND_UP(timeps, mckperiodps);
totalcycles += ncycles;
ret = atmel_smc_cs_conf_set_setup(smcconf, ATMEL_SMC_NRD_SHIFT, ncycles);
if (ret)
return ret;
/*
* The read cycle timing is directly matching tRC, but is also
* dependent on the setup and hold timings we calculated earlier,
* which gives:
*
* NRD_CYCLE = max(tRC, NRD_PULSE + NRD_HOLD)
*
* NRD_SETUP is always 0.
* NRD_CYCLE = max(tRC, NRD_SETUP + NRD_PULSE + NRD_HOLD)
*/
ncycles = DIV_ROUND_UP(conf->timings.sdr.tRC_min, mckperiodps);
ncycles = max(totalcycles, ncycles);

View File

@ -272,6 +272,7 @@ struct stm32_fmc2_nfc {
struct sg_table dma_data_sg;
struct sg_table dma_ecc_sg;
u8 *ecc_buf;
dma_addr_t dma_ecc_addr;
int dma_ecc_len;
u32 tx_dma_max_burst;
u32 rx_dma_max_burst;
@ -902,17 +903,10 @@ static int stm32_fmc2_nfc_xfer(struct nand_chip *chip, const u8 *buf,
if (!write_data && !raw) {
/* Configure DMA ECC status */
p = nfc->ecc_buf;
for_each_sg(nfc->dma_ecc_sg.sgl, sg, eccsteps, s) {
sg_set_buf(sg, p, nfc->dma_ecc_len);
p += nfc->dma_ecc_len;
}
ret = dma_map_sg(nfc->dev, nfc->dma_ecc_sg.sgl,
eccsteps, dma_data_dir);
if (!ret) {
ret = -EIO;
goto err_unmap_data;
sg_dma_address(sg) = nfc->dma_ecc_addr +
s * nfc->dma_ecc_len;
sg_dma_len(sg) = nfc->dma_ecc_len;
}
desc_ecc = dmaengine_prep_slave_sg(nfc->dma_ecc_ch,
@ -921,7 +915,7 @@ static int stm32_fmc2_nfc_xfer(struct nand_chip *chip, const u8 *buf,
DMA_PREP_INTERRUPT);
if (!desc_ecc) {
ret = -ENOMEM;
goto err_unmap_ecc;
goto err_unmap_data;
}
reinit_completion(&nfc->dma_ecc_complete);
@ -929,7 +923,7 @@ static int stm32_fmc2_nfc_xfer(struct nand_chip *chip, const u8 *buf,
desc_ecc->callback_param = &nfc->dma_ecc_complete;
ret = dma_submit_error(dmaengine_submit(desc_ecc));
if (ret)
goto err_unmap_ecc;
goto err_unmap_data;
dma_async_issue_pending(nfc->dma_ecc_ch);
}
@ -949,7 +943,7 @@ static int stm32_fmc2_nfc_xfer(struct nand_chip *chip, const u8 *buf,
if (!write_data && !raw)
dmaengine_terminate_all(nfc->dma_ecc_ch);
ret = -ETIMEDOUT;
goto err_unmap_ecc;
goto err_unmap_data;
}
/* Wait DMA data transfer completion */
@ -969,11 +963,6 @@ static int stm32_fmc2_nfc_xfer(struct nand_chip *chip, const u8 *buf,
}
}
err_unmap_ecc:
if (!write_data && !raw)
dma_unmap_sg(nfc->dev, nfc->dma_ecc_sg.sgl,
eccsteps, dma_data_dir);
err_unmap_data:
dma_unmap_sg(nfc->dev, nfc->dma_data_sg.sgl, eccsteps, dma_data_dir);
@ -996,9 +985,21 @@ static int stm32_fmc2_nfc_seq_write(struct nand_chip *chip, const u8 *buf,
/* Write oob */
if (oob_required) {
ret = nand_change_write_column_op(chip, mtd->writesize,
chip->oob_poi, mtd->oobsize,
false);
unsigned int offset_in_page = mtd->writesize;
const void *buf = chip->oob_poi;
unsigned int len = mtd->oobsize;
if (!raw) {
struct mtd_oob_region oob_free;
mtd_ooblayout_free(mtd, 0, &oob_free);
offset_in_page += oob_free.offset;
buf += oob_free.offset;
len = oob_free.length;
}
ret = nand_change_write_column_op(chip, offset_in_page,
buf, len, false);
if (ret)
return ret;
}
@ -1610,7 +1611,8 @@ static int stm32_fmc2_nfc_dma_setup(struct stm32_fmc2_nfc *nfc)
return ret;
/* Allocate a buffer to store ECC status registers */
nfc->ecc_buf = devm_kzalloc(nfc->dev, FMC2_MAX_ECC_BUF_LEN, GFP_KERNEL);
nfc->ecc_buf = dmam_alloc_coherent(nfc->dev, FMC2_MAX_ECC_BUF_LEN,
&nfc->dma_ecc_addr, GFP_KERNEL);
if (!nfc->ecc_buf)
return -ENOMEM;

View File

@ -122,6 +122,41 @@ static const struct mtd_ooblayout_ops w25n02kv_ooblayout = {
.free = w25n02kv_ooblayout_free,
};
static int w25n01jw_ooblayout_ecc(struct mtd_info *mtd, int section,
struct mtd_oob_region *region)
{
if (section > 3)
return -ERANGE;
region->offset = (16 * section) + 12;
region->length = 4;
return 0;
}
static int w25n01jw_ooblayout_free(struct mtd_info *mtd, int section,
struct mtd_oob_region *region)
{
if (section > 3)
return -ERANGE;
region->offset = (16 * section);
region->length = 12;
/* Extract BBM */
if (!section) {
region->offset += 2;
region->length -= 2;
}
return 0;
}
static const struct mtd_ooblayout_ops w25n01jw_ooblayout = {
.ecc = w25n01jw_ooblayout_ecc,
.free = w25n01jw_ooblayout_free,
};
static int w25n02kv_ecc_get_status(struct spinand_device *spinand,
u8 status)
{
@ -206,7 +241,7 @@ static const struct spinand_info winbond_spinand_table[] = {
&write_cache_variants,
&update_cache_variants),
0,
SPINAND_ECCINFO(&w25m02gv_ooblayout, NULL)),
SPINAND_ECCINFO(&w25n01jw_ooblayout, NULL)),
SPINAND_INFO("W25N02JWZEIF",
SPINAND_ID(SPINAND_READID_METHOD_OPCODE_DUMMY, 0xbf, 0x22),
NAND_MEMORG(1, 2048, 64, 64, 1024, 20, 1, 2, 1),

View File

@ -690,14 +690,6 @@ static void xcan_write_frame(struct net_device *ndev, struct sk_buff *skb,
dlc |= XCAN_DLCR_EDL_MASK;
}
if (!(priv->devtype.flags & XCAN_FLAG_TX_MAILBOXES) &&
(priv->devtype.flags & XCAN_FLAG_TXFEMP))
can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max, 0);
else
can_put_echo_skb(skb, ndev, 0, 0);
priv->tx_head++;
priv->write_reg(priv, XCAN_FRAME_ID_OFFSET(frame_offset), id);
/* If the CAN frame is RTR frame this write triggers transmission
* (not on CAN FD)
@ -730,6 +722,14 @@ static void xcan_write_frame(struct net_device *ndev, struct sk_buff *skb,
data[1]);
}
}
if (!(priv->devtype.flags & XCAN_FLAG_TX_MAILBOXES) &&
(priv->devtype.flags & XCAN_FLAG_TXFEMP))
can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max, 0);
else
can_put_echo_skb(skb, ndev, 0, 0);
priv->tx_head++;
}
/**

View File

@ -2356,6 +2356,7 @@ static void fec_enet_phy_reset_after_clk_enable(struct net_device *ndev)
*/
phy_dev = of_phy_find_device(fep->phy_node);
phy_reset_after_clk_enable(phy_dev);
if (phy_dev)
put_device(&phy_dev->mdio.dev);
}
}

View File

@ -4206,7 +4206,7 @@ free_queue_irqs:
irq_num = pf->msix_entries[base + vector].vector;
irq_set_affinity_notifier(irq_num, NULL);
irq_update_affinity_hint(irq_num, NULL);
free_irq(irq_num, &vsi->q_vectors[vector]);
free_irq(irq_num, vsi->q_vectors[vector]);
}
return err;
}

View File

@ -2081,11 +2081,8 @@ static void igb_diag_test(struct net_device *netdev,
} else {
dev_info(&adapter->pdev->dev, "online testing starting\n");
/* PHY is powered down when interface is down */
if (if_running && igb_link_test(adapter, &data[TEST_LINK]))
if (igb_link_test(adapter, &data[TEST_LINK]))
eth_test->flags |= ETH_TEST_FL_FAILED;
else
data[TEST_LINK] = 0;
/* Online tests aren't run; pass by default */
data[TEST_REG] = 0;

View File

@ -165,14 +165,14 @@ static int hws_matcher_disconnect(struct mlx5hws_matcher *matcher)
next->match_ste.rtc_0_id,
next->match_ste.rtc_1_id);
if (ret) {
mlx5hws_err(tbl->ctx, "Failed to disconnect matcher\n");
goto matcher_reconnect;
mlx5hws_err(tbl->ctx, "Fatal error, failed to disconnect matcher\n");
return ret;
}
} else {
ret = mlx5hws_table_connect_to_miss_table(tbl, tbl->default_miss.miss_tbl);
if (ret) {
mlx5hws_err(tbl->ctx, "Failed to disconnect last matcher\n");
goto matcher_reconnect;
mlx5hws_err(tbl->ctx, "Fatal error, failed to disconnect last matcher\n");
return ret;
}
}
@ -180,27 +180,19 @@ static int hws_matcher_disconnect(struct mlx5hws_matcher *matcher)
if (prev_ft_id == tbl->ft_id) {
ret = mlx5hws_table_update_connected_miss_tables(tbl);
if (ret) {
mlx5hws_err(tbl->ctx, "Fatal error, failed to update connected miss table\n");
goto matcher_reconnect;
mlx5hws_err(tbl->ctx,
"Fatal error, failed to update connected miss table\n");
return ret;
}
}
ret = mlx5hws_table_ft_set_default_next_ft(tbl, prev_ft_id);
if (ret) {
mlx5hws_err(tbl->ctx, "Fatal error, failed to restore matcher ft default miss\n");
goto matcher_reconnect;
return ret;
}
return 0;
matcher_reconnect:
if (list_empty(&tbl->matchers_list) || !prev)
list_add(&matcher->list_node, &tbl->matchers_list);
else
/* insert after prev matcher */
list_add(&matcher->list_node, &prev->list_node);
return ret;
}
static void hws_matcher_set_rtc_attr_sz(struct mlx5hws_matcher *matcher,

View File

@ -97,6 +97,7 @@ int mdiobus_unregister_device(struct mdio_device *mdiodev)
if (mdiodev->bus->mdio_map[mdiodev->addr] != mdiodev)
return -EINVAL;
gpiod_put(mdiodev->reset_gpio);
reset_control_put(mdiodev->reset_ctrl);
mdiodev->bus->mdio_map[mdiodev->addr] = NULL;
@ -814,9 +815,6 @@ void mdiobus_unregister(struct mii_bus *bus)
if (!mdiodev)
continue;
if (mdiodev->reset_gpio)
gpiod_put(mdiodev->reset_gpio);
mdiodev->device_remove(mdiodev);
mdiodev->device_free(mdiodev);
}

View File

@ -989,6 +989,9 @@ static void nvme_submit_cmds(struct nvme_queue *nvmeq, struct rq_list *rqlist)
{
struct request *req;
if (rq_list_empty(rqlist))
return;
spin_lock(&nvmeq->sq_lock);
while ((req = rq_list_pop(rqlist))) {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);

View File

@ -127,13 +127,13 @@ static int eusb2_repeater_init(struct phy *phy)
rptr->cfg->init_tbl[i].value);
/* Override registers from devicetree values */
if (!of_property_read_u8(np, "qcom,tune-usb2-amplitude", &val))
if (!of_property_read_u8(np, "qcom,tune-usb2-preem", &val))
regmap_write(regmap, base + EUSB2_TUNE_USB2_PREEM, val);
if (!of_property_read_u8(np, "qcom,tune-usb2-disc-thres", &val))
regmap_write(regmap, base + EUSB2_TUNE_HSDISC, val);
if (!of_property_read_u8(np, "qcom,tune-usb2-preem", &val))
if (!of_property_read_u8(np, "qcom,tune-usb2-amplitude", &val))
regmap_write(regmap, base + EUSB2_TUNE_IUSB2, val);
/* Wait for status OK */

View File

@ -3164,18 +3164,22 @@ tegra210_xusb_padctl_probe(struct device *dev,
}
pdev = of_find_device_by_node(np);
of_node_put(np);
if (!pdev) {
dev_warn(dev, "PMC device is not available\n");
goto out;
}
if (!platform_get_drvdata(pdev))
if (!platform_get_drvdata(pdev)) {
put_device(&pdev->dev);
return ERR_PTR(-EPROBE_DEFER);
}
padctl->regmap = dev_get_regmap(&pdev->dev, "usb_sleepwalk");
if (!padctl->regmap)
dev_info(dev, "failed to find PMC regmap\n");
put_device(&pdev->dev);
out:
return &padctl->base;
}

View File

@ -363,6 +363,13 @@ static void omap_usb2_init_errata(struct omap_usb *phy)
phy->flags |= OMAP_USB2_DISABLE_CHRG_DET;
}
static void omap_usb2_put_device(void *_dev)
{
struct device *dev = _dev;
put_device(dev);
}
static int omap_usb2_probe(struct platform_device *pdev)
{
struct omap_usb *phy;
@ -373,6 +380,7 @@ static int omap_usb2_probe(struct platform_device *pdev)
struct device_node *control_node;
struct platform_device *control_pdev;
const struct usb_phy_data *phy_data;
int ret;
phy_data = device_get_match_data(&pdev->dev);
if (!phy_data)
@ -423,6 +431,11 @@ static int omap_usb2_probe(struct platform_device *pdev)
return -EINVAL;
}
phy->control_dev = &control_pdev->dev;
ret = devm_add_action_or_reset(&pdev->dev, omap_usb2_put_device,
phy->control_dev);
if (ret)
return ret;
} else {
if (of_property_read_u32_index(node,
"syscon-phy-power", 1,

View File

@ -667,12 +667,20 @@ static int ti_pipe3_get_clk(struct ti_pipe3 *phy)
return 0;
}
static void ti_pipe3_put_device(void *_dev)
{
struct device *dev = _dev;
put_device(dev);
}
static int ti_pipe3_get_sysctrl(struct ti_pipe3 *phy)
{
struct device *dev = phy->dev;
struct device_node *node = dev->of_node;
struct device_node *control_node;
struct platform_device *control_pdev;
int ret;
phy->phy_power_syscon = syscon_regmap_lookup_by_phandle(node,
"syscon-phy-power");
@ -704,6 +712,11 @@ static int ti_pipe3_get_sysctrl(struct ti_pipe3 *phy)
}
phy->control_dev = &control_pdev->dev;
ret = devm_add_action_or_reset(dev, ti_pipe3_put_device,
phy->control_dev);
if (ret)
return ret;
}
if (phy->mode == PIPE3_MODE_PCIE) {

View File

@ -83,9 +83,11 @@ static int sy7636a_regulator_probe(struct platform_device *pdev)
if (!regmap)
return -EPROBE_DEFER;
gdp = devm_gpiod_get(pdev->dev.parent, "epd-pwr-good", GPIOD_IN);
device_set_of_node_from_dev(&pdev->dev, pdev->dev.parent);
gdp = devm_gpiod_get(&pdev->dev, "epd-pwr-good", GPIOD_IN);
if (IS_ERR(gdp)) {
dev_err(pdev->dev.parent, "Power good GPIO fault %ld\n", PTR_ERR(gdp));
dev_err(&pdev->dev, "Power good GPIO fault %ld\n", PTR_ERR(gdp));
return PTR_ERR(gdp);
}
@ -105,7 +107,6 @@ static int sy7636a_regulator_probe(struct platform_device *pdev)
}
config.dev = &pdev->dev;
config.dev->of_node = pdev->dev.parent->of_node;
config.regmap = regmap;
rdev = devm_regulator_register(&pdev->dev, &desc, &config);

View File

@ -543,9 +543,9 @@ static ssize_t hvc_write(struct tty_struct *tty, const u8 *buf, size_t count)
}
/*
* Racy, but harmless, kick thread if there is still pending data.
* Kick thread to flush if there's still pending data
* or to wakeup the write queue.
*/
if (hp->n_outbuf)
hvc_kick();
return written;

View File

@ -1161,17 +1161,6 @@ static int sc16is7xx_startup(struct uart_port *port)
sc16is7xx_port_write(port, SC16IS7XX_FCR_REG,
SC16IS7XX_FCR_FIFO_BIT);
/* Enable EFR */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG,
SC16IS7XX_LCR_CONF_MODE_B);
regcache_cache_bypass(one->regmap, true);
/* Enable write access to enhanced features and internal clock div */
sc16is7xx_port_update(port, SC16IS7XX_EFR_REG,
SC16IS7XX_EFR_ENABLE_BIT,
SC16IS7XX_EFR_ENABLE_BIT);
/* Enable TCR/TLR */
sc16is7xx_port_update(port, SC16IS7XX_MCR_REG,
SC16IS7XX_MCR_TCRTLR_BIT,
@ -1183,7 +1172,8 @@ static int sc16is7xx_startup(struct uart_port *port)
SC16IS7XX_TCR_RX_RESUME(24) |
SC16IS7XX_TCR_RX_HALT(48));
regcache_cache_bypass(one->regmap, false);
/* Disable TCR/TLR access */
sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_TCRTLR_BIT, 0);
/* Now, initialize the UART */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, SC16IS7XX_LCR_WORD_LEN_8);

View File

@ -1601,6 +1601,7 @@ static int f_midi2_create_card(struct f_midi2 *midi2)
strscpy(fb->info.name, ump_fb_name(b),
sizeof(fb->info.name));
}
snd_ump_update_group_attrs(ump);
}
for (i = 0; i < midi2->num_eps; i++) {
@ -1738,9 +1739,12 @@ static int f_midi2_create_usb_configs(struct f_midi2 *midi2,
case USB_SPEED_HIGH:
midi2_midi1_ep_out_desc.wMaxPacketSize = cpu_to_le16(512);
midi2_midi1_ep_in_desc.wMaxPacketSize = cpu_to_le16(512);
for (i = 0; i < midi2->num_eps; i++)
for (i = 0; i < midi2->num_eps; i++) {
midi2_midi2_ep_out_desc[i].wMaxPacketSize =
cpu_to_le16(512);
midi2_midi2_ep_in_desc[i].wMaxPacketSize =
cpu_to_le16(512);
}
fallthrough;
case USB_SPEED_FULL:
midi1_in_eps = midi2_midi1_ep_in_descs;
@ -1749,9 +1753,12 @@ static int f_midi2_create_usb_configs(struct f_midi2 *midi2,
case USB_SPEED_SUPER:
midi2_midi1_ep_out_desc.wMaxPacketSize = cpu_to_le16(1024);
midi2_midi1_ep_in_desc.wMaxPacketSize = cpu_to_le16(1024);
for (i = 0; i < midi2->num_eps; i++)
for (i = 0; i < midi2->num_eps; i++) {
midi2_midi2_ep_out_desc[i].wMaxPacketSize =
cpu_to_le16(1024);
midi2_midi2_ep_in_desc[i].wMaxPacketSize =
cpu_to_le16(1024);
}
midi1_in_eps = midi2_midi1_ep_in_ss_descs;
midi1_out_eps = midi2_midi1_ep_out_ss_descs;
break;

View File

@ -764,8 +764,7 @@ static int dummy_dequeue(struct usb_ep *_ep, struct usb_request *_req)
if (!dum->driver)
return -ESHUTDOWN;
local_irq_save(flags);
spin_lock(&dum->lock);
spin_lock_irqsave(&dum->lock, flags);
list_for_each_entry(iter, &ep->queue, queue) {
if (&iter->req != _req)
continue;
@ -775,15 +774,16 @@ static int dummy_dequeue(struct usb_ep *_ep, struct usb_request *_req)
retval = 0;
break;
}
spin_unlock(&dum->lock);
if (retval == 0) {
dev_dbg(udc_dev(dum),
"dequeued req %p from %s, len %d buf %p\n",
req, _ep->name, _req->length, _req->buf);
spin_unlock(&dum->lock);
usb_gadget_giveback_request(_ep, _req);
spin_lock(&dum->lock);
}
local_irq_restore(flags);
spin_unlock_irqrestore(&dum->lock, flags);
return retval;
}

View File

@ -939,7 +939,7 @@ static void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_i
out:
/* we are now at a leaf device */
xhci_debugfs_remove_slot(xhci, slot_id);
xhci_free_virt_device(xhci, vdev, slot_id);
xhci_free_virt_device(xhci, xhci->devs[slot_id], slot_id);
}
int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id,

View File

@ -1322,7 +1322,18 @@ static const struct usb_device_id option_ids[] = {
.driver_info = NCTRL(0) | RSVD(3) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1033, 0xff), /* Telit LE910C1-EUX (ECM) */
.driver_info = NCTRL(0) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1034, 0xff), /* Telit LE910C4-WWX (rmnet) */
.driver_info = RSVD(2) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1035, 0xff) }, /* Telit LE910C4-WWX (ECM) */
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1036, 0xff) }, /* Telit LE910C4-WWX */
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1037, 0xff), /* Telit LE910C4-WWX (rmnet) */
.driver_info = NCTRL(0) | NCTRL(1) | RSVD(4) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1038, 0xff), /* Telit LE910C4-WWX (rmnet) */
.driver_info = NCTRL(0) | RSVD(3) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x103b, 0xff), /* Telit LE910C4-WWX */
.driver_info = NCTRL(0) | NCTRL(1) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x103c, 0xff), /* Telit LE910C4-WWX */
.driver_info = NCTRL(0) },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE922_USBCFG0),
.driver_info = RSVD(0) | RSVD(1) | NCTRL(2) | RSVD(3) },
{ USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE922_USBCFG1),
@ -1369,6 +1380,12 @@ static const struct usb_device_id option_ids[] = {
.driver_info = NCTRL(0) | RSVD(1) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1075, 0xff), /* Telit FN990A (PCIe) */
.driver_info = RSVD(0) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1077, 0xff), /* Telit FN990A (rmnet + audio) */
.driver_info = NCTRL(0) | RSVD(1) | RSVD(2) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1078, 0xff), /* Telit FN990A (MBIM + audio) */
.driver_info = NCTRL(0) | RSVD(1) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1079, 0xff), /* Telit FN990A (RNDIS + audio) */
.driver_info = NCTRL(2) | RSVD(3) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1080, 0xff), /* Telit FE990A (rmnet) */
.driver_info = NCTRL(0) | RSVD(1) | RSVD(2) },
{ USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1081, 0xff), /* Telit FE990A (MBIM) */

View File

@ -2375,17 +2375,21 @@ static void tcpm_handle_vdm_request(struct tcpm_port *port,
case ADEV_NONE:
break;
case ADEV_NOTIFY_USB_AND_QUEUE_VDM:
if (rx_sop_type == TCPC_TX_SOP_PRIME) {
typec_cable_altmode_vdm(adev, TYPEC_PLUG_SOP_P, p[0], &p[1], cnt);
} else {
WARN_ON(typec_altmode_notify(adev, TYPEC_STATE_USB, NULL));
typec_altmode_vdm(adev, p[0], &p[1], cnt);
}
break;
case ADEV_QUEUE_VDM:
if (response_tx_sop_type == TCPC_TX_SOP_PRIME)
if (rx_sop_type == TCPC_TX_SOP_PRIME)
typec_cable_altmode_vdm(adev, TYPEC_PLUG_SOP_P, p[0], &p[1], cnt);
else
typec_altmode_vdm(adev, p[0], &p[1], cnt);
break;
case ADEV_QUEUE_VDM_SEND_EXIT_MODE_ON_FAIL:
if (response_tx_sop_type == TCPC_TX_SOP_PRIME) {
if (rx_sop_type == TCPC_TX_SOP_PRIME) {
if (typec_cable_altmode_vdm(adev, TYPEC_PLUG_SOP_P,
p[0], &p[1], cnt)) {
int svdm_version = typec_get_cable_svdm_version(

View File

@ -108,6 +108,25 @@ struct btrfs_bio_ctrl {
* This is to avoid touching ranges covered by compression/inline.
*/
unsigned long submit_bitmap;
struct readahead_control *ractl;
/*
* The start offset of the last used extent map by a read operation.
*
* This is for proper compressed read merge.
* U64_MAX means we are starting the read and have made no progress yet.
*
* The current btrfs_bio_is_contig() only uses disk_bytenr as
* the condition to check if the read can be merged with previous
* bio, which is not correct. E.g. two file extents pointing to the
* same extent but with different offset.
*
* So here we need to do extra checks to only merge reads that are
* covered by the same extent map.
* Just extent_map::start will be enough, as they are unique
* inside the same inode.
*/
u64 last_em_start;
};
static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
@ -929,6 +948,23 @@ static struct extent_map *get_extent_map(struct btrfs_inode *inode,
return em;
}
static void btrfs_readahead_expand(struct readahead_control *ractl,
const struct extent_map *em)
{
const u64 ra_pos = readahead_pos(ractl);
const u64 ra_end = ra_pos + readahead_length(ractl);
const u64 em_end = em->start + em->ram_bytes;
/* No expansion for holes and inline extents. */
if (em->disk_bytenr > EXTENT_MAP_LAST_BYTE)
return;
ASSERT(em_end >= ra_pos);
if (em_end > ra_end)
readahead_expand(ractl, ra_pos, em_end - ra_pos);
}
/*
* basic readpage implementation. Locked extent state structs are inserted
* into the tree that are removed when the IO is done (by the end_io
@ -937,7 +973,7 @@ static struct extent_map *get_extent_map(struct btrfs_inode *inode,
* return 0 on success, otherwise return error
*/
static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
struct btrfs_bio_ctrl *bio_ctrl, u64 *prev_em_start)
struct btrfs_bio_ctrl *bio_ctrl)
{
struct inode *inode = folio->mapping->host;
struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
@ -994,6 +1030,17 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
iosize = min(extent_map_end(em) - cur, end - cur + 1);
iosize = ALIGN(iosize, blocksize);
/*
* Only expand readahead for extents which are already creating
* the pages anyway in add_ra_bio_pages, which is compressed
* extents in the non subpage case.
*/
if (bio_ctrl->ractl &&
!btrfs_is_subpage(fs_info, folio->mapping) &&
compress_type != BTRFS_COMPRESS_NONE)
btrfs_readahead_expand(bio_ctrl->ractl, em);
if (compress_type != BTRFS_COMPRESS_NONE)
disk_bytenr = em->disk_bytenr;
else
@ -1037,12 +1084,11 @@ static int btrfs_do_readpage(struct folio *folio, struct extent_map **em_cached,
* non-optimal behavior (submitting 2 bios for the same extent).
*/
if (compress_type != BTRFS_COMPRESS_NONE &&
prev_em_start && *prev_em_start != (u64)-1 &&
*prev_em_start != em->start)
bio_ctrl->last_em_start != U64_MAX &&
bio_ctrl->last_em_start != em->start)
force_bio_submit = true;
if (prev_em_start)
*prev_em_start = em->start;
bio_ctrl->last_em_start = em->start;
free_extent_map(em);
em = NULL;
@ -1086,12 +1132,15 @@ int btrfs_read_folio(struct file *file, struct folio *folio)
const u64 start = folio_pos(folio);
const u64 end = start + folio_size(folio) - 1;
struct extent_state *cached_state = NULL;
struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ };
struct btrfs_bio_ctrl bio_ctrl = {
.opf = REQ_OP_READ,
.last_em_start = U64_MAX,
};
struct extent_map *em_cached = NULL;
int ret;
btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state);
ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl, NULL);
ret = btrfs_do_readpage(folio, &em_cached, &bio_ctrl);
unlock_extent(&inode->io_tree, start, end, &cached_state);
free_extent_map(em_cached);
@ -2360,19 +2409,22 @@ int btrfs_writepages(struct address_space *mapping, struct writeback_control *wb
void btrfs_readahead(struct readahead_control *rac)
{
struct btrfs_bio_ctrl bio_ctrl = { .opf = REQ_OP_READ | REQ_RAHEAD };
struct btrfs_bio_ctrl bio_ctrl = {
.opf = REQ_OP_READ | REQ_RAHEAD,
.ractl = rac,
.last_em_start = U64_MAX,
};
struct folio *folio;
struct btrfs_inode *inode = BTRFS_I(rac->mapping->host);
const u64 start = readahead_pos(rac);
const u64 end = start + readahead_length(rac) - 1;
struct extent_state *cached_state = NULL;
struct extent_map *em_cached = NULL;
u64 prev_em_start = (u64)-1;
btrfs_lock_and_flush_ordered_range(inode, start, end, &cached_state);
while ((folio = readahead_folio(rac)) != NULL)
btrfs_do_readpage(folio, &em_cached, &bio_ctrl, &prev_em_start);
btrfs_do_readpage(folio, &em_cached, &bio_ctrl);
unlock_extent(&inode->io_tree, start, end, &cached_state);

View File

@ -5634,7 +5634,17 @@ static void btrfs_del_inode_from_root(struct btrfs_inode *inode)
bool empty = false;
xa_lock(&root->inodes);
entry = __xa_erase(&root->inodes, btrfs_ino(inode));
/*
* This btrfs_inode is being freed and has already been unhashed at this
* point. It's possible that another btrfs_inode has already been
* allocated for the same inode and inserted itself into the root, so
* don't delete it in that case.
*
* Note that this shouldn't need to allocate memory, so the gfp flags
* don't really matter.
*/
entry = __xa_cmpxchg(&root->inodes, btrfs_ino(inode), inode, NULL,
GFP_ATOMIC);
if (entry == inode)
empty = xa_empty(&root->inodes);
xa_unlock(&root->inodes);

View File

@ -1501,6 +1501,7 @@ static int __qgroup_excl_accounting(struct btrfs_fs_info *fs_info, u64 ref_root,
struct btrfs_qgroup *qgroup;
LIST_HEAD(qgroup_list);
u64 num_bytes = src->excl;
u64 num_bytes_cmpr = src->excl_cmpr;
int ret = 0;
qgroup = find_qgroup_rb(fs_info, ref_root);
@ -1512,11 +1513,12 @@ static int __qgroup_excl_accounting(struct btrfs_fs_info *fs_info, u64 ref_root,
struct btrfs_qgroup_list *glist;
qgroup->rfer += sign * num_bytes;
qgroup->rfer_cmpr += sign * num_bytes;
qgroup->rfer_cmpr += sign * num_bytes_cmpr;
WARN_ON(sign < 0 && qgroup->excl < num_bytes);
WARN_ON(sign < 0 && qgroup->excl_cmpr < num_bytes_cmpr);
qgroup->excl += sign * num_bytes;
qgroup->excl_cmpr += sign * num_bytes;
qgroup->excl_cmpr += sign * num_bytes_cmpr;
if (sign > 0)
qgroup_rsv_add_by_qgroup(fs_info, qgroup, src);

View File

@ -55,8 +55,6 @@ static int mdsc_show(struct seq_file *s, void *p)
struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
struct rb_node *rp;
int pathlen = 0;
u64 pathbase;
char *path;
mutex_lock(&mdsc->mutex);
@ -81,8 +79,8 @@ static int mdsc_show(struct seq_file *s, void *p)
if (req->r_inode) {
seq_printf(s, " #%llx", ceph_ino(req->r_inode));
} else if (req->r_dentry) {
path = ceph_mdsc_build_path(mdsc, req->r_dentry, &pathlen,
&pathbase, 0);
struct ceph_path_info path_info;
path = ceph_mdsc_build_path(mdsc, req->r_dentry, &path_info, 0);
if (IS_ERR(path))
path = NULL;
spin_lock(&req->r_dentry->d_lock);
@ -91,7 +89,7 @@ static int mdsc_show(struct seq_file *s, void *p)
req->r_dentry,
path ? path : "");
spin_unlock(&req->r_dentry->d_lock);
ceph_mdsc_free_path(path, pathlen);
ceph_mdsc_free_path_info(&path_info);
} else if (req->r_path1) {
seq_printf(s, " #%llx/%s", req->r_ino1.ino,
req->r_path1);
@ -100,8 +98,8 @@ static int mdsc_show(struct seq_file *s, void *p)
}
if (req->r_old_dentry) {
path = ceph_mdsc_build_path(mdsc, req->r_old_dentry, &pathlen,
&pathbase, 0);
struct ceph_path_info path_info;
path = ceph_mdsc_build_path(mdsc, req->r_old_dentry, &path_info, 0);
if (IS_ERR(path))
path = NULL;
spin_lock(&req->r_old_dentry->d_lock);
@ -111,7 +109,7 @@ static int mdsc_show(struct seq_file *s, void *p)
req->r_old_dentry,
path ? path : "");
spin_unlock(&req->r_old_dentry->d_lock);
ceph_mdsc_free_path(path, pathlen);
ceph_mdsc_free_path_info(&path_info);
} else if (req->r_path2 && req->r_op != CEPH_MDS_OP_SYMLINK) {
if (req->r_ino2.ino)
seq_printf(s, " #%llx/%s", req->r_ino2.ino,

View File

@ -1263,10 +1263,8 @@ static void ceph_async_unlink_cb(struct ceph_mds_client *mdsc,
/* If op failed, mark everyone involved for errors */
if (result) {
int pathlen = 0;
u64 base = 0;
char *path = ceph_mdsc_build_path(mdsc, dentry, &pathlen,
&base, 0);
struct ceph_path_info path_info = {0};
char *path = ceph_mdsc_build_path(mdsc, dentry, &path_info, 0);
/* mark error on parent + clear complete */
mapping_set_error(req->r_parent->i_mapping, result);
@ -1280,8 +1278,8 @@ static void ceph_async_unlink_cb(struct ceph_mds_client *mdsc,
mapping_set_error(req->r_old_inode->i_mapping, result);
pr_warn_client(cl, "failure path=(%llx)%s result=%d!\n",
base, IS_ERR(path) ? "<<bad>>" : path, result);
ceph_mdsc_free_path(path, pathlen);
path_info.vino.ino, IS_ERR(path) ? "<<bad>>" : path, result);
ceph_mdsc_free_path_info(&path_info);
}
out:
iput(req->r_old_inode);
@ -1339,8 +1337,6 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
int err = -EROFS;
int op;
char *path;
int pathlen;
u64 pathbase;
if (ceph_snap(dir) == CEPH_SNAPDIR) {
/* rmdir .snap/foo is RMSNAP */
@ -1359,14 +1355,15 @@ static int ceph_unlink(struct inode *dir, struct dentry *dentry)
if (!dn) {
try_async = false;
} else {
path = ceph_mdsc_build_path(mdsc, dn, &pathlen, &pathbase, 0);
struct ceph_path_info path_info;
path = ceph_mdsc_build_path(mdsc, dn, &path_info, 0);
if (IS_ERR(path)) {
try_async = false;
err = 0;
} else {
err = ceph_mds_check_access(mdsc, path, MAY_WRITE);
}
ceph_mdsc_free_path(path, pathlen);
ceph_mdsc_free_path_info(&path_info);
dput(dn);
/* For none EACCES cases will let the MDS do the mds auth check */

View File

@ -368,8 +368,6 @@ int ceph_open(struct inode *inode, struct file *file)
int flags, fmode, wanted;
struct dentry *dentry;
char *path;
int pathlen;
u64 pathbase;
bool do_sync = false;
int mask = MAY_READ;
@ -399,14 +397,15 @@ int ceph_open(struct inode *inode, struct file *file)
if (!dentry) {
do_sync = true;
} else {
path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, 0);
struct ceph_path_info path_info;
path = ceph_mdsc_build_path(mdsc, dentry, &path_info, 0);
if (IS_ERR(path)) {
do_sync = true;
err = 0;
} else {
err = ceph_mds_check_access(mdsc, path, mask);
}
ceph_mdsc_free_path(path, pathlen);
ceph_mdsc_free_path_info(&path_info);
dput(dentry);
/* For none EACCES cases will let the MDS do the mds auth check */
@ -614,15 +613,13 @@ static void ceph_async_create_cb(struct ceph_mds_client *mdsc,
mapping_set_error(req->r_parent->i_mapping, result);
if (result) {
int pathlen = 0;
u64 base = 0;
char *path = ceph_mdsc_build_path(mdsc, req->r_dentry, &pathlen,
&base, 0);
struct ceph_path_info path_info = {0};
char *path = ceph_mdsc_build_path(mdsc, req->r_dentry, &path_info, 0);
pr_warn_client(cl,
"async create failure path=(%llx)%s result=%d!\n",
base, IS_ERR(path) ? "<<bad>>" : path, result);
ceph_mdsc_free_path(path, pathlen);
path_info.vino.ino, IS_ERR(path) ? "<<bad>>" : path, result);
ceph_mdsc_free_path_info(&path_info);
ceph_dir_clear_complete(req->r_parent);
if (!d_unhashed(dentry))
@ -791,8 +788,6 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
int mask;
int err;
char *path;
int pathlen;
u64 pathbase;
doutc(cl, "%p %llx.%llx dentry %p '%pd' %s flags %d mode 0%o\n",
dir, ceph_vinop(dir), dentry, dentry,
@ -814,7 +809,8 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
if (!dn) {
try_async = false;
} else {
path = ceph_mdsc_build_path(mdsc, dn, &pathlen, &pathbase, 0);
struct ceph_path_info path_info;
path = ceph_mdsc_build_path(mdsc, dn, &path_info, 0);
if (IS_ERR(path)) {
try_async = false;
err = 0;
@ -826,7 +822,7 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
mask |= MAY_WRITE;
err = ceph_mds_check_access(mdsc, path, mask);
}
ceph_mdsc_free_path(path, pathlen);
ceph_mdsc_free_path_info(&path_info);
dput(dn);
/* For none EACCES cases will let the MDS do the mds auth check */

View File

@ -55,6 +55,52 @@ static int ceph_set_ino_cb(struct inode *inode, void *data)
return 0;
}
/*
* Check if the parent inode matches the vino from directory reply info
*/
static inline bool ceph_vino_matches_parent(struct inode *parent,
struct ceph_vino vino)
{
return ceph_ino(parent) == vino.ino && ceph_snap(parent) == vino.snap;
}
/*
* Validate that the directory inode referenced by @req->r_parent matches the
* inode number and snapshot id contained in the reply's directory record. If
* they do not match which can theoretically happen if the parent dentry was
* moved between the time the request was issued and the reply arrived fall
* back to looking up the correct inode in the inode cache.
*
* A reference is *always* returned. Callers that receive a different inode
* than the original @parent are responsible for dropping the extra reference
* once the reply has been processed.
*/
static struct inode *ceph_get_reply_dir(struct super_block *sb,
struct inode *parent,
struct ceph_mds_reply_info_parsed *rinfo)
{
struct ceph_vino vino;
if (unlikely(!rinfo->diri.in))
return parent; /* nothing to compare against */
/* If we didn't have a cached parent inode to begin with, just bail out. */
if (!parent)
return NULL;
vino.ino = le64_to_cpu(rinfo->diri.in->ino);
vino.snap = le64_to_cpu(rinfo->diri.in->snapid);
if (likely(ceph_vino_matches_parent(parent, vino)))
return parent; /* matches use the original reference */
/* Mismatch this should be rare. Emit a WARN and obtain the correct inode. */
WARN_ONCE(1, "ceph: reply dir mismatch (parent valid %llx.%llx reply %llx.%llx)\n",
ceph_ino(parent), ceph_snap(parent), vino.ino, vino.snap);
return ceph_get_inode(sb, vino, NULL);
}
/**
* ceph_new_inode - allocate a new inode in advance of an expected create
* @dir: parent directory for new inode
@ -1523,6 +1569,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
struct ceph_vino tvino, dvino;
struct ceph_fs_client *fsc = ceph_sb_to_fs_client(sb);
struct ceph_client *cl = fsc->client;
struct inode *parent_dir = NULL;
int err = 0;
doutc(cl, "%p is_dentry %d is_target %d\n", req,
@ -1536,10 +1583,17 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
}
if (rinfo->head->is_dentry) {
struct inode *dir = req->r_parent;
if (dir) {
err = ceph_fill_inode(dir, NULL, &rinfo->diri,
/*
* r_parent may be stale, in cases when R_PARENT_LOCKED is not set,
* so we need to get the correct inode
*/
parent_dir = ceph_get_reply_dir(sb, req->r_parent, rinfo);
if (unlikely(IS_ERR(parent_dir))) {
err = PTR_ERR(parent_dir);
goto done;
}
if (parent_dir) {
err = ceph_fill_inode(parent_dir, NULL, &rinfo->diri,
rinfo->dirfrag, session, -1,
&req->r_caps_reservation);
if (err < 0)
@ -1548,14 +1602,14 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
WARN_ON_ONCE(1);
}
if (dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME &&
if (parent_dir && req->r_op == CEPH_MDS_OP_LOOKUPNAME &&
test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags) &&
!test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) {
bool is_nokey = false;
struct qstr dname;
struct dentry *dn, *parent;
struct fscrypt_str oname = FSTR_INIT(NULL, 0);
struct ceph_fname fname = { .dir = dir,
struct ceph_fname fname = { .dir = parent_dir,
.name = rinfo->dname,
.ctext = rinfo->altname,
.name_len = rinfo->dname_len,
@ -1564,10 +1618,10 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
BUG_ON(!rinfo->head->is_target);
BUG_ON(req->r_dentry);
parent = d_find_any_alias(dir);
parent = d_find_any_alias(parent_dir);
BUG_ON(!parent);
err = ceph_fname_alloc_buffer(dir, &oname);
err = ceph_fname_alloc_buffer(parent_dir, &oname);
if (err < 0) {
dput(parent);
goto done;
@ -1576,7 +1630,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req)
err = ceph_fname_to_usr(&fname, NULL, &oname, &is_nokey);
if (err < 0) {
dput(parent);
ceph_fname_free_buffer(dir, &oname);
ceph_fname_free_buffer(parent_dir, &oname);
goto done;
}
dname.name = oname.name;
@ -1595,7 +1649,7 @@ retry_lookup:
dname.len, dname.name, dn);
if (!dn) {
dput(parent);
ceph_fname_free_buffer(dir, &oname);
ceph_fname_free_buffer(parent_dir, &oname);
err = -ENOMEM;
goto done;
}
@ -1610,12 +1664,12 @@ retry_lookup:
ceph_snap(d_inode(dn)) != tvino.snap)) {
doutc(cl, " dn %p points to wrong inode %p\n",
dn, d_inode(dn));
ceph_dir_clear_ordered(dir);
ceph_dir_clear_ordered(parent_dir);
d_delete(dn);
dput(dn);
goto retry_lookup;
}
ceph_fname_free_buffer(dir, &oname);
ceph_fname_free_buffer(parent_dir, &oname);
req->r_dentry = dn;
dput(parent);
@ -1794,6 +1848,9 @@ retry_lookup:
&dvino, ptvino);
}
done:
/* Drop extra ref from ceph_get_reply_dir() if it returned a new inode */
if (unlikely(!IS_ERR_OR_NULL(parent_dir) && parent_dir != req->r_parent))
iput(parent_dir);
doutc(cl, "done err=%d\n", err);
return err;
}
@ -2483,22 +2540,21 @@ int __ceph_setattr(struct mnt_idmap *idmap, struct inode *inode,
int truncate_retry = 20; /* The RMW will take around 50ms */
struct dentry *dentry;
char *path;
int pathlen;
u64 pathbase;
bool do_sync = false;
dentry = d_find_alias(inode);
if (!dentry) {
do_sync = true;
} else {
path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase, 0);
struct ceph_path_info path_info;
path = ceph_mdsc_build_path(mdsc, dentry, &path_info, 0);
if (IS_ERR(path)) {
do_sync = true;
err = 0;
} else {
err = ceph_mds_check_access(mdsc, path, MAY_WRITE);
}
ceph_mdsc_free_path(path, pathlen);
ceph_mdsc_free_path_info(&path_info);
dput(dentry);
/* For none EACCES cases will let the MDS do the mds auth check */

View File

@ -2686,8 +2686,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
* ceph_mdsc_build_path - build a path string to a given dentry
* @mdsc: mds client
* @dentry: dentry to which path should be built
* @plen: returned length of string
* @pbase: returned base inode number
* @path_info: output path, length, base ino+snap, and freepath ownership flag
* @for_wire: is this path going to be sent to the MDS?
*
* Build a string that represents the path to the dentry. This is mostly called
@ -2705,7 +2704,7 @@ static u8 *get_fscrypt_altname(const struct ceph_mds_request *req, u32 *plen)
* foo/.snap/bar -> foo//bar
*/
char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc, struct dentry *dentry,
int *plen, u64 *pbase, int for_wire)
struct ceph_path_info *path_info, int for_wire)
{
struct ceph_client *cl = mdsc->fsc->client;
struct dentry *cur;
@ -2815,16 +2814,28 @@ retry:
return ERR_PTR(-ENAMETOOLONG);
}
*pbase = base;
*plen = PATH_MAX - 1 - pos;
/* Initialize the output structure */
memset(path_info, 0, sizeof(*path_info));
path_info->vino.ino = base;
path_info->pathlen = PATH_MAX - 1 - pos;
path_info->path = path + pos;
path_info->freepath = true;
/* Set snap from dentry if available */
if (d_inode(dentry))
path_info->vino.snap = ceph_snap(d_inode(dentry));
else
path_info->vino.snap = CEPH_NOSNAP;
doutc(cl, "on %p %d built %llx '%.*s'\n", dentry, d_count(dentry),
base, *plen, path + pos);
base, PATH_MAX - 1 - pos, path + pos);
return path + pos;
}
static int build_dentry_path(struct ceph_mds_client *mdsc, struct dentry *dentry,
struct inode *dir, const char **ppath, int *ppathlen,
u64 *pino, bool *pfreepath, bool parent_locked)
struct inode *dir, struct ceph_path_info *path_info,
bool parent_locked)
{
char *path;
@ -2833,41 +2844,47 @@ static int build_dentry_path(struct ceph_mds_client *mdsc, struct dentry *dentry
dir = d_inode_rcu(dentry->d_parent);
if (dir && parent_locked && ceph_snap(dir) == CEPH_NOSNAP &&
!IS_ENCRYPTED(dir)) {
*pino = ceph_ino(dir);
path_info->vino.ino = ceph_ino(dir);
path_info->vino.snap = ceph_snap(dir);
rcu_read_unlock();
*ppath = dentry->d_name.name;
*ppathlen = dentry->d_name.len;
path_info->path = dentry->d_name.name;
path_info->pathlen = dentry->d_name.len;
path_info->freepath = false;
return 0;
}
rcu_read_unlock();
path = ceph_mdsc_build_path(mdsc, dentry, ppathlen, pino, 1);
path = ceph_mdsc_build_path(mdsc, dentry, path_info, 1);
if (IS_ERR(path))
return PTR_ERR(path);
*ppath = path;
*pfreepath = true;
/*
* ceph_mdsc_build_path already fills path_info, including snap handling.
*/
return 0;
}
static int build_inode_path(struct inode *inode,
const char **ppath, int *ppathlen, u64 *pino,
bool *pfreepath)
static int build_inode_path(struct inode *inode, struct ceph_path_info *path_info)
{
struct ceph_mds_client *mdsc = ceph_sb_to_mdsc(inode->i_sb);
struct dentry *dentry;
char *path;
if (ceph_snap(inode) == CEPH_NOSNAP) {
*pino = ceph_ino(inode);
*ppathlen = 0;
path_info->vino.ino = ceph_ino(inode);
path_info->vino.snap = ceph_snap(inode);
path_info->pathlen = 0;
path_info->freepath = false;
return 0;
}
dentry = d_find_alias(inode);
path = ceph_mdsc_build_path(mdsc, dentry, ppathlen, pino, 1);
path = ceph_mdsc_build_path(mdsc, dentry, path_info, 1);
dput(dentry);
if (IS_ERR(path))
return PTR_ERR(path);
*ppath = path;
*pfreepath = true;
/*
* ceph_mdsc_build_path already fills path_info, including snap from dentry.
* Override with inode's snap since that's what this function is for.
*/
path_info->vino.snap = ceph_snap(inode);
return 0;
}
@ -2877,26 +2894,32 @@ static int build_inode_path(struct inode *inode,
*/
static int set_request_path_attr(struct ceph_mds_client *mdsc, struct inode *rinode,
struct dentry *rdentry, struct inode *rdiri,
const char *rpath, u64 rino, const char **ppath,
int *pathlen, u64 *ino, bool *freepath,
const char *rpath, u64 rino,
struct ceph_path_info *path_info,
bool parent_locked)
{
struct ceph_client *cl = mdsc->fsc->client;
int r = 0;
/* Initialize the output structure */
memset(path_info, 0, sizeof(*path_info));
if (rinode) {
r = build_inode_path(rinode, ppath, pathlen, ino, freepath);
r = build_inode_path(rinode, path_info);
doutc(cl, " inode %p %llx.%llx\n", rinode, ceph_ino(rinode),
ceph_snap(rinode));
} else if (rdentry) {
r = build_dentry_path(mdsc, rdentry, rdiri, ppath, pathlen, ino,
freepath, parent_locked);
doutc(cl, " dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen, *ppath);
r = build_dentry_path(mdsc, rdentry, rdiri, path_info, parent_locked);
doutc(cl, " dentry %p %llx/%.*s\n", rdentry, path_info->vino.ino,
path_info->pathlen, path_info->path);
} else if (rpath || rino) {
*ino = rino;
*ppath = rpath;
*pathlen = rpath ? strlen(rpath) : 0;
doutc(cl, " path %.*s\n", *pathlen, rpath);
path_info->vino.ino = rino;
path_info->vino.snap = CEPH_NOSNAP;
path_info->path = rpath;
path_info->pathlen = rpath ? strlen(rpath) : 0;
path_info->freepath = false;
doutc(cl, " path %.*s\n", path_info->pathlen, rpath);
}
return r;
@ -2973,11 +2996,8 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
struct ceph_client *cl = mdsc->fsc->client;
struct ceph_msg *msg;
struct ceph_mds_request_head_legacy *lhead;
const char *path1 = NULL;
const char *path2 = NULL;
u64 ino1 = 0, ino2 = 0;
int pathlen1 = 0, pathlen2 = 0;
bool freepath1 = false, freepath2 = false;
struct ceph_path_info path_info1 = {0};
struct ceph_path_info path_info2 = {0};
struct dentry *old_dentry = NULL;
int len;
u16 releases;
@ -2987,17 +3007,41 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
u16 request_head_version = mds_supported_head_version(session);
kuid_t caller_fsuid = req->r_cred->fsuid;
kgid_t caller_fsgid = req->r_cred->fsgid;
bool parent_locked = test_bit(CEPH_MDS_R_PARENT_LOCKED, &req->r_req_flags);
ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry,
req->r_parent, req->r_path1, req->r_ino1.ino,
&path1, &pathlen1, &ino1, &freepath1,
test_bit(CEPH_MDS_R_PARENT_LOCKED,
&req->r_req_flags));
&path_info1, parent_locked);
if (ret < 0) {
msg = ERR_PTR(ret);
goto out;
}
/*
* When the parent directory's i_rwsem is *not* locked, req->r_parent may
* have become stale (e.g. after a concurrent rename) between the time the
* dentry was looked up and now. If we detect that the stored r_parent
* does not match the inode number we just encoded for the request, switch
* to the correct inode so that the MDS receives a valid parent reference.
*/
if (!parent_locked && req->r_parent && path_info1.vino.ino &&
ceph_ino(req->r_parent) != path_info1.vino.ino) {
struct inode *old_parent = req->r_parent;
struct inode *correct_dir = ceph_get_inode(mdsc->fsc->sb, path_info1.vino, NULL);
if (!IS_ERR(correct_dir)) {
WARN_ONCE(1, "ceph: r_parent mismatch (had %llx wanted %llx) - updating\n",
ceph_ino(old_parent), path_info1.vino.ino);
/*
* Transfer CEPH_CAP_PIN from the old parent to the new one.
* The pin was taken earlier in ceph_mdsc_submit_request().
*/
ceph_put_cap_refs(ceph_inode(old_parent), CEPH_CAP_PIN);
iput(old_parent);
req->r_parent = correct_dir;
ceph_get_cap_refs(ceph_inode(req->r_parent), CEPH_CAP_PIN);
}
}
/* If r_old_dentry is set, then assume that its parent is locked */
if (req->r_old_dentry &&
!(req->r_old_dentry->d_flags & DCACHE_DISCONNECTED))
@ -3005,7 +3049,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
ret = set_request_path_attr(mdsc, NULL, old_dentry,
req->r_old_dentry_dir,
req->r_path2, req->r_ino2.ino,
&path2, &pathlen2, &ino2, &freepath2, true);
&path_info2, true);
if (ret < 0) {
msg = ERR_PTR(ret);
goto out_free1;
@ -3036,7 +3080,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
/* filepaths */
len += 2 * (1 + sizeof(u32) + sizeof(u64));
len += pathlen1 + pathlen2;
len += path_info1.pathlen + path_info2.pathlen;
/* cap releases */
len += sizeof(struct ceph_mds_request_release) *
@ -3044,9 +3088,9 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
!!req->r_old_inode_drop + !!req->r_old_dentry_drop);
if (req->r_dentry_drop)
len += pathlen1;
len += path_info1.pathlen;
if (req->r_old_dentry_drop)
len += pathlen2;
len += path_info2.pathlen;
/* MClientRequest tail */
@ -3159,8 +3203,8 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
lhead->ino = cpu_to_le64(req->r_deleg_ino);
lhead->args = req->r_args;
ceph_encode_filepath(&p, end, ino1, path1);
ceph_encode_filepath(&p, end, ino2, path2);
ceph_encode_filepath(&p, end, path_info1.vino.ino, path_info1.path);
ceph_encode_filepath(&p, end, path_info2.vino.ino, path_info2.path);
/* make note of release offset, in case we need to replay */
req->r_request_release_offset = p - msg->front.iov_base;
@ -3223,11 +3267,9 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session,
msg->hdr.data_off = cpu_to_le16(0);
out_free2:
if (freepath2)
ceph_mdsc_free_path((char *)path2, pathlen2);
ceph_mdsc_free_path_info(&path_info2);
out_free1:
if (freepath1)
ceph_mdsc_free_path((char *)path1, pathlen1);
ceph_mdsc_free_path_info(&path_info1);
out:
return msg;
out_err:
@ -4584,24 +4626,20 @@ static int reconnect_caps_cb(struct inode *inode, int mds, void *arg)
struct ceph_pagelist *pagelist = recon_state->pagelist;
struct dentry *dentry;
struct ceph_cap *cap;
char *path;
int pathlen = 0, err;
u64 pathbase;
struct ceph_path_info path_info = {0};
int err;
u64 snap_follows;
dentry = d_find_primary(inode);
if (dentry) {
/* set pathbase to parent dir when msg_version >= 2 */
path = ceph_mdsc_build_path(mdsc, dentry, &pathlen, &pathbase,
char *path = ceph_mdsc_build_path(mdsc, dentry, &path_info,
recon_state->msg_version >= 2);
dput(dentry);
if (IS_ERR(path)) {
err = PTR_ERR(path);
goto out_err;
}
} else {
path = NULL;
pathbase = 0;
}
spin_lock(&ci->i_ceph_lock);
@ -4634,7 +4672,7 @@ static int reconnect_caps_cb(struct inode *inode, int mds, void *arg)
rec.v2.wanted = cpu_to_le32(__ceph_caps_wanted(ci));
rec.v2.issued = cpu_to_le32(cap->issued);
rec.v2.snaprealm = cpu_to_le64(ci->i_snap_realm->ino);
rec.v2.pathbase = cpu_to_le64(pathbase);
rec.v2.pathbase = cpu_to_le64(path_info.vino.ino);
rec.v2.flock_len = (__force __le32)
((ci->i_ceph_flags & CEPH_I_ERROR_FILELOCK) ? 0 : 1);
} else {
@ -4649,7 +4687,7 @@ static int reconnect_caps_cb(struct inode *inode, int mds, void *arg)
ts = inode_get_atime(inode);
ceph_encode_timespec64(&rec.v1.atime, &ts);
rec.v1.snaprealm = cpu_to_le64(ci->i_snap_realm->ino);
rec.v1.pathbase = cpu_to_le64(pathbase);
rec.v1.pathbase = cpu_to_le64(path_info.vino.ino);
}
if (list_empty(&ci->i_cap_snaps)) {
@ -4711,7 +4749,7 @@ encode_again:
sizeof(struct ceph_filelock);
rec.v2.flock_len = cpu_to_le32(struct_len);
struct_len += sizeof(u32) + pathlen + sizeof(rec.v2);
struct_len += sizeof(u32) + path_info.pathlen + sizeof(rec.v2);
if (struct_v >= 2)
struct_len += sizeof(u64); /* snap_follows */
@ -4735,7 +4773,7 @@ encode_again:
ceph_pagelist_encode_8(pagelist, 1);
ceph_pagelist_encode_32(pagelist, struct_len);
}
ceph_pagelist_encode_string(pagelist, path, pathlen);
ceph_pagelist_encode_string(pagelist, (char *)path_info.path, path_info.pathlen);
ceph_pagelist_append(pagelist, &rec, sizeof(rec.v2));
ceph_locks_to_pagelist(flocks, pagelist,
num_fcntl_locks, num_flock_locks);
@ -4746,17 +4784,17 @@ out_freeflocks:
} else {
err = ceph_pagelist_reserve(pagelist,
sizeof(u64) + sizeof(u32) +
pathlen + sizeof(rec.v1));
path_info.pathlen + sizeof(rec.v1));
if (err)
goto out_err;
ceph_pagelist_encode_64(pagelist, ceph_ino(inode));
ceph_pagelist_encode_string(pagelist, path, pathlen);
ceph_pagelist_encode_string(pagelist, (char *)path_info.path, path_info.pathlen);
ceph_pagelist_append(pagelist, &rec, sizeof(rec.v1));
}
out_err:
ceph_mdsc_free_path(path, pathlen);
ceph_mdsc_free_path_info(&path_info);
if (!err)
recon_state->nr_caps++;
return err;

View File

@ -612,14 +612,24 @@ extern int ceph_mds_check_access(struct ceph_mds_client *mdsc, char *tpath,
extern void ceph_mdsc_pre_umount(struct ceph_mds_client *mdsc);
static inline void ceph_mdsc_free_path(char *path, int len)
/*
* Structure to group path-related output parameters for build_*_path functions
*/
struct ceph_path_info {
const char *path;
int pathlen;
struct ceph_vino vino;
bool freepath;
};
static inline void ceph_mdsc_free_path_info(const struct ceph_path_info *path_info)
{
if (!IS_ERR_OR_NULL(path))
__putname(path - (PATH_MAX - 1 - len));
if (path_info && path_info->freepath && !IS_ERR_OR_NULL(path_info->path))
__putname((char *)path_info->path - (PATH_MAX - 1 - path_info->pathlen));
}
extern char *ceph_mdsc_build_path(struct ceph_mds_client *mdsc,
struct dentry *dentry, int *plen, u64 *base,
struct dentry *dentry, struct ceph_path_info *path_info,
int for_wire);
extern void __ceph_mdsc_drop_dentry_lease(struct dentry *dentry);

View File

@ -1462,7 +1462,8 @@ static bool ext4_match(struct inode *parent,
* sure cf_name was properly initialized before
* considering the calculated hash.
*/
if (IS_ENCRYPTED(parent) && fname->cf_name.name &&
if (sb_no_casefold_compat_fallback(parent->i_sb) &&
IS_ENCRYPTED(parent) && fname->cf_name.name &&
(fname->hinfo.hash != EXT4_DIRENT_HASH(de) ||
fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de)))
return false;
@ -1595,10 +1596,15 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir,
* return. Otherwise, fall back to doing a search the
* old fashioned way.
*/
if (!IS_ERR(ret) || PTR_ERR(ret) != ERR_BAD_DX_DIR)
goto cleanup_and_exit;
if (IS_ERR(ret) && PTR_ERR(ret) == ERR_BAD_DX_DIR)
dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, "
"falling back\n"));
else if (!sb_no_casefold_compat_fallback(dir->i_sb) &&
*res_dir == NULL && IS_CASEFOLDED(dir))
dxtrace(printk(KERN_DEBUG "ext4_find_entry: casefold "
"failed, falling back\n"));
else
goto cleanup_and_exit;
ret = NULL;
}
nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb);

View File

@ -176,6 +176,14 @@ static int vfs_dentry_acceptable(void *context, struct dentry *dentry)
if (!ctx->flags)
return 1;
/*
* Verify that the decoded dentry itself has a valid id mapping.
* In case the decoded dentry is the mountfd root itself, this
* verifies that the mountfd inode itself has a valid id mapping.
*/
if (!privileged_wrt_inode_uidgid(user_ns, idmap, d_inode(dentry)))
return 0;
/*
* It's racy as we're not taking rename_lock but we're able to ignore
* permissions and we just need an approximation whether we were able

View File

@ -3229,7 +3229,7 @@ static ssize_t __fuse_copy_file_range(struct file *file_in, loff_t pos_in,
.nodeid_out = ff_out->nodeid,
.fh_out = ff_out->fh,
.off_out = pos_out,
.len = len,
.len = min_t(size_t, len, UINT_MAX & PAGE_MASK),
.flags = flags
};
struct fuse_write_out outarg;
@ -3295,6 +3295,9 @@ static ssize_t __fuse_copy_file_range(struct file *file_in, loff_t pos_in,
fc->no_copy_file_range = 1;
err = -EOPNOTSUPP;
}
if (!err && outarg.size > len)
err = -EIO;
if (err)
goto out;

View File

@ -233,6 +233,11 @@ int fuse_backing_open(struct fuse_conn *fc, struct fuse_backing_map *map)
if (!file)
goto out;
/* read/write/splice/mmap passthrough only relevant for regular files */
res = d_is_dir(file->f_path.dentry) ? -EISDIR : -EINVAL;
if (!d_is_reg(file->f_path.dentry))
goto out_fput;
backing_sb = file_inode(file)->i_sb;
res = -ELOOP;
if (backing_sb->s_stack_depth >= fc->max_stack_depth)

View File

@ -70,6 +70,24 @@ static struct kernfs_open_node *of_on(struct kernfs_open_file *of)
!list_empty(&of->list));
}
/* Get active reference to kernfs node for an open file */
static struct kernfs_open_file *kernfs_get_active_of(struct kernfs_open_file *of)
{
/* Skip if file was already released */
if (unlikely(of->released))
return NULL;
if (!kernfs_get_active(of->kn))
return NULL;
return of;
}
static void kernfs_put_active_of(struct kernfs_open_file *of)
{
return kernfs_put_active(of->kn);
}
/**
* kernfs_deref_open_node_locked - Get kernfs_open_node corresponding to @kn
*
@ -139,7 +157,7 @@ static void kernfs_seq_stop_active(struct seq_file *sf, void *v)
if (ops->seq_stop)
ops->seq_stop(sf, v);
kernfs_put_active(of->kn);
kernfs_put_active_of(of);
}
static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos)
@ -152,7 +170,7 @@ static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos)
* the ops aren't called concurrently for the same open file.
*/
mutex_lock(&of->mutex);
if (!kernfs_get_active(of->kn))
if (!kernfs_get_active_of(of))
return ERR_PTR(-ENODEV);
ops = kernfs_ops(of->kn);
@ -238,7 +256,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
* the ops aren't called concurrently for the same open file.
*/
mutex_lock(&of->mutex);
if (!kernfs_get_active(of->kn)) {
if (!kernfs_get_active_of(of)) {
len = -ENODEV;
mutex_unlock(&of->mutex);
goto out_free;
@ -252,7 +270,7 @@ static ssize_t kernfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
else
len = -EINVAL;
kernfs_put_active(of->kn);
kernfs_put_active_of(of);
mutex_unlock(&of->mutex);
if (len < 0)
@ -323,7 +341,7 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
* the ops aren't called concurrently for the same open file.
*/
mutex_lock(&of->mutex);
if (!kernfs_get_active(of->kn)) {
if (!kernfs_get_active_of(of)) {
mutex_unlock(&of->mutex);
len = -ENODEV;
goto out_free;
@ -335,7 +353,7 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
else
len = -EINVAL;
kernfs_put_active(of->kn);
kernfs_put_active_of(of);
mutex_unlock(&of->mutex);
if (len > 0)
@ -357,13 +375,13 @@ static void kernfs_vma_open(struct vm_area_struct *vma)
if (!of->vm_ops)
return;
if (!kernfs_get_active(of->kn))
if (!kernfs_get_active_of(of))
return;
if (of->vm_ops->open)
of->vm_ops->open(vma);
kernfs_put_active(of->kn);
kernfs_put_active_of(of);
}
static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf)
@ -375,14 +393,14 @@ static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf)
if (!of->vm_ops)
return VM_FAULT_SIGBUS;
if (!kernfs_get_active(of->kn))
if (!kernfs_get_active_of(of))
return VM_FAULT_SIGBUS;
ret = VM_FAULT_SIGBUS;
if (of->vm_ops->fault)
ret = of->vm_ops->fault(vmf);
kernfs_put_active(of->kn);
kernfs_put_active_of(of);
return ret;
}
@ -395,7 +413,7 @@ static vm_fault_t kernfs_vma_page_mkwrite(struct vm_fault *vmf)
if (!of->vm_ops)
return VM_FAULT_SIGBUS;
if (!kernfs_get_active(of->kn))
if (!kernfs_get_active_of(of))
return VM_FAULT_SIGBUS;
ret = 0;
@ -404,7 +422,7 @@ static vm_fault_t kernfs_vma_page_mkwrite(struct vm_fault *vmf)
else
file_update_time(file);
kernfs_put_active(of->kn);
kernfs_put_active_of(of);
return ret;
}
@ -418,14 +436,14 @@ static int kernfs_vma_access(struct vm_area_struct *vma, unsigned long addr,
if (!of->vm_ops)
return -EINVAL;
if (!kernfs_get_active(of->kn))
if (!kernfs_get_active_of(of))
return -EINVAL;
ret = -EINVAL;
if (of->vm_ops->access)
ret = of->vm_ops->access(vma, addr, buf, len, write);
kernfs_put_active(of->kn);
kernfs_put_active_of(of);
return ret;
}
@ -455,7 +473,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
mutex_lock(&of->mutex);
rc = -ENODEV;
if (!kernfs_get_active(of->kn))
if (!kernfs_get_active_of(of))
goto out_unlock;
ops = kernfs_ops(of->kn);
@ -490,7 +508,7 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
}
vma->vm_ops = &kernfs_vm_ops;
out_put:
kernfs_put_active(of->kn);
kernfs_put_active_of(of);
out_unlock:
mutex_unlock(&of->mutex);
@ -852,7 +870,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
struct kernfs_node *kn = kernfs_dentry_node(filp->f_path.dentry);
__poll_t ret;
if (!kernfs_get_active(kn))
if (!kernfs_get_active_of(of))
return DEFAULT_POLLMASK|EPOLLERR|EPOLLPRI;
if (kn->attr.ops->poll)
@ -860,7 +878,7 @@ static __poll_t kernfs_fop_poll(struct file *filp, poll_table *wait)
else
ret = kernfs_generic_poll(of, wait);
kernfs_put_active(kn);
kernfs_put_active_of(of);
return ret;
}
@ -875,7 +893,7 @@ static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence)
* the ops aren't called concurrently for the same open file.
*/
mutex_lock(&of->mutex);
if (!kernfs_get_active(of->kn)) {
if (!kernfs_get_active_of(of)) {
mutex_unlock(&of->mutex);
return -ENODEV;
}
@ -886,7 +904,7 @@ static loff_t kernfs_fop_llseek(struct file *file, loff_t offset, int whence)
else
ret = generic_file_llseek(file, offset, whence);
kernfs_put_active(of->kn);
kernfs_put_active_of(of);
mutex_unlock(&of->mutex);
return ret;
}

View File

@ -881,6 +881,8 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,
if (fsinfo->xattr_support)
server->caps |= NFS_CAP_XATTR;
else
server->caps &= ~NFS_CAP_XATTR;
#endif
}

View File

@ -320,6 +320,7 @@ static void nfs_read_sync_pgio_error(struct list_head *head, int error)
static void nfs_direct_pgio_init(struct nfs_pgio_header *hdr)
{
get_dreq(hdr->dreq);
set_bit(NFS_IOHDR_ODIRECT, &hdr->flags);
}
static const struct nfs_pgio_completion_ops nfs_direct_read_completion_ops = {
@ -471,8 +472,16 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
if (user_backed_iter(iter))
dreq->flags = NFS_ODIRECT_SHOULD_DIRTY;
if (!swap)
nfs_start_io_direct(inode);
if (!swap) {
result = nfs_start_io_direct(inode);
if (result) {
/* release the reference that would usually be
* consumed by nfs_direct_read_schedule_iovec()
*/
nfs_direct_req_release(dreq);
goto out_release;
}
}
NFS_I(inode)->read_io += count;
requested = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos);
@ -1030,7 +1039,14 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter,
requested = nfs_direct_write_schedule_iovec(dreq, iter, pos,
FLUSH_STABLE);
} else {
nfs_start_io_direct(inode);
result = nfs_start_io_direct(inode);
if (result) {
/* release the reference that would usually be
* consumed by nfs_direct_write_schedule_iovec()
*/
nfs_direct_req_release(dreq);
goto out_release;
}
requested = nfs_direct_write_schedule_iovec(dreq, iter, pos,
FLUSH_COND_STABLE);

View File

@ -167,7 +167,10 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to)
iocb->ki_filp,
iov_iter_count(to), (unsigned long) iocb->ki_pos);
nfs_start_io_read(inode);
result = nfs_start_io_read(inode);
if (result)
return result;
result = nfs_revalidate_mapping(inode, iocb->ki_filp->f_mapping);
if (!result) {
result = generic_file_read_iter(iocb, to);
@ -188,7 +191,10 @@ nfs_file_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe
dprintk("NFS: splice_read(%pD2, %zu@%llu)\n", in, len, *ppos);
nfs_start_io_read(inode);
result = nfs_start_io_read(inode);
if (result)
return result;
result = nfs_revalidate_mapping(inode, in->f_mapping);
if (!result) {
result = filemap_splice_read(in, ppos, pipe, len, flags);
@ -431,9 +437,10 @@ static void nfs_invalidate_folio(struct folio *folio, size_t offset,
dfprintk(PAGECACHE, "NFS: invalidate_folio(%lu, %zu, %zu)\n",
folio->index, offset, length);
if (offset != 0 || length < folio_size(folio))
return;
/* Cancel any unstarted writes on this page */
if (offset != 0 || length < folio_size(folio))
nfs_wb_folio(inode, folio);
else
nfs_wb_folio_cancel(inode, folio);
folio_wait_private_2(folio); /* [DEPRECATED] */
trace_nfs_invalidate_folio(inode, folio_pos(folio) + offset, length);
@ -669,7 +676,9 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
nfs_clear_invalid_mapping(file->f_mapping);
since = filemap_sample_wb_err(file->f_mapping);
nfs_start_io_write(inode);
error = nfs_start_io_write(inode);
if (error)
return error;
result = generic_write_checks(iocb, from);
if (result > 0)
result = generic_perform_write(iocb, from);

View File

@ -292,7 +292,7 @@ ff_lseg_match_mirrors(struct pnfs_layout_segment *l1,
struct pnfs_layout_segment *l2)
{
const struct nfs4_ff_layout_segment *fl1 = FF_LAYOUT_LSEG(l1);
const struct nfs4_ff_layout_segment *fl2 = FF_LAYOUT_LSEG(l1);
const struct nfs4_ff_layout_segment *fl2 = FF_LAYOUT_LSEG(l2);
u32 i;
if (fl1->mirror_array_cnt != fl2->mirror_array_cnt)
@ -772,8 +772,11 @@ ff_layout_choose_ds_for_read(struct pnfs_layout_segment *lseg,
continue;
if (check_device &&
nfs4_test_deviceid_unavailable(&mirror->mirror_ds->id_node))
nfs4_test_deviceid_unavailable(&mirror->mirror_ds->id_node)) {
// reinitialize the error state in case if this is the last iteration
ds = ERR_PTR(-EINVAL);
continue;
}
*best_idx = idx;
break;
@ -803,7 +806,7 @@ ff_layout_choose_best_ds_for_read(struct pnfs_layout_segment *lseg,
struct nfs4_pnfs_ds *ds;
ds = ff_layout_choose_valid_ds_for_read(lseg, start_idx, best_idx);
if (ds)
if (!IS_ERR(ds))
return ds;
return ff_layout_choose_any_ds_for_read(lseg, start_idx, best_idx);
}
@ -817,7 +820,7 @@ ff_layout_get_ds_for_read(struct nfs_pageio_descriptor *pgio,
ds = ff_layout_choose_best_ds_for_read(lseg, pgio->pg_mirror_idx,
best_idx);
if (ds || !pgio->pg_mirror_idx)
if (!IS_ERR(ds) || !pgio->pg_mirror_idx)
return ds;
return ff_layout_choose_best_ds_for_read(lseg, 0, best_idx);
}
@ -867,7 +870,7 @@ retry:
req->wb_nio = 0;
ds = ff_layout_get_ds_for_read(pgio, &ds_idx);
if (!ds) {
if (IS_ERR(ds)) {
if (!ff_layout_no_fallback_to_mds(pgio->pg_lseg))
goto out_mds;
pnfs_generic_pg_cleanup(pgio);
@ -1071,11 +1074,13 @@ static void ff_layout_resend_pnfs_read(struct nfs_pgio_header *hdr)
{
u32 idx = hdr->pgio_mirror_idx + 1;
u32 new_idx = 0;
struct nfs4_pnfs_ds *ds;
if (ff_layout_choose_any_ds_for_read(hdr->lseg, idx, &new_idx))
ff_layout_send_layouterror(hdr->lseg);
else
ds = ff_layout_choose_any_ds_for_read(hdr->lseg, idx, &new_idx);
if (IS_ERR(ds))
pnfs_error_mark_layout_for_return(hdr->inode, hdr->lseg);
else
ff_layout_send_layouterror(hdr->lseg);
pnfs_read_resend_pnfs(hdr, new_idx);
}

View File

@ -761,8 +761,10 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
trace_nfs_setattr_enter(inode);
/* Write all dirty data */
if (S_ISREG(inode->i_mode))
if (S_ISREG(inode->i_mode)) {
nfs_file_block_o_direct(NFS_I(inode));
nfs_sync_inode(inode);
}
fattr = nfs_alloc_fattr_with_label(NFS_SERVER(inode));
if (fattr == NULL) {

View File

@ -6,6 +6,7 @@
#include "nfs4_fs.h"
#include <linux/fs_context.h>
#include <linux/security.h>
#include <linux/compiler_attributes.h>
#include <linux/crc32.h>
#include <linux/sunrpc/addr.h>
#include <linux/nfs_page.h>
@ -516,11 +517,11 @@ extern const struct netfs_request_ops nfs_netfs_ops;
#endif
/* io.c */
extern void nfs_start_io_read(struct inode *inode);
extern __must_check int nfs_start_io_read(struct inode *inode);
extern void nfs_end_io_read(struct inode *inode);
extern void nfs_start_io_write(struct inode *inode);
extern __must_check int nfs_start_io_write(struct inode *inode);
extern void nfs_end_io_write(struct inode *inode);
extern void nfs_start_io_direct(struct inode *inode);
extern __must_check int nfs_start_io_direct(struct inode *inode);
extern void nfs_end_io_direct(struct inode *inode);
static inline bool nfs_file_io_is_buffered(struct nfs_inode *nfsi)
@ -528,6 +529,16 @@ static inline bool nfs_file_io_is_buffered(struct nfs_inode *nfsi)
return test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0;
}
/* Must be called with exclusively locked inode->i_rwsem */
static inline void nfs_file_block_o_direct(struct nfs_inode *nfsi)
{
if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) {
clear_bit(NFS_INO_ODIRECT, &nfsi->flags);
inode_dio_wait(&nfsi->vfs_inode);
}
}
/* namespace.c */
#define NFS_PATH_CANONICAL 1
extern char *nfs_path(char **p, struct dentry *dentry,

View File

@ -14,15 +14,6 @@
#include "internal.h"
/* Call with exclusively locked inode->i_rwsem */
static void nfs_block_o_direct(struct nfs_inode *nfsi, struct inode *inode)
{
if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) {
clear_bit(NFS_INO_ODIRECT, &nfsi->flags);
inode_dio_wait(inode);
}
}
/**
* nfs_start_io_read - declare the file is being used for buffered reads
* @inode: file inode
@ -39,19 +30,28 @@ static void nfs_block_o_direct(struct nfs_inode *nfsi, struct inode *inode)
* Note that buffered writes and truncates both take a write lock on
* inode->i_rwsem, meaning that those are serialised w.r.t. the reads.
*/
void
int
nfs_start_io_read(struct inode *inode)
{
struct nfs_inode *nfsi = NFS_I(inode);
int err;
/* Be an optimist! */
down_read(&inode->i_rwsem);
err = down_read_killable(&inode->i_rwsem);
if (err)
return err;
if (test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0)
return;
return 0;
up_read(&inode->i_rwsem);
/* Slow path.... */
down_write(&inode->i_rwsem);
nfs_block_o_direct(nfsi, inode);
err = down_write_killable(&inode->i_rwsem);
if (err)
return err;
nfs_file_block_o_direct(nfsi);
downgrade_write(&inode->i_rwsem);
return 0;
}
/**
@ -74,11 +74,15 @@ nfs_end_io_read(struct inode *inode)
* Declare that a buffered read operation is about to start, and ensure
* that we block all direct I/O.
*/
void
int
nfs_start_io_write(struct inode *inode)
{
down_write(&inode->i_rwsem);
nfs_block_o_direct(NFS_I(inode), inode);
int err;
err = down_write_killable(&inode->i_rwsem);
if (!err)
nfs_file_block_o_direct(NFS_I(inode));
return err;
}
/**
@ -119,19 +123,28 @@ static void nfs_block_buffered(struct nfs_inode *nfsi, struct inode *inode)
* Note that buffered writes and truncates both take a write lock on
* inode->i_rwsem, meaning that those are serialised w.r.t. O_DIRECT.
*/
void
int
nfs_start_io_direct(struct inode *inode)
{
struct nfs_inode *nfsi = NFS_I(inode);
int err;
/* Be an optimist! */
down_read(&inode->i_rwsem);
err = down_read_killable(&inode->i_rwsem);
if (err)
return err;
if (test_bit(NFS_INO_ODIRECT, &nfsi->flags) != 0)
return;
return 0;
up_read(&inode->i_rwsem);
/* Slow path.... */
down_write(&inode->i_rwsem);
err = down_write_killable(&inode->i_rwsem);
if (err)
return err;
nfs_block_buffered(nfsi, inode);
downgrade_write(&inode->i_rwsem);
return 0;
}
/**

View File

@ -35,6 +35,7 @@ struct nfs_local_kiocb {
struct bio_vec *bvec;
struct nfs_pgio_header *hdr;
struct work_struct work;
void (*aio_complete_work)(struct work_struct *);
struct nfsd_file *localio;
};
@ -50,6 +51,11 @@ static void nfs_local_fsync_work(struct work_struct *work);
static bool localio_enabled __read_mostly = true;
module_param(localio_enabled, bool, 0644);
static bool localio_O_DIRECT_semantics __read_mostly = false;
module_param(localio_O_DIRECT_semantics, bool, 0644);
MODULE_PARM_DESC(localio_O_DIRECT_semantics,
"LOCALIO will use O_DIRECT semantics to filesystem.");
static inline bool nfs_client_is_local(const struct nfs_client *clp)
{
return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
@ -274,7 +280,7 @@ nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
static struct nfs_local_kiocb *
nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
struct nfsd_file *localio, gfp_t flags)
struct file *file, gfp_t flags)
{
struct nfs_local_kiocb *iocb;
@ -287,11 +293,19 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
kfree(iocb);
return NULL;
}
init_sync_kiocb(&iocb->kiocb, nfs_to->nfsd_file_file(localio));
if (localio_O_DIRECT_semantics &&
test_bit(NFS_IOHDR_ODIRECT, &hdr->flags)) {
iocb->kiocb.ki_filp = file;
iocb->kiocb.ki_flags = IOCB_DIRECT;
} else
init_sync_kiocb(&iocb->kiocb, file);
iocb->kiocb.ki_pos = hdr->args.offset;
iocb->localio = localio;
iocb->hdr = hdr;
iocb->kiocb.ki_flags &= ~IOCB_APPEND;
iocb->aio_complete_work = NULL;
return iocb;
}
@ -346,6 +360,18 @@ nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
nfs_local_hdr_release(hdr, hdr->task.tk_ops);
}
/*
* Complete the I/O from iocb->kiocb.ki_complete()
*
* Note that this function can be called from a bottom half context,
* hence we need to queue the rpc_call_done() etc to a workqueue
*/
static inline void nfs_local_pgio_aio_complete(struct nfs_local_kiocb *iocb)
{
INIT_WORK(&iocb->work, iocb->aio_complete_work);
queue_work(nfsiod_workqueue, &iocb->work);
}
static void
nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
{
@ -368,6 +394,23 @@ nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
status > 0 ? status : 0, hdr->res.eof);
}
static void nfs_local_read_aio_complete_work(struct work_struct *work)
{
struct nfs_local_kiocb *iocb =
container_of(work, struct nfs_local_kiocb, work);
nfs_local_pgio_release(iocb);
}
static void nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
{
struct nfs_local_kiocb *iocb =
container_of(kiocb, struct nfs_local_kiocb, kiocb);
nfs_local_read_done(iocb, ret);
nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_read_aio_complete_work */
}
static void nfs_local_call_read(struct work_struct *work)
{
struct nfs_local_kiocb *iocb =
@ -382,12 +425,13 @@ static void nfs_local_call_read(struct work_struct *work)
nfs_local_iter_init(&iter, iocb, READ);
status = filp->f_op->read_iter(&iocb->kiocb, &iter);
WARN_ON_ONCE(status == -EIOCBQUEUED);
nfs_local_read_done(iocb, status);
nfs_local_pgio_release(iocb);
revert_creds(save_cred);
if (status != -EIOCBQUEUED) {
nfs_local_read_done(iocb, status);
nfs_local_pgio_release(iocb);
}
}
static int
@ -396,17 +440,28 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
const struct rpc_call_ops *call_ops)
{
struct nfs_local_kiocb *iocb;
struct file *file = nfs_to->nfsd_file_file(localio);
/* Don't support filesystems without read_iter */
if (!file->f_op->read_iter)
return -EAGAIN;
dprintk("%s: vfs_read count=%u pos=%llu\n",
__func__, hdr->args.count, hdr->args.offset);
iocb = nfs_local_iocb_alloc(hdr, localio, GFP_KERNEL);
iocb = nfs_local_iocb_alloc(hdr, file, GFP_KERNEL);
if (iocb == NULL)
return -ENOMEM;
iocb->localio = localio;
nfs_local_pgio_init(hdr, call_ops);
hdr->res.eof = false;
if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
iocb->aio_complete_work = nfs_local_read_aio_complete_work;
}
INIT_WORK(&iocb->work, nfs_local_call_read);
queue_work(nfslocaliod_workqueue, &iocb->work);
@ -536,6 +591,24 @@ nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
nfs_local_pgio_done(hdr, status);
}
static void nfs_local_write_aio_complete_work(struct work_struct *work)
{
struct nfs_local_kiocb *iocb =
container_of(work, struct nfs_local_kiocb, work);
nfs_local_vfs_getattr(iocb);
nfs_local_pgio_release(iocb);
}
static void nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
{
struct nfs_local_kiocb *iocb =
container_of(kiocb, struct nfs_local_kiocb, kiocb);
nfs_local_write_done(iocb, ret);
nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_write_aio_complete_work */
}
static void nfs_local_call_write(struct work_struct *work)
{
struct nfs_local_kiocb *iocb =
@ -554,14 +627,15 @@ static void nfs_local_call_write(struct work_struct *work)
file_start_write(filp);
status = filp->f_op->write_iter(&iocb->kiocb, &iter);
file_end_write(filp);
WARN_ON_ONCE(status == -EIOCBQUEUED);
nfs_local_write_done(iocb, status);
nfs_local_vfs_getattr(iocb);
nfs_local_pgio_release(iocb);
revert_creds(save_cred);
current->flags = old_flags;
if (status != -EIOCBQUEUED) {
nfs_local_write_done(iocb, status);
nfs_local_vfs_getattr(iocb);
nfs_local_pgio_release(iocb);
}
}
static int
@ -570,14 +644,20 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
const struct rpc_call_ops *call_ops)
{
struct nfs_local_kiocb *iocb;
struct file *file = nfs_to->nfsd_file_file(localio);
/* Don't support filesystems without write_iter */
if (!file->f_op->write_iter)
return -EAGAIN;
dprintk("%s: vfs_write count=%u pos=%llu %s\n",
__func__, hdr->args.count, hdr->args.offset,
(hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
iocb = nfs_local_iocb_alloc(hdr, localio, GFP_NOIO);
iocb = nfs_local_iocb_alloc(hdr, file, GFP_NOIO);
if (iocb == NULL)
return -ENOMEM;
iocb->localio = localio;
switch (hdr->args.stable) {
default:
@ -588,10 +668,16 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
case NFS_FILE_SYNC:
iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
}
nfs_local_pgio_init(hdr, call_ops);
nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
iocb->aio_complete_work = nfs_local_write_aio_complete_work;
}
INIT_WORK(&iocb->work, nfs_local_call_write);
queue_work(nfslocaliod_workqueue, &iocb->work);
@ -603,16 +689,9 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
const struct rpc_call_ops *call_ops)
{
int status = 0;
struct file *filp = nfs_to->nfsd_file_file(localio);
if (!hdr->args.count)
return 0;
/* Don't support filesystems without read_iter/write_iter */
if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
nfs_local_disable(clp);
status = -EAGAIN;
goto out;
}
switch (hdr->rw_mode) {
case FMODE_READ:
@ -626,8 +705,10 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
hdr->rw_mode);
status = -EINVAL;
}
out:
if (status != 0) {
if (status == -EAGAIN)
nfs_local_disable(clp);
nfs_to_nfsd_file_put_local(localio);
hdr->task.tk_status = status;
nfs_local_hdr_release(hdr, call_ops);

View File

@ -112,6 +112,7 @@ static int nfs42_proc_fallocate(struct rpc_message *msg, struct file *filep,
exception.inode = inode;
exception.state = lock->open_context->state;
nfs_file_block_o_direct(NFS_I(inode));
err = nfs_sync_inode(inode);
if (err)
goto out;
@ -355,6 +356,7 @@ static ssize_t _nfs42_proc_copy(struct file *src,
return status;
}
nfs_file_block_o_direct(NFS_I(dst_inode));
status = nfs_sync_inode(dst_inode);
if (status)
return status;

View File

@ -283,9 +283,11 @@ static loff_t nfs42_remap_file_range(struct file *src_file, loff_t src_off,
/* flush all pending writes on both src and dst so that server
* has the latest data */
nfs_file_block_o_direct(NFS_I(src_inode));
ret = nfs_sync_inode(src_inode);
if (ret)
goto out_unlock;
nfs_file_block_o_direct(NFS_I(dst_inode));
ret = nfs_sync_inode(dst_inode);
if (ret)
goto out_unlock;

View File

@ -3989,8 +3989,10 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
res.attr_bitmask[2];
}
memcpy(server->attr_bitmask, res.attr_bitmask, sizeof(server->attr_bitmask));
server->caps &= ~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS |
NFS_CAP_SYMLINKS| NFS_CAP_SECURITY_LABEL);
server->caps &=
~(NFS_CAP_ACLS | NFS_CAP_HARDLINKS | NFS_CAP_SYMLINKS |
NFS_CAP_SECURITY_LABEL | NFS_CAP_FS_LOCATIONS |
NFS_CAP_OPEN_XOR | NFS_CAP_DELEGTIME);
server->fattr_valid = NFS_ATTR_FATTR_V4;
if (res.attr_bitmask[0] & FATTR4_WORD0_ACL &&
res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
@ -4064,7 +4066,6 @@ int nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle)
};
int err;
nfs_server_set_init_caps(server);
do {
err = nfs4_handle_exception(server,
_nfs4_server_capabilities(server, fhandle),

View File

@ -2058,6 +2058,7 @@ int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio)
* release it */
nfs_inode_remove_request(req);
nfs_unlock_and_release_request(req);
folio_cancel_dirty(folio);
}
return ret;

View File

@ -696,6 +696,8 @@ out:
* it not only handles the fiemap for inlined files, but also deals
* with the fast symlink, cause they have no difference for extent
* mapping per se.
*
* Must be called with ip_alloc_sem semaphore held.
*/
static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
struct fiemap_extent_info *fieinfo,
@ -707,6 +709,7 @@ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
u64 phys;
u32 flags = FIEMAP_EXTENT_DATA_INLINE|FIEMAP_EXTENT_LAST;
struct ocfs2_inode_info *oi = OCFS2_I(inode);
lockdep_assert_held_read(&oi->ip_alloc_sem);
di = (struct ocfs2_dinode *)di_bh->b_data;
if (ocfs2_inode_is_fast_symlink(inode))
@ -722,8 +725,11 @@ static int ocfs2_fiemap_inline(struct inode *inode, struct buffer_head *di_bh,
phys += offsetof(struct ocfs2_dinode,
id2.i_data.id_data);
/* Release the ip_alloc_sem to prevent deadlock on page fault */
up_read(&OCFS2_I(inode)->ip_alloc_sem);
ret = fiemap_fill_next_extent(fieinfo, 0, phys, id_count,
flags);
down_read(&OCFS2_I(inode)->ip_alloc_sem);
if (ret < 0)
return ret;
}
@ -792,9 +798,11 @@ int ocfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
len_bytes = (u64)le16_to_cpu(rec.e_leaf_clusters) << osb->s_clustersize_bits;
phys_bytes = le64_to_cpu(rec.e_blkno) << osb->sb->s_blocksize_bits;
virt_bytes = (u64)le32_to_cpu(rec.e_cpos) << osb->s_clustersize_bits;
/* Release the ip_alloc_sem to prevent deadlock on page fault */
up_read(&OCFS2_I(inode)->ip_alloc_sem);
ret = fiemap_fill_next_extent(fieinfo, virt_bytes, phys_bytes,
len_bytes, fe_flags);
down_read(&OCFS2_I(inode)->ip_alloc_sem);
if (ret)
break;

View File

@ -388,6 +388,7 @@ struct proc_dir_entry *proc_register(struct proc_dir_entry *dir,
if (proc_alloc_inum(&dp->low_ino))
goto out_free_entry;
if (!S_ISDIR(dp->mode))
pde_set_flags(dp);
write_lock(&proc_subdir_lock);

View File

@ -18,23 +18,42 @@
#define KASAN_ABI_VERSION 5
/*
* Clang 22 added preprocessor macros to match GCC, in hopes of eventually
* dropping __has_feature support for sanitizers:
* https://github.com/llvm/llvm-project/commit/568c23bbd3303518c5056d7f03444dae4fdc8a9c
* Create these macros for older versions of clang so that it is easy to clean
* up once the minimum supported version of LLVM for building the kernel always
* creates these macros.
*
* Note: Checking __has_feature(*_sanitizer) is only true if the feature is
* enabled. Therefore it is not required to additionally check defined(CONFIG_*)
* to avoid adding redundant attributes in other configurations.
*/
#if __has_feature(address_sanitizer) || __has_feature(hwaddress_sanitizer)
/* Emulate GCC's __SANITIZE_ADDRESS__ flag */
#if __has_feature(address_sanitizer) && !defined(__SANITIZE_ADDRESS__)
#define __SANITIZE_ADDRESS__
#endif
#if __has_feature(hwaddress_sanitizer) && !defined(__SANITIZE_HWADDRESS__)
#define __SANITIZE_HWADDRESS__
#endif
#if __has_feature(thread_sanitizer) && !defined(__SANITIZE_THREAD__)
#define __SANITIZE_THREAD__
#endif
/*
* Treat __SANITIZE_HWADDRESS__ the same as __SANITIZE_ADDRESS__ in the kernel.
*/
#ifdef __SANITIZE_HWADDRESS__
#define __SANITIZE_ADDRESS__
#endif
#ifdef __SANITIZE_ADDRESS__
#define __no_sanitize_address \
__attribute__((no_sanitize("address", "hwaddress")))
#else
#define __no_sanitize_address
#endif
#if __has_feature(thread_sanitizer)
/* emulate gcc's __SANITIZE_THREAD__ flag */
#define __SANITIZE_THREAD__
#ifdef __SANITIZE_THREAD__
#define __no_sanitize_thread \
__attribute__((no_sanitize("thread")))
#else

View File

@ -1198,10 +1198,18 @@ extern int send_sigurg(struct file *file);
/* These flags relate to encoding and casefolding */
#define SB_ENC_STRICT_MODE_FL (1 << 0)
#define SB_ENC_NO_COMPAT_FALLBACK_FL (1 << 1)
#define sb_has_strict_encoding(sb) \
(sb->s_encoding_flags & SB_ENC_STRICT_MODE_FL)
#if IS_ENABLED(CONFIG_UNICODE)
#define sb_no_casefold_compat_fallback(sb) \
(sb->s_encoding_flags & SB_ENC_NO_COMPAT_FALLBACK_FL)
#else
#define sb_no_casefold_compat_fallback(sb) (1)
#endif
/*
* Umount options
*/

View File

@ -1637,6 +1637,7 @@ enum {
NFS_IOHDR_RESEND_PNFS,
NFS_IOHDR_RESEND_MDS,
NFS_IOHDR_UNSTABLE_WRITES,
NFS_IOHDR_ODIRECT,
};
struct nfs_io_completion;

29
include/linux/pgalloc.h Normal file
View File

@ -0,0 +1,29 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_PGALLOC_H
#define _LINUX_PGALLOC_H
#include <linux/pgtable.h>
#include <asm/pgalloc.h>
/*
* {pgd,p4d}_populate_kernel() are defined as macros to allow
* compile-time optimization based on the configured page table levels.
* Without this, linking may fail because callers (e.g., KASAN) may rely
* on calls to these functions being optimized away when passing symbols
* that exist only for certain page table levels.
*/
#define pgd_populate_kernel(addr, pgd, p4d) \
do { \
pgd_populate(&init_mm, pgd, p4d); \
if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED) \
arch_sync_kernel_mappings(addr, addr); \
} while (0)
#define p4d_populate_kernel(addr, p4d, pud) \
do { \
p4d_populate(&init_mm, p4d, pud); \
if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_P4D_MODIFIED) \
arch_sync_kernel_mappings(addr, addr); \
} while (0)
#endif /* _LINUX_PGALLOC_H */

View File

@ -1699,8 +1699,8 @@ static inline int pmd_protnone(pmd_t pmd)
/*
* Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
* and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
* needs to be called.
* and let generic vmalloc, ioremap and page table update code know when
* arch_sync_kernel_mappings() needs to be called.
*/
#ifndef ARCH_PAGE_TABLE_SYNC_MASK
#define ARCH_PAGE_TABLE_SYNC_MASK 0
@ -1833,10 +1833,11 @@ static inline bool arch_has_pfn_modify_check(void)
/*
* Page Table Modification bits for pgtbl_mod_mask.
*
* These are used by the p?d_alloc_track*() set of functions an in the generic
* vmalloc/ioremap code to track at which page-table levels entries have been
* modified. Based on that the code can better decide when vmalloc and ioremap
* mapping changes need to be synchronized to other page-tables in the system.
* These are used by the p?d_alloc_track*() and p*d_populate_kernel()
* functions in the generic vmalloc, ioremap and page table update code
* to track at which page-table levels entries have been modified.
* Based on that the code can better decide when page table changes need
* to be synchronized to other page-tables in the system.
*/
#define __PGTBL_PGD_MODIFIED 0
#define __PGTBL_P4D_MODIFIED 1

View File

@ -459,19 +459,17 @@ struct nft_set_ext;
* control plane functions.
*/
struct nft_set_ops {
bool (*lookup)(const struct net *net,
const struct nft_set_ext * (*lookup)(const struct net *net,
const struct nft_set *set,
const u32 *key,
const struct nft_set_ext **ext);
bool (*update)(struct nft_set *set,
const u32 *key);
const struct nft_set_ext * (*update)(struct nft_set *set,
const u32 *key,
struct nft_elem_priv *
(*new)(struct nft_set *,
const struct nft_expr *,
struct nft_regs *),
const struct nft_expr *expr,
struct nft_regs *regs,
const struct nft_set_ext **ext);
struct nft_regs *regs);
bool (*delete)(const struct nft_set *set,
const u32 *key);
@ -1911,7 +1909,6 @@ struct nftables_pernet {
struct mutex commit_mutex;
u64 table_handle;
u64 tstamp;
unsigned int base_seq;
unsigned int gc_seq;
u8 validate_state;
struct work_struct destroy_work;

View File

@ -94,34 +94,35 @@ extern const struct nft_set_type nft_set_pipapo_type;
extern const struct nft_set_type nft_set_pipapo_avx2_type;
#ifdef CONFIG_MITIGATION_RETPOLINE
bool nft_rhash_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext);
bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext);
bool nft_bitmap_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext);
bool nft_hash_lookup_fast(const struct net *net,
const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext);
bool nft_hash_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext);
bool nft_set_do_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext);
#else
static inline bool
nft_set_do_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext)
{
return set->ops->lookup(net, set, key, ext);
}
const struct nft_set_ext *
nft_rhash_lookup(const struct net *net, const struct nft_set *set,
const u32 *key);
const struct nft_set_ext *
nft_rbtree_lookup(const struct net *net, const struct nft_set *set,
const u32 *key);
const struct nft_set_ext *
nft_bitmap_lookup(const struct net *net, const struct nft_set *set,
const u32 *key);
const struct nft_set_ext *
nft_hash_lookup_fast(const struct net *net, const struct nft_set *set,
const u32 *key);
const struct nft_set_ext *
nft_hash_lookup(const struct net *net, const struct nft_set *set,
const u32 *key);
#endif
const struct nft_set_ext *
nft_set_do_lookup(const struct net *net, const struct nft_set *set,
const u32 *key);
/* called from nft_pipapo_avx2.c */
bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext);
const struct nft_set_ext *
nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
const u32 *key);
/* called from nft_set_pipapo.c */
bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
const u32 *key, const struct nft_set_ext **ext);
const struct nft_set_ext *
nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
const u32 *key);
void nft_counter_init_seqcount(void);

View File

@ -3,6 +3,7 @@
#define _NETNS_NFTABLES_H_
struct netns_nftables {
unsigned int base_seq;
u8 gencursor;
};

View File

@ -114,10 +114,11 @@ DEFINE_EVENT(dma_unmap, dma_unmap_resource,
enum dma_data_direction dir, unsigned long attrs),
TP_ARGS(dev, addr, size, dir, attrs));
TRACE_EVENT(dma_alloc,
DECLARE_EVENT_CLASS(dma_alloc_class,
TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr,
size_t size, gfp_t flags, unsigned long attrs),
TP_ARGS(dev, virt_addr, dma_addr, size, flags, attrs),
size_t size, enum dma_data_direction dir, gfp_t flags,
unsigned long attrs),
TP_ARGS(dev, virt_addr, dma_addr, size, dir, flags, attrs),
TP_STRUCT__entry(
__string(device, dev_name(dev))
@ -125,6 +126,7 @@ TRACE_EVENT(dma_alloc,
__field(u64, dma_addr)
__field(size_t, size)
__field(gfp_t, flags)
__field(enum dma_data_direction, dir)
__field(unsigned long, attrs)
),
@ -137,8 +139,9 @@ TRACE_EVENT(dma_alloc,
__entry->attrs = attrs;
),
TP_printk("%s dma_addr=%llx size=%zu virt_addr=%p flags=%s attrs=%s",
TP_printk("%s dir=%s dma_addr=%llx size=%zu virt_addr=%p flags=%s attrs=%s",
__get_str(device),
decode_dma_data_direction(__entry->dir),
__entry->dma_addr,
__entry->size,
__entry->virt_addr,
@ -146,16 +149,69 @@ TRACE_EVENT(dma_alloc,
decode_dma_attrs(__entry->attrs))
);
TRACE_EVENT(dma_free,
#define DEFINE_ALLOC_EVENT(name) \
DEFINE_EVENT(dma_alloc_class, name, \
TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr, \
size_t size, enum dma_data_direction dir, gfp_t flags, \
unsigned long attrs), \
TP_ARGS(dev, virt_addr, dma_addr, size, dir, flags, attrs))
DEFINE_ALLOC_EVENT(dma_alloc);
DEFINE_ALLOC_EVENT(dma_alloc_pages);
DEFINE_ALLOC_EVENT(dma_alloc_sgt_err);
TRACE_EVENT(dma_alloc_sgt,
TP_PROTO(struct device *dev, struct sg_table *sgt, size_t size,
enum dma_data_direction dir, gfp_t flags, unsigned long attrs),
TP_ARGS(dev, sgt, size, dir, flags, attrs),
TP_STRUCT__entry(
__string(device, dev_name(dev))
__dynamic_array(u64, phys_addrs, sgt->orig_nents)
__field(u64, dma_addr)
__field(size_t, size)
__field(enum dma_data_direction, dir)
__field(gfp_t, flags)
__field(unsigned long, attrs)
),
TP_fast_assign(
struct scatterlist *sg;
int i;
__assign_str(device);
for_each_sg(sgt->sgl, sg, sgt->orig_nents, i)
((u64 *)__get_dynamic_array(phys_addrs))[i] = sg_phys(sg);
__entry->dma_addr = sg_dma_address(sgt->sgl);
__entry->size = size;
__entry->dir = dir;
__entry->flags = flags;
__entry->attrs = attrs;
),
TP_printk("%s dir=%s dma_addr=%llx size=%zu phys_addrs=%s flags=%s attrs=%s",
__get_str(device),
decode_dma_data_direction(__entry->dir),
__entry->dma_addr,
__entry->size,
__print_array(__get_dynamic_array(phys_addrs),
__get_dynamic_array_len(phys_addrs) /
sizeof(u64), sizeof(u64)),
show_gfp_flags(__entry->flags),
decode_dma_attrs(__entry->attrs))
);
DECLARE_EVENT_CLASS(dma_free_class,
TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr,
size_t size, unsigned long attrs),
TP_ARGS(dev, virt_addr, dma_addr, size, attrs),
size_t size, enum dma_data_direction dir, unsigned long attrs),
TP_ARGS(dev, virt_addr, dma_addr, size, dir, attrs),
TP_STRUCT__entry(
__string(device, dev_name(dev))
__field(void *, virt_addr)
__field(u64, dma_addr)
__field(size_t, size)
__field(enum dma_data_direction, dir)
__field(unsigned long, attrs)
),
@ -164,17 +220,63 @@ TRACE_EVENT(dma_free,
__entry->virt_addr = virt_addr;
__entry->dma_addr = dma_addr;
__entry->size = size;
__entry->dir = dir;
__entry->attrs = attrs;
),
TP_printk("%s dma_addr=%llx size=%zu virt_addr=%p attrs=%s",
TP_printk("%s dir=%s dma_addr=%llx size=%zu virt_addr=%p attrs=%s",
__get_str(device),
decode_dma_data_direction(__entry->dir),
__entry->dma_addr,
__entry->size,
__entry->virt_addr,
decode_dma_attrs(__entry->attrs))
);
#define DEFINE_FREE_EVENT(name) \
DEFINE_EVENT(dma_free_class, name, \
TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr, \
size_t size, enum dma_data_direction dir, unsigned long attrs), \
TP_ARGS(dev, virt_addr, dma_addr, size, dir, attrs))
DEFINE_FREE_EVENT(dma_free);
DEFINE_FREE_EVENT(dma_free_pages);
TRACE_EVENT(dma_free_sgt,
TP_PROTO(struct device *dev, struct sg_table *sgt, size_t size,
enum dma_data_direction dir),
TP_ARGS(dev, sgt, size, dir),
TP_STRUCT__entry(
__string(device, dev_name(dev))
__dynamic_array(u64, phys_addrs, sgt->orig_nents)
__field(u64, dma_addr)
__field(size_t, size)
__field(enum dma_data_direction, dir)
),
TP_fast_assign(
struct scatterlist *sg;
int i;
__assign_str(device);
for_each_sg(sgt->sgl, sg, sgt->orig_nents, i)
((u64 *)__get_dynamic_array(phys_addrs))[i] = sg_phys(sg);
__entry->dma_addr = sg_dma_address(sgt->sgl);
__entry->size = size;
__entry->dir = dir;
),
TP_printk("%s dir=%s dma_addr=%llx size=%zu phys_addrs=%s",
__get_str(device),
decode_dma_data_direction(__entry->dir),
__entry->dma_addr,
__entry->size,
__print_array(__get_dynamic_array(phys_addrs),
__get_dynamic_array_len(phys_addrs) /
sizeof(u64), sizeof(u64)))
);
TRACE_EVENT(dma_map_sg,
TP_PROTO(struct device *dev, struct scatterlist *sgl, int nents,
int ents, enum dma_data_direction dir, unsigned long attrs),
@ -221,6 +323,41 @@ TRACE_EVENT(dma_map_sg,
decode_dma_attrs(__entry->attrs))
);
TRACE_EVENT(dma_map_sg_err,
TP_PROTO(struct device *dev, struct scatterlist *sgl, int nents,
int err, enum dma_data_direction dir, unsigned long attrs),
TP_ARGS(dev, sgl, nents, err, dir, attrs),
TP_STRUCT__entry(
__string(device, dev_name(dev))
__dynamic_array(u64, phys_addrs, nents)
__field(int, err)
__field(enum dma_data_direction, dir)
__field(unsigned long, attrs)
),
TP_fast_assign(
struct scatterlist *sg;
int i;
__assign_str(device);
for_each_sg(sgl, sg, nents, i)
((u64 *)__get_dynamic_array(phys_addrs))[i] = sg_phys(sg);
__entry->err = err;
__entry->dir = dir;
__entry->attrs = attrs;
),
TP_printk("%s dir=%s dma_addrs=%s err=%d attrs=%s",
__get_str(device),
decode_dma_data_direction(__entry->dir),
__print_array(__get_dynamic_array(phys_addrs),
__get_dynamic_array_len(phys_addrs) /
sizeof(u64), sizeof(u64)),
__entry->err,
decode_dma_attrs(__entry->attrs))
);
TRACE_EVENT(dma_unmap_sg,
TP_PROTO(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs),

View File

@ -12,31 +12,33 @@
/**
* enum mptcp_event_type
* @MPTCP_EVENT_UNSPEC: unused event
* @MPTCP_EVENT_CREATED: token, family, saddr4 | saddr6, daddr4 | daddr6,
* sport, dport A new MPTCP connection has been created. It is the good time
* to allocate memory and send ADD_ADDR if needed. Depending on the
* @MPTCP_EVENT_CREATED: A new MPTCP connection has been created. It is the
* good time to allocate memory and send ADD_ADDR if needed. Depending on the
* traffic-patterns it can take a long time until the MPTCP_EVENT_ESTABLISHED
* is sent.
* @MPTCP_EVENT_ESTABLISHED: token, family, saddr4 | saddr6, daddr4 | daddr6,
* sport, dport A MPTCP connection is established (can start new subflows).
* @MPTCP_EVENT_CLOSED: token A MPTCP connection has stopped.
* @MPTCP_EVENT_ANNOUNCED: token, rem_id, family, daddr4 | daddr6 [, dport] A
* new address has been announced by the peer.
* @MPTCP_EVENT_REMOVED: token, rem_id An address has been lost by the peer.
* @MPTCP_EVENT_SUB_ESTABLISHED: token, family, loc_id, rem_id, saddr4 |
* saddr6, daddr4 | daddr6, sport, dport, backup, if_idx [, error] A new
* subflow has been established. 'error' should not be set.
* @MPTCP_EVENT_SUB_CLOSED: token, family, loc_id, rem_id, saddr4 | saddr6,
* daddr4 | daddr6, sport, dport, backup, if_idx [, error] A subflow has been
* closed. An error (copy of sk_err) could be set if an error has been
* detected for this subflow.
* @MPTCP_EVENT_SUB_PRIORITY: token, family, loc_id, rem_id, saddr4 | saddr6,
* daddr4 | daddr6, sport, dport, backup, if_idx [, error] The priority of a
* subflow has changed. 'error' should not be set.
* @MPTCP_EVENT_LISTENER_CREATED: family, sport, saddr4 | saddr6 A new PM
* listener is created.
* @MPTCP_EVENT_LISTENER_CLOSED: family, sport, saddr4 | saddr6 A PM listener
* is closed.
* is sent. Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6,
* sport, dport, server-side.
* @MPTCP_EVENT_ESTABLISHED: A MPTCP connection is established (can start new
* subflows). Attributes: token, family, saddr4 | saddr6, daddr4 | daddr6,
* sport, dport, server-side.
* @MPTCP_EVENT_CLOSED: A MPTCP connection has stopped. Attribute: token.
* @MPTCP_EVENT_ANNOUNCED: A new address has been announced by the peer.
* Attributes: token, rem_id, family, daddr4 | daddr6 [, dport].
* @MPTCP_EVENT_REMOVED: An address has been lost by the peer. Attributes:
* token, rem_id.
* @MPTCP_EVENT_SUB_ESTABLISHED: A new subflow has been established. 'error'
* should not be set. Attributes: token, family, loc_id, rem_id, saddr4 |
* saddr6, daddr4 | daddr6, sport, dport, backup, if-idx [, error].
* @MPTCP_EVENT_SUB_CLOSED: A subflow has been closed. An error (copy of
* sk_err) could be set if an error has been detected for this subflow.
* Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 |
* daddr6, sport, dport, backup, if-idx [, error].
* @MPTCP_EVENT_SUB_PRIORITY: The priority of a subflow has changed. 'error'
* should not be set. Attributes: token, family, loc_id, rem_id, saddr4 |
* saddr6, daddr4 | daddr6, sport, dport, backup, if-idx [, error].
* @MPTCP_EVENT_LISTENER_CREATED: A new PM listener is created. Attributes:
* family, sport, saddr4 | saddr6.
* @MPTCP_EVENT_LISTENER_CLOSED: A PM listener is closed. Attributes: family,
* sport, saddr4 | saddr6.
*/
enum mptcp_event_type {
MPTCP_EVENT_UNSPEC,

Some files were not shown because too many files have changed in this diff Show More