linux-kernelorg-stable/net
Linus Torvalds c5bfc48d54 vfs-6.16-rc1.coredump
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBPTwAKCRCRxhvAZXjc
 oliqAQCVdrBn7D2+dB04hjefFq6W6LhyLGrtCCliflicN5SyxAD+PHHiB9nFKe6J
 xQkaNArCJjPd2QEx73aGjHzi3UQq6Qs=
 =Pk9c
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull coredump updates from Christian Brauner:
 "This adds support for sending coredumps over an AF_UNIX socket. It
  also makes (implicit) use of the new SO_PEERPIDFD ability to hand out
  pidfds for reaped peer tasks

  The new coredump socket will allow userspace to not have to rely on
  usermode helpers for processing coredumps and provides a saf way to
  handle them instead of relying on super privileged coredumping helpers

  This will also be significantly more lightweight since the kernel
  doens't have to do a fork()+exec() for each crashing process to spawn
  a usermodehelper. Instead the kernel just connects to the AF_UNIX
  socket and userspace can process it concurrently however it sees fit.
  Support for userspace is incoming starting with systemd-coredump

  There's more work coming in that direction next cycle. The rest below
  goes into some details and background

  Coredumping currently supports two modes:

   (1) Dumping directly into a file somewhere on the filesystem.

   (2) Dumping into a pipe connected to a usermode helper process
       spawned as a child of the system_unbound_wq or kthreadd

  For simplicity I'm mostly ignoring (1). There's probably still some
  users of (1) out there but processing coredumps in this way can be
  considered adventurous especially in the face of set*id binaries

  The most common option should be (2) by now. It works by allowing
  userspace to put a string into /proc/sys/kernel/core_pattern like:

          |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

  The "|" at the beginning indicates to the kernel that a pipe must be
  used. The path following the pipe indicator is a path to a binary that
  will be spawned as a usermode helper process. Any additional
  parameters pass information about the task that is generating the
  coredump to the binary that processes the coredump

  In the example the core_pattern shown causes the kernel to spawn
  systemd-coredump as a usermode helper. There's various conceptual
  consequences of this (non-exhaustive list):

   - systemd-coredump is spawned with file descriptor number 0 (stdin)
     connected to the read-end of the pipe. All other file descriptors
     are closed. That specifically includes 1 (stdout) and 2 (stderr).

     This has already caused bugs because userspace assumed that this
     cannot happen (Whether or not this is a sane assumption is
     irrelevant)

   - systemd-coredump will be spawned as a child of system_unbound_wq.
     So it is not a child of any userspace process and specifically not
     a child of PID 1. It cannot be waited upon and is in a weird hybrid
     upcall which are difficult for userspace to control correctly

   - systemd-coredump is spawned with full kernel privileges. This
     necessitates all kinds of weird privilege dropping excercises in
     userspace to make this safe

   - A new usermode helper has to be spawned for each crashing process

  This adds a new mode:

   (3) Dumping into an AF_UNIX socket

  Userspace can set /proc/sys/kernel/core_pattern to:

          @/path/to/coredump.socket

  The "@" at the beginning indicates to the kernel that an AF_UNIX
  coredump socket will be used to process coredumps

  The coredump socket must be located in the initial mount namespace.
  When a task coredumps it opens a client socket in the initial network
  namespace and connects to the coredump socket:

   - The coredump server uses SO_PEERPIDFD to get a stable handle on the
     connected crashing task. The retrieved pidfd will provide a stable
     reference even if the crashing task gets SIGKILLed while generating
     the coredump. That is a huge attack vector right now

   - By setting core_pipe_limit non-zero userspace can guarantee that
     the crashing task cannot be reaped behind it's back and thus
     process all necessary information in /proc/<pid>. The SO_PEERPIDFD
     can be used to detect whether /proc/<pid> still refers to the same
     process

     The core_pipe_limit isn't used to rate-limit connections to the
     socket. This can simply be done via AF_UNIX socket directly

   - The pidfd for the crashing task will contain information how the
     task coredumps. The PIDFD_GET_INFO ioctl gained a new flag
     PIDFD_INFO_COREDUMP which can be used to retreive the coredump
     information

     If the coredump gets a new coredump client connection the kernel
     guarantees that PIDFD_INFO_COREDUMP information is available.

     Currently the following information is provided in the new
     @coredump_mask extension to struct pidfd_info:

      * PIDFD_COREDUMPED is raised if the task did actually coredump

      * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping
        (e.g., undumpable)

      * PIDFD_COREDUMP_USER is raised if this is a regular coredump and
        doesn't need special care by the coredump server

      * PIDFD_COREDUMP_ROOT is raised if the generated coredump should
        be treated as sensitive and the coredump server should restrict
        access to the generated coredump to sufficiently privileged
        users"

* tag 'vfs-6.16-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  mips, net: ensure that SOCK_COREDUMP is defined
  selftests/coredump: add tests for AF_UNIX coredumps
  selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure
  coredump: validate socket name as it is written
  coredump: show supported coredump modes
  pidfs, coredump: add PIDFD_INFO_COREDUMP
  coredump: add coredump socket
  coredump: reflow dump helpers a little
  coredump: massage do_coredump()
  coredump: massage format_corename()
2025-05-26 11:17:01 -07:00
..
6lowpan
9p
802
8021q
appletalk treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
atm treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
ax25 treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
batman-adv Here is a batman-adv bugfix: 2025-05-09 17:09:39 -07:00
bluetooth Bluetooth: L2CAP: Fix not checking l2cap_chan security level 2025-05-15 13:09:46 -04:00
bpf
bridge bridge: netfilter: Fix forwarding of fragmented packets 2025-05-16 16:02:06 -07:00
caif
can can: bcm: add missing rcu read protection for procfs content 2025-05-19 16:58:19 +02:00
ceph A small CephFS encryption-related fix and a dead code cleanup. 2025-04-25 15:51:28 -07:00
core vfs-6.16-rc1.pidfs 2025-05-26 10:30:02 -07:00
dcb
dccp tcp/dccp: remove icsk->icsk_ack.timeout 2025-03-25 10:34:33 -07:00
devlink
dns_resolver
dsa net: dsa: microchip: linearize skb for tail-tagging switches 2025-05-16 16:00:17 -07:00
ethernet
ethtool ethtool: cmis_cdb: use correct rpl size in ethtool_cmis_module_poll() 2025-04-11 18:41:19 -07:00
handshake
hsr net: hold instance lock during NETDEV_CHANGE 2025-04-07 11:13:39 -07:00
ieee802154
ife
ipv4 ipsec-2025-05-21 2025-05-22 11:49:53 +02:00
ipv6 ipsec-2025-05-21 2025-05-22 11:49:53 +02:00
iucv
kcm
key
l2tp
l3mdev net: fib_rules: Fix iif / oif matching on L3 master device 2025-04-15 17:54:56 -07:00
lapb treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
llc llc: fix data loss when reading from a socket in llc_ui_recvmsg() 2025-05-19 12:12:54 +01:00
mac80211 wifi: mac80211: Set n_channels after allocating struct cfg80211_scan_request 2025-05-15 13:20:33 +02:00
mac802154
mctp net: mctp: Ensure keys maintain only one ref to corresponding dev 2025-05-09 16:22:53 -07:00
mpls
mptcp mptcp: pm: Defer freeing of MPTCP userspace path manager entries 2025-04-23 16:27:58 -07:00
ncsi treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
netfilter netfilter: ipset: fix region locking in hash types 2025-05-07 23:57:31 +02:00
netlabel
netlink
netrom treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
nfc treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
nsh
openvswitch openvswitch: Fix unsafe attribute parsing in output_userspace() 2025-05-07 16:51:02 -07:00
packet treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
phonet
psample
qrtr
rds
rfkill
rose treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
rxrpc treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
sched sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue() 2025-05-22 11:16:44 +02:00
sctp Including fixes from netfilter. 2025-04-10 08:52:18 -07:00
shaper
smc smc: Fix lockdep false-positive for IPPROTO_SMC. 2025-04-11 14:14:26 -07:00
strparser
sunrpc vfs-6.16-rc1.async.dir 2025-05-26 08:02:43 -07:00
switchdev
tipc net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done 2025-05-22 11:33:12 +02:00
tls net/tls: fix kernel panic when alloc_page failed 2025-05-15 07:40:51 -07:00
unix coredump: add coredump socket 2025-05-21 13:59:11 +02:00
vmw_vsock vsock: avoid timeout during connect() if the socket is closing 2025-04-02 17:19:30 -07:00
wireless wifi: cfg80211: fix out-of-bounds access during multi-link element defragmentation 2025-05-06 21:04:40 +02:00
x25 treewide: Switch/rename to timer_delete[_sync]() 2025-04-05 10:30:12 +02:00
xdp xsk: Bring back busy polling support in XDP_COPY 2025-05-21 10:28:23 +01:00
xfrm xfrm: Sanitize marks before insert 2025-05-14 07:18:58 +02:00
Kconfig
Kconfig.debug
Makefile
compat.c
devres.c
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-03-26 09:32:10 -07:00
sysctl_net.c