linux-kernelorg-stable/net/ipv4
Vadim Fedorenko 6e17474aa9 net: fib: restore ECMP balance from loopback
Preference of nexthop with source address broke ECMP for packets with
source addresses which are not in the broadcast domain, but rather added
to loopback/dummy interfaces. Original behaviour was to balance over
nexthops while now it uses the latest nexthop from the group. To fix the
issue introduce next hop scoring system where next hops with source
address equal to requested will always have higher priority.

For the case with 198.51.100.1/32 assigned to dummy0 and routed using
192.0.2.0/24 and 203.0.113.0/24 networks:

2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether d6:54:8a:ff:78:f5 brd ff:ff:ff:ff:ff:ff
    inet 198.51.100.1/32 scope global dummy0
       valid_lft forever preferred_lft forever
7: veth1@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:ed:98:87:6d:8a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.0.2.2/24 scope global veth1
       valid_lft forever preferred_lft forever
    inet6 fe80::4ed:98ff:fe87:6d8a/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever
9: veth3@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ae:75:23:38:a0:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 203.0.113.2/24 scope global veth3
       valid_lft forever preferred_lft forever
    inet6 fe80::ac75:23ff:fe38:a0d2/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever

~ ip ro list:
default
	nexthop via 192.0.2.1 dev veth1 weight 1
	nexthop via 203.0.113.1 dev veth3 weight 1
192.0.2.0/24 dev veth1 proto kernel scope link src 192.0.2.2
203.0.113.0/24 dev veth3 proto kernel scope link src 203.0.113.2

before:
   for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
    255 veth3

after:
   for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
    122 veth1
    133 veth3

Fixes: 32607a332c ("ipv4: prefer multipath nexthop that matches source address")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20251221192639.3911901-1-vadim.fedorenko@linux.dev
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-30 11:07:38 +01:00
..
netfilter netfilter: nf_reject: don't reply to icmp error messages 2025-09-11 15:40:55 +02:00
Kconfig tcp: Convert tcp-md5 to use MD5 library instead of crypto_ahash 2025-10-17 17:14:54 -07:00
Makefile
af_inet.c net: factor-out _sk_charge() helper 2025-11-24 19:49:40 -08:00
ah4.c
arp.c net: Convert struct sockaddr to fixed-size "sa_data[14]" 2025-11-04 19:10:33 -08:00
bpf_tcp_ca.c
cipso_ipv4.c ipv4: icmp: Pass IPv4 control block structure as an argument to __icmp_send() 2025-09-11 12:22:38 +02:00
datagram.c net: Convert proto callbacks from sockaddr to sockaddr_unsized 2025-11-04 19:10:33 -08:00
devinet.c ipv4: Fix NULL vs error pointer check in inet_blackhole_dev_init() 2025-09-03 16:58:44 -07:00
esp4.c tcp: Don't pass hashinfo to socket lookup helpers. 2025-08-25 17:53:35 -07:00
esp4_offload.c xfrm: Determine inner GSO type from packet inner protocol 2025-10-30 11:52:31 +01:00
fib_frontend.c ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
fib_lookup.h
fib_notifier.c
fib_rules.c ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
fib_semantics.c net: fib: restore ECMP balance from loopback 2025-12-30 11:07:38 +01:00
fib_trie.c ipv4: Fix reference count leak when using error routes with nexthop objects 2025-12-30 10:39:22 +01:00
fou_bpf.c
fou_core.c net: gro: remove is_ipv6 from napi_gro_cb 2025-09-25 12:42:49 +02:00
fou_nl.c tools: ynl-gen: add regeneration comment 2025-11-25 19:20:42 -08:00
fou_nl.h tools: ynl-gen: add regeneration comment 2025-11-25 19:20:42 -08:00
gre_demux.c
gre_offload.c
icmp.c ipv4: icmp: Add RFC 5837 support 2025-10-29 18:28:29 -07:00
igmp.c
igmp_internal.h
inet_connection_sock.c tcp: remove icsk->icsk_retransmit_timer 2025-11-25 19:28:29 -08:00
inet_diag.c tcp: introduce icsk->icsk_keepalive_timer 2025-11-25 19:28:29 -08:00
inet_fragment.c inet: frags: flush pending skbs in fqdir_pre_exit() 2025-12-10 01:15:27 -08:00
inet_hashtables.c inet: Avoid ehash lookup race in inet_ehash_insert() 2025-10-17 16:08:43 -07:00
inet_timewait_sock.c inet: Avoid ehash lookup race in inet_twsk_hashdance_schedule() 2025-10-17 16:08:43 -07:00
inetpeer.c
ip_forward.c
ip_fragment.c inet: frags: flush pending skbs in fqdir_pre_exit() 2025-12-10 01:15:27 -08:00
ip_gre.c erspan: Initialize options_len before referencing options. 2025-12-23 09:21:00 +01:00
ip_input.c net: ipv4: Remove extern udp_v4_early_demux()/tcp_v4_early_demux() in .c files 2025-10-29 17:05:30 -07:00
ip_options.c net: Switch to skb_dstref_steal/skb_dstref_restore for ip_route_input callers 2025-08-19 17:54:35 -07:00
ip_output.c net: psp: don't assume reply skbs will have a socket 2025-10-03 10:23:50 -07:00
ip_sockglue.c
ip_tunnel.c net/ip6_tunnel: Prevent perpetual tunnel growth 2025-10-13 17:43:46 -07:00
ip_tunnel_core.c tunnels: reset the GSO metadata before reusing the skb 2025-09-09 13:03:33 +02:00
ip_vti.c
ipcomp.c
ipconfig.c net: ipconfig: Replace strncpy with strscpy in ic_proto_name 2025-11-28 20:19:16 -08:00
ipip.c netfilter: flowtable: Add IPIP rx sw acceleration 2025-11-28 00:00:38 +00:00
ipmr.c ipv4: start using dst_dev_rcu() 2025-08-29 19:36:32 -07:00
ipmr_base.c
metrics.c
netfilter.c ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
netlink.c
nexthop.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-25 11:00:59 -07:00
ping.c net: Convert proto callbacks from sockaddr to sockaddr_unsized 2025-11-04 19:10:33 -08:00
proc.c ipv4: snmp: do not use SNMP_MIB_SENTINEL anymore 2025-09-08 18:06:20 -07:00
protocol.c
raw.c net: Convert proto callbacks from sockaddr to sockaddr_unsized 2025-11-04 19:10:33 -08:00
raw_diag.c inet_diag: change inet_diag_bc_sk() first argument 2025-08-29 19:29:24 -07:00
route.c ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe 2025-11-12 06:46:36 -08:00
syncookies.c tcp: accecn: AccECN negotiation 2025-09-18 08:47:51 +02:00
sysctl_net_ipv4.c tcp: add net.ipv4.tcp_rcvbuf_low_rtt 2025-11-20 17:44:23 -08:00
tcp.c netmem, devmem, tcp: access pp fields through @desc in net_iov 2025-11-27 17:41:51 -08:00
tcp_ao.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2025-09-18 11:26:06 -07:00
tcp_bbr.c
tcp_bic.c
tcp_bpf.c tcp_bpf: Call sk_msg_free() when tcp_bpf_send_verdict() fails to allocate psock->cork. 2025-09-10 06:53:56 -07:00
tcp_cdg.c tcp: cdg: remove redundant __GFP_NOWARN 2025-08-12 14:05:43 -07:00
tcp_cong.c
tcp_cubic.c
tcp_dctcp.c
tcp_dctcp.h
tcp_diag.c inet_diag: change inet_diag_bc_sk() first argument 2025-08-29 19:29:24 -07:00
tcp_fastopen.c tcp: use dst_dev_rcu() in tcp_fastopen_active_disable_ofo_check() 2025-08-29 19:36:32 -07:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: add net.ipv4.tcp_rcvbuf_low_rtt 2025-11-20 17:44:23 -08:00
tcp_ipv4.c tcp: introduce icsk->icsk_keepalive_timer 2025-11-25 19:28:29 -08:00
tcp_lp.c net: tcp_lp: fix kernel-doc warnings and update outdated reference links 2025-10-28 17:52:44 -07:00
tcp_metrics.c Networking changes for 6.18. 2025-10-02 15:17:01 -07:00
tcp_minisocks.c tcp: Don't reinitialise tw->tw_transparent in tcp_time_wait(). 2025-11-18 18:00:38 -08:00
tcp_nv.c
tcp_offload.c tcp: gro: inline tcp_gro_pull_header() 2025-11-14 18:00:08 -08:00
tcp_output.c net/smc: bpf: Introduce generic hook for handshake flow 2025-11-10 11:19:41 -08:00
tcp_plb.c
tcp_rate.c
tcp_recovery.c
tcp_scalable.c
tcp_sigpool.c
tcp_timer.c tcp: remove icsk->icsk_retransmit_timer 2025-11-25 19:28:29 -08:00
tcp_ulp.c
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c
tunnel4.c
udp.c net: Convert proto callbacks from sockaddr to sockaddr_unsized 2025-11-04 19:10:33 -08:00
udp_bpf.c
udp_diag.c inet_diag: change inet_diag_bc_sk() first argument 2025-08-29 19:29:24 -07:00
udp_impl.h
udp_offload.c net: gro: remove is_ipv6 from napi_gro_cb 2025-09-25 12:42:49 +02:00
udp_tunnel_core.c net: Convert proto_ops connect() callbacks to use sockaddr_unsized 2025-11-04 19:10:32 -08:00
udp_tunnel_nic.c udp_tunnel: use netdev_warn() instead of netdev_WARN() 2025-09-11 19:09:48 -07:00
udp_tunnel_stub.c
udplite.c
xfrm4_input.c
xfrm4_output.c
xfrm4_policy.c ipv4: Convert ->flowi4_tos to dscp_t. 2025-08-26 17:34:31 -07:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c