Commit Graph

296 Commits

Author SHA1 Message Date
Sabrina Dubroca 57122c74a0 tls: fix use-after-free on failed backlog decryption
When the decrypt request goes to the backlog and crypto_aead_decrypt
returns -EBUSY, tls_do_decryption will wait until all async
decryptions have completed. If one of them fails, tls_do_decryption
will return -EBADMSG and tls_decrypt_sg jumps to the error path,
releasing all the pages. But the pages have been passed to the async
callback, and have already been released by tls_decrypt_done.

The only true async case is when crypto_aead_decrypt returns
 -EINPROGRESS. With -EBUSY, we already waited so we can tell
tls_sw_recvmsg that the data is available for immediate copy, but we
need to notify tls_decrypt_sg (via the new ->async_done flag) that the
memory has already been released.

Fixes: 859054147318 ("net: tls: handle backlogging of crypto requests")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/4755dd8d9bebdefaa19ce1439b833d6199d4364c.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

CVE-2024-26800
(backported from commit 13114dc5543069f7b97991e3b79937b6da05f5b0)
[juergh: Adjusted context and 'return err' instead of 'goto exit_free_skb'
 due to missing commits:
 fd31f3996af2 ("tls: rx: decrypt into a fresh skb")
 6bd116c8c654 ("tls: rx: return the decrypted skb via darg")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Mehmet Basaran <mehmet.basaran@canonical.com>
Acked-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-09-27 10:50:35 +02:00
Sabrina Dubroca 81a9adf960 tls: separate no-async decryption request handling from async
If we're not doing async, the handling is much simpler. There's no
reference counting, we just need to wait for the completion to wake us
up and return its result.

We should preferably also use a separate crypto_wait. I'm not seeing a
UAF as I did in the past, I think aec7961916f3 ("tls: fix race between
async notify and socket close") took care of it.

This will make the next fix easier.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/47bde5f649707610eaef9f0d679519966fc31061.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

CVE-2024-26800
(cherry picked from commit 41532b785e9d79636b3815a64ddf6a096647d011)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Mehmet Basaran <mehmet.basaran@canonical.com>
Acked-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
[smb: possibly context adjustments by git]
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-09-27 10:50:35 +02:00
Jakub Kicinski ce09675221 tls: rx: coalesce exit paths in tls_decrypt_sg()
Jump to the free() call, instead of having to remember
to free the memory in multiple places.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

CVE-2024-26800
(backported from commit 03957d84055e59235c7d57c95a37617bd3aa5646)
[juergh: Adjusted context.]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Mehmet Basaran <mehmet.basaran@canonical.com>
Acked-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-09-27 10:50:35 +02:00
Jakub Kicinski 2dba02cacb tls: fix race between tx work scheduling and socket close
CVE-2024-26585

commit e01e3934a1b2d122919f73bc6ddbe1cdafc4bbdb upstream.

Similarly to previous commit, the submitting thread (recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete().
Reorder scheduling the work before calling complete().
This seems more logical in the first place, as it's
the inverse order of what the submitting thread will do.

Reported-by: valis <sec@valis.email>
Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Lee: Fixed merge-conflict in Stable branches linux-6.1.y and older]
Signed-off-by: Lee Jones <lee@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(backported from commit 196f198ca6fce04ba6ce262f5a0e4d567d7d219d linux-6.1.y)
[juergh: Adjusted context.]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski 520a643445 net: tls: handle backlogging of crypto requests
CVE-2024-26584

commit 8590541473188741055d27b955db0777569438e3 upstream.

Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our
requests to the crypto API, crypto_aead_{encrypt,decrypt} can return
 -EBUSY instead of -EINPROGRESS in valid situations. For example, when
the cryptd queue for AESNI is full (easy to trigger with an
artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued
to the backlog but still processed. In that case, the async callback
will also be called twice: first with err == -EINPROGRESS, which it
seems we can just ignore, then with err == 0.

Compared to Sabrina's original patch this version uses the new
tls_*crypt_async_wait() helpers and converts the EBUSY to
EINPROGRESS to avoid having to modify all the error handling
paths. The handling is identical.

Fixes: a54667f672 ("tls: Add support for encryption using async offload accelerator")
Fixes: 94524d8fc9 ("net/tls: Add support for async decryption of tls records")
Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[v5.15: fixed contextual merge-conflicts in tls_decrypt_done and tls_encrypt_done]
Cc: <stable@vger.kernel.org> # 5.15
Signed-off-by: Shaoying Xu <shaoyi@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3ade391adc584f17b5570fd205de3ad029090368 linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski 90764c2caa tls: fix race between async notify and socket close
CVE-2024-26583

commit aec7961916f3f9e88766e2688992da6980f11b8d upstream.

The submitting thread (one which called recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete()
so any code past that point risks touching already freed data.

Try to avoid the locking and extra flags altogether.
Have the main thread hold an extra reference, this way
we can depend solely on the atomic ref counter for
synchronization.

Don't futz with reiniting the completion, either, we are now
tightly controlling when completion fires.

Reported-by: valis <sec@valis.email>
Fixes: 0cada33241 ("net/tls: fix race condition causing kernel panic")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
[v5.15: fixed contextual conflicts in struct tls_sw_context_rx and func
init_ctx_rx; replaced DEBUG_NET_WARN_ON_ONCE with BUILD_BUG_ON_INVALID
since they're equivalent when DEBUG_NET is not defined]
Cc: <stable@vger.kernel.org> # 5.15
Signed-off-by: Shaoying Xu <shaoyi@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(backported from commit f17d21ea73918ace8afb9c2d8e734dbf71c2c9d7 linux-5.15.y)
[juergh: Adjusted context due to missing commit:
 5c5458ec9d ("net/tls: store async_capable on a single bit")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski b632626101 net: tls: factor out tls_*crypt_async_wait()
CVE-2024-26583

commit c57ca512f3b68ddcd62bda9cc24a8f5584ab01b1 upstream.

Factor out waiting for async encrypt and decrypt to finish.
There are already multiple copies and a subsequent fix will
need more. No functional changes.

Note that crypto_wait_req() returns wait->err

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: aec7961916f3 ("tls: fix race between async notify and socket close")
[v5.15: removed changes in tls_sw_splice_eof and adjusted waiting factor out for
async descrypt in tls_sw_recvmsg]
Cc: <stable@vger.kernel.org> # 5.15
Signed-off-by: Shaoying Xu <shaoyi@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 94afddde1e9285c0aa25a8bb1f4734071e56055b linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Sabrina Dubroca 727ee47496 tls: extract context alloc/initialization out of tls_set_sw_offload
CVE-2024-26583

commit 615580cbc99af0da2d1c7226fab43a3d5003eb97 upstream.

Simplify tls_set_sw_offload a bit.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: aec7961916f3 ("tls: fix race between async notify and socket close")
[v5.15: fixed contextual conflicts from unavailable init_waitqueue_head and
skb_queue_head_init calls in tls_set_sw_offload and init_ctx_rx]
Cc: <stable@vger.kernel.org> # 5.15
Signed-off-by: Shaoying Xu <shaoyi@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fb782814bf09199283e565c7e6d0b6b155363273 linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski 37a589c6ee tls: rx: simplify async wait
CVE-2024-26583

commit 37943f047bfb88ba4dfc7a522563f57c86d088a0 upstream.

Since we are protected from async completions by decrypt_compl_lock
we can drop the async_notify and reinit the completion before we
start waiting.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: aec7961916f3 ("tls: fix race between async notify and socket close")
Signed-off-by: Shaoying Xu <shaoyi@amazon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 704402f913b809345b2372a05d7822d03f29f923 linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski a41c968e7d net: tls: fix async vs NIC crypto offload
CVE-2024-26583

commit c706b2b5ed74d30436b85cbd8e63e969f6b5873a upstream.

When NIC takes care of crypto (or the record has already
been decrypted) we forget to update darg->async. ->async
is supposed to mean whether record is async capable on
input and whether record has been queued for async crypto
on output.

Reported-by: Gal Pressman <gal@nvidia.com>
Fixes: 3547a1f9d988 ("tls: rx: use async as an in-out argument")
Tested-by: Gal Pressman <gal@nvidia.com>
Link: https://lore.kernel.org/r/20220425233309.344858-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9d5932275b3b4a6ffc0be57b1810ad8cf80eafd7 linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Sabrina Dubroca 7e7a1ab368 tls: decrement decrypt_pending if no async completion will be called
CVE-2024-26583

[ Upstream commit f7fa16d49837f947ee59492958f9e6f0e51d9a78 ]

With mixed sync/async decryption, or failures of crypto_aead_decrypt,
we increment decrypt_pending but we never do the corresponding
decrement since tls_decrypt_done will not be called. In this case, we
should decrement decrypt_pending immediately to avoid getting stuck.

For example, the prequeue prequeue test gets stuck with mixed
modes (one async decrypt + one sync decrypt).

Fixes: 94524d8fc9 ("net/tls: Add support for async decryption of tls records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/c56d5fc35543891d5319f834f25622360e1bfbec.1709132643.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5bc8810b788a564bc7ae27ab3dcfa105339f5d0a linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski 28dc0d9f24 tls: rx: use async as an in-out argument
CVE-2024-26583

[ Upstream commit 3547a1f9d988d88ecff4fc365d2773037c849f49 ]

Propagating EINPROGRESS thru multiple layers of functions is
error prone. Use darg->async as an in/out argument, like we
use darg->zc today. On input it tells the code if async is
allowed, on output if it took place.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(backported from commit 9ae48288fc8b1aef1ab3a0d998683292767ed057 linux-5.15.y)
[juergh: Adjusted context due to missing commit:
 284b4d93daee ("tls: rx: move counting TlsDecryptErrors for sync")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski e339f2251a tls: rx: assume crypto always calls our callback
CVE-2024-26583

[ Upstream commit 1c699ffa48a15710746989c36a82cbfb07e8d17f ]

If crypto didn't always invoke our callback for async
we'd not be clearing skb->sk and would crash in the
skb core when freeing it. This if must be dead code.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit bdb7fb29236a52c21c6f2b76354c1699ce19050d linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski b5509995c6 tls: rx: don't track the async count
CVE-2024-26583

[ Upstream commit 7da18bcc5e4cfd14ea520367546c5697e64ae592 ]

We track both if the last record was handled by async crypto
and how many records were async. This is not necessary. We
implicitly assume once crypto goes async it will stay that
way, otherwise we'd reorder records. So just track if we're
in async mode, the exact number of records is not necessary.

This change also forces us into "async" mode more consistently
in case crypto ever decided to interleave async and sync.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b61dbb5ef449afe4e3d2d2298ebb5db52b33ef80 linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski 46a71e05ea tls: rx: factor out writing ContentType to cmsg
CVE-2024-26583

[ Upstream commit 06554f4ffc2595ae52ee80aec4a13bd77d22bed7 ]

cmsg can be filled in during rx_list processing or normal
receive. Consolidate the code.

We don't need to keep the boolean to track if the cmsg was
created. 0 is an invalid content type.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4fd23a600be99c5702b49491899b06ff2f5e51e7 linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski 51b4d6979a tls: rx: wrap decryption arguments in a structure
CVE-2024-26583

[ Upstream commit 4175eac37123a68ebee71f288826339fb89bfec7 ]

We pass zc as a pointer to bool a few functions down as an in/out
argument. This is error prone since C will happily evalue a pointer
as a boolean (IOW forgetting *zc and writing zc leads to loss of
developer time..). Wrap the arguments into a structure.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9876554897b3912949f1dc0dfe89c0f6dd9663e3 linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski 2c0cab77e1 tls: rx: don't report text length from the bowels of decrypt
CVE-2024-26583

[ Upstream commit 9bdf75ccffa690237cd0b472cd598cf6d22873dc ]

We plumb pointer to chunk all the way to the decryption method.
It's set to the length of the text when decrypt_skb_update()
returns.

I think the code is written this way because original TLS
implementation passed &chunk to zerocopy_from_iter() and this
was carried forward as the code gotten more complex, without
any refactoring.

The fix for peek() introduced a new variable - to_decrypt
which for all practical purposes is what chunk is going to
get set to. Spare ourselves the pointer passing, use to_decrypt.

Use this opportunity to clean things up a little further.

Note that chunk / to_decrypt was mostly needed for the async
path, since the sync path would access rxm->full_len (decryption
transforms full_len from record size to text size). Use the
right source of truth more explicitly.

We have three cases:
 - async - it's TLS 1.2 only, so chunk == to_decrypt, but we
           need the min() because to_decrypt is a whole record
	   and we don't want to underflow len. Note that we can't
	   handle partial record by falling back to sync as it
	   would introduce reordering against records in flight.
 - zc - again, TLS 1.2 only for now, so chunk == to_decrypt,
        we don't do zc if len < to_decrypt, no need to check again.
 - normal - it already handles chunk > len, we can factor out the
            assignment to rxm->full_len and share it with zc.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d6c9c2a66c91407bbb2381a823200164fa4c067b linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:50 +02:00
Jakub Kicinski 258ad497b2 tls: rx: drop unnecessary arguments from tls_setup_from_iter()
CVE-2024-26583

[ Upstream commit d4bd88e67666c73cfa9d75c282e708890d4f10a7 ]

sk is unused, remove it to make it clear the function
doesn't poke at the socket.

size_used is always 0 on input and @length on success.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit ffc8a2b821414e5781df1d0a6b5c40c361174575 linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Jakub Kicinski c26f1b98a6 tls: hw: rx: use return value of tls_device_decrypted() to carry status
CVE-2024-26583

[ Upstream commit 71471ca32505afa7c3f7f6a8268716e1ddb81cd4 ]

Instead of tls_device poking into internals of the message
return 1 from tls_device_decrypted() if the device handled
the decryption.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(backported from commit 1abd49fa1ffb43ef31369cfcfdc9d0409db4ea58 linux-5.15.y)
[juergh: Adjusted context due to missing commit:
 8538d29cea ("net/tls: add tracing for device/offload events")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Jakub Kicinski 1ae0749067 tls: rx: refactor decrypt_skb_update()
CVE-2024-26583

[ Upstream commit 3764ae5ba6615095de86698a00e814513b9ad0d5 ]

Use early return and a jump label to remove two indentation levels.
No functional changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(backported from commit 432d40036f173275fc89f2c154ce927ccb568b7a linux-5.15.y)
[juergh: Adjusted context and drop stats counting due to missing commits:
 5c5d22a750 ("net/tls: avoid spurious decryption error with HW resync")
 5c5ec66858 ("net/tls: add TlsDecryptError stat")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Jakub Kicinski f31f945607 tls: rx: don't issue wake ups when data is decrypted
CVE-2024-26583

[ Upstream commit 5dbda02d322db7762f1a0348117cde913fb46c13 ]

We inform the applications that data is available when
the record is received. Decryption happens inline inside
recvmsg or splice call. Generating another wakeup inside
the decryption handler seems pointless as someone must
be actively reading the socket if we are executing this
code.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 17d8bda2a6fdb49938d74e8018700e5ae1be1482 linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Jakub Kicinski 78bf9e8c79 tls: rx: don't store the decryption status in socket context
CVE-2024-26583

[ Upstream commit 7dc59c33d62c4520a119051d4486c214ef5caa23 ]

Similar justification to previous change, the information
about decryption status belongs in the skb.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(backported from commit de0970d258efa793fd1236a362f0838c9c9d2384 linux-5.15.y)
[juergh: Minor adjustments due to missing commits:
 bc76e5bb12 ("net/tls: store decrypted on a single bit")
 5c5458ec9d ("net/tls: store async_capable on a single bit")]
 8538d29cea ("net/tls: add tracing for device/offload events")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Jakub Kicinski 2fea9590fe tls: rx: don't store the record type in socket context
CVE-2024-26583

[ Upstream commit c3f6bb74137c68b515b7e2ff123a80611e801013 ]

Original TLS implementation was handling one record at a time.
It stashed the type of the record inside tls context (per socket
structure) for convenience. When async crypto support was added
[1] the author had to use skb->cb to store the type per-message.

The use of skb->cb overlaps with strparser, however, so a hybrid
approach was taken where type is stored in context while parsing
(since we parse a message at a time) but once parsed its copied
to skb->cb.

Recently a workaround for sockmaps [2] exposed the previously
private struct _strp_msg and started a trend of adding user
fields directly in strparser's header. This is cleaner than
storing information about an skb in the context.

This change is not strictly necessary, but IMHO the ownership
of the context field is confusing. Information naturally
belongs to the skb.

[1] commit 94524d8fc9 ("net/tls: Add support for async decryption of tls records")
[2] commit b2c4618162ec ("bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: f7fa16d49837 ("tls: decrement decrypt_pending if no async completion will be called")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(backported from commit 4c68bf84d1623437483411d9268e9a80d4ee0488 linux-5.15.y)
[juergh: Adjusted context due to missing commits:
 b2c4618162ec ("bpf, sockmap: sk_skb data_end access incorrect when src_reg = dst_reg")
 6942a284fb ("net/tls: make inline helpers protocol-aware")
 bc76e5bb12 ("net/tls: store decrypted on a single bit")
 5c5458ec9d ("net/tls: store async_capable on a single bit")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Jakub Kicinski 4bd59b2628 net: tls: avoid discarding data on record close
CVE-2024-26583

[ Upstream commit 6b47808f223c70ff564f9b363446d2a5fa1e05b2 ]

TLS records end with a 16B tag. For TLS device offload we only
need to make space for this tag in the stream, the device will
generate and replace it with the actual calculated tag.

Long time ago the code would just re-reference the head frag
which mostly worked but was suboptimal because it prevented TCP
from combining the record into a single skb frag. I'm not sure
if it was correct as the first frag may be shorter than the tag.

The commit under fixes tried to replace that with using the page
frag and if the allocation failed rolling back the data, if record
was long enough. It achieves better fragment coalescing but is
also buggy.

We don't roll back the iterator, so unless we're at the end of
send we'll skip the data we designated as tag and start the
next record as if the rollback never happened.
There's also the possibility that the record was constructed
with MSG_MORE and the data came from a different syscall and
we already told the user space that we "got it".

Allocate a single dummy page and use it as fallback.

Found by code inspection, and proven by forcing allocation
failures.

Fixes: e7b159a48b ("net/tls: remove the record tail optimization")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(backported from commit e2d10f1de1fac24e6e41bed71301d7a95aea43c6 linux-5.15.y)
[juergh: Adjusted context due to missing commit:
 6942a284fb ("net/tls: make inline helpers protocol-aware")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Tariq Toukan 2af1a6e550 net/tls: Multi-threaded calls to TX tls_dev_del
CVE-2024-26583

[ Upstream commit 7adc91e0c93901a0eeeea10665d0feb48ffde2d4 ]

Multiple TLS device-offloaded contexts can be added in parallel via
concurrent calls to .tls_dev_add, while calls to .tls_dev_del are
sequential in tls_device_gc_task.

This is not a sustainable behavior. This creates a rate gap between add
and del operations (addition rate outperforms the deletion rate).  When
running for enough time, the TLS device resources could get exhausted,
failing to offload new connections.

Replace the single-threaded garbage collector work with a per-context
alternative, so they can be handled on several cores in parallel. Use
a new dedicated destruct workqueue for this.

Tested with mlx5 device:
Before: 22141 add/sec,   103 del/sec
After:  11684 add/sec, 11684 del/sec

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 6b47808f223c ("net: tls: avoid discarding data on record close")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9a15ca893909e1b87d975d0726b79caf5b2f8830 linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Tariq Toukan cf15f60d75 net/tls: Perform immediate device ctx cleanup when possible
CVE-2024-26583

[ Upstream commit 113671b255ee3b9f5585a6d496ef0e675e698698 ]

TLS context destructor can be run in atomic context. Cleanup operations
for device-offloaded contexts could require access and interaction with
the device callbacks, which might sleep. Hence, the cleanup of such
contexts must be deferred and completed inside an async work.

For all others, this is not necessary, as cleanup is atomic. Invoke
cleanup immediately for them, avoiding queueing redundant gc work.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 6b47808f223c ("net: tls: avoid discarding data on record close")
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2d93157b7e2dac9acf3ee4e52189b4c79d1ac77c linux-5.15.y)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Jakub Kicinski 5e68421cd8 net/tls: pass context to tls_device_decrypted()
CVE-2024-26583

Avoid unnecessary pointer chasing and calculations, callers already
have most of the state tls_device_decrypted() needs.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit 4de30a8d58)
[juergh: Adjusted context.]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Maxim Mikityanskiy 8f560c7185 net/tls: Remove the context from the list in tls_device_down
CVE-2024-26583

tls_device_down takes a reference on all contexts it's going to move to
the degraded state (software fallback). If sk_destruct runs afterwards,
it can reduce the reference counter back to 1 and return early without
destroying the context. Then tls_device_down will release the reference
it took and call tls_device_free_ctx. However, the context will still
stay in tls_device_down_list forever. The list will contain an item,
memory for which is released, making a memory corruption possible.

Fix the above bug by properly removing the context from all lists before
any call to tls_device_free_ctx.

Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f6336724a4d4220c89a4ec38bca84b03b178b1a3)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Tariq Toukan 726e00ea41 net/tls: Check for errors in tls_device_init
CVE-2024-26583

Add missing error checks in tls_device_init.

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(backported from commit 3d8c51b25a235e283e37750943bbf356ef187230)
[juergh: Declare err, drop unregister_pernet_subsys() due to missing commit:
 d26b698dd3 ("net/tls: add skeleton of MIB statistics").
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Maxim Mikityanskiy 525ac5b33c tls: Fix context leak on tls_device_down
CVE-2024-26583

The commit cited below claims to fix a use-after-free condition after
tls_device_down. Apparently, the description wasn't fully accurate. The
context stayed alive, but ctx->netdev became NULL, and the offload was
torn down without a proper fallback, so a bug was present, but a
different kind of bug.

Due to misunderstanding of the issue, the original patch dropped the
refcount_dec_and_test line for the context to avoid the alleged
premature deallocation. That line has to be restored, because it matches
the refcount_inc_not_zero from the same function, otherwise the contexts
that survived tls_device_down are leaked.

This patch fixes the described issue by restoring refcount_dec_and_test.
After this change, there is no leak anymore, and the fallback to
software kTLS still works.

Fixes: c55dcdd435 ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220512091830.678684-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 3740651bf7e200109dd42d5b2fb22226b26f960a)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Jakub Kicinski 388c0caa27 tls: splice_read: fix accessing pre-processed records
CVE-2024-26583

recvmsg() will put peek()ed and partially read records onto the rx_list.
splice_read() needs to consult that list otherwise it may miss data.
Align with recvmsg() and also put partially-read records onto rx_list.
tls_sw_advance_skb() is pretty pointless now and will be removed in
net-next.

Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit e062fe99cccd9ff9f232e593d163ecabd244fae8)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Jim Ma c9fbeb9fb6 tls splice: remove inappropriate flags checking for MSG_PEEK
CVE-2024-26583

In function tls_sw_splice_read, before call tls_sw_advance_skb
it checks likely(!(flags & MSG_PEEK)), while MSG_PEEK is used
for recvmsg, splice supports SPLICE_F_NONBLOCK, SPLICE_F_MOVE,
SPLICE_F_MORE, should remove this checking.

Signed-off-by: Jim Ma <majinjing3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d8654f4f93)
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Jakub Kicinski 4239297f7a tls: splice_read: fix record type check
CVE-2024-26583

We don't support splicing control records. TLS 1.3 changes moved
the record type check into the decrypt if(). The skb may already
be decrypted and still be an alert.

Note that decrypt_skb_update() is idempotent and updates ctx->decrypted
so the if() is pointless.

Reorder the check for decryption errors with the content type check
while touching them. This part is not really a bug, because if
decryption failed in TLS 1.3 content type will be DATA, and for
TLS 1.2 it will be correct. Nevertheless its strange to touch output
before checking if the function has failed.

Fixes: fedf201e12 ("net: tls: Refactor control message handling on recv")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(backported from commit 520493f66f6822551aef2879cd40207074fe6980)
[juergh: Adjusted context due to missing commit:
 bc76e5bb12 ("net/tls: store decrypted on a single bit")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Maxim Mikityanskiy 1f2290853a net/tls: Fix use-after-free after the TLS device goes down and up
CVE-2024-26583

When a netdev with active TLS offload goes down, tls_device_down is
called to stop the offload and tear down the TLS context. However, the
socket stays alive, and it still points to the TLS context, which is now
deallocated. If a netdev goes up, while the connection is still active,
and the data flow resumes after a number of TCP retransmissions, it will
lead to a use-after-free of the TLS context.

This commit addresses this bug by keeping the context alive until its
normal destruction, and implements the necessary fallbacks, so that the
connection can resume in software (non-offloaded) kTLS mode.

On the TX side tls_sw_fallback is used to encrypt all packets. The RX
side already has all the necessary fallbacks, because receiving
non-decrypted packets is supported. The thing needed on the RX side is
to block resync requests, which are normally produced after receiving
non-decrypted packets.

The necessary synchronization is implemented for a graceful teardown:
first the fallbacks are deployed, then the driver resources are released
(it used to be possible to have a tls_dev_resync after tls_dev_del).

A new flag called TLS_RX_DEV_DEGRADED is added to indicate the fallback
mode. It's used to skip the RX resync logic completely, as it becomes
useless, and some objects may be released (for example, resync_async,
which is allocated and freed by the driver).

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit c55dcdd435)
[juergh: Adjusted context due to missing commit:
 d5bee7374b ("net/tls: Annotate access to sk_prot with READ_ONCE/WRITE_ONCE")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Maxim Mikityanskiy 997f98f9e0 net/tls: Replace TLS_RX_SYNC_RUNNING with RCU
CVE-2024-26583

RCU synchronization is guaranteed to finish in finite time, unlike a
busy loop that polls a flag. This patch is a preparation for the bugfix
in the next patch, where the same synchronize_net() call will also be
used to sync with the TX datapath.

Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit 05fc8b6cbd)
[juergh: Adjusted context due to missing commit:
 8538d29cea ("net/tls: add tracing for device/offload events")]
Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com>
Acked-by: Manuel Diewald <manuel.diewald@canonical.com>
Acked-by: Philip Cox <philip.cox@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-07-05 10:51:49 +02:00
Sabrina Dubroca 39e4657d4b tls: stop recv() if initial process_rx_list gave us non-DATA
BugLink: https://bugs.launchpad.net/bugs/2060019

[ Upstream commit fdfbaec5923d9359698cbb286bc0deadbb717504 ]

If we have a non-DATA record on the rx_list and another record of the
same type still on the queue, we will end up merging them:
 - process_rx_list copies the non-DATA record
 - we start the loop and process the first available record since it's
   of the same type
 - we break out of the loop since the record was not DATA

Just check the record type and jump to the end in case process_rx_list
did some work.

Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/bd31449e43bd4b6ff546f5c51cf958c31c511deb.1708007371.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
2024-04-26 10:54:09 +02:00
Jakub Kicinski 6ae41c0a2a tls: rx: drop pointless else after goto
BugLink: https://bugs.launchpad.net/bugs/2060019

[ Upstream commit d5123edd10cf9d324fcb88e276bdc7375f3c5321 ]

Pointless else branch after goto makes the code harder to refactor
down the line.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: fdfbaec5923d ("tls: stop recv() if initial process_rx_list gave us non-DATA")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
2024-04-26 10:54:09 +02:00
Jakub Kicinski ed4294edb6 tls: rx: jump to a more appropriate label
BugLink: https://bugs.launchpad.net/bugs/2060019

[ Upstream commit bfc06e1aaa130b86a81ce3c41ec71a2f5e191690 ]

'recv_end:' checks num_async and decrypted, and is then followed
by the 'end' label. Since we know that decrypted and num_async
are 0 at the start we can jump to 'end'.

Move the init of decrypted and num_async to let the compiler
catch if I'm wrong.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stable-dep-of: fdfbaec5923d ("tls: stop recv() if initial process_rx_list gave us non-DATA")
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
2024-04-26 10:54:09 +02:00
John Fastabend 4aefbc3c1d net: tls, update curr on splice as well
commit c5a595000e2677e865a39f249c056bc05d6e55fd upstream.

The curr pointer must also be updated on the splice similar to how
we do this for other copy types.

Fixes: d829e9c411 ("tls: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reported-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20231206232706.374377-2-john.fastabend@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

CVE-2024-0646
(cherry picked from commit ba5efd8544fa62ae85daeb36077468bf2ce974ab linux-5.15.y)
Signed-off-by: Magali Lemes <magali.lemes@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2024-02-02 14:13:17 +01:00
Liu Jian 1751cbc90e net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()
BugLink: https://bugs.launchpad.net/bugs/2040284

[ Upstream commit cfaa80c91f6f99b9342b6557f0f0e1143e434066 ]

I got the below warning when do fuzzing test:
BUG: KASAN: null-ptr-deref in scatterwalk_copychunks+0x320/0x470
Read of size 4 at addr 0000000000000008 by task kworker/u8:1/9

CPU: 0 PID: 9 Comm: kworker/u8:1 Tainted: G           OE
Hardware name: linux,dummy-virt (DT)
Workqueue: pencrypt_parallel padata_parallel_worker
Call trace:
 dump_backtrace+0x0/0x420
 show_stack+0x34/0x44
 dump_stack+0x1d0/0x248
 __kasan_report+0x138/0x140
 kasan_report+0x44/0x6c
 __asan_load4+0x94/0xd0
 scatterwalk_copychunks+0x320/0x470
 skcipher_next_slow+0x14c/0x290
 skcipher_walk_next+0x2fc/0x480
 skcipher_walk_first+0x9c/0x110
 skcipher_walk_aead_common+0x380/0x440
 skcipher_walk_aead_encrypt+0x54/0x70
 ccm_encrypt+0x13c/0x4d0
 crypto_aead_encrypt+0x7c/0xfc
 pcrypt_aead_enc+0x28/0x84
 padata_parallel_worker+0xd0/0x2dc
 process_one_work+0x49c/0xbdc
 worker_thread+0x124/0x880
 kthread+0x210/0x260
 ret_from_fork+0x10/0x18

This is because the value of rec_seq of tls_crypto_info configured by the
user program is too large, for example, 0xffffffffffffff. In addition, TLS
is asynchronously accelerated. When tls_do_encryption() returns
-EINPROGRESS and sk->sk_err is set to EBADMSG due to rec_seq overflow,
skmsg is released before the asynchronous encryption process ends. As a
result, the UAF problem occurs during the asynchronous processing of the
encryption module.

If the operation is asynchronous and the encryption module returns
EINPROGRESS, do not free the record information.

Fixes: 635d939817 ("net/tls: free record only on encryption error")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/r/20230909081434.2324940-1-liujian56@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2023-10-30 11:42:21 +01:00
Kees Cook dc08591ef5 treewide: Remove uninitialized_var() usage
BugLink: https://bugs.launchpad.net/bugs/2028981

commit 3f649ab728 upstream.

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2023-08-09 12:25:41 +02:00
Hangyu Hua 5b0e3a8b63 net: tls: fix possible race condition between do_tls_getsockopt_conf() and do_tls_setsockopt_conf()
BugLink: https://bugs.launchpad.net/bugs/2023601

commit 49c47cc21b5b7a3d8deb18fc57b0aa2ab1286962 upstream.

ctx->crypto_send.info is not protected by lock_sock in
do_tls_getsockopt_conf(). A race condition between do_tls_getsockopt_conf()
and error paths of do_tls_setsockopt_conf() may lead to a use-after-free
or null-deref.

More discussion:  https://lore.kernel.org/all/Y/ht6gQL+u6fj3dG@hog/

Fixes: 3c4d755915 ("tls: kernel TLS support")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20230228023344.9623-1-hbh25y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Meena Shanmugam <meenashanmugam@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Luke Nowakowski-Krijger <luke.nowakowskikrijger@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2023-07-10 17:22:01 +02:00
Jakub Kicinski b2cf94a065 net: tls: avoid hanging tasks on the tx_lock
BugLink: https://bugs.launchpad.net/bugs/2017706

commit f3221361dc85d4de22586ce8441ec2c67b454f5d upstream.

syzbot sent a hung task report and Eric explains that adversarial
receiver may keep RWIN at 0 for a long time, so we are not guaranteed
to make forward progress. Thread which took tx_lock and went to sleep
may not release tx_lock for hours. Use interruptible sleep where
possible and reschedule the work if it can't take the lock.

Testing: existing selftest passes

Reported-by: syzbot+9c0268252b8ef967c62e@syzkaller.appspotmail.com
Fixes: 79ffe6087e ("net/tls: add a TX lock")
Link: https://lore.kernel.org/all/000000000000e412e905f5b46201@google.com/
Cc: stable@vger.kernel.org # wait 4 weeks
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230301002857.2101894-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luke Nowakowski-Krijger <luke.nowakowskikrijger@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2023-05-12 17:15:15 +02:00
Tariq Toukan 60aefa3f3a net/tls: Fix race in TLS device down flow
BugLink: https://bugs.launchpad.net/bugs/1988225

[ Upstream commit f08d8c1bb97c48f24a82afaa2fd8c140f8d3da8b ]

Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.

In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
  all the way until it flushes the gc work, and returns without freeing
  the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
  work and frees the context's device resources.

Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization.  This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2022-09-16 10:59:45 +02:00
Jakub Kicinski b856980060 Revert "net/tls: fix tls_sk_proto_close executed repeatedly"
BugLink: https://bugs.launchpad.net/bugs/1986995

[ Upstream commit 1b205d948fbb06a7613d87dcea0ff5fd8a08ed91 ]

This reverts commit 69135c572d1f84261a6de2a1268513a7e71753e2.

This commit was just papering over the issue, ULP should not
get ->update() called with its own sk_prot. Each ULP would
need to add this check.

Fixes: 69135c572d1f ("net/tls: fix tls_sk_proto_close executed repeatedly")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20220620191353.1184629-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2022-08-26 11:11:14 +02:00
Ziyang Xuan 755e3761ae net/tls: fix tls_sk_proto_close executed repeatedly
BugLink: https://bugs.launchpad.net/bugs/1986995

[ Upstream commit 69135c572d1f84261a6de2a1268513a7e71753e2 ]

After setting the sock ktls, update ctx->sk_proto to sock->sk_prot by
tls_update(), so now ctx->sk_proto->close is tls_sk_proto_close(). When
close the sock, tls_sk_proto_close() is called for sock->sk_prot->close
is tls_sk_proto_close(). But ctx->sk_proto->close() will be executed later
in tls_sk_proto_close(). Thus tls_sk_proto_close() executed repeatedly
occurred. That will trigger the following bug.

=================================================================
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:tls_sk_proto_close+0xd8/0xaf0 net/tls/tls_main.c:306
Call Trace:
 <TASK>
 tls_sk_proto_close+0x356/0xaf0 net/tls/tls_main.c:329
 inet_release+0x12e/0x280 net/ipv4/af_inet.c:428
 __sock_release+0xcd/0x280 net/socket.c:650
 sock_close+0x18/0x20 net/socket.c:1365

Updating a proto which is same with sock->sk_prot is incorrect. Add proto
and sock->sk_prot equality check at the head of tls_update() to fix it.

Fixes: 95fa145479 ("bpf: sockmap/tls, close can race with map free")
Reported-by: syzbot+29c3c12f3214b85ad081@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2022-08-26 11:11:11 +02:00
Maxim Mikityanskiy f6554d063b tls: Skip tls_append_frag on zero copy size
BugLink: https://bugs.launchpad.net/bugs/1979014

[ Upstream commit a0df71948e9548de819a6f1da68f5f1742258a52 ]

Calling tls_append_frag when max_open_record_len == record->len might
add an empty fragment to the TLS record if the call happens to be on the
page boundary. Normally tls_append_frag coalesces the zero-sized
fragment to the previous one, but not if it's on page boundary.

If a resync happens then, the mlx5 driver posts dump WQEs in
tx_post_resync_dump, and the empty fragment may become a data segment
with byte_count == 0, which will confuse the NIC and lead to a CQE
error.

This commit fixes the described issue by skipping tls_append_frag on
zero size to avoid adding empty fragments. The fix is not in the driver,
because an empty fragment is hardly the desired behavior.

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220426154949.159055-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2022-06-22 14:51:13 +02:00
Ziyang Xuan 81d964b746 net/tls: fix slab-out-of-bounds bug in decrypt_internal
BugLink: https://bugs.launchpad.net/bugs/1971497

[ Upstream commit 9381fe8c849cfbe50245ac01fc077554f6eaa0e2 ]

The memory size of tls_ctx->rx.iv for AES128-CCM is 12 setting in
tls_set_sw_offload(). The return value of crypto_aead_ivsize()
for "ccm(aes)" is 16. So memcpy() require 16 bytes from 12 bytes
memory space will trigger slab-out-of-bounds bug as following:

==================================================================
BUG: KASAN: slab-out-of-bounds in decrypt_internal+0x385/0xc40 [tls]
Read of size 16 at addr ffff888114e84e60 by task tls/10911

Call Trace:
 <TASK>
 dump_stack_lvl+0x34/0x44
 print_report.cold+0x5e/0x5db
 ? decrypt_internal+0x385/0xc40 [tls]
 kasan_report+0xab/0x120
 ? decrypt_internal+0x385/0xc40 [tls]
 kasan_check_range+0xf9/0x1e0
 memcpy+0x20/0x60
 decrypt_internal+0x385/0xc40 [tls]
 ? tls_get_rec+0x2e0/0x2e0 [tls]
 ? process_rx_list+0x1a5/0x420 [tls]
 ? tls_setup_from_iter.constprop.0+0x2e0/0x2e0 [tls]
 decrypt_skb_update+0x9d/0x400 [tls]
 tls_sw_recvmsg+0x3c8/0xb50 [tls]

Allocated by task 10911:
 kasan_save_stack+0x1e/0x40
 __kasan_kmalloc+0x81/0xa0
 tls_set_sw_offload+0x2eb/0xa20 [tls]
 tls_setsockopt+0x68c/0x700 [tls]
 __sys_setsockopt+0xfe/0x1b0

Replace the crypto_aead_ivsize() with prot->iv_size + prot->salt_size
when memcpy() iv value in TLS_1_3_VERSION scenario.

Fixes: f295b3ae9f ("net/tls: Add support of AES128-CCM based ciphers")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2022-05-20 15:19:54 +02:00
Tianjia Zhang 84ed722504 net/tls: Fix authentication failure in CCM mode
BugLink: https://bugs.launchpad.net/bugs/1956381

commit 5961060692f8b17cd2080620a3d27b95d2ae05ca upstream.

When the TLS cipher suite uses CCM mode, including AES CCM and
SM4 CCM, the first byte of the B0 block is flags, and the real
IV starts from the second byte. The XOR operation of the IV and
rec_seq should be skip this byte, that is, add the iv_offset.

Fixes: f295b3ae9f ("net/tls: Add support of AES128-CCM based ciphers")
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Cc: Vakul Garg <vakul.garg@nxp.com>
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2022-02-03 18:57:38 +01:00
Daniel Jordan 62e7d71652 net/tls: Fix flipped sign in async_wait.err assignment
BugLink: https://bugs.launchpad.net/bugs/1951883

commit 1d9d6fd21a upstream.

sk->sk_err contains a positive number, yet async_wait.err wants the
opposite.  Fix the missed sign flip, which Jakub caught by inspection.

Fixes: a42055e8d2 ("net/tls: Add support for async encryption of records for performance")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2021-11-24 12:09:24 +01:00