Commit Graph

17121 Commits

Author SHA1 Message Date
Joseph Myers ea18d5a4c2 Implement C23 memalignment
Add the C23 memalignment function (query the alignment of a pointer)
to glibc.

Given how simple this operation is, it would make sense for compilers
to inline calls to this function, but I'm treating that as a compiler
matter (compilers should add it as a built-in function) rather than
adding an inline version to glibc headers (although such an inline
version would be reasonable as well).  I've filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122117 for this feature
in GCC.

Tested for x86_64 and x86.
2025-10-17 16:56:59 +00:00
Adhemerval Zanella 850d93f514 math: Use binary search on lgammaf slow path
And remove some unused entries of the fallback table.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:12:08 -03:00
Adhemerval Zanella 6610a293b3 math: Use stdbit.h instead of builtin in math_config.h
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:12:04 -03:00
Adhemerval Zanella ae49afe74d math: Optimize fma call on log2pf1
The fma is required only for x == -0x1.da285cp-5 in FE_TONEAREST
to provide correctly rounded results.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:12:00 -03:00
Adhemerval Zanella 82a4f50b4e math: Optimize fma call on asinpif
The fma is required only for x == +/-0x1.6371e8p-4f in FE_TOWARDZERO
to provide correctly rounded results.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-14 11:11:56 -03:00
Adhemerval Zanella fab32b6526 math: Remove erfcf fma usage
The fma is not required to provide correctly rounded and it helps
on !__FP_FAST_FMA ISAs.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-14 08:46:06 -03:00
Adhemerval Zanella 68cb78eccc math: Remove asinhf fma usage
The fma is not required to provide correctly rounded and it helps
on !__FP_FAST_FMA ISAs.

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-14 08:46:06 -03:00
Adhemerval Zanella c075ff00a6 math: Optimize fma call on acospif
The fma is required only for inputs less than 0x1.0fd288p-127.  Also
only add the extra check for !__FP_FAST_FMA targets.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-14 08:46:06 -03:00
Adhemerval Zanella c9d9336f50 math: Remove acoshf fma usage
The fma is not strickly required to provide correctly rounded and
it helps on !__FP_FAST_FMA ABIs.

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2025-10-14 08:46:06 -03:00
Yury Khrustalev ecb0fc2f0f aarch64: tests for SME
This commit adds tests for the following use cases relevant to handing of
the SME state:

 - fork() and vfork()
 - clone() and clone3()
 - signal handler

While most cases are trivial, the case of clone3() is more complicated since
the clone3() symbol is not public in Glibc.

To avoid having to check all possible ways clone3() may be called via other
public functions (e.g. vfork() or pthread_create()), we put together a test
that links directly with clone3.o. All the existing functions that have calls
to clone3() may not actually use it, in which case the outcome of such tests
would be unexpected. Having a direct call to the clone3() symbol in the test
allows to check precisely what we need to test: that the __arm_za_disable()
function is indeed called and has the desired effect.

Linking to clone3.o also requires linking to __arm_za_disable.o that in
turn requires the _dl_hwcap2 hidden symbol which to provide in the test
and initialise it before using.

Co-authored-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-14 09:42:46 +01:00
Yury Khrustalev 27effb3d50 aarch64: clear ZA state of SME before clone and clone3 syscalls
This change adds a call to the __arm_za_disable() function immediately
before the SVC instruction inside clone() and clone3() wrappers. It also
adds a macro for inline clone() used in fork() and adds the same call to
the vfork implementation. This sets the ZA state of SME to "off" on return
from these functions (for both the child and the parent).

The __arm_za_disable() function is described in [1] (8.1.3). Note that
the internal Glibc name for this function is __libc_arm_za_disable().

When this change was originally proposed [2,3], it generated a long
discussion where several questions and concerns were raised. Here we
will address these concerns and explain why this change is useful and,
in fact, necessary.

In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent
to this change, mainly, the chapters 6.2 and 6.6), should have a call to the
__arm_za_disable() function in clone() and clone3() wrappers. The following
explains in detail why this is the case.

When we consider using the __arm_za_disable() function inside the clone()
and clone3() libc wrappers, we talk about the C library subroutines clone()
and clone3() rather than the syscalls with similar names. In the current
version of Glibc, clone() is public and clone3() is private, but it being
private is not pertinent to this discussion.

We will begin with stating that this change is NOT a bug fix for something
in the kernel. The requirement to call __arm_za_disable() does NOT come from
the kernel. It also is NOT needed to satisfy a contract between the kernel
and userspace. This is why it is not for the kernel documentation to describe
this requirement. This requirement is instead needed to satisfy a pure userspace
scheme outlined in [1] and to make sure that software that uses Glibc (or any
other C library that has correct handling of SME states (see below)) conforms
to [1] without having to unnecessarily become SME-aware thus losing portability.

To recap (see [1] (6.2)), SME extension defines SME state which is part of
processor state. Part of this SME state is ZA state that is necessary to
manage ZA storage register in the context of the ZA lazy saving scheme [1]
(6.6). This scheme exists because it would be challenging to handle ZA
storage of SME in either callee-saved or caller-saved manner.

There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA
bit and the TPIDR2_EL0 register (see [1] (6.6.3)):

- "off":       PSTATE.ZA == 0
- "active":    PSTATE.ZA == 1 TPIDR2_EL0 == null
- "dormant":   PSTATE.ZA == 1 TPIDR2_EL0 != null

As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface
depending on the permitted ZA-states on entry and on normal return from
a call to this subroutine. Callers of a subroutine must know and respect
the ZA-interface of the subroutines they are using. Using a subroutine
in a way that is not permitted by its ZA-interface is undefined behaviour.

In particular, clone() and clone3() (the C library functions) have the
ZA-private interface. This means that the permitted ZA-states on entry
are "off" and "dormant" and that the permitted states on return are "off"
or "dormant" (but if and only if it was "dormant" on entry).

This means that both functions in question should correctly handle both
"off" and "dormant" ZA-states on entry. The conforming states on return
are "off" and "dormant" (if inbound state was already "dormant").

This change ensures that the ZA-state on return is always "off". Note,
that, in the context of clone() and clone3(), "on return" means a point
when execution resumes at certain address after transferring from clone()
or clone3(). For the caller (we may refer to it as "parent") this is the
return address in the link register where the RET instruction jumps. For
the "child", this is the target branch address.

So, the "off" state on return is permitted and conformant. Why can't we
retain the "dormant" state? In theory, we can, but we shouldn't, here is
why.

Every subroutine with a private-ZA interface, including clone() and clone3(),
must comply with the lazy saving scheme [1] (6.7.2). This puts additional
responsibility on a subroutine if ZA-state on return is "dormant" because
this state has special meaning. The "caller" (that is the place in code
where execution is transferred to, so this include both "parent" and "child")
may check the ZA-state and use it as per the spec of the "dormant" state that
is outlined in [1] (6.6.6 and 6.6.7).

Conforming to this would require more code inside of clone() and clone3()
which hardly is desirable.

For the return to "parent" this could be achieved in theory, but given that
neither clone() nor clone3() are supposed to be used in the middle of an
SME operation, if wouldn't be useful. For the "return" to "child" this
would be particularly difficult to achieve given the complexity of these
functions and their interfaces. Most importantly, it would be illegal
and somewhat meaningless to allow a "child" to start execution in the
"dormant" ZA-state because the very essence of the "dormant" state implies
that there is a place to return and that there is some outer context that
we are allowed to interact with.

To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the
execution resumes after a call to clone() or clone3() is correct and also
the most simple way to conform to [1].

Can there be situations when we can avoid calling __arm_za_disable()?

Calling __arm_za_disable() implies certain (sufficiently small) overhead,
so one might rightly ponder avoiding making a call to this function when
we can afford not to. The most trivial cases like this (e.g. when the
calling thread doesn't have access to SME or to the TPIDR2_EL0 register)
are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning
about other possible use cases would require making code inside clone() and
clone3() more complicated and it would defeat the point of trying to make
an optimisation of not calling __arm_za_disable().

Why can't the kernel do this instead?

The handling of SME state by the kernel is described in [4]. In short,
kernel must not impose a specific ZA-interface onto a userspace function.
Interaction with the kernel happens (among other thing) via system calls.
In Glibc many of the system calls (notably, including SYS_clone and
SYS_clone3) are used via wrappers, and the kernel has no control of them
and, moreover, it cannot dictate how these wrappers should behave because
it is simply outside of the kernel's remit.

However, in certain cases, the kernel may ensure that a "child" doesn't
start in an incorrect state. This is what is done by the recent change
included in 6.16 kernel [5]. This is not enough to ensure that code that
uses clone() and clone3() function conforms to [1] when it runs on a
system that provides SME, hence this change.

[1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
[2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com
[3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com
[4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html
[5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-14 09:42:46 +01:00
Yury Khrustalev b4b713bd89 aarch64: define macro for calling __libc_arm_za_disable
A common sequence of instructions is used in several places
in assembly files, so define it in one place as an assembly
macro.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-10-14 09:42:46 +01:00
Paul Zimmermann ea5b996be9 replace use of double by float [BZ#29326] 2025-10-14 09:46:00 +02:00
Arjun Shankar 88ce558a31 string: Add tests for unique strerror and strsignal strings
strerror, strsignal, and their variants should return unique strings for
each known (and, depending on the function, unknown) error/signal.  Add
tests to verify this for strerror, strerror_r (GNU and XSI compliant
variants), and strerror_l (for the C locale), strerrordesc_np,
strsignal, sigabbrev_np, and sigdescr_np.

Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-10-13 19:04:44 +02:00
Uros Bizjak 3a0a8eae50 x86: Fix trivial code formatting erros in my last two commits
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
2025-10-12 17:59:16 +02:00
Uros Bizjak bb019bc68f i386: Use __seg_gs qualifiers in PTR_{MANGLE,DEMANGLE}() macros
Use __seg_gs named address space qualifiers in PTR_MANGLE() and
PTR_DEMANGLE() macros to access the pointer_guard field in the TCB.

This change allows the compiler to eliminate redundant reads of
the variable, reducing the number of reads from 105 to 94 and
decreasing the text size of the library by 280 bytes.

While at it, fix a few trivial whitespace issues as well

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-12 17:48:55 +02:00
Uros Bizjak 60e3ada68d x86_64: Use __seg_fs qualifiers in PTR_{MANGLE,DEMANGLE}() macros
Use __seg_fs named address space qualifiers in PTR_MANGLE() and
PTR_DEMANGLE() macros to access the pointer_guard field in the TCB.

This change allows the compiler to eliminate redundant reads of
the variable, reducing the number of reads from 98 to 89 and
decreasing the text size of the library by 512 bytes.

While at it, fix a few trivial whitespace issues as well.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-12 17:47:55 +02:00
Yury Khrustalev 7a47a51e8d misc: Fix several typos 2025-10-10 14:52:40 +01:00
Uros Bizjak 3ee23564ce x86: Use typeof_member style in RSEQ area access expressions
Update RSEQ access macros to use `(struct rseq_area) {}.member`
in _Static_assert and __typeof expressions, instead of
RSEQ_SELF()->member.  This adopts the typeof_member style, avoiding
reliance on RSEQ_SELF for compile-time expressions.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-08 09:35:15 +02:00
Uros Bizjak 99518a3a35 x86: Simplify RSEQ area access expressions
Replace manual cast with a direct
`(struct rseq_area __seg_gs *)__rseq_offset` dereference to access
`member`.  This avoids redundant `offsetof(struct rseq_area, member)`
and improves readability while preserving semantics.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-08 09:35:15 +02:00
Uros Bizjak e47728a77c x86: Simplify stack and pointer guard macros
Replace manual casts with a direct `(__tcbhead_t __seg_gs *)0`
dereferences for `stack_guard` and `pointer_guard`.  This makes
the macros more straightforward and removes the dependency on
<stdint.h>.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-08 09:35:15 +02:00
Uros Bizjak f48b12aab6 x86: Simplify TCB access expressions
Replace manual cast with a direct `(__typeof(*descr) __seg_gs *)0`
dereference to access `member`.  This avoids redundant
`offsetof(struct pthread, member)` and improves readability while
preserving semantics.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-08 09:35:15 +02:00
Sunil K Pandey a114e29ddd x86: Detect Intel Nova Lake Processor
Detect Intel Nova Lake Processor and tune it similar to Intel Panther
Lake.  https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-07 20:50:24 -07:00
Sunil K Pandey f8dd52901b x86: Detect Intel Wildcat Lake Processor
Detect Intel Wildcat Lake Processor and tune it similar to Intel Panther
Lake.  https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-07 16:41:06 -07:00
Sachin Monga 2ea943f7d4 ppc64le: Restore optimized strncmp for power10
This patch addresses the actual cause of CVE-2025-5745

The vector non-volatile registers are not used anymore for
32 byte load and comparison operation

Additionally, the assembler workaround used earlier for the
instruction lxvp is replaced with actual instruction.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Co-authored-by: Paul Murphy <paumurph@redhat.com>
2025-10-07 03:25:42 -05:00
Sachin Monga 9a40b1cda5 ppc64le: Restore optimized strcmp for power10
This patch addresses the actual cause of CVE-2025-5702

The vector non-volatile registers are not used anymore for
32 byte load and comparison operation

Additionally, the assembler workaround used earlier for the
instruction lxvp is replaced with actual instruction.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Co-authored-by: Paul Murphy <paumurph@redhat.com>
2025-10-07 03:20:44 -05:00
Adhemerval Zanella 0c8cdb10a1 arm: Add ARM VFPv4 VFMA instruction support in fma/fmaf (BZ 15503)
It is enabled through math-use-builtins-fma.h if  glibc is built
for VPFv4 (__ARM_FEATURE_FMA predefined by GCC), or through IFUNC
(testing HWCAP_ARM_VFPv4) otherwise.

Checked on arm-linux-gnueabihf.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-03 15:19:54 -03:00
Adhemerval Zanella 61ac7c6a75 math: Optimize flt-32 remainder implementation
With same micro-optimization done for the double variant:

  * Combine the |y| zero check.
  * Rework the check to adjust result and call fmod.
  * Remove one check after fmod.
  * Remove float-int-float roundtrip on return.

Also use math_config.h macros and indent the code.  The resulting
strategy is different in many places that I think requires a
different Copyright.

I see the following performance improvements using remainder benchtests
(using reciprocal-throughput metric):

Architecture     | Input           |   master |   patch  | Improvemnt
-----------------|-----------------|----------|-----------------------
x86_64           | subnormals      |  20.4176 |  19.6144 |      3.93%
x86_64           | normal          |  54.0939 |  52.2343 |      3.44%
x86_64           | close-exponent  |  23.9120 |  22.3768 |      6.42%
aarch64          | subnormals      |   9.2423 |   8.3825 |      9.30%
aarch64          | normal          |  30.5393 |   29.244 |      4.24%
aarch64          | close-exponent  |  15.5405 |  13.9256 |     10.39%

The aarch64 used as Neoverse-N1, gcc 15.1.1; while the x86_64 was
a AMD Ryzen 9 5900X, gcc 15.2.1.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-03 15:19:44 -03:00
Adhemerval Zanella f0facb2d27 math: Optimize dbl-64 remainder implementation
The commit 34b9f8bc17 provides an optimized fmod implementation; use
the same strategy used for remainderf and implement the double variant
on top of fmod.

I see the following performance improvements using remainder benchtests
(using reciprocal-throughput metric):

Architecture     | Input           |   master |   patch  | Improvemnt
-----------------|-----------------|----------|-----------------------
x86_64           | subnormals      |  76.1345 |  21.5334 |     71.72%
x86_64           | normal          | 553.2670 | 426.5670 |     22.90%
x86_64           | close-exponent  |  30.5111 |  22.6893 |     25.64%
aarch64          | subnormals      |  26.0734 |   8.4876 |     67.45%
aarch64          | normal          | 205.2590 |  200.082 |      2.52%
aarch64          | close-exponent  |  13.8481 |  13.6663 |      1.31%

The aarch64 used as Neoverse-N1, gcc 15.1.1; while the x86_64 was
a AMD Ryzen 9 5900X, gcc 15.2.1.

This implementation also fixes the math/test-double-remainder issues
on alpha.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-10-03 15:19:31 -03:00
Collin Funk 6d9e110577 math: fix Wshift-overflow warning.
When compiling on x86_64 with -Wshift-overflow=2 you can see the
following warning:

../sysdeps/ieee754/flt-32/math_config.h: In function ‘is_inf’:
../sysdeps/ieee754/flt-32/math_config.h:184:37: warning: result of ‘2139095040 << 1’ requires 33 bits to represent, but ‘int’ only has 32 bits [-Wshift-overflow=]
  184 |   return (x << 1) == (EXPONENT_MASK << 1);
      |                                     ^~

This patch adjusts the definitions to use UINT32_C. This matches the
definitions in sysdeps/ieee754/dbl-64/math_config.h which use UINT64_C
for these definitions.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-10-02 18:01:23 -07:00
Joseph Myers a7ddbf456d Add once_flag, ONCE_FLAG_INIT and call_once to stdlib.h for C23
C23 adds once_flag, ONCE_FLAG_INIT and call_once to stdlib.h (in C11
they were only in threads.h, in C23 they are in both headers; this
change came from N2840).  Implement this change, with a
bits/types/once_flag.h header for the common type and initializer
definitions.

Note that there's an omnibus bug (bug 33001) that covers more than
just these missing definitions.

This doesn't seem a significant enough feature to be worth mentioning
in NEWS.

ISO C is not concerned with whether functions are in libc or
libpthread, but POSIX links this to what header they are declared in,
so functions declared in stdlib.h are supposed to be in libc.
However, the current edition of POSIX is based on C17; hopefully Hurd
glibc will have completed the merge of libpthread into libc (in
particular, moving call_once) well before a future edition of POSIX
based on C23 (or a later version of ISO C) is released.

Tested for x86_64 and x86.
2025-10-01 15:15:15 +00:00
Joseph Myers 0f201f4a81 Implement C23 memset_explicit (bug 32378)
Add the C23 memset_explicit function to glibc.  Everything here is
closely based on the approach taken for explicit_bzero.  This includes
the bits that relate to internal uses of explicit_bzero within glibc
(although we don't currently have any such internal uses of
memset_explicit), and also includes the nonnull attribute (when we
move to nonnull_if_nonzero for various functions following C2y, this
function should be included in that change).

The function is declared both for __USE_MISC and for __GLIBC_USE (ISOC23)
(so by default not just for compilers defaulting to C23 mode).

Tested for x86_64 and x86.
2025-10-01 15:14:09 +00:00
Collin Funk e7eadbb29f Linux: Fix tst-copy_file_range-large test on recent kernels [BZ #33498]
Instead of a negative return value the fixed FUSE copy_file_range will
silently truncate the size to UINT_MAX & PAGE_MASK [1]. Allow that value
to be returned as well.

[1] 1e08938c36

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-09-27 18:18:04 -07:00
Luna Lamb 653e6c4fff AArch64: Implement AdvSIMD and SVE log10p1(f) routines
Vector variants of the new C23 log10p1 routines.

Note: Benchmark inputs for log10p1(f) are identical to log1p(f)

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-09-27 12:45:59 +00:00
Luna Lamb db42732474 AArch64: Implement AdvSIMD and SVE log2p1(f) routines
Vector variants of the new C23 log2p1 routines.

Note: Benchmark inputs for log2p1(f) are identical to log1p(f).

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-09-27 12:44:09 +00:00
Uros Bizjak a9a8b106bb x86: Restore "*&" GCC asm memory operand workaround to installed fpu-control.h
fpu_control.h is an installed header so a wider range of compiler versions
(including ones older than GCC 9) are relevant with it than are relevant
for building glibc.

Fixes commit 3014dec3ad
('x86: Remove obsolete "*&" GCC asm memory operand workaround')

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
2025-09-24 08:04:41 +02:00
Adhemerval Zanella c40832acff math: Remove unused files
The multiprecision slow paths were removed in glibc 2.28.
2025-09-23 10:29:24 -03:00
Jovan Dmitrovic 70d45632ad mips: Fix delay slot filling in bsd-setjmp.S
In the !defined __PIC__ case, we cannot guarantee that the delay slot
is properly filled at the final `j` instuction without reordering
active.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-09-23 10:29:24 -03:00
Jovan Dmitrovic 3ac2833ec7 mips: Remove strcmp.S
Testing strcmp on MIPS hardware shows that strcmp.S performs worse
than the combination of using the generic strcmp.c implementation
alongside -funroll-loops.

Suggested-by:  Joseph Myers <josmyers@redhat.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-09-23 10:29:24 -03:00
Adhemerval Zanella c1016b727a assert: Refactor assert/assert_perror
It now calls __libc_assert, which contains similar logic. The assert
call only requires memory allocation for the message translation, so
test-assert2.c is adapted to handle it.

It also removes the fxprintf from assert/assert_perror; although it
is not 100% backwards-compatible (write message only if there is a
file descriptor associated with the stderr). It now writes bytes
directly without going through the wide stream state.

Checked on aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-09-23 10:29:24 -03:00
Uros Bizjak b8254a047f x86_64: Fix number of operands mismatch for `vdivss'
Fixes commit ff8be6152b
('x86: Use "%v" to emit VEX encoded instructions for AVX targets')

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
2025-09-23 08:13:13 +02:00
Uros Bizjak ff8be6152b x86: Use "%v" to emit VEX encoded instructions for AVX targets
Legacy encodings of SSE instructions incur AVX-SSE domain transition
penalties on some Intel microarchitectures (e.g. Haswell, Broadwell).
Using the VEX forms avoids these penatlies and keeps all instructions
in the VEX decode domain.  Use "%v" sequence to emit the "v" prefix
for opcodes when compiling with -mavx.

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-09-22 17:33:25 +02:00
Uros Bizjak 3014dec3ad x86: Remove obsolete "*&" GCC asm memory operand workaround
GCC now accept plain variable names as valid lvalues for "m"
constraints, automatically spilling locals to memory if necessary.
The long-standing "*&" pattern was originally used as a defensive
workaround for older compiler versions that rejected operands
such as:

     asm ("incl %0" : "+m"(x));

with errors like "memory input is not directly addressable".

Modern compilers (GCC >= 9) reliably generate correct code
without the workaround, and the resulting assembly is identical.

No functional changes intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-09-22 17:33:25 +02:00
Samuel Thibault 81a6e97791 hurd: Note BZ #30166 as fixed
802b0eba51 ("hurd: implement RLIMIT_AS against Mach RPCs") brought the
needed RLIMIT_AS support for memory-crunchy tests.
2025-09-22 02:17:50 +02:00
Diego Nieto Cid 802b0eba51 hurd: implement RLIMIT_AS against Mach RPCs
Check for VM limit RPCs

  * config.h.in: add #undef for HAVE_MACH_VM_GET_SIZE_LIMIT and
    HAVE_MACH_VM_SET_SIZE_LIMIT.
  * sysdeps/mach/configure.ac: use mach_RPC_CHECK to check for
    vm_set_size_limit and vm_get_size_limit RPCs in gnumach.defs.
  * sysdeps/mach/configure: regenerate file.

Use vm_get_size_limit to initialize RLIMIT_AS

  * hurd/hurdrlimit.c(init_rlimit): use vm_get_size_limit to initialize
    RLIMIT_AS entry of the _hurd_rlimits array.

Notify the kernel of the new VM size limits

  * sysdeps/mach/hurd/setrlimit.c: use the vm_set_size_limit RPC,
    if available, to notify the kernel of the new limits. Retry RPC
    calls if they were interrupted by a signal.
Message-ID: <03fb90a795b354a366ee73f56f73e6ad22a86cda.1755220108.git.dnietoc@gmail.com>
2025-09-22 00:52:37 +02:00
Samuel Thibault c9cc047e9f hurd: catch SIGSEGV on returning from signal handler
On stack overflow typically, we may not actually have room on the stack to
trampoline back from the signal handler.  We have to detect this before
locking the ss, otherwise the signal thread will be stuck on taking the
ss lock while trying to post SIGSEGV.
2025-09-21 23:45:40 +02:00
Wilco Dijkstra aebaeb2c33 AArch64: Update math-vector-fortran.h
Update math-vector-fortran.h with the latest set of math functions
and sort by name.

Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
2025-09-19 12:57:47 +00:00
H.J. Lu 1fa5773eb1 x86: Don't use asm statement for trunc/truncf
Compiler inlines trunc and truncf with SSE4.1.  But older versions of GCC
doesn't inline them with -Os:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121861

Don't use asm statement for trunc and truncf if compiler can inline them
with -Os.  It removes one register move with GCC 16:

__modff_sse41:                        __modff_sse41:
.LFB23:                               .LFB23:
   .cfi_startproc                        .cfi_startproc
   endbr64                               endbr64
   subq  $24, %rsp                       subq  $24, %rsp
   .cfi_def_cfa_offset 32                .cfi_def_cfa_offset 32
   movq  %fs:40, %rax                    movq  %fs:40, %rax
   movq  %rax, 8(%rsp)                   movq  %rax, 8(%rsp)
   xorl  %eax, %eax                      xorl  %eax, %eax
   movd  %xmm0, %eax                     movd  %xmm0, %eax
   addl  %eax, %eax                      addl  %eax, %eax
   cmpl  $-16777216, %eax                cmpl  $-16777216, %eax
   je .L7                                je .L7
                                   >     movaps   %xmm0, %xmm3
   movaps   %xmm0, %xmm4                 movaps   %xmm0, %xmm4
   movss .LC0(%rip), %xmm2         |     movss .LC0(%rip), %xmm1
   movaps   %xmm2, %xmm3           |     movaps   %xmm1, %xmm2
   andps %xmm0, %xmm2              |     roundss  $11, %xmm3, %xmm3
   roundss $11, %xmm0, %xmm1       |     subss %xmm3, %xmm4
   subss %xmm1, %xmm4              |     andps %xmm0, %xmm1
   andnps   %xmm4, %xmm3           |     andnps   %xmm4, %xmm2
   orps  %xmm3, %xmm2              |     orps  %xmm2, %xmm1
.L3:                                  .L3:
   movss %xmm1, (%rdi)             |     movss %xmm3, (%rdi)
   movq  8(%rsp), %rax                   movq  8(%rsp), %rax
   subq  %fs:40, %rax                    subq  %fs:40, %rax
   jne   .L8                             jne   .L8
   movaps   %xmm2, %xmm0           |     movaps   %xmm1, %xmm0
   addq  $24, %rsp                       addq  $24, %rsp
   .cfi_remember_state                   .cfi_remember_state
   .cfi_def_cfa_offset 8                 .cfi_def_cfa_offset 8
   ret                                   ret

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Uros Bizjak <ubizjak@gmail.com>
2025-09-17 04:30:11 -07:00
H.J. Lu d6666eea3e i686: Compile .op files and gmon tests with -mfentry
On i686, after GCC 16 commit:

commit 07d8de9174c421d719649639a1452b8b9f2eee32
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Jul 2 08:58:23 2025 +0800

    x86-64: Add --enable-x86-64-mfentry

which warns ‘-pg’ without ‘-mfentry’, when glibc is configured with
--disable-default-pie, GCC 16 fails to compile .op files and gmon tests
with error:

cc1: error: ‘-pg’ without ‘-mfentry’ may be unreliable with shrink wrapping [-Werror]

Compile .op files and gmon tests with -mfentry if it is supported by
CC/TEST_CC and glibc is configured with --disable-default-pie.  This
fixes BZ #33376.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Joseph Myers <josmyers@redhat.com>
2025-09-15 11:14:03 -07:00
Uros Bizjak 041151f439 i386: Use __seg_gs qualifier to cast access to TCB in THREAD_GSCOPE_RESET_FLAG()
Use the __seg_gs named address space qualifier to cast access to the
gscope_flag in the TCB as a %gs: prefixed address.  This enables the
use of the "m" operand constraint, which informs the compiler about
memory access in the inline assembly.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: H.J.Lu <hjl.tools@gmail.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Carlos O'Donell <carlos@redhat.com>
2025-09-14 21:50:50 +02:00