glibc

Commit Graph

Author	SHA1	Message	Date
Joseph Myers	ea18d5a4c2	Implement C23 memalignment Add the C23 memalignment function (query the alignment of a pointer) to glibc. Given how simple this operation is, it would make sense for compilers to inline calls to this function, but I'm treating that as a compiler matter (compilers should add it as a built-in function) rather than adding an inline version to glibc headers (although such an inline version would be reasonable as well). I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122117 for this feature in GCC. Tested for x86_64 and x86.	2025-10-17 16:56:59 +00:00
Adhemerval Zanella	850d93f514	math: Use binary search on lgammaf slow path And remove some unused entries of the fallback table. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:12:08 -03:00
Adhemerval Zanella	6610a293b3	math: Use stdbit.h instead of builtin in math_config.h Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:12:04 -03:00
Adhemerval Zanella	ae49afe74d	math: Optimize fma call on log2pf1 The fma is required only for x == -0x1.da285cp-5 in FE_TONEAREST to provide correctly rounded results. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:12:00 -03:00
Adhemerval Zanella	82a4f50b4e	math: Optimize fma call on asinpif The fma is required only for x == +/-0x1.6371e8p-4f in FE_TOWARDZERO to provide correctly rounded results. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-14 11:11:56 -03:00
Adhemerval Zanella	fab32b6526	math: Remove erfcf fma usage The fma is not required to provide correctly rounded and it helps on !__FP_FAST_FMA ISAs. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2025-10-14 08:46:06 -03:00
Adhemerval Zanella	68cb78eccc	math: Remove asinhf fma usage The fma is not required to provide correctly rounded and it helps on !__FP_FAST_FMA ISAs. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2025-10-14 08:46:06 -03:00
Adhemerval Zanella	c075ff00a6	math: Optimize fma call on acospif The fma is required only for inputs less than 0x1.0fd288p-127. Also only add the extra check for !__FP_FAST_FMA targets. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2025-10-14 08:46:06 -03:00
Adhemerval Zanella	c9d9336f50	math: Remove acoshf fma usage The fma is not strickly required to provide correctly rounded and it helps on !__FP_FAST_FMA ABIs. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2025-10-14 08:46:06 -03:00
Yury Khrustalev	ecb0fc2f0f	aarch64: tests for SME This commit adds tests for the following use cases relevant to handing of the SME state: - fork() and vfork() - clone() and clone3() - signal handler While most cases are trivial, the case of clone3() is more complicated since the clone3() symbol is not public in Glibc. To avoid having to check all possible ways clone3() may be called via other public functions (e.g. vfork() or pthread_create()), we put together a test that links directly with clone3.o. All the existing functions that have calls to clone3() may not actually use it, in which case the outcome of such tests would be unexpected. Having a direct call to the clone3() symbol in the test allows to check precisely what we need to test: that the __arm_za_disable() function is indeed called and has the desired effect. Linking to clone3.o also requires linking to __arm_za_disable.o that in turn requires the _dl_hwcap2 hidden symbol which to provide in the test and initialise it before using. Co-authored-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-14 09:42:46 +01:00
Yury Khrustalev	27effb3d50	aarch64: clear ZA state of SME before clone and clone3 syscalls This change adds a call to the __arm_za_disable() function immediately before the SVC instruction inside clone() and clone3() wrappers. It also adds a macro for inline clone() used in fork() and adds the same call to the vfork implementation. This sets the ZA state of SME to "off" on return from these functions (for both the child and the parent). The __arm_za_disable() function is described in [1] (8.1.3). Note that the internal Glibc name for this function is __libc_arm_za_disable(). When this change was originally proposed [2,3], it generated a long discussion where several questions and concerns were raised. Here we will address these concerns and explain why this change is useful and, in fact, necessary. In a nutshell, a C library that conforms to the AAPCS64 spec [1] (pertinent to this change, mainly, the chapters 6.2 and 6.6), should have a call to the __arm_za_disable() function in clone() and clone3() wrappers. The following explains in detail why this is the case. When we consider using the __arm_za_disable() function inside the clone() and clone3() libc wrappers, we talk about the C library subroutines clone() and clone3() rather than the syscalls with similar names. In the current version of Glibc, clone() is public and clone3() is private, but it being private is not pertinent to this discussion. We will begin with stating that this change is NOT a bug fix for something in the kernel. The requirement to call __arm_za_disable() does NOT come from the kernel. It also is NOT needed to satisfy a contract between the kernel and userspace. This is why it is not for the kernel documentation to describe this requirement. This requirement is instead needed to satisfy a pure userspace scheme outlined in [1] and to make sure that software that uses Glibc (or any other C library that has correct handling of SME states (see below)) conforms to [1] without having to unnecessarily become SME-aware thus losing portability. To recap (see [1] (6.2)), SME extension defines SME state which is part of processor state. Part of this SME state is ZA state that is necessary to manage ZA storage register in the context of the ZA lazy saving scheme [1] (6.6). This scheme exists because it would be challenging to handle ZA storage of SME in either callee-saved or caller-saved manner. There are 3 kinds of ZA state that are defined in terms of the PSTATE.ZA bit and the TPIDR2_EL0 register (see [1] (6.6.3)): - "off": PSTATE.ZA == 0 - "active": PSTATE.ZA == 1 TPIDR2_EL0 == null - "dormant": PSTATE.ZA == 1 TPIDR2_EL0 != null As [1] (6.7.2) outlines, every subroutine has exactly one SME-interface depending on the permitted ZA-states on entry and on normal return from a call to this subroutine. Callers of a subroutine must know and respect the ZA-interface of the subroutines they are using. Using a subroutine in a way that is not permitted by its ZA-interface is undefined behaviour. In particular, clone() and clone3() (the C library functions) have the ZA-private interface. This means that the permitted ZA-states on entry are "off" and "dormant" and that the permitted states on return are "off" or "dormant" (but if and only if it was "dormant" on entry). This means that both functions in question should correctly handle both "off" and "dormant" ZA-states on entry. The conforming states on return are "off" and "dormant" (if inbound state was already "dormant"). This change ensures that the ZA-state on return is always "off". Note, that, in the context of clone() and clone3(), "on return" means a point when execution resumes at certain address after transferring from clone() or clone3(). For the caller (we may refer to it as "parent") this is the return address in the link register where the RET instruction jumps. For the "child", this is the target branch address. So, the "off" state on return is permitted and conformant. Why can't we retain the "dormant" state? In theory, we can, but we shouldn't, here is why. Every subroutine with a private-ZA interface, including clone() and clone3(), must comply with the lazy saving scheme [1] (6.7.2). This puts additional responsibility on a subroutine if ZA-state on return is "dormant" because this state has special meaning. The "caller" (that is the place in code where execution is transferred to, so this include both "parent" and "child") may check the ZA-state and use it as per the spec of the "dormant" state that is outlined in [1] (6.6.6 and 6.6.7). Conforming to this would require more code inside of clone() and clone3() which hardly is desirable. For the return to "parent" this could be achieved in theory, but given that neither clone() nor clone3() are supposed to be used in the middle of an SME operation, if wouldn't be useful. For the "return" to "child" this would be particularly difficult to achieve given the complexity of these functions and their interfaces. Most importantly, it would be illegal and somewhat meaningless to allow a "child" to start execution in the "dormant" ZA-state because the very essence of the "dormant" state implies that there is a place to return and that there is some outer context that we are allowed to interact with. To sum up, calling __arm_za_disable() to ensure the "off" ZA-state when the execution resumes after a call to clone() or clone3() is correct and also the most simple way to conform to [1]. Can there be situations when we can avoid calling __arm_za_disable()? Calling __arm_za_disable() implies certain (sufficiently small) overhead, so one might rightly ponder avoiding making a call to this function when we can afford not to. The most trivial cases like this (e.g. when the calling thread doesn't have access to SME or to the TPIDR2_EL0 register) are already handled by this function (see [1] (8.1.3 and 8.1.2)). Reasoning about other possible use cases would require making code inside clone() and clone3() more complicated and it would defeat the point of trying to make an optimisation of not calling __arm_za_disable(). Why can't the kernel do this instead? The handling of SME state by the kernel is described in [4]. In short, kernel must not impose a specific ZA-interface onto a userspace function. Interaction with the kernel happens (among other thing) via system calls. In Glibc many of the system calls (notably, including SYS_clone and SYS_clone3) are used via wrappers, and the kernel has no control of them and, moreover, it cannot dictate how these wrappers should behave because it is simply outside of the kernel's remit. However, in certain cases, the kernel may ensure that a "child" doesn't start in an incorrect state. This is what is done by the recent change included in 6.16 kernel [5]. This is not enough to ensure that code that uses clone() and clone3() function conforms to [1] when it runs on a system that provides SME, hence this change. [1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst [2]: https://inbox.sourceware.org/libc-alpha/20250522114828.2291047-1-yury.khrustalev@arm.com [3]: https://inbox.sourceware.org/libc-alpha/20250609121407.3316070-1-yury.khrustalev@arm.com [4]: https://www.kernel.org/doc/html/v6.16/arch/arm64/sme.html [5]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cde5c32db55740659fca6d56c09b88800d88fd29 Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-14 09:42:46 +01:00
Yury Khrustalev	b4b713bd89	aarch64: define macro for calling __libc_arm_za_disable A common sequence of instructions is used in several places in assembly files, so define it in one place as an assembly macro. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-14 09:42:46 +01:00
Paul Zimmermann	ea5b996be9	replace use of double by float [BZ#29326]	2025-10-14 09:46:00 +02:00
Arjun Shankar	88ce558a31	string: Add tests for unique strerror and strsignal strings strerror, strsignal, and their variants should return unique strings for each known (and, depending on the function, unknown) error/signal. Add tests to verify this for strerror, strerror_r (GNU and XSI compliant variants), and strerror_l (for the C locale), strerrordesc_np, strsignal, sigabbrev_np, and sigdescr_np. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-13 19:04:44 +02:00
Uros Bizjak	3a0a8eae50	x86: Fix trivial code formatting erros in my last two commits Signed-off-by: Uros Bizjak <ubizjak@gmail.com>	2025-10-12 17:59:16 +02:00
Uros Bizjak	bb019bc68f	i386: Use __seg_gs qualifiers in PTR_{MANGLE,DEMANGLE}() macros Use __seg_gs named address space qualifiers in PTR_MANGLE() and PTR_DEMANGLE() macros to access the pointer_guard field in the TCB. This change allows the compiler to eliminate redundant reads of the variable, reducing the number of reads from 105 to 94 and decreasing the text size of the library by 280 bytes. While at it, fix a few trivial whitespace issues as well Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-12 17:48:55 +02:00
Uros Bizjak	60e3ada68d	x86_64: Use __seg_fs qualifiers in PTR_{MANGLE,DEMANGLE}() macros Use __seg_fs named address space qualifiers in PTR_MANGLE() and PTR_DEMANGLE() macros to access the pointer_guard field in the TCB. This change allows the compiler to eliminate redundant reads of the variable, reducing the number of reads from 98 to 89 and decreasing the text size of the library by 512 bytes. While at it, fix a few trivial whitespace issues as well. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-12 17:47:55 +02:00
Yury Khrustalev	7a47a51e8d	misc: Fix several typos	2025-10-10 14:52:40 +01:00
Uros Bizjak	3ee23564ce	x86: Use typeof_member style in RSEQ area access expressions Update RSEQ access macros to use `(struct rseq_area) {}.member` in _Static_assert and __typeof expressions, instead of RSEQ_SELF()->member. This adopts the typeof_member style, avoiding reliance on RSEQ_SELF for compile-time expressions. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-08 09:35:15 +02:00
Uros Bizjak	99518a3a35	x86: Simplify RSEQ area access expressions Replace manual cast with a direct `(struct rseq_area __seg_gs *)__rseq_offset` dereference to access `member`. This avoids redundant `offsetof(struct rseq_area, member)` and improves readability while preserving semantics. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-08 09:35:15 +02:00
Uros Bizjak	e47728a77c	x86: Simplify stack and pointer guard macros Replace manual casts with a direct `(__tcbhead_t __seg_gs *)0` dereferences for `stack_guard` and `pointer_guard`. This makes the macros more straightforward and removes the dependency on <stdint.h>. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-08 09:35:15 +02:00
Uros Bizjak	f48b12aab6	x86: Simplify TCB access expressions Replace manual cast with a direct `(__typeof(descr) __seg_gs )0` dereference to access `member`. This avoids redundant `offsetof(struct pthread, member)` and improves readability while preserving semantics. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-08 09:35:15 +02:00
Sunil K Pandey	a114e29ddd	x86: Detect Intel Nova Lake Processor Detect Intel Nova Lake Processor and tune it similar to Intel Panther Lake. https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-07 20:50:24 -07:00
Sunil K Pandey	f8dd52901b	x86: Detect Intel Wildcat Lake Processor Detect Intel Wildcat Lake Processor and tune it similar to Intel Panther Lake. https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-07 16:41:06 -07:00
Sachin Monga	2ea943f7d4	ppc64le: Restore optimized strncmp for power10 This patch addresses the actual cause of CVE-2025-5745 The vector non-volatile registers are not used anymore for 32 byte load and comparison operation Additionally, the assembler workaround used earlier for the instruction lxvp is replaced with actual instruction. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Co-authored-by: Paul Murphy <paumurph@redhat.com>	2025-10-07 03:25:42 -05:00
Sachin Monga	9a40b1cda5	ppc64le: Restore optimized strcmp for power10 This patch addresses the actual cause of CVE-2025-5702 The vector non-volatile registers are not used anymore for 32 byte load and comparison operation Additionally, the assembler workaround used earlier for the instruction lxvp is replaced with actual instruction. Signed-off-by: Sachin Monga <smonga@linux.ibm.com> Co-authored-by: Paul Murphy <paumurph@redhat.com>	2025-10-07 03:20:44 -05:00
Adhemerval Zanella	0c8cdb10a1	arm: Add ARM VFPv4 VFMA instruction support in fma/fmaf (BZ 15503) It is enabled through math-use-builtins-fma.h if glibc is built for VPFv4 (__ARM_FEATURE_FMA predefined by GCC), or through IFUNC (testing HWCAP_ARM_VFPv4) otherwise. Checked on arm-linux-gnueabihf. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-03 15:19:54 -03:00
Adhemerval Zanella	61ac7c6a75	math: Optimize flt-32 remainder implementation With same micro-optimization done for the double variant: * Combine the \|y\| zero check. * Rework the check to adjust result and call fmod. * Remove one check after fmod. * Remove float-int-float roundtrip on return. Also use math_config.h macros and indent the code. The resulting strategy is different in many places that I think requires a different Copyright. I see the following performance improvements using remainder benchtests (using reciprocal-throughput metric): Architecture \| Input \| master \| patch \| Improvemnt -----------------\|-----------------\|----------\|----------------------- x86_64 \| subnormals \| 20.4176 \| 19.6144 \| 3.93% x86_64 \| normal \| 54.0939 \| 52.2343 \| 3.44% x86_64 \| close-exponent \| 23.9120 \| 22.3768 \| 6.42% aarch64 \| subnormals \| 9.2423 \| 8.3825 \| 9.30% aarch64 \| normal \| 30.5393 \| 29.244 \| 4.24% aarch64 \| close-exponent \| 15.5405 \| 13.9256 \| 10.39% The aarch64 used as Neoverse-N1, gcc 15.1.1; while the x86_64 was a AMD Ryzen 9 5900X, gcc 15.2.1. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-03 15:19:44 -03:00
Adhemerval Zanella	f0facb2d27	math: Optimize dbl-64 remainder implementation The commit `34b9f8bc17` provides an optimized fmod implementation; use the same strategy used for remainderf and implement the double variant on top of fmod. I see the following performance improvements using remainder benchtests (using reciprocal-throughput metric): Architecture \| Input \| master \| patch \| Improvemnt -----------------\|-----------------\|----------\|----------------------- x86_64 \| subnormals \| 76.1345 \| 21.5334 \| 71.72% x86_64 \| normal \| 553.2670 \| 426.5670 \| 22.90% x86_64 \| close-exponent \| 30.5111 \| 22.6893 \| 25.64% aarch64 \| subnormals \| 26.0734 \| 8.4876 \| 67.45% aarch64 \| normal \| 205.2590 \| 200.082 \| 2.52% aarch64 \| close-exponent \| 13.8481 \| 13.6663 \| 1.31% The aarch64 used as Neoverse-N1, gcc 15.1.1; while the x86_64 was a AMD Ryzen 9 5900X, gcc 15.2.1. This implementation also fixes the math/test-double-remainder issues on alpha. Tested on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-03 15:19:31 -03:00
Collin Funk	6d9e110577	math: fix Wshift-overflow warning. When compiling on x86_64 with -Wshift-overflow=2 you can see the following warning: ../sysdeps/ieee754/flt-32/math_config.h: In function ‘is_inf’: ../sysdeps/ieee754/flt-32/math_config.h:184:37: warning: result of ‘2139095040 << 1’ requires 33 bits to represent, but ‘int’ only has 32 bits [-Wshift-overflow=] 184 \| return (x << 1) == (EXPONENT_MASK << 1); \| ^~ This patch adjusts the definitions to use UINT32_C. This matches the definitions in sysdeps/ieee754/dbl-64/math_config.h which use UINT64_C for these definitions. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-02 18:01:23 -07:00
Joseph Myers	a7ddbf456d	Add once_flag, ONCE_FLAG_INIT and call_once to stdlib.h for C23 C23 adds once_flag, ONCE_FLAG_INIT and call_once to stdlib.h (in C11 they were only in threads.h, in C23 they are in both headers; this change came from N2840). Implement this change, with a bits/types/once_flag.h header for the common type and initializer definitions. Note that there's an omnibus bug (bug 33001) that covers more than just these missing definitions. This doesn't seem a significant enough feature to be worth mentioning in NEWS. ISO C is not concerned with whether functions are in libc or libpthread, but POSIX links this to what header they are declared in, so functions declared in stdlib.h are supposed to be in libc. However, the current edition of POSIX is based on C17; hopefully Hurd glibc will have completed the merge of libpthread into libc (in particular, moving call_once) well before a future edition of POSIX based on C23 (or a later version of ISO C) is released. Tested for x86_64 and x86.	2025-10-01 15:15:15 +00:00
Joseph Myers	0f201f4a81	Implement C23 memset_explicit (bug 32378) Add the C23 memset_explicit function to glibc. Everything here is closely based on the approach taken for explicit_bzero. This includes the bits that relate to internal uses of explicit_bzero within glibc (although we don't currently have any such internal uses of memset_explicit), and also includes the nonnull attribute (when we move to nonnull_if_nonzero for various functions following C2y, this function should be included in that change). The function is declared both for __USE_MISC and for __GLIBC_USE (ISOC23) (so by default not just for compilers defaulting to C23 mode). Tested for x86_64 and x86.	2025-10-01 15:14:09 +00:00
Collin Funk	e7eadbb29f	Linux: Fix tst-copy_file_range-large test on recent kernels [BZ #33498 ] Instead of a negative return value the fixed FUSE copy_file_range will silently truncate the size to UINT_MAX & PAGE_MASK [1]. Allow that value to be returned as well. [1] `1e08938c36` Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-09-27 18:18:04 -07:00
Luna Lamb	653e6c4fff	AArch64: Implement AdvSIMD and SVE log10p1(f) routines Vector variants of the new C23 log10p1 routines. Note: Benchmark inputs for log10p1(f) are identical to log1p(f) Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-09-27 12:45:59 +00:00
Luna Lamb	db42732474	AArch64: Implement AdvSIMD and SVE log2p1(f) routines Vector variants of the new C23 log2p1 routines. Note: Benchmark inputs for log2p1(f) are identical to log1p(f). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-09-27 12:44:09 +00:00
Uros Bizjak	a9a8b106bb	x86: Restore "&" GCC asm memory operand workaround to installed fpu-control.h fpu_control.h is an installed header so a wider range of compiler versions (including ones older than GCC 9) are relevant with it than are relevant for building glibc. Fixes commit `3014dec3ad` ('x86: Remove obsolete "&" GCC asm memory operand workaround') Signed-off-by: Uros Bizjak <ubizjak@gmail.com>	2025-09-24 08:04:41 +02:00
Adhemerval Zanella	c40832acff	math: Remove unused files The multiprecision slow paths were removed in glibc 2.28.	2025-09-23 10:29:24 -03:00
Jovan Dmitrovic	70d45632ad	mips: Fix delay slot filling in bsd-setjmp.S In the !defined __PIC__ case, we cannot guarantee that the delay slot is properly filled at the final `j` instuction without reordering active. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2025-09-23 10:29:24 -03:00
Jovan Dmitrovic	3ac2833ec7	mips: Remove strcmp.S Testing strcmp on MIPS hardware shows that strcmp.S performs worse than the combination of using the generic strcmp.c implementation alongside -funroll-loops. Suggested-by: Joseph Myers <josmyers@redhat.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2025-09-23 10:29:24 -03:00
Adhemerval Zanella	c1016b727a	assert: Refactor assert/assert_perror It now calls __libc_assert, which contains similar logic. The assert call only requires memory allocation for the message translation, so test-assert2.c is adapted to handle it. It also removes the fxprintf from assert/assert_perror; although it is not 100% backwards-compatible (write message only if there is a file descriptor associated with the stderr). It now writes bytes directly without going through the wide stream state. Checked on aarch64-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2025-09-23 10:29:24 -03:00
Uros Bizjak	b8254a047f	x86_64: Fix number of operands mismatch for `vdivss' Fixes commit `ff8be6152b` ('x86: Use "%v" to emit VEX encoded instructions for AVX targets') Signed-off-by: Uros Bizjak <ubizjak@gmail.com>	2025-09-23 08:13:13 +02:00
Uros Bizjak	ff8be6152b	x86: Use "%v" to emit VEX encoded instructions for AVX targets Legacy encodings of SSE instructions incur AVX-SSE domain transition penalties on some Intel microarchitectures (e.g. Haswell, Broadwell). Using the VEX forms avoids these penatlies and keeps all instructions in the VEX decode domain. Use "%v" sequence to emit the "v" prefix for opcodes when compiling with -mavx. No functional changes intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-09-22 17:33:25 +02:00
Uros Bizjak	3014dec3ad	x86: Remove obsolete "&" GCC asm memory operand workaround GCC now accept plain variable names as valid lvalues for "m" constraints, automatically spilling locals to memory if necessary. The long-standing "&" pattern was originally used as a defensive workaround for older compiler versions that rejected operands such as: asm ("incl %0" : "+m"(x)); with errors like "memory input is not directly addressable". Modern compilers (GCC >= 9) reliably generate correct code without the workaround, and the resulting assembly is identical. No functional changes intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-09-22 17:33:25 +02:00
Samuel Thibault	81a6e97791	hurd: Note BZ #30166 as fixed `802b0eba51` ("hurd: implement RLIMIT_AS against Mach RPCs") brought the needed RLIMIT_AS support for memory-crunchy tests.	2025-09-22 02:17:50 +02:00
Diego Nieto Cid	802b0eba51	hurd: implement RLIMIT_AS against Mach RPCs Check for VM limit RPCs * config.h.in: add #undef for HAVE_MACH_VM_GET_SIZE_LIMIT and HAVE_MACH_VM_SET_SIZE_LIMIT. * sysdeps/mach/configure.ac: use mach_RPC_CHECK to check for vm_set_size_limit and vm_get_size_limit RPCs in gnumach.defs. * sysdeps/mach/configure: regenerate file. Use vm_get_size_limit to initialize RLIMIT_AS * hurd/hurdrlimit.c(init_rlimit): use vm_get_size_limit to initialize RLIMIT_AS entry of the _hurd_rlimits array. Notify the kernel of the new VM size limits * sysdeps/mach/hurd/setrlimit.c: use the vm_set_size_limit RPC, if available, to notify the kernel of the new limits. Retry RPC calls if they were interrupted by a signal. Message-ID: <03fb90a795b354a366ee73f56f73e6ad22a86cda.1755220108.git.dnietoc@gmail.com>	2025-09-22 00:52:37 +02:00
Samuel Thibault	c9cc047e9f	hurd: catch SIGSEGV on returning from signal handler On stack overflow typically, we may not actually have room on the stack to trampoline back from the signal handler. We have to detect this before locking the ss, otherwise the signal thread will be stuck on taking the ss lock while trying to post SIGSEGV.	2025-09-21 23:45:40 +02:00
Wilco Dijkstra	aebaeb2c33	AArch64: Update math-vector-fortran.h Update math-vector-fortran.h with the latest set of math functions and sort by name. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-09-19 12:57:47 +00:00
H.J. Lu	1fa5773eb1	x86: Don't use asm statement for trunc/truncf Compiler inlines trunc and truncf with SSE4.1. But older versions of GCC doesn't inline them with -Os: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121861 Don't use asm statement for trunc and truncf if compiler can inline them with -Os. It removes one register move with GCC 16: __modff_sse41: __modff_sse41: .LFB23: .LFB23: .cfi_startproc .cfi_startproc endbr64 endbr64 subq $24, %rsp subq $24, %rsp .cfi_def_cfa_offset 32 .cfi_def_cfa_offset 32 movq %fs:40, %rax movq %fs:40, %rax movq %rax, 8(%rsp) movq %rax, 8(%rsp) xorl %eax, %eax xorl %eax, %eax movd %xmm0, %eax movd %xmm0, %eax addl %eax, %eax addl %eax, %eax cmpl $-16777216, %eax cmpl $-16777216, %eax je .L7 je .L7 > movaps %xmm0, %xmm3 movaps %xmm0, %xmm4 movaps %xmm0, %xmm4 movss .LC0(%rip), %xmm2 \| movss .LC0(%rip), %xmm1 movaps %xmm2, %xmm3 \| movaps %xmm1, %xmm2 andps %xmm0, %xmm2 \| roundss $11, %xmm3, %xmm3 roundss $11, %xmm0, %xmm1 \| subss %xmm3, %xmm4 subss %xmm1, %xmm4 \| andps %xmm0, %xmm1 andnps %xmm4, %xmm3 \| andnps %xmm4, %xmm2 orps %xmm3, %xmm2 \| orps %xmm2, %xmm1 .L3: .L3: movss %xmm1, (%rdi) \| movss %xmm3, (%rdi) movq 8(%rsp), %rax movq 8(%rsp), %rax subq %fs:40, %rax subq %fs:40, %rax jne .L8 jne .L8 movaps %xmm2, %xmm0 \| movaps %xmm1, %xmm0 addq $24, %rsp addq $24, %rsp .cfi_remember_state .cfi_remember_state .cfi_def_cfa_offset 8 .cfi_def_cfa_offset 8 ret ret Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Uros Bizjak <ubizjak@gmail.com>	2025-09-17 04:30:11 -07:00
H.J. Lu	d6666eea3e	i686: Compile .op files and gmon tests with -mfentry On i686, after GCC 16 commit: commit 07d8de9174c421d719649639a1452b8b9f2eee32 Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Jul 2 08:58:23 2025 +0800 x86-64: Add --enable-x86-64-mfentry which warns ‘-pg’ without ‘-mfentry’, when glibc is configured with --disable-default-pie, GCC 16 fails to compile .op files and gmon tests with error: cc1: error: ‘-pg’ without ‘-mfentry’ may be unreliable with shrink wrapping [-Werror] Compile .op files and gmon tests with -mfentry if it is supported by CC/TEST_CC and glibc is configured with --disable-default-pie. This fixes BZ #33376. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Joseph Myers <josmyers@redhat.com>	2025-09-15 11:14:03 -07:00
Uros Bizjak	041151f439	i386: Use __seg_gs qualifier to cast access to TCB in THREAD_GSCOPE_RESET_FLAG() Use the __seg_gs named address space qualifier to cast access to the gscope_flag in the TCB as a %gs: prefixed address. This enables the use of the "m" operand constraint, which informs the compiler about memory access in the inline assembly. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: H.J.Lu <hjl.tools@gmail.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Carlos O'Donell <carlos@redhat.com>	2025-09-14 21:50:50 +02:00

1 2 3 4 5 ...

17121 Commits