glibc

Commit Graph

Author	SHA1	Message	Date
Dylan Fleming	fd1d642ef8	AArch64: Remove WANT_SIMD_EXCEPT from aarch64 AdvSIMD math routines Remove legacy code for supporting an old Arm Optimised Routines deprecated feature for throwing SIMD Exceptions. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-11-18 15:51:15 +00:00
Pierre Blanchard	bb6519de1e	AArch64: Fix and improve SVE pow(f) special cases powf: Update scalar special case function to best use new interface. pow: Make specialcase NOINLINE to prevent str/ldr leaking in fast path. Remove depency in sv_call2, as new callback impl is not a performance gain. Replace with vectorised specialcase since structure of scalar routine is fairly simple. Throughput gain of about 5-10% on V1 for large values and 25% for subnormal `x`. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-18 15:51:15 +00:00
Pierre Blanchard	e889160273	AArch64: fix SVE tanpi(f) [BZ #33642 ] Fixed svld1rq using incorrect predicates (BZ #33642). Next to no performance variations (tested on V1). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-18 15:51:15 +00:00
Wilco Dijkstra	989e538224	math: Remove float_t and double_t [BZ #33563 ] Remove uses of float_t and double_t. This is not useful on modern machines, and does not help given GCC defaults to -fexcess-precision=fast. One use of double_t remains to allow forcing the precision to double on targets where FLT_EVAL_METHOD=2. This fixes BZ #33563 on i486-pc-linux-gnu. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-11-12 19:33:23 +00:00
Yury Khrustalev	a9c426bcca	aarch64: fix includes in SME tests Use the correct include for the SIGCHLD macro: signal.h Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-12 13:45:52 +00:00
Florian Weimer	259adb087d	aarch64: Remove $(aarch64-bti) check The variable was removed in commit `2c421fc430` ("AArch64: Cleanup PAC and BTI"), so this Makefile fragment is always excluded. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-11-07 14:12:01 +01:00
Joe Ramsay	e45af510bc	AArch64: Fix instability in AdvSIMD sinh Previously presence of special-cases in one lane could affect the results in other lanes due to unconditional scalar fallback. The old WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has been removed from AOR, making it easier to spot and fix this. No measured change in performance. This patch applies cleanly as far back as 2.41, however there are conflicts with 2.40 where sinh was first introduced. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-06 18:30:47 +00:00
Joe Ramsay	6c22823da5	AArch64: Fix instability in AdvSIMD tan Previously presence of special-cases in one lane could affect the results in other lanes due to unconditional scalar fallback. The old WANT_SIMD_EXCEPT option (which has never been enabled in libmvec) has been removed from AOR, making it easier to spot and fix this. 4% improvement in throughput with GCC 14 on Neoverse V1. This bug is present as far back as 2.39 (where tan was first introduced). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-06 18:30:47 +00:00
Joe Ramsay	5b82fb1882	AArch64: Optimise SVE scalar callbacks Instead of using SVE instructions to marshall special results into the correct lane, just write the entire vector (and the predicate) to memory, then use cheaper scalar operations. Geomean speedup of 16% in special intervals on Neoverse with GCC 14. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-06 15:45:37 +00:00
Wilco Dijkstra	324c088a18	nptl: Remove ATOMIC_EXCHANGE_USES_CAS usage The only usage was for pthread_spin_lock, introduced by `12d2dd7060`, as a way to optimize the code for certain architectures. Now that atomic builtins are used by default, let the compiler use the best code sequence for the atomic exchange. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Wilco Dijkstra	53807741fb	Define __HAVE_64B_ATOMICS from compiler support Now that atomic builtins are used by default, we can rely on the compiler to define when to use 64-bit atomic operations. It allows the use of 64-bit atomic operations on some 32-bit ABIs where they were not previously enabled due to missing pre-processor handling: hppa, mips64n32, s390, and sparcv9. Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Uros Bizjak <ubizjak@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	70ee250fb8	atomic: Consolidate atomic_full_barrier implementation All ABIs save for sparcv9 and s390 defines it to __sync_synchronize, which can be mapped to __atomic_thread_fence (__ATOMIC_SEQ_CST). For Sparc, it uses a stricter #StoreStore\|#LoadStore\|#StoreLoad\|#LoadLoad instead of the #StoreLoad generated by __sync_synchronize. For s390x, it defaults to a memory barrier where __sync_synchronize emits a 'bcr 15,0' (which the manual describes as pipeline synchronization). The barrier is used only in one place (pthread_mutex_setprioceiling), and using a stricter barrier for s390 is ok performance-wise. Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	b299332fb4	aarch64: Remove ununsed atomic macros These are already provided by the generic include/atomic.h. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-11-04 04:14:01 -03:00
Adhemerval Zanella	8711c29bb7	aarch64: Fix tst-ifunc-arg-4 on clang-18 It issues: ../sysdeps/aarch64/tst-ifunc-arg-4.c:39:1: error: unused function 'resolver' [-Werror,-Wunused-function] 39 \| resolver (uint64_t arg0, const uint64_t arg1[]) \| ^~~~~~~~ 1 error generated. clang-19 and onwards do not trigger the warning. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-29 12:54:10 -03:00
Adhemerval Zanella	970364dac0	Annotate swtich fall-through The clang default to warning for missing fall-through and it does not support all comment-like annotation that gcc does. Use C23 [[fallthrough]] annotation instead. proper attribute instead. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:54:01 -03:00
Wilco Dijkstra	0375e6e233	AArch64: Use math-use-builtins for roundeven(f)/lrint(f)/lround(f) Remove target implementations of roundeven(f)/lrint(f)/lround(f) and use the math-use-builtins mechanism instead. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-17 17:03:54 +00:00
Yury Khrustalev	ecb0fc2f0f	aarch64: tests for SME This commit adds tests for the following use cases relevant to handing of the SME state: - fork() and vfork() - clone() and clone3() - signal handler While most cases are trivial, the case of clone3() is more complicated since the clone3() symbol is not public in Glibc. To avoid having to check all possible ways clone3() may be called via other public functions (e.g. vfork() or pthread_create()), we put together a test that links directly with clone3.o. All the existing functions that have calls to clone3() may not actually use it, in which case the outcome of such tests would be unexpected. Having a direct call to the clone3() symbol in the test allows to check precisely what we need to test: that the __arm_za_disable() function is indeed called and has the desired effect. Linking to clone3.o also requires linking to __arm_za_disable.o that in turn requires the _dl_hwcap2 hidden symbol which to provide in the test and initialise it before using. Co-authored-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-14 09:42:46 +01:00
Yury Khrustalev	b4b713bd89	aarch64: define macro for calling __libc_arm_za_disable A common sequence of instructions is used in several places in assembly files, so define it in one place as an assembly macro. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-14 09:42:46 +01:00
Yury Khrustalev	7a47a51e8d	misc: Fix several typos	2025-10-10 14:52:40 +01:00
Luna Lamb	653e6c4fff	AArch64: Implement AdvSIMD and SVE log10p1(f) routines Vector variants of the new C23 log10p1 routines. Note: Benchmark inputs for log10p1(f) are identical to log1p(f) Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-09-27 12:45:59 +00:00
Luna Lamb	db42732474	AArch64: Implement AdvSIMD and SVE log2p1(f) routines Vector variants of the new C23 log2p1 routines. Note: Benchmark inputs for log2p1(f) are identical to log1p(f). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-09-27 12:44:09 +00:00
Wilco Dijkstra	aebaeb2c33	AArch64: Update math-vector-fortran.h Update math-vector-fortran.h with the latest set of math functions and sort by name. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-09-19 12:57:47 +00:00
Adhemerval Zanella	63ba1a1509	math: Add fetestexcept internal alias To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-09-11 14:46:07 -03:00
Adhemerval Zanella	2eb8836de7	math: Add feclearexcept internal alias To avoid linknamespace issues on old standards. It is required if the fallback fma implementation is used if/when it is also used internally for other implementation. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-09-11 14:46:07 -03:00
remph	e20ca759af	AArch64: add optimised strspn/strcspn Requires Neon (aka. Advanced SIMD). Looks up 16 characters at a time, for a 2-3x perfomance improvement, and a ~30% speedup on the strtok & strsep benchtests, as tested on Cortex A-{53,72}. Signed-off-by: remph <lhr@disroot.org> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-09-10 16:12:23 +00:00
Hasaan Khan	8ced7815fb	AArch64: Implement exp2m1 and exp10m1 routines Vector variants of the new C23 exp2m1 & exp10m1 routines. Note: Benchmark inputs for exp2m1 & exp10m1 are identical to exp2 & exp10 respectively, this also includes the floating point variations. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-09-02 16:50:24 +00:00
Pierre Blanchard	aac077645a	AArch64: Fix SVE powf routine [BZ #33299 ] Fix a bug in predicate logic introduced in last change. A slight performance improvement from relying on all true predicates during conversion from single to double. This fixes BZ #33299. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-08-20 17:45:21 +00:00
Adhemerval Zanella	20528165bd	Disable SFrame support by default And add extra checks to enable for binutils 2.45 and if the architecture explicitly enables it. When SFrame is disabled, all the related code is also not enabled for backtrace() and _dl_find_object(), so SFrame backtracking is not used even if the binary has the SFrame segment. This patch also adds some other related fixes: * Fixed an issue with AC_CHECK_PROG_VER, where the READELF_SFRAME usage prevented specifying a different readelf through READELF environment variable at configure time. * Add an extra arch-specific internal definition, libc_cv_support_sframe, to disable --enable-sframe on architectures that have binutils but not glibc support (s390x). * Renamed the tests without the .sframe segment and move the tst-backtrace1 from pthread to debug. * Use the built compiler strip to remove the .sframe segment, instead of the system one (which might not support SFrame). Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Sam James <sam@gentoo.org>	2025-07-24 15:51:58 -03:00
H.J. Lu	848f0e46f0	i386: Update ___tls_get_addr to preserve vector registers Compiler generates the following instruction sequence for dynamic TLS access: leal tls_var@tlsgd(,%ebx,1), %eax call ___tls_get_addr@PLT CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS, AX, CX, and DX, are unchanged after CALL. But ___tls_get_addr is a normal function which doesn't preserve any vector registers. 1. Rename the generic __tls_get_addr function to ___tls_get_addr_internal. 2. Change ___tls_get_addr to a wrapper function with implementations for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all vector registers. 3. dl-tlsdesc-dynamic.h has: _dl_tlsdesc_dynamic: /* Like all TLS resolvers, preserve call-clobbered registers. We need two scratch regs anyway. */ subl $32, %esp cfi_adjust_cfa_offset (32) It is wrong to use movl %ebx, -28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl -28(%esp), %ebx to preserve EBX on stack. Fix it with: movl %ebx, 28(%esp) movl %esp, %ebx cfi_def_cfa_register(%ebx) ... mov %ebx, %esp cfi_def_cfa_register(%esp) movl 28(%esp), %ebx 4. Update _dl_tlsdesc_dynamic to call ___tls_get_addr_internal directly. 5. Add have-test-mtls-traditional to compile tst-tls23-mod.c with traditional TLS variant to verify the fix. 6. Define DL_RUNTIME_RESOLVE_REALIGN_STACK in sysdeps/x86/sysdep.h. This fixes BZ #32996. Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-06-19 04:30:31 +08:00
Luna Lamb	6849c5b791	AArch64: Improve codegen SVE log1p helper Improve codegen by packing coefficients. 4% and 2% improvement in throughput microbenchmark on Neoverse V1, for acosh and atanh respectively. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-06-18 17:28:51 +00:00
Dylan Fleming	dee22d2a81	AArch64: Optimise SVE FP64 Hyperbolics Reworke SVE FP64 hyperbolics to use the SVE FEXPA instruction. Also update the special case handelling for large inputs to be entirely vectorised. Performance improvements on Neoverse V1: cosh_sve: 19% for \|x\| < 709, 5x otherwise sinh_sve: 24% for \|x\| < 709, 5.9x otherwise tanh_sve: 12% for \|x\| < 19, 9x otherwise Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-06-18 17:28:51 +00:00
Dylan Fleming	1e3d1ddf97	AArch64: Optimize SVE exp functions Improve performance of SVE exps by making better use of the SVE FEXPA instruction. Performance improvement on Neoverse V1: exp2_sve: 21% exp2f_sve: 24% exp10f_sve: 23% expm1_sve: 25% Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-06-18 17:28:51 +00:00
Yury Khrustalev	c0f0db2d59	aarch64: simplify calls to __libc_arm_za_disable in assembly There is no functional change in this patch. We remove stores and loads to stack, return address signing, and redundant CFI directives before and after call to __libc_arm_za_disable(). The __libc_arm_za_disable implementation follows special calling convention that allows to avoid most of the operations that would be necessary for a call to a normal function (see [1] for details). First, we rely on __libc_arm_za_disable() not clobbering certain registers, and we put return address into one of these registers. Now we don't need to store it on stack, so we don't need to sign return address using PAC. Second, as a result of the above, we don't need to update the CFI offset. This patch provides small optimisation avoiding unnecessary store and load on stack also simplifies assembly code and CFI directives. [1]: https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-06-18 09:42:33 +01:00
Yury Khrustalev	eeedfc2f74	aarch64: GCS: use internal struct in __alloc_gcs No functional change here, just a small refactoring to simplify using __alloc_gcs() for allocating shadow stacks. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-06-18 09:37:13 +01:00
Yury Khrustalev	b15ed85c86	aarch64: fix typo in sysdeps/aarch64/Makefile	2025-06-10 10:48:07 +01:00
Wilco Dijkstra	09795c5612	AArch64: Fix builderror with GCC 12.1/12.2 Early versions of GCC 12 didn't support -mtune=neoverse-v2, so use -mtune=neoverse-v1 instead. Reported-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-06-06 13:22:27 +00:00
Yury Khrustalev	fcd6a8b5c5	aarch64: add __ifunc_hwcap function to be used in ifunc resolvers Add a new helper function __ifunc_hwcap() as a portable way to access HWCAP elements via the parameter(s) passed to an ifunc resolver checking the _IFUNC_ARG_HWCAP bit in the first parameter and size of the buffer in the second parameter. Note that 0 is returned when the requested element is not available or does not correspond to a valid AT_HWCAP{,2,...} value. Also add relevant tests. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-06-05 14:38:51 +01:00
Yury Khrustalev	ea14d04e9a	aarch64: add support for hwcap3,4 Add basic support for hwcap3 and hwcap4 in dynamic loader and ifunc resolvers. Describe new backward-compatible prototype for GNU indirect function resolvers that use a pointer to uint64_t array in stead of a pointer to the __ifunc_arg_t struct. This patch also adds macro _IFUNC_HWCAP_MAX to specify current number of hwcap elements. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-06-05 14:38:03 +01:00
Wilco Dijkstra	aa18367c11	AArch64: Improve enabling of SVE for libmvec When using a -mcpu option in CFLAGS, GCC can report errors when building libmvec. Fix this by overriding both -mcpu and -march with a generic variant with SVE added. Also use a tune for a modern SVE core. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-05-29 16:58:49 +00:00
Luna Lamb	da196e6134	AArch64: Improve codegen in SVE log1p Improves memory access, reformat evaluation scheme to pack coefficients. 5% improvement in throughput microbenchmark on Neoverse V1. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-29 15:25:35 +00:00
Wilco Dijkstra	2071666d03	AArch64: Fix typo in math-vector.h Fix typo atanpi2->atan2pi in math-vector.h.	2025-05-20 13:44:16 +00:00
Wilco Dijkstra	b990b0aee2	AArch64: Cleanup SVE config and defines Now we finally support modern GCC and binutils, it's time for a cleanup. Remove HAVE_AARCH64_SVE_ASM define and conditional compilation. Remove SVE configure checks for SVE, ACLE and variant-PCS support. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-05-20 10:33:55 +00:00
Wilco Dijkstra	2c421fc430	AArch64: Cleanup PAC and BTI Now we finally support modern GCC and binutils, it's time for a cleanup. Use PAC and BTI instructions unconditionally and use proper assembler syntax. Remove the PR target/94791 strip_pac workarounds for buggy GCCs. Remove the PAC/BTI configure checks - always emit GNU property notes on assembly files. Change cfi_window_save to the correct cfi_negate_ra_state unwind directive. Reviewed-by: Matthieu Longo <matthieu.longo@arm.com>	2025-05-19 15:35:32 +00:00
Dylan Fleming	96abd59bf2	AArch64: Implement AdvSIMD and SVE atan2pi/f Implement double and single precision variants of the C23 routine atan2pi for both AdvSIMD and SVE. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-19 15:35:25 +00:00
Dylan Fleming	edf6202815	AArch64: Implement AdvSIMD and SVE atanpi/f Implement double and single precision variants of the C23 routine atanpi for both AdvSIMD and SVE. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-19 15:34:40 +00:00
Dylan Fleming	0ef2cf44e7	AArch64: Implement AdvSIMD and SVE asinpi/f Implement double and single precision variants of the C23 routine asinpi for both AdvSIMD and SVE. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-19 15:33:50 +00:00
Dylan Fleming	993997ca1b	AArch64: Implement AdvSIMD and SVE acospi/f Implement double and single precision variants of the C23 routine acospi for both AdvSIMD and SVE. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-19 15:31:59 +00:00
Dylan Fleming	1e84509e00	AArch64: Optimize inverse trig functions Improve performance of Inverse trig functions by altering how coefficients are loaded. Performance improvement on Neoverse V1: SVE acos 14% AdvSIMD acos 6% AdvSIMD asin 6% SVE asin 5% AdvSIMD asinf 2% AdvSIMD atanf 22% SVE atanf 20% SVE atan 11% AdvSIMD atan 5% SVE atan2 7% SVE atan2f 4% AdvSIMD atan2f 3% AdvSIMD atan2 2% Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-19 14:54:32 +00:00
Yury Khrustalev	251f932624	aarch64: update tests for SME Add test that checks that ZA state is disabled after setjmp and sigsetjmp Update existing SME test that uses setjmp Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-15 14:23:35 +01:00
Yury Khrustalev	a7f6fd976c	aarch64: Disable ZA state of SME in setjmp and sigsetjmp Due to the nature of the ZA state, setjmp() should clear it in the same manner as it is already done by longjmp. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-15 14:23:03 +01:00

1 2 3 4 5 ...

616 Commits