glibc

Commit Graph

Author	SHA1	Message	Date
Frédéric Bérat	d4d472366b	docs: Add dynamic linker environment variable docs The Dynamic Linker chapter now includes a new section detailing environment variables that influence its behavior. This new section documents the `LD_DEBUG` environment variable, explaining how to enable debugging output and listing its various keywords like `libs`, `reloc`, `files`, `symbols`, `bindings`, `versions`, `scopes`, `tls`, `all`, `statistics`, `unused`, and `help`. It also documents `LD_DEBUG_OUTPUT`, which controls where the debug output is written, allowing redirection to a file with the process ID appended. This provides users with essential information for controlling and debugging the dynamic linker. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-11-03 10:47:56 +01:00
Frédéric Bérat	332f8e62af	tls: Add debug logging for TLS and TCB management Introduce the `DL_DEBUG_TLS` debug mask to enable detailed logging for Thread-Local Storage (TLS) and Thread Control Block (TCB) management. This change integrates a new `tls` option into the `LD_DEBUG` environment variable, allowing developers to trace: - TCB allocation, deallocation, and reuse events in `dl-tls.c`, `nptl/allocatestack.c`, and `nptl/nptl-stack.c`. - Thread startup events, including the TID and TCB address, in `nptl/pthread_create.c`. A new test, `tst-dl-debug-tid`, has been added to validate the functionality of this new debug logging, ensuring that relevant messages are correctly generated for both main and worker threads. This enhances the debugging capabilities for diagnosing issues related to TLS allocation and thread lifecycle within the dynamic linker. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-11-03 10:47:28 +01:00
Pincheng Wang	720e891637	riscv: Add Zbkb optimized repeat_bytes helper Introduce a RISC-V specific string-misc.h to provide an optimized repeat_bytes implementation when the Zbkb extension is available. The new version uses packh/packw/pack instruction count and avoiding high latency instructions. This helper is used by several mem and string functions, and falls back to the generic implementation when Zbkb is not present. Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-31 16:23:57 -05:00
Wilco Dijkstra	1136c036a3	math: Remove xfail from pow test [BZ #33563 ] Remove xfail from pow testcase since pow and powf have been fixed. Also check float128 maximum value. See BZ #33563. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 19:13:53 +00:00
Wilco Dijkstra	0212fc23b0	math: Fix pow special case [BZ #33563 ] Fix pow (DBL_MAX, 1.0) to return DBL_MAX when rouding upwards without FMA. This fixes BZ #33563. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 19:13:41 +00:00
Wilco Dijkstra	8917bd3eb3	math: Fix powf special case [BZ #33563 ] Fix powf (0x1.fffffep+127, 1.0f) to return 0x1.fffffep+127 when rouding upwards. Cleanup the special case code - performance improves by ~1.2%. This fixes BZ #33563. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 19:12:47 +00:00
Yury Khrustalev	7d99ff550f	debug: mark __libc_message_wrapper as always inline When building with -Og to enable debugging, there is currently a compiler error because if __libc_message_wrapper() is not inline, the __va_arg_pack_len macro cannot be used. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 10:01:33 +00:00
Yury Khrustalev	2f77aec043	aarch64: fix cfi directives around __libc_arm_za_disable Incorrect CFI directive corrupted call stack information and prevented debuggers from correctly displaying call stack information. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-31 09:48:47 +00:00
Eric Wong	3ac0112b5d	cdefs: allow __attribute__ on tcc According to the tcc (tiny C compiler) Changelog, tcc supports __attribute__ since 0.9.3. Looking at history of tcc at <https://repo.or.cz/tinycc.git>, __attribute__ support was added in commit 14658993425878be300aae2e879560698e0c6c4c on 2002-01-03, which also looks like the release of 0.9.3. While I'm unable to find release tags for tcc before 0.9.18 (2003-04-14), the next release (0.9.28) will include __attribute__((cleanup(func)) which I rely on. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-30 20:03:00 -07:00
Collin Funk	3fe3f62833	Cleanup some recently added whitespace. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-30 18:56:58 -07:00
Yao Zihong	09a94c86ca	riscv: memcpy_noalignment: Reorder to store via a3, then bump a3 Rewrite the copy micro-step from: REG_L a4, 0(a5) addi a3, a3, SZREG addi a5, a5, SZREG REG_S a4, -SZREG(a3) to: REG_L a4, 0(a5) addi a5, a5, SZREG REG_S a4, 0(a3) addi a3, a3, SZREG Semantics are unchanged: both read (a5_old), write (a3_old), and then increment a3/a5 by SZREG. memcpy assumes non-overlapping regions, so the reordering preserves correctness. No functional change. Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-30 17:49:21 -05:00
Yao Zihong	0698fd462a	riscv: memcpy_noalignment: Fold SZREG/BLOCK_SIZE alignment to single andi Simplify the alignment steps for SZREG and BLOCK_SIZE multiples. The previous three-instruction sequences addi a7, a2, -SZREG andi a7, a7, -SZREG addi a7, a7, SZREG and addi a7, a2, -BLOCK_SIZE andi a7, a7, -BLOCK_SIZE addi a7, a7, BLOCK_SIZE are equivalent to a single andi a7, a2, -SZREG andi a7, a2, -BLOCK_SIZE because SZREG and BLOCK_SIZE are powers of two in this context, making the surrounding addi steps cancel out. Folding to one instruction reduces code size with identical semantics. No functional change. sysdeps/riscv/multiarch/memcpy_noalignment.S: Remove redundant addi around alignment; keep a single andi for SZREG/BLOCK_SIZE rounding. Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-30 17:47:24 -05:00
Yao Zihong	444d81284e	riscv: memcpy_noalignment: Make register allocation Zca-friendly Tidy the temporary register allocation to favor registers eligible for compressed encodings when Zca/Zcb are enabled. This keeps the ABI and clobber set unchanged and does not alter control flow or memory access behavior. No functional change. sysdeps/riscv/multiarch/memcpy_noalignment.S: Reassign temps to improve compressed encoding opportunities. Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn> Reviewed-by: Peter Bergner <bergner@tenstorrent.com>	2025-10-30 17:44:58 -05:00
Adhemerval Zanella	ee946212fe	math: Remove the SVID error handling wrapper from yn/jn Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:35 -03:00
Adhemerval Zanella	8d4815e6d7	math: Remove the SVID error handling wrapper from y1/j1 Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:33 -03:00
Adhemerval Zanella	b050cb53b0	math: Remove the SVID error handling wrapper from y0/j0 Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:31 -03:00
Adhemerval Zanella	03eeeba705	math: Remove the SVID error handling from coshf It improves latency for about 3-10% and throughput for about 5-15%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:28 -03:00
Adhemerval Zanella	555c39c0fc	math: Remove the SVID error handling from atanhf It improves latency for about 1-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:26 -03:00
Adhemerval Zanella	8facb464b4	math: Remove the SVID error handling from acoshf It improves latency for about 3-7% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:24 -03:00
Adhemerval Zanella	f92aba68bc	math: Remove the SVID error handling from asinf It improves latency for about 2% and throughput for about 5%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:22 -03:00
Adhemerval Zanella	9f8dea5b5d	math: Remove the SVID error handling from acosf It improves latency for about 2-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:20 -03:00
Adhemerval Zanella	0b484d7b77	math: Remove the SVID error handling from log10f It improves latency for about 3-10% and throughput for about 5-10%. Tested on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:17 -03:00
Adhemerval Zanella	6deadd4eb6	m68k: Remove SVID error handling on fmod The m68k provided an optimized version through __m81_u(fmod) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:15 -03:00
Adhemerval Zanella	b19904cfb2	m68k: Avoid include e_fmod.c on fmod/remainder implementation And open-code each implementation. It simplifies SVID error handling removal. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:12 -03:00
Adhemerval Zanella	ade9f30ce2	m68k: Remove the SVID error handling from fmodf The m68k provided an optimized version through __m81_u(fmodf) (mathimpl.h), and gcc does not implement it through a builtin (different than i386). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:10 -03:00
Adhemerval Zanella	1dd2163e51	i386: Remove the SVID error handling from fmodf The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. It allows us to move the implementation to a C one. The performance on a Zen3 chip is slight better: reciprocal-throughput input master no-SVID improvement i686 subnormals 22.4741 20.1571 10.31% i686 normal 74.1631 70.3606 5.13% i686 close-exponent 22.5625 20.2435 10.28% Tested on i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:41:07 -03:00
Adhemerval Zanella	bfee89dc8a	i386: Remove the SVID error handling from fmod The optimized i386 version is faster than the generic one, and gcc implements it through the builtin. It allows us to move the implementation to a C one. The performance on a Zen3 chip is similar to the SVID one. Tested on i686-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-10-30 15:40:41 -03:00
Jiamei Xie	4d86b6cdd8	x86: fix wmemset ifunc stray '!' (bug 33542) The ifunc selector for wmemset had a stray '!' in the X86_ISA_CPU_FEATURES_ARCH_P(...) check: if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, !)) This effectively negated the predicate and caused the AVX2/AVX512 paths to be skipped, making the dispatcher fall back to the SSE2 implementation even on CPUs where AVX2/AVX512 are available. The regression leads to noticeable throughput loss for wmemset. Remove the stray '!' so the AVX_Fast_Unaligned_Load capability is tested as intended and the correct AVX2/EVEX variants are selected. Impact: - On AVX2/AVX512-capable x86_64, wmemset no longer incorrectly falls back to SSE2; perf now shows __wmemset_evex/avx2 variants. Testing: - benchtests/bench-wmemset shows improved bandwidth across sizes. - perf confirm the selected symbol is no longer SSE2. Signed-off-by: xiejiamei <xiejiamei@hygon.com> Signed-off-by: Li jing <lijing@hygon.cn> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-29 12:54:14 -03:00
Jiayuan Chen	1177d2f26c	Updates struct tcp_zerocopy_receive from 5.11 to netinet/tcp.h. This patch updates struct tcp_zerocopy_receive to contain filed including copybuf_address, copybuf_len, and others. Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-10-29 12:54:12 -03:00
Adhemerval Zanella	8711c29bb7	aarch64: Fix tst-ifunc-arg-4 on clang-18 It issues: ../sysdeps/aarch64/tst-ifunc-arg-4.c:39:1: error: unused function 'resolver' [-Werror,-Wunused-function] 39 \| resolver (uint64_t arg0, const uint64_t arg1[]) \| ^~~~~~~~ 1 error generated. clang-19 and onwards do not trigger the warning. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-29 12:54:10 -03:00
Adhemerval Zanella	d49d917b90	Enable --no-undefined-version by default Recent lld version default to --no-undefined-version, which triggers errors when building multiple libraries. For ld.so on x86_64 it fails with: ld.lld: error: version script assignment of 'GLIBC_2.4' to symbol '__stack_chk_guard' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_PRIVATE' to symbol '__nptl_set_robust_list_avail' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_PRIVATE' to symbol '__pointer_chk_guard' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_PRIVATE' to symbol '_dl_starting_up' failed: symbol not defined While for libc.so: ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_clearerr' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_fgetc' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_fileno' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_freopen' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_fscanf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_fseek' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_peekc_unlocked' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_stderr_' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_stdin_' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_stdout_' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_pclose' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_perror' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_rewind' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_scanf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_setbuf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_setlinebuf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_wdefault_setbuf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '_IO_wfile_setbuf' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '__ctype32_tolower' failed: symbol not defined ld.lld: error: version script assignment of 'GLIBC_2.17' to symbol '__ctype32_toupper' failed: symbol not defined ld.lld: error: too many errors emitted, stopping now (use --error-limit=0 to see all errors) The version script is created with multiple missing symbols to simplify the build for multiple ABIs, each of which may have different symbols. For instance, __stack_chk_guard is defined by default. This avoids requiring each ABI to add this symbol to its version script, depending on the stack protector ABI it uses. The libc.so warnings do show unused symbols being defined (like _IO_clearerr), which might trigger potential errors depending on how symbols are exported. However, since we already have ABI checks for missing and extra symbols, the linker's extra checks are not really necessary. The --no-undefined-version is the default for ld.bfd. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-29 12:54:06 -03:00
Adhemerval Zanella	1ab6a62e68	Supress unused command arguments warning with clang clang 20 issues an warning for the unused '-c' argument used to create errlist-data-aux-shared.S, errlist-data-aux.S, siglist-aux-shared.S, and siglist-aux.S. Filter out the '-c' from the $(compile-command.c). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-29 12:54:03 -03:00
Adhemerval Zanella	970364dac0	Annotate swtich fall-through The clang default to warning for missing fall-through and it does not support all comment-like annotation that gcc does. Use C23 [[fallthrough]] annotation instead. proper attribute instead. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:54:01 -03:00
Adhemerval Zanella	543ddd628f	argp: Move attribute_hidden to argp-fmtstream.h The internal header redefines the some internal argp functions with attribute_hidden, which triggers clang warning of mismatched attributes. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:54:00 -03:00
Adhemerval Zanella	110ec4954e	argp: Expand argp_usage, _option_is_short, and _option_is_end The argp code uses macro redefinitions to avoid duplicating static inline implementations for argp_usage, _option_is_short, and _option_is_end. However, this causes build issues with clang, as some function prototypes are redefined to add the hidden attribute with libc_hidden_proto. To avoid extensive changes to internal headers, just expand the function implementations and avoid the macro redefine tricks. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:53:57 -03:00
Adhemerval Zanella	36b4c553e6	Replace count_leading_zeros with stdc_leading_zeros Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:53:55 -03:00
Adhemerval Zanella	f91abbde02	malloc: Remove unused tcache_set_inactive clang warns that this function is not used. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2025-10-29 12:53:53 -03:00
Adhemerval Zanella	602fdf5d69	include: Sync gnulib intprops It syncs with gnulib commit 1790ef25d81983d1d25a77d452c0080345df459b. The main change is to proper support clang by using builtins. It fixes a sprof build issue, where previous version uses the generic code path when building with clang: sprof.c:682:8: error: result of comparison of constant 288230376151711743 with expression of type 'Elf64_Half' (aka 'unsigned short') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 682 \| if (INT_MULTIPLY_WRAPV (ehdr2.e_shnum, sizeof (ElfW(Shdr)), &size)) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/intprops.h:415:34: note: expanded from macro 'INT_MULTIPLY_WRAPV' 415 \| _GL_INT_OP_WRAPV (a, b, r, *, _GL_INT_MULTIPLY_RANGE_OVERFLOW) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/intprops.h:504:45: note: expanded from macro '_GL_INT_OP_WRAPV' 504 \| : _GL_INT_OP_WRAPV_LONGISH(a, b, r, op, overflow)) \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~ ../include/intprops.h:511:41: note: expanded from macro '_GL_INT_OP_WRAPV_LONGISH' 511 \| : _GL_INT_OP_CALC (a, b, r, op, overflow, unsigned long int, \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 512 \| unsigned long int, 0, ULONG_MAX)) \ \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/intprops.h:533:4: note: expanded from macro '_GL_INT_OP_CALC' 533 \| (overflow (a, b, tmin, tmax) \ \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ../include/intprops.h:608:22: note: expanded from macro '_GL_INT_MULTIPLY_RANGE_OVERFLOW' 608 \| : (tmax) / (b) < (a))) \| ~~~~~~~~~~~~ ^ ~~~ 1 error generated. Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-29 12:53:50 -03:00
Adhemerval Zanella	5ee722d3ac	i386: Build s_erf_common.c with -fexcess-precision=standard It is requires to provide correctly rounded results. Checked on i686-linux-gnu.	2025-10-29 10:17:34 -03:00
H.J. Lu	14243c9db6	Build programs in $(others-noinstall) like tests Programs in $(others-noinstall) are internal to glibc build and they aren't installed. They should be treated like programs in $(others), but linked like tests so that --enable-hardcoded-path-in-tests also applies to them. Also replace run-via-rtld-prefix with test-via-rtld-prefix when running container tests. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-29 12:04:40 +08:00
Osama Abdelkader	96073e9f34	Fix incorrect setrlimit return value checks in tests The setrlimit(2) function returns 0 on success and -1 on error, but several test files were incorrectly checking for a return value of 1 to detect errors. This means the error checks would never trigger, causing tests to continue silently even when setrlimit() failed. This commit fixes the error checks in five files to correctly test for -1, matching both the documented behavior and the pattern used correctly in other parts of the codebase. Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Collin Funk <collin.funk1@gmail.com>	2025-10-28 18:51:51 -07:00
Joseph Myers	096fcdc0a5	Rename uimaxabs to umaxabs (bug 33325) The C2y function uimaxabs has been renamed to umaxabs. Implement this change in glibc, keeping a compat symbol under the old name, copying the test to test the new name and changing the old test to test the compat symbol. Jakub has done the corresponding change to the built-in function in GCC. Tested for x86_64 and x86.	2025-10-28 12:15:02 +00:00
Adhemerval Zanella	013f5167b9	math: Consolidate CORE-MATH double-double routines For lgamma and tgamma the muldd, mulddd, and polydd are renamed to muldd2, mulddd2, and polydd2 respectively. Checked on aarch64-linux-gnu and x86_64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:46:04 -03:00
Adhemerval Zanella	e4d812c980	math: Consolidate erf/erfc definitions The common code definitions are consolidated in s_erf_common.h and s_erf_common.c. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:46:01 -03:00
Adhemerval Zanella	fc419290f9	math: Consolidate internal erf/erfc tables The shared internal data definitions are consolidated in s_erf_data.c and the erfc only one are moved to s_erfc_data.c. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	acaad9ab06	math: Use erfc from CORE-MATH The current implementation precision shows the following accuracy, on three ranges ([-DBL_MAX,5], [-5,5], [5,DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MAX, -5] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-5, 5] * FE_TONEAREST 0: 8069309665 80.69% 1: 1882910247 18.83% 2: 47485296 0.47% 3: 293749 0.00% 4: 1043 0.00% * FE_UPWARD 0: 5540301026 55.40% 1: 2026739127 20.27% 2: 1774882486 17.75% 3: 567324466 5.67% 4: 86913847 0.87% 5: 3820789 0.04% 6: 18259 0.00% * FE_DOWNWARD 0: 5520969586 55.21% 1: 2057293099 20.57% 2: 1778334818 17.78% 3: 557521494 5.58% 4: 82473927 0.82% 5: 3393276 0.03% 6: 13800 0.00% * FE_TOWARDZERO 0: 6220287175 62.20% 1: 2323846149 23.24% 2: 1251999920 12.52% 3: 190748245 1.91% 4: 12996232 0.13% 5: 122279 0.00% * Range [5, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 49.0980 267.0660 -443.94% x86_64v2 49.3220 257.6310 -422.34% x86_64v3 42.9539 84.9571 -97.79% aarch64 28.7266 52.9096 -84.18% power10 14.1673 25.1273 -77.36% Latency master patched improvement x86_64 95.6640 269.7060 -181.93% x86_64v2 95.8296 260.4860 -171.82% x86_64v3 91.1658 112.7150 -23.64% aarch64 37.0745 58.6791 -58.27% power10 23.3197 31.5737 -35.39% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	72a48e45bd	math: Use erf from CORE-MATH The current implementation precision shows the following accuracy, on three rangeis ([-DBL_MIN, -4.2], [-4.2, 4.2], [4.2, DBL_MAX]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-DBL_MIN, -4.2] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% * Range [-4.2, 4.2] * FE_TONEAREST 0: 9764404513 97.64% 1: 235595487 2.36% * FE_UPWARD 0: 9468013928 94.68% 1: 531986072 5.32% * FE_DOWNWARD 0: 9493787693 94.94% 1: 506212307 5.06% * FE_TOWARDZERO 0: 9585271351 95.85% 1: 414728649 4.15% * Range [4.2, DBL_MAX] * FE_TONEAREST 0: 10000000000 100.00% * FE_UPWARD 0: 10000000000 100.00% * FE_DOWNWARD 0: 10000000000 100.00% * FE_TOWARDZERO 0: 10000000000 100.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 38.2754 78.0311 -103.87% x86_64v2 38.3325 75.7555 -97.63% x86_64v3 34.6604 28.3182 18.30% aarch64 23.1499 21.4307 7.43% power10 12.3051 9.3766 23.80% Latency master patched improvement x86_64 84.3062 121.3580 -43.95% x86_64v2 84.1817 117.4250 -39.49% x86_64v3 81.0933 70.6458 12.88% aarch64 35.012 29.5012 15.74% power10 21.7205 18.4589 15.02% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	1cae0550e8	math: Use tgamma from CORE-MATH The current implementation precision shows the following accuracy, on one range ([-20,20]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20,20] * FE_TONEAREST 0: 4504877808 45.05% 1: 4402224940 44.02% 2: 947652295 9.48% 3: 131076831 1.31% 4: 13222216 0.13% 5: 910045 0.01% 6: 35253 0.00% 7: 606 0.00% 8: 6 0.00% * FE_UPWARD 0: 3477307921 34.77% 1: 4838637866 48.39% 2: 1413942684 14.14% 3: 240762564 2.41% 4: 27113094 0.27% 5: 2130934 0.02% 6: 102599 0.00% 7: 2324 0.00% 8: 14 0.00% * FE_DOWNWARD 0: 3923545410 39.24% 1: 4745067290 47.45% 2: 1137899814 11.38% 3: 171596912 1.72% 4: 20013805 0.20% 5: 1773899 0.02% 6: 99911 0.00% 7: 2928 0.00% 8: 31 0.00% * FE_TOWARDZERO 0: 3697160741 36.97% 1: 4731951491 47.32% 2: 1303092738 13.03% 3: 231969191 2.32% 4: 32344517 0.32% 5: 3283092 0.03% 6: 193010 0.00% 7: 5175 0.00% 8: 45 0.00% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 237.7960 175.4090 26.24% x86_64v2 232.9320 163.4460 29.83% x86_64v3 193.0680 89.7721 53.50% aarch64 113.6340 56.7350 50.07% power10 92.0617 26.6137 71.09% Latency master patched improvement x86_64 266.7190 208.0130 22.01% x86_64v2 263.6070 200.0280 24.12% x86_64v3 214.0260 146.5180 31.54% aarch64 114.4760 58.5235 48.88% power10 84.3718 35.7473 57.63% Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	d67d2f4688	math: Use lgamma from CORE-MATH The current implementation precision shows the following accuracy, on one range ([-1,1]) with 10e9 uniform randomly generated numbers for each range (first column is the accuracy in ULP, with '0' being correctly rounded, second is the number of samples with the corresponding precision): * Range [-20, 20] * FE_TONEAREST 0: 6701254075 67.01% 1: 3230897408 32.31% 2: 63986940 0.64% 3: 3605417 0.04% 4: 233189 0.00% 5: 20973 0.00% 6: 1869 0.00% 7: 125 0.00% 8: 4 0.00% * FE_UPWARDA 0: 4207428861 42.07% 1: 5001137116 50.01% 2: 740542213 7.41% 3: 49116304 0.49% 4: 1715617 0.02% 5: 54464 0.00% 6: 4956 0.00% 7: 451 0.00% 8: 16 0.00% 9: 2 0.00% * FE_DOWNWARD 0: 4155925193 41.56% 1: 4989821364 49.90% 2: 770312796 7.70% 3: 72014726 0.72% 4: 11040522 0.11% 5: 872811 0.01% 6: 12480 0.00% 7: 106 0.00% 8: 2 0.00% * FE_TOWARDZERO 0: 4225861532 42.26% 1: 5027051105 50.27% 2: 706443411 7.06% 3: 39877908 0.40% 4: 713109 0.01% 5: 47513 0.00% 6: 4961 0.00% 7: 438 0.00% 8: 23 0.00% * Range [20, 0x5.d53649e2d4674p+1012] * FE_TONEAREST 0: 7262241995 72.62% 1: 2737758005 27.38% * FE_UPWARD 0: 4690392401 46.90% 1: 5143728216 51.44% 2: 165879383 1.66% * FE_DOWNWARD 0: 4690333331 46.90% 1: 5143794937 51.44% 2: 165871732 1.66% * FE_TOWARDZERO 0: 4690343071 46.90% 1: 5143786761 51.44% 2: 165870168 1.66% The CORE-MATH implementation is correctly rounded for any rounding mode. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1) shows: reciprocal-throughput master patched improvement x86_64 112.9740 135.8640 -20.26% x86_64v2 111.8910 131.7590 -17.76% x86_64v3 108.2800 68.0935 37.11% aarch64 61.3759 49.2403 19.77% power10 42.4483 24.1943 43.00% Latency master patched improvement x86_64 144.0090 167.9750 -16.64% x86_64v2 139.2690 167.1900 -20.05% x86_64v3 130.1320 96.9347 25.51% aarch64 66.8538 53.2747 20.31% power10 49.5076 29.6917 40.03% For x86_64/x86_64-v2, most performance hit came from the fma call through the ifunc mechanism. Checked on x86_64-linux-gnu, aarch64-linux-gnu, and powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00
Adhemerval Zanella	140e802cb3	math: Move atanh internal data to separate file The internal data definitions are moved to s_atanh_data.c. It helps on ABIs that build the implementation multiple times for ifunc optimizations, like x86_64. Reviewed-by: DJ Delorie <dj@redhat.com>	2025-10-27 09:34:04 -03:00

1 2 3 4 5 ...

43063 Commits All Branches Search

43063 Commits

All Branches