It adds the new fields on generic statx struct from Linux commit
5d894321c49e61379189b0ff605f316e39cbd1e9.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
It adds the new constant STATX_DIO_READ_ALIGN and related fields in
generic statx struct from Linux commit
7ed6cbe0f8caa6ee38a2dc8f1b925acb904cc01f.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The commit fc650bfd71 added
STATX_WRITE_ATOMIC/STATX_ATTR_WRITE_ATOMIC on the statx-generic.h
without updating the generic statx struct.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Linux 6.16 adds no new syscalls, while Linux 6.17 adds file_getattr
and file_setattr (commit be7efb2d20d67f334a7de2aef77ae6c69367e646).
Update syscall-names.list and regenerate the arch-syscall.h headers
with build-many-glibcs.py update-syscalls.
The pidfd interface was extended with:
* PIDFD_GET_INFO and pidfd_info (along with related extra flags) to
allow get information about the process without the need to parse
/proc (commit cdda1f26e74ba, Linux 6.13).
* PIDFD_SELF_{THREAD,THREAD_GROUP,SELF,SELF_PROCESS} to allow
pidfd_send_signal refer to the own process or thread lead groups
without the need of allocating a file descriptor (commit f08d0c3a71114,
Linux 6.15).
* PIDFD_INFO_COREDUMP that extends PIDFD_GET_INFO to obtain coredump
information.
Linux uAPI header defines both PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP on linux/fcntl.h (since they reserve part of the
AT_* values), however for glibc I do not see any good reason to add pidfd
definitions on fcntl-linux.h.
The tst-pidfd.c is extended with some PIDFD_SELF_* tests and a new
‘tst-pidfd_getinfo.c’ test is added to check PIDFD_GET_INFO. The
PIDFD_INFO_COREDUMP tests would require very large and complex tests
that are already covered by kernel tests.
Checked on aarch64-linux-gnu and x86_64-linux-gnu on kernels 6.8 and
6.17.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
It improves latency for about 3-6% and throughput for about 5-12%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
* include/features.h (_POSIX_C_SOURCE): Document the value of 202405L
for POSIX.1-2024. Set it to 202405L when _GNU_SOURCE or _DEFAULT_SOURCE
is defined.
(_XOPEN_SOURCE): Document the value of 800 for POSIX-1.2024. Set it to
800 when _GNU_SOURCE is defined.
(__USE_XOPEN2K24, __USE_XOPEN2K24XSI): New internal macros. Set them
when _POSIX_C_SOURCE is 202405L or greater and/or when _XOPEN_SOURCE is
800 or greater.
* manual/creature.texi (Feature Test Macros): Document the new values
for _POSIX_C_SOURCE and _XOPEN_SOURCE.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Collin Funk <collin.funk1@gmail.com>
As discussed in bug 28327, the fromfp functions changed type in C23
(compared to the version in TS 18661-1); they now return the same type
as the floating-point argument, instead of intmax_t / uintmax_t.
As with other such incompatible changes compared to the initial TS
18661 versions of interfaces (the types of totalorder functions, in
particular), it seems appropriate to support only the new version as
an API, not the old one (although many programs written for the old
API might in fact work wtih the new one as well). Thus, the existing
implementations should become compat symbols. They are sufficiently
different from how I'd expect to implement the new version that using
separate implementations in separate files is more convenient than
trying to share code, and directly sharing testcases would be
problematic as well.
Rename the existing fromfp implementation and test files to names
reflecting how they're intended to become compat symbols, so freeing
up the existing filenames for a subsequent implementation of the C23
versions of these functions (which is the point at which the existing
implementations would actually become compat symbols).
gen-fromfp-tests.py and gen-fromfp-tests-inputs are not renamed; I
think it will make sense to adapt the test generator to be able to
generate most tests for both versions of the functions (with extra
test inputs added that are only of interest with the C23 version).
The ldbl-opt/nldbl-* files are also not renamed; since those are for a
static only library, no compat versions are needed, and they'll just
have their contents changed when the C23 version is implemented.
Tested for x86_64, and with build-many-glibcs.py.
C23 Annex H adds <math.h> typedefs long_double_t and _FloatN_t
(originally introduced in TS 18661-3), analogous to float_t and
double_t. Add these typedefs to glibc. (There are no _FloatNx_t
typedefs.)
C23 also slightly changes the rules for how such typedef names should
be defined, compared to the definition in TS 18661-3. In both cases,
<TYPE>_t corresponds to the evaluation format for <TYPE>, as specified
by FLT_EVAL_METHOD (for which <math.h> uses glibc's internal
__GLIBC_FLT_EVAL_METHOD). Specifically, each FLT_EVAL_METHOD value
corresponds to some type U (for example, 64 corresponds to U =
_Float64), and for types with exactly the same set of values as U, TS
18661-3 says expressions with those types are to be evaluated to the
range and precision of type U (so <TYPE>_t is defined to U), whereas
C23 only does that for types whose values are a strict subset of those
of type U (so <TYPE>_t is defined to <TYPE>).
As with other cases where semantics changed between TS 18661 and C23,
this patch only implements the newer version of the semantics
(including adjusting existing definitions of float_t and double_t as
needed). The new semantics are contradictory between the main
standard and Annex H for the case of FLT_EVAL_METHOD == 2 and the
choice of double_t when double and long double have the same values
(the main standard says it's defined as long double in that case,
whereas Annex H would define it as double), which I've raised on the
WG14 reflector (but I think setting FLT_EVAL_METHOD == 2 when double
and long double have the same values is a fairly theoretical
combination of features); for now glibc follows the value in the main
standard in that case.
Note that I think all existing GCC targets supported by glibc only use
values -1, 0, 1, 2 or 16 for FLT_EVAL_METHOD (so most of the header
code is somewhat theoretical, though potentially relevant with other
compilers since the choice of FLT_EVAL_METHOD is only an API choice,
not an ABI one; it can vary with compiler options, and these typedefs
should not be used in ABIs). The testcase (expanded to cover the new
typedefs) is really just repeating the same logic in a second place
(so all it really tests is that __GLIBC_FLT_EVAL_METHOD is consistent
with FLT_EVAL_METHOD).
Tested for x86_64 and x86, and with build-many-glibcs.py.
The Linux kernel ABI specifies that the vector registers are not preserved
across system calls, but the __SYSCALL_CLOBBERS macro doesn't mention them.
This could possibly lead to compilers trying to keep data in the vector
registers across the syscall leading to corruption. Add the vector registers
to __SYSCALL_CLOBBERS when the vector extension is enabled. If the vector
extension is enabled, then require GCC 15 or later and RVV 1.0 or later.
Fixes: 36960f0c76 ("RISC-V: Linux Syscall Interface")
Signed-off-by: Peter Bergner <bergner@tenstorrent.com>
In commit 970364dac0 we switched some
/*FALLTHROUGH*/ comments to [[fallthrough]] to avoid warnings with
Clang. However, since gperf emitted different output the buildbot
failed. The buildbot has been updated to use gperf 3.3 which will use
__attribute__ ((__fallthrough__)) where needed to avoid warnings [1].
This patch regenerates these files with the same version.
[1] https://sourceware.org/pipermail/libc-testresults/2025q4/014123.html
Reviewed-by: Mark Wielaard <mark@klomp.org>
i386 and m68k architectures should use math-use-builtins-sqrt.h rather
than relying on architecture-specific or inline assembly implementations.
The PowerPC optimization for PPC 601/603 (30 years old) is removed.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
It improves latency for about 3-10% and throughput for about 5-15%.
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The optimized i386 version is faster than the generic one, and
gcc implements it through the builtin. This optimization enables
us to migrate the implementation to a C version. The performance
on a Zen3 chip is similar to the SVID one.
The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin
(different than i386).
Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):
reciprocal-throughput input master NO-SVID improvement
x86_64 subnormals 18.8522 16.2506 13.80%
x86_64 normal 421.8260 403.9270 4.24%
x86_64 close-exponent 21.0579 18.7642 10.89%
i686 subnormals 21.3443 21.4229 -0.37%
i686 normal 525.8380 538.807 -2.47%
i686 close-exponent 21.6589 21.7983 -0.64%
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The optimized i386 version is faster than the generic one, and gcc
implements it through the builtin. This optimization enables us to
migrate the implementation to a C version. The performance on a Zen3
chip is similar to the SVID one.
The m68k provided an optimized version through __m81_u(remainderf)
(mathimpl.h), and gcc does not implement it through a builtin (different
than i386).
Performance improves a bit on x86_64 (Zen3, gcc 15.2.1):
reciprocal-throughput input master NO-SVID improvement
x86_64 subnormals 17.5349 15.6125 10.96%
x86_64 normal 53.8134 52.5754 2.30%
x86_64 close-exponent 20.0211 18.6656 6.77%
i686 subnormals 21.8105 20.1856 7.45%
i686 normal 73.1945 71.2199 2.70%
i686 close-exponent 22.2141 20.331 8.48%
Tested on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The only usage was for pthread_spin_lock, introduced by 12d2dd7060,
as a way to optimize the code for certain architectures. Now that atomic
builtins are used by default, let the compiler use the best code sequence
for the atomic exchange.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Now that atomic builtins are used by default, we can rely on the
compiler to define when to use 64-bit atomic operations.
It allows the use of 64-bit atomic operations on some 32-bit ABIs where
they were not previously enabled due to missing pre-processor handling:
hppa, mips64n32, s390, and sparcv9.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
All ABIs, except alpha and sparc, define it to
atomic_full_barrier/__sync_synchronize, which can be mapped to
__atomic_thread_fence (__ATOMIC_RELEASE).
For alpha, it uses a 'wmb' which does not map to any of C11
barriers.
For sparc it uses a stronger 'member #LoadStore | #StoreStore',
where the release barrier maps to just 'membar #StoreLoad'. The
patch keeps the sparc definition.
For PowerPC, it allows the use of lwsync for additional chips
(since _ARCH_PWR4 does not cover all chips that support it).
Tested on aarch64-linux-gnu.
Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
All ABIs, except alpha, powerpc, and x86_64, define it to
atomic_full_barrier/__sync_synchronize, which can be mapped to
__atomic_thread_fence (__ATOMIC_SEQ_CST) in most cases, with the
exception of aarch64 (where the acquire fence is generated as
'dmb ishld' instead of 'dmb ish').
For s390x, it defaults to a memory barrier where __sync_synchronize
emits a 'bcr 15,0' (which the manual describes as pipeline
synchronization).
For PowerPC, it allows the use of lwsync for additional chips
(since _ARCH_PWR4 does not cover all chips that support it).
Tested on aarch64-linux-gnu, where the acquire produces a different
instruction that the current code.
Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
All ABIs save for sparcv9 and s390 defines it to __sync_synchronize,
which can be mapped to __atomic_thread_fence (__ATOMIC_SEQ_CST).
For Sparc, it uses a stricter #StoreStore|#LoadStore|#StoreLoad|#LoadLoad
instead of the #StoreLoad generated by __sync_synchronize.
For s390x, it defaults to a memory barrier where __sync_synchronize
emits a 'bcr 15,0' (which the manual describes as pipeline synchronization).
The barrier is used only in one place (pthread_mutex_setprioceiling),
and using a stricter barrier for s390 is ok performance-wise.
Co-authored-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The __HAVE_64B_ATOMICS can be define based on __WORDSIZE, and
the __ARCH_ACQ_INSTR, MUTEX_HINT_*, and barriers definition are
defined by the target cpu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
These are already provided by the generic include/atomic.h and
the resulting macros are not Linux specific.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Both m68k and m68k-colfire do not support 64 bit atomis. The
atomic_barrier syscall on m68k is a no-op, so it can use the compiler
builtin.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The libgcc provides the required support to calling the kernel
auxiliary routines for !__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
These are already provided by the generic include/atomic.h.
Reviewed-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
These are already provided by the generic include/atomic.h. Also
remove outdated comment from unsupported gcc versions.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Build programs in $(others-noinstall) like tests only if libgcc_s is
available. Otherwise, "build-many-glibcs.py compilers" will fail to
build the initial glibc with the initial limited gcc due to the missing
libgcc_s.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
C23 makes assert into a variadic macro to handle cases of an argument
that would be interpreted as a single function argument but more than
one macro argument (in particular, compound literals with an
unparenthesized comma in an initializer list); this change was made by
N2829. Note that this only applies to assert, not to other macros
specified in the C standard with particular numbers of arguments.
Implement this support in glibc. This change is only for C; C++ would
need a separate change to its separate assert implementations. It's
also applied only in C23 mode. It depends on support for (C99)
variadic macros, and also (in order to detect calls where more than
one expression is passed, via an unevaluated function call) a C99
boolean type. These requirements are encapsulated in the definition
of __ASSERT_VARIADIC. Tests with -std=c99 and -std=gnu99 (using
implementations continue to work.
I don't think we have a way in the glibc testsuite to validate that
passing more than one expression as an argument does produce the
desired error.
Tested for x86_64.
The Dynamic Linker chapter now includes a new section detailing
environment variables that influence its behavior.
This new section documents the `LD_DEBUG` environment variable,
explaining how to enable debugging output and listing its various
keywords like `libs`, `reloc`, `files`, `symbols`, `bindings`,
`versions`, `scopes`, `tls`, `all`, `statistics`, `unused`, and `help`.
It also documents `LD_DEBUG_OUTPUT`, which controls where the debug
output is written, allowing redirection to a file with the process ID
appended.
This provides users with essential information for controlling and
debugging the dynamic linker.
Reviewed-by: DJ Delorie <dj@redhat.com>
Introduce the `DL_DEBUG_TLS` debug mask to enable detailed logging for
Thread-Local Storage (TLS) and Thread Control Block (TCB) management.
This change integrates a new `tls` option into the `LD_DEBUG`
environment variable, allowing developers to trace:
- TCB allocation, deallocation, and reuse events in `dl-tls.c`,
`nptl/allocatestack.c`, and `nptl/nptl-stack.c`.
- Thread startup events, including the TID and TCB address, in
`nptl/pthread_create.c`.
A new test, `tst-dl-debug-tid`, has been added to validate the
functionality of this new debug logging, ensuring that relevant messages
are correctly generated for both main and worker threads.
This enhances the debugging capabilities for diagnosing issues related
to TLS allocation and thread lifecycle within the dynamic linker.
Reviewed-by: DJ Delorie <dj@redhat.com>
Introduce a RISC-V specific string-misc.h to provide an optimized
repeat_bytes implementation when the Zbkb extension is available.
The new version uses packh/packw/pack instruction count and avoiding
high latency instructions. This helper is used by several mem and string
functions, and falls back to the generic implementation when Zbkb is not
present.
Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
Remove xfail from pow testcase since pow and powf have been fixed.
Also check float128 maximum value. See BZ #33563.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Fix pow (DBL_MAX, 1.0) to return DBL_MAX when rouding upwards without FMA.
This fixes BZ #33563.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Fix powf (0x1.fffffep+127, 1.0f) to return 0x1.fffffep+127 when
rouding upwards. Cleanup the special case code - performance
improves by ~1.2%. This fixes BZ #33563.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>