Commit Graph

42475 Commits

Author SHA1 Message Date
Carlos O'Donell 15808c77b3 ppc64le: Revert "powerpc: Optimized strcmp for power10" (CVE-2025-5702)
This reverts commit 3367d8e180

Reason for revert: Power10 strcmp clobbers non-volatile vector
registers (Bug 33056)

Tested on ppc64le without regression.
2025-06-16 18:02:58 -04:00
Carlos O'Donell a7877bb668 ppc64le: Revert "powerpc : Add optimized memchr for POWER10" (Bug 33059)
This reverts commit b9182c793c

Reason for revert: Power10 memchr clobbers v20 vector register
(Bug 33059)

This is not a security issue, unlike CVE-2025-5745 and
CVE-2025-5702.

Tested on ppc64le without regression.
2025-06-16 18:02:58 -04:00
Carlos O'Donell c22de63588 ppc64le: Revert "powerpc: Fix performance issues of strcmp power10" (CVE-2025-5702)
This reverts commit 90bcc8721e

This change is in the chain of the final revert that fixes the CVE
i.e. 3367d8e180

Reason for revert: Power10 strcmp clobbers non-volatile vector
registers (Bug 33056)

Tested on ppc64le with no regressions.
2025-06-16 18:02:58 -04:00
Carlos O'Donell 63c60101ce ppc64le: Revert "powerpc: Optimized strncmp for power10" (CVE-2025-5745)
This reverts commit 23f0d81608

Reason for revert: Power10 strncmp clobbers non-volatile vector
registers (Bug 33060)

Tested on ppc64le with no regressions.
2025-06-16 18:02:58 -04:00
Cupertino Miranda cde5caa4bb malloc: add testing for large tcache support
This patch adds large tcache support tests by re-executing malloc tests
using the tunable:  glibc.malloc.tcache_max=1048576
Test names are postfixed with "largetcache".

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-16 12:54:32 +00:00
Cupertino Miranda cbfd798810 malloc: add tcache support for large chunk caching
Existing tcache implementation in glibc seems to focus in caching
smaller data size allocations, limiting the size of the allocation to
1KB.

This patch changes tcache implementation to allow to cache any chunk
size allocations.  The implementation adds extra bins (linked-lists)
which store chunks with different ranges of allocation sizes. Bin
selection is done in multiples in powers of 2 and chunks are inserted in
growing size ordering within the bin.  The last bin contains all other
sizes of allocations.

This patch although by default preserves the same implementation,
limitting caches to 1KB chunks, it now allows to increase the max size
for the cached chunks with the tunable glibc.malloc.tcache_max.

It also now verifies if chunk was mmapped, in which case __libc_free
will not add it to tcache.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-16 12:05:22 +00:00
H.J. Lu 5b7c8d1cd4 Always check lockf64 return value
On x86-64, when GCC 14.2.1 is used to build:

commit f3c82fc1b4
Author: Radko Krkos <krkos@mail.muni.cz>
Date:   Sat Jun 14 11:07:40 2025 +0200

    io: Mark lockf() __wur [BZ #32800]

    In commit 0476597b28 flock() was marked __wur in posix/unistd.h, but not
    in io/fcntl.h, the declarations must match.

    Reviewed-by: Florian Weimer <fweimer@redhat.com>

I got

programs/locarchive.c: In function ‘open_archive’:
programs/locarchive.c:641:18: error: ignoring return value of ‘lockf64’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
  641 |           (void) lockf64 (fd, F_ULOCK, sizeof (struct locarhead));
      |                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
programs/locarchive.c:653:14: error: ignoring return value of ‘lockf64’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
  653 |       (void) lockf64 (fd, F_ULOCK, sizeof (struct locarhead));
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
programs/locarchive.c:660:14: error: ignoring return value of ‘lockf64’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
  660 |       (void) lockf64 (fd, F_ULOCK, sizeof (struct locarhead));
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
programs/locarchive.c:679:14: error: ignoring return value of ‘lockf64’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
  679 |       (void) lockf64 (fd, F_ULOCK, sizeof (struct locarhead));
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Update locarchive.c to always check lockf64 return value.  This fixes
BZ #33089.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-16 14:48:45 +08:00
H.J. Lu 81467d4b61 elf: Add optimization barrier for __ehdr_start and _end
rtld.c has

extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
...
  _dl_rtld_map.l_map_start = (ElfW(Addr)) &__ehdr_start;
  _dl_rtld_map.l_map_end = (ElfW(Addr)) _end;

As

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120653

shows, compiler may generate run-time relocation on __ehdr_start with

	movq	.LC0(%rip), %xmm0
...
	.section	.data.rel.ro.local,"aw"
	.align 8
.LC0:
	.quad	__ehdr_start

This won't work before run-time relocation is finished in rtld.c.  Add
optimization barrier to prevent run-time relocations against __ehdr_start
and _end.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
2025-06-16 08:43:40 +08:00
gfleury 27360ab9ea htl: move pthread_key_*, pthread_get/setspecific
Signed-off-by: gfleury <gfleury@disroot.org>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Message-ID: <20250613184440.1660335-1-gfleury@disroot.org>
2025-06-15 21:21:12 +02:00
H.J. Lu 90cf97bb9d elf: Remove the unused _etext declaration
Since

commit 53df2ce688
Author: Florian Weimer <fweimer@redhat.com>
Date:   Fri Sep 8 13:02:06 2023 +0200

    elf: Remove unused l_text_end field from struct link_map

removed the only reference to _etext, also remove the unused _etext
declaration.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-15 12:42:24 +08:00
Radko Krkos f3c82fc1b4 io: Mark lockf() __wur [BZ #32800]
In commit 0476597b28 flock() was marked __wur in posix/unistd.h, but not
in io/fcntl.h, the declarations must match.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-14 11:57:46 +02:00
Adhemerval Zanella 1d828b9ddc benchtests: Improve modf benchtest
It adds four ranges, which is how the generic implementation handles
normal numbers:

  1. Random inputs in the range [0.0, 1.0];
  2. Random inputs in the range [1.0, (double)(UINT64_C(1) << 52))];
  3. Random inputs in the range [(double)(UINT64_C(1) << 52), DBL_MAX];
  4. Random integral inputs in the range [0.0, (double)(UINT64_C(1) << 52)].

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-13 11:30:12 -03:00
Adhemerval Zanella 619fd4e37b benchtests: Add modff benchtest
It adds four ranges, which is how the generic implementation handles
normal numbers:

  1. Random inputs in the range [0.0, 1.0];
  2. Random inputs in the range [1.0, (float)(1U << 23)];
  3. Random inputs in the range [(float)(1U << 23), FLT_MAX];
  4. Random integral inputs in the range [0.0, (float)(1U << 23)].

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-13 11:29:39 -03:00
Mark Harris 8af8beb1c4 riscv: Correct __riscv_hwprobe function prototype [BZ #32932]
The third argument to __riscv_hwprobe is the size in bytes of the
cpu bitmask pointed to by the fourth argument, however in the access
attribute (read_only, 4, 3) it is used as an element count (i.e., the
number of unsigned longs that make up the bitmask), resulting in a
false compiler warning:

$ gcc -c hwprobe1.c
hwprobe1.c: In function 'main':
hwprobe1.c:15:11: warning: '__riscv_hwprobe' reading 1024 bytes from a region of size 128 [-Wstringop-overread]
   15 |     ret = __riscv_hwprobe (pairs, 1, sizeof(cpus), cpus, 0);
      |           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hwprobe1.c:9:23: note: source object 'cpus' of size 128
    9 |     unsigned long int cpus[16];
      |                       ^~~~
In file included from hwprobe1.c:1:
/usr/include/riscv64-linux-gnu/sys/hwprobe.h:66:12: note: in a call to function '__riscv_hwprobe' declared with attribute 'access (read_only, 4, 3)'
   66 | extern int __riscv_hwprobe (struct riscv_hwprobe *__pairs, size_t __pair_count,
      |            ^~~~~~~~~~~~~~~
$

The documentation (https://docs.kernel.org/arch/riscv/hwprobe.html)
claims that the cpu bitmask has the type cpu_set_t *, which would be
consistent with other functions that take a cpu bitmask such as
sched_setaffinity and sched_getaffinity.  It also uses the name
cpusetsize for the third argument, which is much more accurate than
cpu_count since it is a size in bytes and not a cpu count.  The
(read_only, 4, 3) access attribute in the glibc prototype claims
that the cpu bitmask is only read, however when flags is
RISCV_HWPROBE_WHICH_CPUS it is both read and written.

Therefore, in the glibc prototype the type of the fourth argument is
changed to cpu_set_t * to match the documentation, the name of the
third argument is changed to cpusetsize as in the documentation, and the
incorrect access attribute that applies to these arguments is removed.
Almost all existing callers pass a null pointer for the fourth
argument, however a transparent union is introduced for compatibility
with callers that cast a pointer to the old argument type, and a
macro is introduced allowing callers the ability to distinguish
between the old and new prototype when needed.

The access attributes are being specified with __fortified_attr_access,
however this macro is for fortified functions; the regular
__attr_access macro is for non-fortified functions such as this one.
Using the incorrect macro results in no access checks at fortify level
3, because it is assumed that the fortified function will be doing the
checking.  It is changed to use the correct macro so that the access
checks will work regardless of fortify level.

Also because __riscv_hwprobe is not a cancellation point, __THROW
is added, consistent with similar functions.  (However, it is omitted
from the typedef because GCC does not accept it there.)

The __wur (warn_unused_result) attribute is helpful for functions that
cannot be used safely without checking the result, however code such
as the following does not require the result to be checked and should
not produce a warning:
    struct riscv_hwprobe pair = { RISCV_HWPROBE_KEY_IMA_EXT_0, 0 };
    __riscv_hwprobe (&pair, 1, 0, NULL, 0);
    if (pair.value & RISCV_HWPROBE_EXT_ZBB) ...
Therefore this attribute is omitted.

The comment claiming that the second argument to the ifunc selector
is a pointer to the vDSO function is corrected.  It is a pointer to
the regular glibc function (which returns errors as positive values),
not the vDSO function (which returns errors as negative values).

Fixes commit 426d0e1aa8 ("riscv: Add
Linux hwprobe syscall support").

Fixes: BZ #32932
Signed-off-by: Mark Harris <mark.hsj@gmail.com>
Signed-off-by: Mark Harris <mark.hsj@gmail.com>
Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com>
Acked-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-13 11:25:12 -03:00
Sergey Kolosov daab2a6d19 resolv: Add test for getaddrinfo returning FQDN in ai_canonname
Test for BZ #15218.  This test verifies that getaddrinfo returns a
fully-qualified domain name in the ai_canonname field then
AI_CANONNAME is set and search domains apply.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-10 15:10:31 +02:00
Yury Khrustalev b15ed85c86 aarch64: fix typo in sysdeps/aarch64/Makefile 2025-06-10 10:48:07 +01:00
Siddhesh Poyarekar f8f73249d9 Advisory text for CVE-2025-5745
The fix is not available yet, so this only records the first vulnerable
commit.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2025-06-09 13:07:26 -04:00
Siddhesh Poyarekar 62cb3ee57d Advisory text for CVE-2025-5702
The fix is not available yet, so this only records the first vulnerable
commit.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2025-06-09 13:07:26 -04:00
Samuel Thibault 5fdc693d95 hurd: Make __getrandom_early_init call __mach_init
25d37948c9 ("malloc: Improve malloc initialization") moved calling malloc
initialization earlier, within _dl_sysdep_start's call to dl_main, before
__mach_init is called by _dl_init_first. But malloc initialization uses
getrandom, which needs to make RPCs.

This adds __getrandom_early_init on hurd to express that getrandom needs
__mach_init too. This also adds a guard to avoid making it create several task
and host ports.

Fixes: 25d37948c9 ("malloc: Improve malloc initialization")
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2025-06-09 08:34:06 +00:00
H.J. Lu 0a027674a1 x86: Avoid GLRO(dl_x86_cpu_features)
In init_cpu_features, replace GLRO(dl_x86_cpu_features) with
cpu_features to avoid an extra load.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-09 13:03:13 +08:00
Maciej W. Rozycki 62fba6d980 manual: Add a comparative example of 'clock_nanosleep' use
Add an illustrative example of how to express 'nanosleep' in terms of
'clock_nanosleep'.
2025-06-06 18:14:34 +01:00
Wilco Dijkstra 09795c5612 AArch64: Fix builderror with GCC 12.1/12.2
Early versions of GCC 12 didn't support -mtune=neoverse-v2, so use
-mtune=neoverse-v1 instead.

Reported-by: Yury Khrustalev <yury.khrustalev@arm.com>
2025-06-06 13:22:27 +00:00
Maciej W. Rozycki 7a751ce39c Linux: Drop obsolete kernel support with `if_nameindex' and `if_nametoindex'
Support for the SIOCGIFINDEX ioctl(2) Linux ABI (0x8933 command, called
SIOGIFINDEX in the API originally) was added with kernel version 2.1.14
for AF_INET6 sockets, followed by general support with version 2.1.22.
The Linux API was then updated by adding the current SIOCGIFINDEX name
with kernel version 2.1.68, back in Nov 1997.

All these kernel versions are well below our current default required
minimum of 3.2.0, let alone some platform higher version requirements.

Drop support for the absence of the SIOCGIFINDEX ioctl(2) in the API or
ABI, by removing arrangements for the ENOSYS error condition.  Discard
the indirection from '__if_nameindex' to 'if_nameindex_netlink' and
adjust the implementation of '__if_nametoindex' accordingly for a better
code flow.
2025-06-05 19:04:46 +01:00
Yury Khrustalev fcd6a8b5c5 aarch64: add __ifunc_hwcap function to be used in ifunc resolvers
Add a new helper function __ifunc_hwcap() as a portable way to
access HWCAP elements via the parameter(s) passed to an ifunc
resolver checking the _IFUNC_ARG_HWCAP bit in the first parameter
and size of the buffer in the second parameter.

Note that 0 is returned when the requested element is not available
or does not correspond to a valid AT_HWCAP{,2,...} value.

Also add relevant tests.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-05 14:38:51 +01:00
Yury Khrustalev ea14d04e9a aarch64: add support for hwcap3,4
Add basic support for hwcap3 and hwcap4 in dynamic loader and
ifunc resolvers.

Describe new backward-compatible prototype for GNU indirect
function resolvers that use a pointer to uint64_t array in
stead of a pointer to the __ifunc_arg_t struct.

This patch also adds macro _IFUNC_HWCAP_MAX to specify current
number of hwcap elements.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-05 14:38:03 +01:00
Arjun Shankar 25f1d94576 manual: Document futimens and utimensat
Document futimens and utimensat.  Also document the EINVAL error
condition for futimes.  It is inherited by futimens and utimensat as
well.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 20:17:04 +02:00
Arjun Shankar 75b725717f manual: Document unlinkat
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 20:17:04 +02:00
Arjun Shankar 60f86c9cd0 manual: Document renameat
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 20:17:04 +02:00
Arjun Shankar 49766eb1a5 manual: Document mkdirat
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 20:17:04 +02:00
Arjun Shankar 941157dbcd manual: Document faccessat
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 20:17:04 +02:00
Arjun Shankar 3b21166c4d manual: Expand Descriptor-Relative Access section
Improve the clarity of the paragraphs describing common flags and add a
list of common error conditions for descriptor-relative functions.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 20:17:04 +02:00
Florian Weimer 2fca4b624b Makefile: Avoid $(objpfx)/ in makefiles
If paths with both $(objpfx)/ and $(objpfx) (which already includes
a trailing slash) appear during the build, this can trigger unexpected
rebuilds, or incorrect concurrent rebuilds.
2025-06-04 17:44:19 +02:00
Maciej W. Rozycki 140b20e971 manual: Document error codes missing for 'inet_pton'
Add documentation for EAFNOSUPPORT error code returned, and the possible
return values on non-success.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 16:27:20 +01:00
Maciej W. Rozycki 5a9020eeb2 manual: Document error codes missing for 'if_nametoindex'
Add documentation for ENODEV error code returned and refer to 'socket'
for further possible codes from the underlying function call.

While changing the text clarify the description by mentioning 'ifname'.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 16:27:20 +01:00
Maciej W. Rozycki 46acdf46cc manual: Document error codes missing for 'if_indextoname'
Add documentation for ENXIO error code returned and refer to 'socket'
for further possible codes from the underlying function call.

While changing the text clarify the description by mentioning 'ifname'
and replace @code tags with @var ones where referring to a function
parameter.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 16:27:20 +01:00
Cœur e885fd43db posix: fix building regex when _LIBC isn't defined
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 13:55:23 +02:00
Collin Funk 5b45674869 localedata: Use the name North Macedonia.
The name "the former Yugoslav Republic of Macedonia" is no longer in use
since the signing of the Prespa Agreement [1][2].  This resolved the
country's naming dispute with Greece and changed the name to "North
Macedonia".

The name field of this locale/iso-3166.def is not used, so this does not
affect binaries.

[1] https://en.wikipedia.org/wiki/Prespa_Agreement
[2] https://treaties.un.org/Pages/showDetails.aspx?objid=0800000280544ac1

Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-04 12:01:55 +02:00
Wilco Dijkstra 7e10e30e64 malloc: Count tcache entries downwards
Currently tcache requires 2 global variable accesses to determine
whether a block can be added to the tcache.  Change the counts array
to 'num_slots' to indicate the number of entries that could be added.
If 'num_slots' reaches zero, no more blocks can be added.  If the entries
pointer is not NULL, at least one block is available for allocation.

Now each tcache bin can support a different maximum number of entries,
and they can be individually switched on or off (a zero initialized
num_slots+entry means the tcache bin is not available for free or malloc).

Reviewed-by: DJ Delorie <dj@redhat.com>
2025-06-03 17:16:39 +00:00
Adhemerval Zanella 404526ee2e sparc: Fix argument passing to __libc_start_main (BZ 32981)
sparc start.S does not provide the final argument for
__libc_start_main, which is the highest stack address used to
update the __libc_stack_end.A

This fixes elf/tst-execstack-prog-static-tunable on sparc64.
On sparcv9 this does not happen because the kernel puts an
auxv value, which turns to point to a value in the stack itself.

Checked on sparc64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-03 09:11:46 -03:00
Collin Funk d475e5bf4f localedata: Refer to Eswatini instead of Swaziland.
The name was changed in 2018 [1].

The name is not used in locale/programs/ld-address.c so this does not
change any binaries or data.

[1] https://www.un.org/en/about-us/member-states/eswatini

Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-03 10:53:12 +02:00
наб 6945ce4a6f sigaction: don't sign-extend sa_flags
Before:
  rt_sigaction(SIGBUS, {sa_handler=0x55abb9960139, sa_mask=[], sa_flags=SA_RESTORER|SA_RESETHAND|SA_SIGINFO|0xffffffff00000000, sa_restorer=0x7fb1b2a82050}, NULL, 8) = 0

After:
  rt_sigaction(SIGBUS, {sa_handler=0x7f6a70dce139, sa_mask=[], sa_flags=SA_RESTORER|SA_RESETHAND|SA_SIGINFO, sa_restorer=0x7f6a70c28f60}, NULL, 8) = 0

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-03 10:53:12 +02:00
Collin Funk b2970d5e5b stdio-common: Add nonnull attribute to stdio_ext.h functions.
* stdio-common/stdio_ext.h (__fbufsize, __freading, __fwriting)
(__freadable, __fwritable, __flbf, __fpurge, __fpending, __fsetlocking):
Add __nonnull ((1)) to these functions since they access the FP without
checking if it is NULL.

Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella e529bfe8de elf: Fix UB on _dl_map_object_from_fd
On 32-bit architecture ubsan triggers:

UBSAN: Undefined behaviour in dl-load.c:1345:54 pointer index expression with base 0x00612508 overflowed  to 0xf7c3a508

Use explicit uintptr_t operation instead.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella 1642570563 argp: Fix shift bug
From gnulib commits 06094e390b0 and 88033d3779362a.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella 7c00a20397 math: Remove i386 ilogb/ilogbf/llogb/llogbf
The new float and double implementation does not required an
extra function call and error handling uses math_err function,
which results in better performance on i386 as well.

With gcc-14 on AMD AMD Ryzen 9 5900X, master shows:

$ ./benchtests/bench-ilogb
  "ilogb": {
   "subnormal": {
    "duration": 3.68863e+09,
    "iterations": 1.72228e+08,
    "max": 89.2995,
    "min": 21.016,
    "mean": 21.4171
   },
   "normal": {
    "duration": 3.68878e+09,
    "iterations": 1.72948e+08,
    "max": 78.6065,
    "min": 21.127,
    "mean": 21.3288
   }
  }
$ ./benchtests/bench-ilogbf
  "ilogbf": {
   "subnormal": {
    "duration": 3.68835e+09,
    "iterations": 1.66716e+08,
    "max": 46.953,
    "min": 21.793,
    "mean": 22.1236
   },
   "normal": {
    "duration": 3.68784e+09,
    "iterations": 1.66168e+08,
    "max": 46.9715,
    "min": 21.904,
    "mean": 22.1935
   }
  }

While with this patch:

$ ./benchtests/bench-ilogb
  "ilogb": {
   "subnormal": {
    "duration": 3.68134e+09,
    "iterations": 4.17516e+08,
    "max": 32.5045,
    "min": 8.3245,
    "mean": 8.81723
   },
   "normal": {
    "duration": 3.6677e+09,
    "iterations": 6.79468e+08,
    "max": 50.9305,
    "min": 5.3465,
    "mean": 5.3979
   }
}
$ ./benchtests/bench-ilogbf
  "ilogbf": {
   "subnormal": {
    "duration": 3.67553e+09,
    "iterations": 5.11032e+08,
    "max": 35.927,
    "min": 7.0485,
    "mean": 7.19237
   },
   "normal": {
    "duration": 3.66877e+09,
    "iterations": 6.556e+08,
    "max": 26.3625,
    "min": 5.5315,
    "mean": 5.59605
   }
 }

Checked on i686-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella 39775f00b1 math: Optimize float ilogb/llogb
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalidf_i/__math_invalidf_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.

The code now builds to with gcc-14 on aarch64:

0000000000000000 <__ilogbf>:
   0:   1e260000        fmov    w0, s0
   4:   d3577801        ubfx    x1, x0, #23, #8
   8:   340000e1        cbz     w1, 24 <__ilogbf+0x24>
   c:   5101fc20        sub     w0, w1, #0x7f
  10:   7103fc3f        cmp     w1, #0xff
  14:   54000040        b.eq    1c <__ilogbf+0x1c>  // b.none
  18:   d65f03c0        ret
  1c:   12b00000        mov     w0, #0x7fffffff                 // #2147483647
  20:   14000000        b       0 <__math_invalidf_i>
  24:   53175800        lsl     w0, w0, #9
  28:   340000a0        cbz     w0, 3c <__ilogbf+0x3c>
  2c:   5ac01000        clz     w0, w0
  30:   12800fc1        mov     w1, #0xffffff81                 // #-127
  34:   4b000020        sub     w0, w1, w0
  38:   d65f03c0        ret
  3c:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  40:   14000000        b       0 <__math_invalidf_i>

Some ABI requires additional adjustments:

  * i386 and m68k requires to use the template version, since
    both provide __ieee754_ilogb implementatations.

  * loongarch uses a custom implementation as well.

  * powerpc64le also has a custom implementation for POWER9, which
    is also used for float and float128 version.  The generic
    e_ilogb.c implementation is moved on powerpc to keep the
    current code as-is.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella afe09d44f3 math: Remove UB and optimize double ilogbf
The subnormal exponent calculation invokes UB by left shifting the
signed expoenent to find the first leading bit.

The patch reimplements ilogb using the math_config.h macros and
uses the new stdbit.h function to simplify the subnormal handling.

On aarch64 it generates better code:

* master:

0000000000000000 <__ieee754_ilogbf>:
   0:   1e260000        fmov    w0, s0
   4:   12007801        and     w1, w0, #0x7fffffff
   8:   72091c1f        tst     w0, #0x7f800000
   c:   54000141        b.ne    34 <__ieee754_ilogbf+0x34>  // b.any
  10:   34000201        cbz     w1, 50 <__ieee754_ilogbf+0x50>
  14:   53185c21        lsl     w1, w1, #8
  18:   12800fa0        mov     w0, #0xffffff82                 // #-126
  1c:   d503201f        nop
  20:   531f7821        lsl     w1, w1, #1
  24:   51000400        sub     w0, w0, #0x1
  28:   7100003f        cmp     w1, #0x0
  2c:   54ffffac        b.gt    20 <__ieee754_ilogbf+0x20>
  30:   d65f03c0        ret
  34:   13177c20        asr     w0, w1, #23
  38:   12b01002        mov     w2, #0x7f7fffff                 // #2139095039
  3c:   5101fc00        sub     w0, w0, #0x7f
  40:   6b02003f        cmp     w1, w2
  44:   12b00001        mov     w1, #0x7fffffff                 // #2147483647
  48:   1a819000        csel    w0, w0, w1, ls  // ls = plast
  4c:   d65f03c0        ret
  50:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  54:   d65f03c0        ret

* patch:

0000000000000000 <__ieee754_ilogbf>:
   0:   1e260001        fmov    w1, s0
   4:   d3577820        ubfx    x0, x1, #23, #8
   8:   350000e0        cbnz    w0, 24 <__ieee754_ilogbf+0x24>
   c:   53175821        lsl     w1, w1, #9
  10:   34000141        cbz     w1, 38 <__ieee754_ilogbf+0x38>
  14:   5ac01021        clz     w1, w1
  18:   12800fc0        mov     w0, #0xffffff81                 // #-127
  1c:   4b010000        sub     w0, w0, w1
  20:   d65f03c0        ret
  24:   7103fc1f        cmp     w0, #0xff
  28:   5101fc00        sub     w0, w0, #0x7f
  2c:   12b00001        mov     w1, #0x7fffffff                 // #2147483647
  30:   1a811000        csel    w0, w0, w1, ne  // ne = any
  34:   d65f03c0        ret
  38:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  3c:   d65f03c0        ret

Other architecture with support for stdc_leading_zeros and/or
__builtin_clzll should have similar improvements.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella c4be334400 math: Optimize double ilogb/llogb
It removes the wrapper by moving the error/EDOM handling to an
out-of-line implementation (__math_invalid_i/__math_invalid_li).
Also, __glibc_unlikely is used on errors case since it helps
code generation on recent gcc.

The code now builds to with gcc-14 on aarch64:

0000000000000000 <__ilogb>:
   0:   9e660000        fmov    x0, d0
   4:   d374f801        ubfx    x1, x0, #52, #11
   8:   340000e1        cbz     w1, 24 <__ilogb+0x24>
   c:   510ffc20        sub     w0, w1, #0x3ff
  10:   711ffc3f        cmp     w1, #0x7ff
  14:   54000040        b.eq    1c <__ilogb+0x1c>  // b.none
  18:   d65f03c0        ret
  1c:   12b00000        mov     w0, #0x7fffffff                 // #2147483647
  20:   14000000        b       0 <__math_invalid_i>
  24:   d374cc00        lsl     x0, x0, #12
  28:   b40000a0        cbz     x0, 3c <__ilogb+0x3c>
  2c:   dac01000        clz     x0, x0
  30:   12807fc1        mov     w1, #0xfffffc01                 // #-1023
  34:   4b000020        sub     w0, w1, w0
  38:   d65f03c0        ret
  3c:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  40:   14000000        b       0 <__math_invalid_i>

Some ABI requires additional adjustments:

  * i386 and m68k requires to use the template version, since
    both provide __ieee754_ilogb implementatations.

  * loongarch uses a custom implementation as well.

  * powerpc64le also has a custom implementation for POWER9, which
    is also used for float and float128 version.  The generic
    e_ilogb.c implementation is moved on powerpc to keep the
    current code as-is.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Adhemerval Zanella eb1e9194fa math: Remove UB and optimize double ilogb
The subnormal exponent calculation invokes UB by left shifting the
signed exponent to find the first leading bit.  The implementation
also uses 32 bits operations, which generates suboptimal code in
64 bits architectures.

The patch reimplements ilogb using the math_config.h macros and
uses the new stdbit function to simplify the subnormal handling.

On aarch64 it generates better code:

* master:

0000000000000000 <__ieee754_ilogb>:
   0:   9e660000        fmov    x0, d0
   4:   d360fc02        lsr     x2, x0, #32
   8:   d360f801        ubfx    x1, x0, #32, #31
   c:   f26c285f        tst     x2, #0x7ff00000
  10:   540001a1        b.ne    44 <__ieee754_ilogb+0x44>  // b.any
  14:   2a000022        orr     w2, w1, w0
  18:   34000322        cbz     w2, 7c <__ieee754_ilogb+0x7c>
  1c:   35000221        cbnz    w1, 60 <__ieee754_ilogb+0x60>
  20:   2a0003e1        mov     w1, w0
  24:   7100001f        cmp     w0, #0x0
  28:   12808240        mov     w0, #0xfffffbed                 // #-1043
  2c:   540000ad        b.le    40 <__ieee754_ilogb+0x40>
  30:   531f7821        lsl     w1, w1, #1
  34:   51000400        sub     w0, w0, #0x1
  38:   7100003f        cmp     w1, #0x0
  3c:   54ffffac        b.gt    30 <__ieee754_ilogb+0x30>
  40:   d65f03c0        ret
  44:   13147c20        asr     w0, w1, #20
  48:   12b00202        mov     w2, #0x7fefffff                 // #2146435071
  4c:   510ffc00        sub     w0, w0, #0x3ff
  50:   6b02003f        cmp     w1, w2
  54:   12b00001        mov     w1, #0x7fffffff                 // #2147483647
  58:   1a819000        csel    w0, w0, w1, ls  // ls = plast
  5c:   d65f03c0        ret
  60:   53155021        lsl     w1, w1, #11
  64:   12807fa0        mov     w0, #0xfffffc02                 // #-1022
  68:   531f7821        lsl     w1, w1, #1
  6c:   51000400        sub     w0, w0, #0x1
  70:   7100003f        cmp     w1, #0x0
  74:   54ffffac        b.gt    68 <__ieee754_ilogb+0x68>
  78:   d65f03c0        ret
  7c:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  80:   d65f03c0        ret

* patch:

0000000000000000 <__ieee754_ilogb>:
   0:   9e660001        fmov    x1, d0
   4:   d374f820        ubfx    x0, x1, #52, #11
   8:   350000e0        cbnz    w0, 24 <__ieee754_ilogb+0x24>
   c:   d374cc21        lsl     x1, x1, #12
  10:   b4000141        cbz     x1, 38 <__ieee754_ilogb+0x38>
  14:   dac01021        clz     x1, x1
  18:   12807fc0        mov     w0, #0xfffffc01                 // #-1023
  1c:   4b010000        sub     w0, w0, w1
  20:   d65f03c0        ret
  24:   711ffc1f        cmp     w0, #0x7ff
  28:   510ffc00        sub     w0, w0, #0x3ff
  2c:   12b00001        mov     w1, #0x7fffffff                 // #2147483647
  30:   1a811000        csel    w0, w0, w1, ne  // ne = any
  34:   d65f03c0        ret
  38:   320107e0        mov     w0, #0x80000001                 // #-2147483647
  3c:   d65f03c0        ret

Other architecture with support for stdc_leading_zeros and/or
__builtin_clzll should have similar improvements.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2025-06-02 13:32:19 -03:00
Arjun Shankar 591283a689 manual: Correct return value description of 'clock_nanosleep'
Commit 1a3d8f2201 incorrectly described
'clock_nanosleep' as having the same return values as 'nanosleep'.  Fix
this, clarifying that 'clock_nanosleep' returns a positive error number
upon failure instead of setting 'errno'.  Also clarify that 'nanosleep'
returns '-1' upon error.

Fixes: 1a3d8f2201
Reported-by: Mark Harris <mark.hsj@gmail.com>
Reviewed-by: Mark Harris <mark.hsj@gmail.com>
2025-06-02 16:06:11 +02:00