glibc/sysdeps
Matheus Castanho 1a594aa986 powerpc: Add optimized rawmemchr for POWER10
Reuse code for optimized strlen to implement a faster version of rawmemchr.
This takes advantage of the same benefits provided by the strlen implementation,
but needs some extra steps. __strlen_power10 code should be unchanged after this
change.

rawmemchr returns a pointer to the char found, while strlen returns only the
length, so we have to take that into account when preparing the return value.

To quickly check 64B, the loop on __strlen_power10 merges the whole block into
16B by using unsigned minimum vector operations (vminub) and checks if there are
any \0 on the resulting vector. The same code is used by rawmemchr if the char c
is 0. However, this approach does not work when c != 0.  We first need to
subtract each byte by c, so that the value we are looking for is converted to a
0, then taking the minimum and checking for nulls works again.

The new code branches after it has compared ~256 bytes and chooses which of the
two strategies above will be used in the main loop, based on the char c. This
extra branch adds some overhead (~5%) for length ~256, but is quickly amortized
by the faster loop for larger sizes.

Compared to __rawmemchr_power9, this version is ~20% faster for length < 256.
Because of the optimized main loop, the improvement becomes ~35% for c != 0
and ~50% for c = 0 for strings longer than 256.

Reviewed-by: Lucas A. M. Magalhaes <lamm@linux.ibm.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
2021-05-17 10:30:35 -03:00
..
aarch64
alpha
arc
arm
csky
generic elf: Move static TLS size and alignment into _rtld_global_ro 2021-05-17 10:17:41 +02:00
gnu Annotate additional APIs with GCC attribute access. 2021-05-06 11:01:05 -06:00
hppa
htl
hurd
i386 Remove architecture specific sched_cpucount optimizations 2021-05-07 13:35:29 -03:00
ia64 Remove architecture specific sched_cpucount optimizations 2021-05-07 13:35:29 -03:00
ieee754 add workload traces for cbrtl 2021-05-10 18:45:34 +02:00
m68k
mach Hurd: Add missing hidden proto definition for __ttyname_r 2021-05-10 10:29:36 +02:00
microblaze
mips
nios2
nptl nptl: Move __nptl_initial_report_events into ld.so/startup code 2021-05-17 10:04:06 +02:00
posix
powerpc powerpc: Add optimized rawmemchr for POWER10 2021-05-17 10:30:35 -03:00
pthread nptl: Move thread join functions into libc 2021-05-11 11:24:39 +02:00
riscv
s390
sh
sparc
unix nptl: Move pthread_sigqueue into libc 2021-05-17 10:25:12 +02:00
wordsize-32
wordsize-64
x86 x86: Set rep_movsb_threshold to 2112 on processors with FSRM 2021-05-03 05:08:22 -07:00
x86_64 elf: Use relaxed atomics for racy accesses [BZ #19329] 2021-05-11 17:16:37 +01:00