glibc/sysdeps/x86_64/multiarch
Noah Goldstein 64b8b6516b x86: Add evex optimized functions for the wchar_t strcpy family
Implemented:
    wcscat-evex  (+ 905 bytes)
    wcscpy-evex  (+ 674 bytes)
    wcpcpy-evex  (+ 709 bytes)
    wcsncpy-evex (+1358 bytes)
    wcpncpy-evex (+1467 bytes)
    wcsncat-evex (+1213 bytes)

Performance Changes:
    Times are from N = 10 runs of the benchmark suite and are reported
    as geometric mean of all ratios of New Implementation / Best Old
    Implementation. Best Old Implementation was determined with the
    highest ISA implementation.

    wcscat-evex     -> 0.991
    wcscpy-evex     -> 0.587
    wcpcpy-evex     -> 0.695
    wcsncpy-evex    -> 0.719
    wcpncpy-evex    -> 0.694
    wcsncat-evex    -> 0.979

Code Size Changes:
    This change  increase the size of libc.so by ~6.3kb bytes. For
    reference the patch optimizing the normal strcpy family functions
    decreases libc.so by ~5.7kb.

Full check passes on x86-64 and build succeeds for all ISA levels w/
and w/o multiarch.
2022-11-08 19:22:33 -08:00
..
scripts
Makefile x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
dl-symbol-redir-ifunc.h
ifunc-avx2.h
ifunc-evex.h
ifunc-impl-list.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
ifunc-memcmp.h
ifunc-memcmpeq.h
ifunc-memmove.h
ifunc-memset.h
ifunc-sse4_2.h
ifunc-strcasecmp.h
ifunc-strcpy.h
ifunc-strncpy.h
ifunc-wcs.h x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
ifunc-wcslen.h
ifunc-wmemset.h
memchr-avx2-rtm.S
memchr-avx2.S
memchr-evex-base.S x86_64: Implement evex512 version of memchr, rawmemchr and wmemchr 2022-10-18 13:26:33 -07:00
memchr-evex-rtm.S
memchr-evex.S x86: Optimize memchr-evex.S and implement with VMM headers 2022-10-19 17:31:03 -07:00
memchr-evex512.S x86_64: Implement evex512 version of memchr, rawmemchr and wmemchr 2022-10-18 13:26:33 -07:00
memchr-sse2.S
memchr.c
memcmp-avx2-movbe-rtm.S
memcmp-avx2-movbe.S
memcmp-evex-movbe.S x86: Use VMM API in memcmp-evex-movbe.S and minor changes 2022-11-08 19:19:35 -08:00
memcmp-sse2.S
memcmp.c
memcmpeq-avx2-rtm.S
memcmpeq-avx2.S
memcmpeq-evex.S x86: Use VMM API in memcmpeq-evex.S and minor changes 2022-11-08 19:22:08 -08:00
memcmpeq-sse2.S
memcmpeq.c
memcpy.c
memcpy_chk-nonshared.S
memcpy_chk.c
memmove-avx-unaligned-erms-rtm.S
memmove-avx-unaligned-erms.S
memmove-avx512-no-vzeroupper.S
memmove-avx512-unaligned-erms.S
memmove-erms.S
memmove-evex-unaligned-erms.S
memmove-shlib-compat.h
memmove-sse2-unaligned-erms.S
memmove-ssse3.S
memmove-vec-unaligned-erms.S x86: Use `testb` for FSRM check in memmove-vec-unaligned-erms 2022-10-20 11:29:05 -07:00
memmove.c
memmove_chk-nonshared.S
memmove_chk.c
mempcpy.c
mempcpy_chk-nonshared.S
mempcpy_chk.c
memrchr-avx2-rtm.S
memrchr-avx2.S
memrchr-evex.S x86: Optimize memrchr-evex.S 2022-10-19 17:31:03 -07:00
memrchr-sse2.S
memrchr.c
memset-avx2-unaligned-erms-rtm.S x86: Update memset to use new VEC macros 2022-10-14 21:21:58 -07:00
memset-avx2-unaligned-erms.S x86: Update memset to use new VEC macros 2022-10-14 21:21:58 -07:00
memset-avx512-no-vzeroupper.S
memset-avx512-unaligned-erms.S x86: Update memset to use new VEC macros 2022-10-14 21:21:58 -07:00
memset-erms.S
memset-evex-unaligned-erms.S x86: Update memset to use new VEC macros 2022-10-14 21:21:58 -07:00
memset-sse2-unaligned-erms.S x86: Update memset to use new VEC macros 2022-10-14 21:21:58 -07:00
memset-vec-unaligned-erms.S x86: Update memset to use new VEC macros 2022-10-14 21:21:58 -07:00
memset.c
memset_chk-nonshared.S
memset_chk.c
rawmemchr-avx2-rtm.S
rawmemchr-avx2.S
rawmemchr-evex-rtm.S x86: Optimize memchr-evex.S and implement with VMM headers 2022-10-19 17:31:03 -07:00
rawmemchr-evex.S x86: Optimize memchr-evex.S and implement with VMM headers 2022-10-19 17:31:03 -07:00
rawmemchr-evex512.S x86_64: Implement evex512 version of memchr, rawmemchr and wmemchr 2022-10-18 13:26:33 -07:00
rawmemchr-sse2.S
rawmemchr.c
reg-macros.h
rtld-memchr.S
rtld-memcmp.S
rtld-memcmpeq.S
rtld-memmove.S
rtld-memset.S
rtld-rawmemchr.S
rtld-stpcpy.S
rtld-strchr.S
rtld-strchrnul.S
rtld-strcmp.S
rtld-strcpy.S
rtld-strcspn.c
rtld-strlen.S
rtld-strncmp.S
rtld-strnlen.S
stpcpy-avx2-rtm.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
stpcpy-avx2.S
stpcpy-evex.S
stpcpy-sse2-unaligned.S
stpcpy-sse2.S
stpcpy.c
stpncpy-avx2-rtm.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
stpncpy-avx2.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
stpncpy-evex.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-evex functions 2022-11-08 19:22:33 -08:00
stpncpy-sse2-unaligned.S
stpncpy.c
strcasecmp.c
strcasecmp_l-avx2-rtm.S
strcasecmp_l-avx2.S
strcasecmp_l-evex.S
strcasecmp_l-sse2.S
strcasecmp_l-sse4_2.S
strcasecmp_l.c
strcat-avx2-rtm.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
strcat-avx2.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
strcat-evex.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-evex functions 2022-11-08 19:22:33 -08:00
strcat-sse2-unaligned.S
strcat-sse2.S
strcat-strlen-avx2.h.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
strcat-strlen-evex.h.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-evex functions 2022-11-08 19:22:33 -08:00
strcat.c
strchr-avx2-rtm.S
strchr-avx2.S
strchr-evex-base.S x86_64: Implement evex512 version of strchrnul, strchr and wcschr 2022-10-25 22:39:35 -07:00
strchr-evex.S x86: Shrink / minorly optimize strchr-evex and implement with VMM headers 2022-10-19 17:31:03 -07:00
strchr-evex512.S x86_64: Implement evex512 version of strchrnul, strchr and wcschr 2022-10-25 22:39:35 -07:00
strchr-sse2-no-bsf.S
strchr-sse2.S
strchr.c
strchrnul-avx2-rtm.S
strchrnul-avx2.S
strchrnul-evex.S
strchrnul-evex512.S x86_64: Implement evex512 version of strchrnul, strchr and wcschr 2022-10-25 22:39:35 -07:00
strchrnul-sse2.S
strchrnul.c
strcmp-avx2-rtm.S
strcmp-avx2.S x86: Use `testb` for case-locale check in str{n}casecmp-avx2 2022-10-20 11:29:05 -07:00
strcmp-evex.S x86: Add support for VEC_SIZE == 64 in strcmp-evex.S impl 2022-10-20 11:29:05 -07:00
strcmp-naming.h
strcmp-sse2-unaligned.S
strcmp-sse2.S x86: Use `testb` for case-locale check in str{n}casecmp-sse2 2022-10-20 11:29:05 -07:00
strcmp-sse4_2.S x86: Use `testb` for case-locale check in str{n}casecmp-sse42 2022-10-20 11:29:05 -07:00
strcmp.c
strcpy-avx2-rtm.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
strcpy-avx2.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
strcpy-evex.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-evex functions 2022-11-08 19:22:33 -08:00
strcpy-sse2-unaligned.S
strcpy-sse2.S
strcpy.c
strcspn-generic.c
strcspn-sse4.c
strcspn.c
strlen-avx2-rtm.S
strlen-avx2.S
strlen-evex-base.S x86-64: Improve evex512 version of strlen functions 2022-10-30 13:09:56 -07:00
strlen-evex.S x86: Optimize strnlen-evex.S and implement with VMM headers 2022-10-19 17:31:03 -07:00
strlen-evex512.S x86: Update strlen-evex-base to use new reg/vec macros. 2022-10-14 21:21:58 -07:00
strlen-sse2.S
strlen.c
strncase.c
strncase_l-avx2-rtm.S
strncase_l-avx2.S
strncase_l-evex.S
strncase_l-sse2.S
strncase_l-sse4_2.S
strncase_l.c
strncat-avx2-rtm.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
strncat-avx2.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
strncat-evex.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-evex functions 2022-11-08 19:22:33 -08:00
strncat-sse2-unaligned.S
strncat.c
strncmp-avx2-rtm.S
strncmp-avx2.S
strncmp-evex.S
strncmp-sse2.S
strncmp-sse4_2.S
strncmp.c
strncpy-avx2-rtm.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
strncpy-avx2.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
strncpy-evex.S x86: Optimize and shrink st{r|p}{n}{cat|cpy}-evex functions 2022-11-08 19:22:33 -08:00
strncpy-or-cat-overflow-def.h x86: Optimize and shrink st{r|p}{n}{cat|cpy}-evex functions 2022-11-08 19:22:33 -08:00
strncpy-sse2-unaligned.S
strncpy.c
strnlen-avx2-rtm.S
strnlen-avx2.S
strnlen-evex.S x86: Optimize strnlen-evex.S and implement with VMM headers 2022-10-19 17:31:03 -07:00
strnlen-evex512.S
strnlen-sse2.S
strnlen.c
strpbrk-generic.c
strpbrk-sse4.c
strpbrk.c
strrchr-avx2-rtm.S
strrchr-avx2.S
strrchr-evex-base.S x86_64: Implement evex512 version of strrchr and wcsrchr 2022-11-03 15:51:52 -07:00
strrchr-evex.S x86: Remove AVX512-BVMI2 instruction from strrchr-evex.S 2022-10-20 11:29:05 -07:00
strrchr-evex512.S x86_64: Implement evex512 version of strrchr and wcsrchr 2022-11-03 15:51:52 -07:00
strrchr-sse2.S
strrchr.c
strspn-generic.c
strspn-sse4.c
strspn.c
strstr-avx512.c
strstr-sse2-unaligned.S
strstr.c
varshift.c
varshift.h
wcpcpy-evex.S x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcpcpy-generic.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcpcpy.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcpncpy-evex.S x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcpncpy-generic.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcpncpy.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcscat-evex.S x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcscat-generic.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcscat.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcschr-avx2-rtm.S
wcschr-avx2.S
wcschr-evex.S
wcschr-evex512.S x86_64: Implement evex512 version of strchrnul, strchr and wcschr 2022-10-25 22:39:35 -07:00
wcschr-sse2.S
wcschr.c
wcscmp-avx2-rtm.S
wcscmp-avx2.S
wcscmp-evex.S
wcscmp-sse2.S
wcscmp.c
wcscpy-evex.S x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcscpy-generic.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcscpy-ssse3.S
wcscpy.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcslen-avx2-rtm.S
wcslen-avx2.S
wcslen-evex.S
wcslen-evex512.S
wcslen-sse2.S
wcslen-sse4_1.S
wcslen.c
wcsncat-evex.S x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcsncat-generic.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcsncat.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcsncmp-avx2-rtm.S
wcsncmp-avx2.S
wcsncmp-evex.S
wcsncmp-generic.c
wcsncmp.c
wcsncpy-evex.S x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcsncpy-generic.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcsncpy.c x86: Add evex optimized functions for the wchar_t strcpy family 2022-11-08 19:22:33 -08:00
wcsnlen-avx2-rtm.S
wcsnlen-avx2.S
wcsnlen-evex.S x86: Optimize strnlen-evex.S and implement with VMM headers 2022-10-19 17:31:03 -07:00
wcsnlen-evex512.S
wcsnlen-generic.c
wcsnlen-sse4_1.S
wcsnlen.c
wcsrchr-avx2-rtm.S
wcsrchr-avx2.S
wcsrchr-evex.S
wcsrchr-evex512.S x86_64: Implement evex512 version of strrchr and wcsrchr 2022-11-03 15:51:52 -07:00
wcsrchr-sse2.S
wcsrchr.c
wmemchr-avx2-rtm.S
wmemchr-avx2.S
wmemchr-evex-rtm.S
wmemchr-evex.S
wmemchr-evex512.S x86_64: Implement evex512 version of memchr, rawmemchr and wmemchr 2022-10-18 13:26:33 -07:00
wmemchr-sse2.S
wmemchr.c
wmemcmp-avx2-movbe-rtm.S
wmemcmp-avx2-movbe.S
wmemcmp-evex-movbe.S
wmemcmp-sse2.S
wmemcmp.c
wmemset.c
wmemset_chk-nonshared.S
wmemset_chk.c
x86-avx-rtm-vecs.h
x86-avx-vecs.h x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00
x86-evex-vecs-common.h
x86-evex256-vecs.h
x86-evex512-vecs.h
x86-sse2-vecs.h
x86-vec-macros.h