glibc

Commit Graph

Author	SHA1	Message	Date
Wilco Dijkstra	2071666d03	AArch64: Fix typo in math-vector.h Fix typo atanpi2->atan2pi in math-vector.h.	2025-05-20 13:44:16 +00:00
Wilco Dijkstra	b990b0aee2	AArch64: Cleanup SVE config and defines Now we finally support modern GCC and binutils, it's time for a cleanup. Remove HAVE_AARCH64_SVE_ASM define and conditional compilation. Remove SVE configure checks for SVE, ACLE and variant-PCS support. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-05-20 10:33:55 +00:00
Wilco Dijkstra	2c421fc430	AArch64: Cleanup PAC and BTI Now we finally support modern GCC and binutils, it's time for a cleanup. Use PAC and BTI instructions unconditionally and use proper assembler syntax. Remove the PR target/94791 strip_pac workarounds for buggy GCCs. Remove the PAC/BTI configure checks - always emit GNU property notes on assembly files. Change cfi_window_save to the correct cfi_negate_ra_state unwind directive. Reviewed-by: Matthieu Longo <matthieu.longo@arm.com>	2025-05-19 15:35:32 +00:00
Dylan Fleming	96abd59bf2	AArch64: Implement AdvSIMD and SVE atan2pi/f Implement double and single precision variants of the C23 routine atan2pi for both AdvSIMD and SVE. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-19 15:35:25 +00:00
Dylan Fleming	edf6202815	AArch64: Implement AdvSIMD and SVE atanpi/f Implement double and single precision variants of the C23 routine atanpi for both AdvSIMD and SVE. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-19 15:34:40 +00:00
Dylan Fleming	0ef2cf44e7	AArch64: Implement AdvSIMD and SVE asinpi/f Implement double and single precision variants of the C23 routine asinpi for both AdvSIMD and SVE. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-19 15:33:50 +00:00
Dylan Fleming	993997ca1b	AArch64: Implement AdvSIMD and SVE acospi/f Implement double and single precision variants of the C23 routine acospi for both AdvSIMD and SVE. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-19 15:31:59 +00:00
Dylan Fleming	1e84509e00	AArch64: Optimize inverse trig functions Improve performance of Inverse trig functions by altering how coefficients are loaded. Performance improvement on Neoverse V1: SVE acos 14% AdvSIMD acos 6% AdvSIMD asin 6% SVE asin 5% AdvSIMD asinf 2% AdvSIMD atanf 22% SVE atanf 20% SVE atan 11% AdvSIMD atan 5% SVE atan2 7% SVE atan2f 4% AdvSIMD atan2f 3% AdvSIMD atan2 2% Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-19 14:54:32 +00:00
Yury Khrustalev	251f932624	aarch64: update tests for SME Add test that checks that ZA state is disabled after setjmp and sigsetjmp Update existing SME test that uses setjmp Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-15 14:23:35 +01:00
Yury Khrustalev	a7f6fd976c	aarch64: Disable ZA state of SME in setjmp and sigsetjmp Due to the nature of the ZA state, setjmp() should clear it in the same manner as it is already done by longjmp. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-05-15 14:23:03 +01:00
Yury Khrustalev	691edbdf77	aarch64: fix unwinding in longjmp Previously, longjmp() on aarch64 was using CFI directives around the call to __libc_arm_za_disable() after CFA was redefined at the start of longjmp(). This may result in unwinding issues. Move the call and surrounding CFI directives to the beginning of longjmp(). Suggested-by: Wilco Dijkstra <wilco.dijkstra@arm.com>	2025-05-13 13:00:57 +01:00
Andrew Pinski	ceeffd970c	aarch64: Add back non-temporal load/stores from oryon-1's memset I misunderstood the recommendation from the hardware team about non-temporal load/stores. It is still recommended to use them in memset for large sizes. It was not recommended for their use with device memory and memset is already not valid to be used with device memory. This reverts commit `e6590f0c86`. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-04-15 12:07:07 -07:00
Andrew Pinski	0e1aa5db73	aarch64: Add back non-temporal load/stores from oryon-1's memcpy I misunderstood the recommendation from the hardware team about non-temporal load/stores. It is still recommended to use them in memcpy for large sizes. It was not recommended for their use with device memory and memcpy is already not valid to be use with device memory. This reverts commit `eb5eeb4740`. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-04-15 12:06:59 -07:00
Adhemerval Zanella	4352e2cc93	aarch64: Fix _dl_tlsdesc_dynamic unwind for pac-ret (BZ 32612) When libgcc is built with pac-ret, it requires to autenticate the unwinding frame based on CFI information. The _dl_tlsdesc_dynamic uses a custom calling convention, where it is responsible to save and restore all registers it might use (even volatile). The pac-ret support added by `1be3d6eb82` was added only on the slow-path, but the fast path also adds DWARF Register Rule Instruction (cfi_adjust_cfa_offset) since it requires to save/restore some auxiliary register. It seems that this is not fully supported neither by libgcc nor AArch64 ABI [1]. Instead, move paciasp/autiasp to function prologue/epilogue to be used on both fast and slow paths. I also corrected the _dl_tlsdesc_dynamic comment description, it was copied from i386 implementation without any adjustment. Checked on aarch64-linux-gnu with a toolchain built with --enable-standard-branch-protection on a system with pac-ret support. [1] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1 Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-03-31 10:08:06 -03:00
Pierre Blanchard	cf56eb28fa	AArch64: Optimize algorithm in users of SVE expf helper Polynomial order was unnecessarily high, unlocking multiple optimizations. Max error for new SVE expf is 0.88 +0.5ULP. Max error for new SVE coshf is 2.56 +0.5ULP. Performance improvement on Neoverse V1: expf (30%), coshf (26%). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-03-18 17:15:18 +00:00
Adhemerval Zanella	3e8814903c	math: Refactor how to use libm-test-ulps The current approach tracks math maximum supported errors by explicitly setting them per function and architecture. On newer implementations or new compiler versions, the file is updated with newer values if it shows higher results. The idea is to track the maximum known error, to update the manual with the obtained values. The constant libm-test-ulps shows little value, where it is usually a mechanical change done by the maintainer, for past releases it is usually ignored whether the ulp change resulted from a compiler regression, and the math tests already have a maximum ulp error that triggers a regression. It was shown by a recent update after the new acosf [1] implementation that is correctly rounded, where the libm-test-ulps was indeed from a compiler issue. This patch removes all arch-specific libm-test-ulps, adds system generic libm-test-ulps where applicable, and changes its semantics. The generic files now track specific implementation constraints, like if it is expected to be correctly rounded, or if the system-specific has different error expectations. Now multiple libm-test-ulps can be defined, and system-specific overrides generic implementation. This is for the case where arch-specific implementation might show worse precision than generic implementation, for instance, the cbrtf on i686. Regressions are only reported if the implementation shows larger errors than 9 ulps (13 for IBM long double) unless it is overridden by libm-test-ulps and the maximum error is not printed at the end of tests. The regen-ulps rule is also removed since it does not make sense to update the libm-test-ulps automatically. The manual error table is also removed, Paul Zimmermann and others have been tracking libm precision with a more comprehensive analysis for some releases; so link to his work instead. [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=9cc9f8e11e8fb8f54f1e84d9f024917634a78201	2025-03-12 13:40:07 -03:00
Wilco Dijkstra	0f044be1da	AArch64: Use prefer_sve_ifuncs for SVE memset Use prefer_sve_ifuncs for SVE memset just like memcpy. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-02-27 16:51:57 +00:00
Wilco Dijkstra	935563754b	AArch64: Remove LP64 and ILP32 ifdefs Remove LP64 and ILP32 ifdefs. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-02-24 14:20:29 +00:00
Wilco Dijkstra	4c11379106	AArch64: Simplify lrint Simplify lrint. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-02-24 14:20:03 +00:00
Wilco Dijkstra	0a021727bc	AArch64: Remove AARCH64_R macro Remove AArch64_R relocation macro. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-02-24 14:19:19 +00:00
Wilco Dijkstra	eb7ac024d9	AArch64: Cleanup pointer mangling Cleanup pointer mangling. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-02-24 14:17:57 +00:00
Wilco Dijkstra	19860fd42e	AArch64: Remove PTR_REG defines Remove PTR_REG defines. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-02-24 14:16:55 +00:00
Wilco Dijkstra	ce2f26a22e	AArch64: Remove PTR_ARG/SIZE_ARG defines This series removes various ILP32 defines that are now no longer needed. Remove PTR_ARG/SIZE_ARG. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-02-24 14:15:15 +00:00
Wilco Dijkstra	163b1bbb76	AArch64: Add SVE memset Add SVE memset based on the generic memset with predicated load for sizes < 16. Unaligned memsets of 128-1024 are improved by ~20% on average by using aligned stores for the last 64 bytes. Performance of random memset benchmark improves by ~2% on Neoverse V1. Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>	2025-02-20 15:31:50 +00:00
Yat Long Poon	95e807209b	AArch64: Improve codegen for SVE powf Improve memory access with indexed/unpredicated instructions. Eliminate register spills. Speedup on Neoverse V1: 3%. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-02-13 18:16:54 +00:00
Yat Long Poon	0b195651db	AArch64: Improve codegen for SVE pow Move constants to struct. Improve memory access with indexed/unpredicated instructions. Eliminate register spills. Speedup on Neoverse V1: 24%. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-02-13 18:16:54 +00:00
Yat Long Poon	f5ff34cb3c	AArch64: Improve codegen for SVE erfcf Reduce number of MOV/MOVPRFXs and use unpredicated FMUL. Replace MUL with LSL. Speedup on Neoverse V1: 6%. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-02-13 18:16:54 +00:00
Luna Lamb	c0ff447edf	Aarch64: Improve codegen in SVE exp and users, and update expf_inline Use unpredicted muls, and improve memory access. 7%, 3% and 1% improvement in throughput microbenchmark on Neoverse V1, for exp, exp2 and cosh respectively. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-02-13 18:16:54 +00:00
Luna Lamb	8f0e7fe61e	Aarch64: Improve codegen in SVE asinh Use unpredicated muls, use lanewise mla's and improve memory access. 1% regression in throughput microbenchmark on Neoverse V1. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-02-13 18:16:54 +00:00
Adhemerval Zanella	8f170dc819	math: Use tanpif from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic tanpif. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 85.1683 47.7990 43.88% x86_64v2 76.8219 41.4679 46.02% x86_64v3 73.7775 37.7734 48.80% aarch64 (Neoverse) 35.4514 18.0742 49.02% power8 22.7604 10.1054 55.60% power10 22.1358 9.9553 55.03% reciprocal-throughput master patched improvement x86_64 41.0174 19.4718 52.53% x86_64v2 34.8565 11.3761 67.36% x86_64v3 34.0325 9.6989 71.50% aarch64 (Neoverse) 25.4349 9.2017 63.82% power8 13.8626 3.8486 72.24% power10 11.7933 3.6420 69.12% Reviewed-by: DJ Delorie <dj@redhat.com>	2025-02-12 16:31:57 -03:00
Adhemerval Zanella	de2fca9fe2	math: Use sinpif from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic sinpif. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 47.5710 38.4455 19.18% x86_64v2 46.8828 40.7563 13.07% x86_64v3 44.0034 34.1497 22.39% aarch64 (Neoverse) 19.2493 14.1968 26.25% power8 23.5312 16.3854 30.37% power10 22.6485 10.2888 54.57% reciprocal-throughput master patched improvement x86_64 21.8858 11.6717 46.67% x86_64v2 22.0620 11.9853 45.67% x86_64v3 21.5653 11.3291 47.47% aarch64 (Neoverse) 13.0615 6.5499 49.85% power8 16.2030 6.9580 57.06% power10 12.8911 4.2858 66.75% Reviewed-by: DJ Delorie <dj@redhat.com>	2025-02-12 16:31:57 -03:00
Adhemerval Zanella	be85208b9f	math: Use cospif from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic cospif. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 47.4679 38.4157 19.07% x86_64v2 46.9686 38.3329 18.39% x86_64v3 43.8929 31.8510 27.43% aarch64 (Neoverse) 18.8867 13.2089 30.06% power8 22.9435 7.8023 65.99% power10 15.4472 7.77505 49.67% reciprocal-throughput master patched improvement x86_64 20.9518 11.4991 45.12% x86_64v2 19.8699 10.5921 46.69% x86_64v3 19.3475 9.3998 51.42% aarch64 (Neoverse) 12.5767 6.2158 50.58% power8 15.0566 3.2654 78.31% power10 9.2866 3.1147 66.46% Reviewed-by: DJ Delorie <dj@redhat.com>	2025-02-12 16:31:57 -03:00
Adhemerval Zanella	95a01ea955	math: Use atanpif from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic atanpif. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 66.3296 52.7558 20.46% x86_64v2 66.0429 51.4007 22.17% x86_64v3 60.6294 48.7876 19.53% aarch64 (Neoverse) 24.3163 20.9110 14.00% power8 16.5766 13.3620 19.39% power10 16.5115 13.4072 18.80% reciprocal-throughput master patched improvement x86_64 30.8599 16.0866 47.87% x86_64v2 29.2286 15.4688 47.08% x86_64v3 23.0960 12.8510 44.36% aarch64 (Neoverse) 15.4619 10.6752 30.96% power8 7.9200 5.2483 33.73% power10 6.8539 4.6262 32.50% Reviewed-by: DJ Delorie <dj@redhat.com>	2025-02-12 16:31:57 -03:00
Adhemerval Zanella	1cd9ccd8c0	math: Use atan2pif from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic atan2pif. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 79.4006 70.8726 10.74% x86_64v2 77.5136 69.1424 10.80% x86_64v3 71.8050 68.1637 5.07% aarch64 (Neoverse) 27.8363 24.7700 11.02% power8 39.3893 17.2929 56.10% power10 19.7200 16.8187 14.71% reciprocal-throughput master patched improvement x86_64 38.3457 30.9471 19.29% x86_64v2 37.4023 30.3112 18.96% x86_64v3 33.0713 24.4891 25.95% aarch64 (Neoverse) 19.3683 15.3259 20.87% power8 19.5507 8.27165 57.69% power10 9.05331 7.63775 15.64% Reviewed-by: DJ Delorie <dj@redhat.com>	2025-02-12 16:31:57 -03:00
Adhemerval Zanella	ae679a0aca	math: Use asinpif from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic asinpif. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 46.4996 41.6126 10.51% x86_64v2 46.7551 38.8235 16.96% x86_64v3 42.6235 33.7603 20.79% aarch64 (Neoverse) 17.4161 14.3604 17.55% power8 10.7347 9.0193 15.98% power10 10.6420 9.0362 15.09% reciprocal-throughput master patched improvement x86_64 24.7208 16.5544 33.03% x86_64v2 24.2177 14.8938 38.50% x86_64v3 20.5617 10.5452 48.71% aarch64 (Neoverse) 13.4827 7.17613 46.78% power8 6.46134 3.56089 44.89% power10 5.79007 3.49544 39.63% Reviewed-by: DJ Delorie <dj@redhat.com>	2025-02-12 16:31:57 -03:00
Adhemerval Zanella	edb2a8f0ae	math: Use acospif from CORE-MATH The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic acospif. The code was adapted to glibc style and to use the definition of math_config.h (to handle errno, overflow, and underflow). Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1, gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 54.8281 42.9070 21.74% x86_64v2 54.1717 42.7497 21.08% x86_64v3 49.3552 34.1512 30.81% aarch64 (Neoverse) 17.9395 14.3733 19.88% power8 20.3110 8.8609 56.37% power10 11.3113 8.84067 21.84% reciprocal-throughput master patched improvement x86_64 21.2301 14.4803 31.79% x86_64v2 20.6858 13.9506 32.56% x86_64v3 16.1944 11.3377 29.99% aarch64 (Neoverse) 11.4474 7.13282 37.69% power8 10.6916 3.57547 66.56% power10 4.64269 3.54145 23.72% Reviewed-by: DJ Delorie <dj@redhat.com>	2025-02-12 16:31:57 -03:00
Yury Khrustalev	d3f2b71ef1	aarch64: Fix tests not compatible with targets supporting GCS - Add GCS marking to some of the tests when target supports GCS - Fix tst-ro-dynamic-mod.map linker script to avoid removing GNU properties - Add header with macros for GNU properties Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-01-20 09:36:19 +00:00
Szabolcs Nagy	3d8da0d91b	aarch64: Add GCS user-space allocation logic Allocate GCS based on the stack size, this can be used for coroutines (makecontext) and thread creation (if the kernel allows user allocated GCS). Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-01-20 09:36:19 +00:00
Szabolcs Nagy	29476485f9	aarch64: Ignore GCS property of ld.so check_gcs is called for each dependency of a DSO, but the GNU property of the ld.so is not processed so ldso->l_mach.gcs may not be correct. Just assume ld.so is GCS compatible independently of the ELF marking. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-01-20 09:36:19 +00:00
Szabolcs Nagy	4d56a5bbd6	aarch64: Handle GCS marking - Handle GCS marking - Use l_searchlist.r_list for gcs (allows using the same function for static exe) Co-authored-by: Yury Khrustalev <yury.khrustalev@arm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-01-20 09:35:56 +00:00
Szabolcs Nagy	8d516b6f85	aarch64: Use l_searchlist.r_list for bti Allows using the same function for static exe. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-01-20 09:31:47 +00:00
Szabolcs Nagy	76b79f7241	aarch64: Mark objects with GCS property note Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-01-20 09:31:47 +00:00
Szabolcs Nagy	01f52b11de	aarch64: Enable GCS in dynamic linked exe Use the dynamic linker start code to enable GCS in the dynamic linked case after _dl_start returns and before _dl_start_user which marks the point after which user code may run. Like in the static linked case this ensures that GCS is enabled on a top level stack frame. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-01-20 09:31:47 +00:00
Szabolcs Nagy	9ad3d9267d	aarch64: Add glibc.cpu.aarch64_gcs tunable This tunable controls Guarded Control Stack (GCS) for the process. 0 = disabled: do not enable GCS 1 = enforced: check markings and fail if any binary is not marked 2 = optional: check markings but keep GCS off if a binary is unmarked 3 = override: enable GCS, markings are ignored By default it is 0, so GCS is disabled, value 1 will enable GCS. The status is stored into GL(dl_aarch64_gcs) early and only applied later, since enabling GCS is tricky: it must happen on a top level stack frame. Using GL instead of GLRO because it may need updates depending on loaded libraries that happen after readonly protection is applied, however library marking based GCS setting is not yet implemented. Describe new tunable in the manual. Co-authored-by: Yury Khrustalev <yury.khrustalev@arm.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2025-01-20 09:31:33 +00:00
Szabolcs Nagy	7d22054db7	aarch64: Mark swapcontext with indirect_return Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2025-01-20 09:22:41 +00:00
Szabolcs Nagy	5ff5e7836e	aarch64: Add GCS support to longjmp This implementations ensures that longjmp across different stacks works: it scans for GCS cap token and switches GCS if necessary then the target GCSPR is restored with a GCSPOPM loop once the current GCSPR is on the same GCS. This makes longjmp linear time in the number of jumped over stack frames when GCS is enabled. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-01-20 09:22:41 +00:00
Szabolcs Nagy	13cbbb0cb2	aarch64: Define jmp_buf offset for GCS The target specific internal __longjmp is called with a __jmp_buf argument which has its size exposed in the ABI. On aarch64 this has no space left, so GCSPR cannot be restored in longjmp in the usual way, which is needed for the Guarded Control Stack (GCS) extension. setjmp is implemented via __sigsetjmp which has a jmp_buf argument however it is also called with __pthread_unwind_buf_t argument cast to jmp_buf (in cancellation cleanup code built with -fno-exception). The two types, jmp_buf and __pthread_unwind_buf_t, have common bits beyond the __jmp_buf field and there is unused space there which we can use for saving GCSPR. For this to work some bits of those two generic types have to be reserved for target specific use and the generic code in glibc has to ensure that __longjmp is always called with a __jmp_buf that is embedded into one of those two types. Morally __longjmp should be changed to take jmp_buf as argument, but that is an intrusive change across targets. Note: longjmp is never called with __pthread_unwind_buf_t from user code, only the internal __libc_longjmp is called with that type and thus the two types could have separate longjmp implementations on a target. We don't rely on this now (but might in the future given that cancellation unwind does not need to restore GCSPR). Given the above this patch finds an unused slot for GCSPR. This placement is not exposed in the ABI so it may change in the future. This is also very target ABI specific so the generic types cannot be easily changed to clearly mark the reserved fields. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-01-20 09:22:41 +00:00
Szabolcs Nagy	58771b8a59	aarch64: Add asm helpers for GCS The Guarded Control Stack instructions can be present even if the hardware does not support the extension (runtime checked feature), so the asm code should be backward compatible with old assemblers. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2025-01-20 09:22:41 +00:00
Adhemerval Zanella	6c575d835e	aarch64: Use 64-bit variable to access the special registers clang issues: error: value size does not match register size specified by the constraint and modifier [-Werror,-Wasm-operand-widths] while tryng to use 32 bit variables with 'mrs' to get/set the fpsr, dczid_el0, and ctr.	2025-01-13 10:17:38 -03:00
Adhemerval Zanella	9cc9f8e11e	math: Fix acosf when building with gcc <= 11 GCC <= 11 wrongly assumes the rounding is to nearest and performs a constant folding where it should evaluate since the result is not exact [1]. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57245	2025-01-09 12:53:58 -03:00

1 2 3 4 5 ...

576 Commits