mirror of git://sourceware.org/git/glibc.git
This patch optimizes large size copy using normal store when src > dst and overlap. Make it the same as the logic in memmove-vec-unaligned-erms.S. Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non- temporal threshold, this patch updates that value to '__x86_shared_non_temporal_threshold'. Currently, the __x86_shared_non_temporal_threshold is cpu-specific, and different CPUs will have different values based on the related nt-benchmark results. However, in memmove-ssse3, the nontemporal threshold uses '__x86_shared_cache_size_half', which sounds unreasonable. The performance is not changed drastically although shows overall improvements without any major regressions or gains. Results on Zhaoxin KX-7000: bench-memcpy geometric_mean(N=20) New / Original: 0.999 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 0.978 bench-memmove geometric_mean(N=20) New / Original: 1.000 bench-memmmove-large geometric_mean(N=20) New / Original: 0.962 Results on Intel Core i5-6600K: bench-memcpy geometric_mean(N=20) New / Original: 1.001 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 1.001 bench-memmove geometric_mean(N=20) New / Original: 0.995 bench-memmmove-large geometric_mean(N=20) New / Original: 0.936 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> |
||
|---|---|---|
| .. | ||
| aarch64 | ||
| alpha | ||
| arc | ||
| arm | ||
| csky | ||
| generic | ||
| gnu | ||
| hppa | ||
| htl | ||
| hurd | ||
| i386 | ||
| ieee754 | ||
| loongarch | ||
| m68k | ||
| mach | ||
| microblaze | ||
| mips | ||
| nios2 | ||
| nptl | ||
| or1k | ||
| posix | ||
| powerpc | ||
| pthread | ||
| riscv | ||
| s390 | ||
| sh | ||
| sparc | ||
| unix | ||
| wordsize-32 | ||
| wordsize-64 | ||
| x86 | ||
| x86_64 | ||