linux-kernelorg-stable/kernel
Feng Tang 56f3547bfa mm: adjust vm_committed_as_batch according to vm overcommit policy
When checking a performance change for will-it-scale scalability mmap test
[1], we found very high lock contention for spinlock of percpu counter
'vm_committed_as':

    94.14%     0.35%  [kernel.kallsyms]         [k] _raw_spin_lock_irqsave
    48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap;
    45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap;

Actually this heavy lock contention is not always necessary.  The
'vm_committed_as' needs to be very precise when the strict
OVERCOMMIT_NEVER policy is set, which requires a rather small batch number
for the percpu counter.

So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and
lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies.  Also
add a sysctl handler to adjust it when the policy is reconfigured.

Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T
desktop, and 2097%(20X) on a 4S/72C/144T server.  We tested with test
platforms in 0day (server, desktop and laptop), and 80%+ platforms shows
improvements with that test.  And whether it shows improvements depends on
if the test mmap size is bigger than the batch number computed.

And if the lift is 16X, 1/3 of the platforms will show improvements,
though it should help the mmap/unmap usage generally, as Michal Hocko
mentioned:

: I believe that there are non-synthetic worklaods which would benefit from
: a larger batch.  E.g.  large in memory databases which do large mmaps
: during startups from multiple threads.

[1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Qian Cai <cai@lca.pw>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: kernel test robot <rong.a.chen@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1592725000-73486-4-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1594389708-60781-5-git-send-email-feng.tang@intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:26 -07:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-08-03 18:27:40 -07:00
cgroup for-5.9/block-20200802 2020-08-03 11:57:03 -07:00
configs
debug Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
dma It's been a busy cycle for documentation - hopefully the busiest for a 2020-08-04 22:47:54 -07:00
entry entry: Correct 'noinstr' attributes 2020-07-26 15:42:20 +02:00
events Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-08-05 20:13:21 -07:00
gcov treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
irq This tree adds the sched_set_fifo*() encapsulation APIs to remove 2020-08-06 11:55:43 -07:00
kcsan Merge branch 'kcsan' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/core 2020-08-01 09:26:27 +02:00
livepatch
locking s390: implement diag318 2020-08-06 12:59:31 -07:00
power mm: memcg: convert vmstat slab counters to bytes 2020-08-07 11:33:24 -07:00
printk Printk changes for 5.9 2020-08-04 22:22:25 -07:00
rcu This tree adds the sched_set_fifo*() encapsulation APIs to remove 2020-08-06 11:55:43 -07:00
sched This tree adds the sched_set_fifo*() encapsulation APIs to remove 2020-08-06 11:55:43 -07:00
time Time, timers and related driver updates: 2020-08-04 18:17:37 -07:00
trace This tree adds the sched_set_fifo*() encapsulation APIs to remove 2020-08-06 11:55:43 -07:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile Generic implementation of common syscall, interrupt and exception 2020-08-04 21:00:11 -07:00
acct.c
async.c treewide: Remove uninitialized_var() usage 2020-07-16 12:35:15 -07:00
audit.c audit/stable-5.9 PR 20200803 2020-08-04 14:20:26 -07:00
audit.h revert: 1320a4052e ("audit: trigger accompanying records when no rules present") 2020-07-29 10:00:36 -04:00
audit_fsnotify.c
audit_tree.c audit: Use struct_size() helper in alloc_chunk 2020-06-17 16:43:11 -04:00
audit_watch.c
auditfilter.c
auditsc.c audit/stable-5.9 PR 20200803 2020-08-04 14:20:26 -07:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c
capability.c
compat.c
configs.c
context_tracking.c context_tracking: Ensure that the critical path cannot be instrumented 2020-06-11 15:14:36 +02:00
cpu.c
cpu_pm.c
crash_core.c crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo 2020-07-02 17:56:11 +01:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-08-04 14:27:25 -07:00
extable.c
fail_function.c
fork.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
freezer.c
futex.c Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
gen_kheaders.sh
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c Linux 5.8-rc7 2020-07-28 13:18:01 +02:00
kcmp.c
kcov.c kcov: check kcov_softirq in kcov_remote_stop() 2020-06-10 19:14:17 -07:00
kexec.c
kexec_core.c
kexec_elf.c
kexec_file.c integrity-v5.9 2020-08-06 11:35:57 -07:00
kexec_internal.h
kheaders.c
kmod.c
kprobes.c kprobes: Remove unnecessary module_mutex locking from kprobe_optimizer() 2020-07-28 13:19:05 +02:00
ksysfs.c
kthread.c kthread: remove incorrect comment in kthread_create_on_cpu() 2020-08-07 11:33:21 -07:00
latencytop.c
module-internal.h
module.c dyndbg: rename __verbose section to __dyndbg 2020-07-24 17:00:08 +02:00
module_signature.c
module_signing.c
notifier.c
nsproxy.c nsproxy: support CLONE_NEWTIME with setns() 2020-07-08 11:14:22 +02:00
padata.c padata: remove padata_parallel_queue 2020-07-23 17:34:18 +10:00
panic.c bug: Annotate WARN/BUG/stackfail as noinstr safe 2020-06-11 15:14:36 +02:00
params.c
pid.c cap-checkpoint-restore-v5.9 2020-08-04 15:02:07 -07:00
pid_namespace.c pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid 2020-07-19 20:14:42 +02:00
profile.c
ptrace.c
range.c
reboot.c arch: remove unicore32 port 2020-07-01 12:09:13 +03:00
relay.c
resource.c
rseq.c
scs.c mm: memcontrol: account kernel stack per node 2020-08-07 11:33:25 -07:00
seccomp.c seccomp: Introduce addfd ioctl to seccomp user notifier 2020-07-14 16:29:42 -07:00
signal.c signal: fix typo in dequeue_synchronous_signal() 2020-07-26 23:57:52 +02:00
smp.c smp: Fix a potential usage of stale nr_cpus 2020-07-22 10:22:04 +02:00
smpboot.c
smpboot.h
softirq.c tasklets API update for v5.9-rc1 2020-08-04 13:40:35 -07:00
stackleak.c gcc-plugins/stackleak: Use asm instrumentation to avoid useless register saving 2020-06-24 07:48:28 -07:00
stacktrace.c
stop_machine.c
sys.c prctl: exe link permission error changed from -EINVAL to -EPERM 2020-07-19 20:14:42 +02:00
sys_ni.c
sysctl-test.c
sysctl.c mm: adjust vm_committed_as_batch according to vm overcommit policy 2020-08-07 11:33:26 -07:00
sysctl_binary.c
task_work.c task_work: teach task_work_add() to do signal_wake_up() 2020-06-30 12:18:08 -06:00
taskstats.c
test_kprobes.c
torture.c torture: Dump ftrace at shutdown only if requested 2020-06-29 12:01:45 -07:00
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c exec: Implement kernel_execve 2020-07-21 08:24:52 -05:00
up.c
user-return-notifier.c
user.c
user_namespace.c
usermode_driver.c umd: Stop using split_argv 2020-07-07 11:58:59 -05:00
utsname.c
utsname_sysctl.c
watch_queue.c Notifications over pipes + Keyring notifications 2020-06-13 09:56:21 -07:00
watchdog.c
watchdog_hld.c
workqueue.c maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault 2020-06-17 10:57:41 -07:00
workqueue_internal.h