Ubuntu-focal-kernel/kernel
Dario Faggioli 332ac17ef5 sched/deadline: Add bandwidth management for SCHED_DEADLINE tasks
In order of deadline scheduling to be effective and useful, it is
important that some method of having the allocation of the available
CPU bandwidth to tasks and task groups under control.
This is usually called "admission control" and if it is not performed
at all, no guarantee can be given on the actual scheduling of the
-deadline tasks.

Since when RT-throttling has been introduced each task group have a
bandwidth associated to itself, calculated as a certain amount of
runtime over a period. Moreover, to make it possible to manipulate
such bandwidth, readable/writable controls have been added to both
procfs (for system wide settings) and cgroupfs (for per-group
settings).

Therefore, the same interface is being used for controlling the
bandwidth distrubution to -deadline tasks and task groups, i.e.,
new controls but with similar names, equivalent meaning and with
the same usage paradigm are added.

However, more discussion is needed in order to figure out how
we want to manage SCHED_DEADLINE bandwidth at the task group level.
Therefore, this patch adds a less sophisticated, but actually
very sensible, mechanism to ensure that a certain utilization
cap is not overcome per each root_domain (the single rq for !SMP
configurations).

Another main difference between deadline bandwidth management and
RT-throttling is that -deadline tasks have bandwidth on their own
(while -rt ones doesn't!), and thus we don't need an higher level
throttling mechanism to enforce the desired bandwidth.

This patch, therefore:

 - adds system wide deadline bandwidth management by means of:
    * /proc/sys/kernel/sched_dl_runtime_us,
    * /proc/sys/kernel/sched_dl_period_us,
   that determine (i.e., runtime / period) the total bandwidth
   available on each CPU of each root_domain for -deadline tasks;

 - couples the RT and deadline bandwidth management, i.e., enforces
   that the sum of how much bandwidth is being devoted to -rt
   -deadline tasks to stay below 100%.

This means that, for a root_domain comprising M CPUs, -deadline tasks
can be created until the sum of their bandwidths stay below:

    M * (sched_dl_runtime_us / sched_dl_period_us)

It is also possible to disable this bandwidth management logic, and
be thus free of oversubscribing the system up to any arbitrary level.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-12-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 13:46:42 +01:00
..
cpu
debug
events perf: Disable all pmus on unthrottling and rescheduling 2013-12-17 15:04:00 +01:00
gcov gcov: reuse kbasename helper 2013-11-13 12:09:34 +09:00
irq Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-12-02 10:15:39 -08:00
locking sched/deadline: Add SCHED_DEADLINE inheritance logic 2014-01-13 13:42:56 +01:00
power PM / sleep: Fix memory leak in pm_vt_switch_unregister(). 2013-12-22 00:56:35 +01:00
printk printk.c: comments should refer to /proc/vmcore instead of /proc/vmcoreinfo 2013-11-13 12:09:14 +09:00
rcu NOHZ: Check for nohz active instead of nohz enabled 2013-11-19 14:59:50 +01:00
sched sched/deadline: Add bandwidth management for SCHED_DEADLINE tasks 2014-01-13 13:46:42 +01:00
time nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off 2013-11-29 12:23:03 +01:00
trace sched/deadline: Add SCHED_DEADLINE inheritance logic 2014-01-13 13:42:56 +01:00
.gitignore Ignore generated file kernel/x509_certificate_list 2013-12-10 18:21:34 +00:00
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks
Kconfig.preempt
Makefile KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y 2013-12-13 15:59:11 +00:00
acct.c
async.c
audit.c Merge git://git.infradead.org/users/eparis/audit 2013-11-21 19:18:14 -08:00
audit.h
audit_tree.c
audit_watch.c
auditfilter.c
auditsc.c
backtracetest.c
bounds.c mm: do not allocate page->ptl dynamically, if spinlock_t fits to long 2013-12-20 12:25:45 -08:00
capability.c
cgroup.c cgroup: don't recycle cgroup id until all csses' have been destroyed 2013-12-17 08:11:52 -05:00
cgroup_freezer.c
compat.c
configs.c
context_tracking.c
cpu.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:55:11 +09:00
cpu_pm.c
cpuset.c cpuset: Fix memory allocator deadlock 2013-11-27 13:52:47 -05:00
crash_dump.c
cred.c
delayacct.c kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk() 2013-11-13 12:09:12 +09:00
dma.c
elfcore.c
exec_domain.c
exit.c
extable.c kernel/extable: fix address-checks for core_kernel and init areas 2013-11-28 09:49:41 -08:00
fork.c sched/deadline: Add SCHED_DEADLINE inheritance logic 2014-01-13 13:42:56 +01:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex.c rtmutex: Turn the plist into an rb-tree 2014-01-13 13:41:50 +01:00
futex_compat.c
groups.c
hrtimer.c sched/deadline: Add SCHED_DEADLINE structures & implementation 2014-01-13 13:41:06 +01:00
hung_task.c Here are the 3.13 KVM changes. There was a lot of work on the PPC 2013-11-15 13:51:36 +09:00
irq_work.c
itimer.c
jump_label.c
kallsyms.c
kcmp.c
kexec.c kexec: migrate to reboot cpu 2013-12-18 19:04:50 -08:00
kmod.c
kprobes.c kprobes: use KSYM_NAME_LEN to size identifier buffers 2013-11-13 12:09:26 +09:00
ksysfs.c
kthread.c
latencytop.c
module-internal.h
module.c Mainly boring here, too. rmmod --wait finally removed, though. 2013-11-15 13:27:50 +09:00
module_signing.c
notifier.c
nsproxy.c
padata.c
panic.c kernel/panic.c: reduce 1 byte usage for print tainted buffer 2013-11-13 12:09:35 +09:00
params.c
pid.c
pid_namespace.c
posix-cpu-timers.c
posix-timers.c
profile.c
ptrace.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-13 12:09:33 +09:00
range.c
reboot.c kexec: migrate to reboot cpu 2013-12-18 19:04:50 -08:00
relay.c
res_counter.c
resource.c
seccomp.c
signal.c
smp.c kernel: fix generic_exec_single indentation 2013-11-15 09:32:22 +09:00
smpboot.c
smpboot.h
softirq.c revert "softirq: Add support for triggering softirq work on softirqs" 2013-11-15 09:32:22 +09:00
stacktrace.c
stop_machine.c
sys.c kernel/sys.c: remove obsolete #include <linux/kexec.h> 2013-11-13 12:09:13 +09:00
sys_ni.c
sysctl.c sched/deadline: Add bandwidth management for SCHED_DEADLINE tasks 2014-01-13 13:46:42 +01:00
sysctl_binary.c kernel/sysctl_binary.c: use scnprintf() instead of snprintf() 2013-11-13 12:09:33 +09:00
system_certificates.S KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
system_keyring.c KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
task_work.c
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c
time.c
timeconst.bc
timer.c timer: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...) 2013-11-19 14:59:50 +01:00
tracepoint.c
tsacct.c
uid16.c
up.c kernel: provide a __smp_call_function_single stub for !CONFIG_SMP 2013-11-15 09:32:22 +09:00
user-return-notifier.c
user.c KEYS: fix uninitialized persistent_keyring_register_sem 2013-12-13 15:59:11 +00:00
user_namespace.c
utsname.c
utsname_sysctl.c
watchdog.c
workqueue.c PCI updates for v3.13: 2013-12-15 11:45:27 -08:00
workqueue_internal.h