Scheduler updates for v6.17:

Core scheduler changes:
 
  - Better tracking of maximum lag of tasks in presence of different
    slices duration, for better handling of lag in the fair
    scheduler. (Vincent Guittot)
 
  - Clean up and standardize #if/#else/#endif markers throughout
    the entire scheduler code base (Ingo Molnar)
 
  - Make SMP unconditional: build the SMP scheduler's
    data structures and logic on UP kernel too, even though
    they are not used, to simplify the scheduler and remove
    around 200 #ifdef/[#else]/#endif blocks from the
    scheduler. (Ingo Molnar)
 
  - Reorganize cgroup bandwidth control interface handling
    for better interfacing with sched_ext (Tejun Heo)
 
 Balancing:
 
  - Bump sd->max_newidle_lb_cost when newidle balance fails (Chris Mason)
  - Remove sched_domain_topology_level::flags to simplify the code (Prateek Nayak)
  - Simplify and clean up build_sched_topology() (Li Chen)
  - Optimize build_sched_topology() on large machines (Li Chen)
 
 Real-time scheduling:
 
  - Add initial version of proxy execution: a mechanism for mutex-owning
    tasks to inherit the scheduling context of higher priority waiters.
    Currently limited to a single runqueue and conditional on CONFIG_EXPERT,
    and other limitations. (John Stultz, Peter Zijlstra, Valentin Schneider)
 
  - Deadline scheduler (Juri Lelli):
 
    - Fix dl_servers initialization order (Juri Lelli)
    - Fix DL scheduler's root domain reinitialization logic (Juri Lelli)
    - Fix accounting bugs after global limits change (Juri Lelli)
    - Fix scalability regression by implementing less agressive dl_server handling
      (Peter Zijlstra)
 
 PSI:
 
  - Improve scalability by optimizing psi_group_change() cpu_clock() usage
    (Peter Zijlstra)
 
 Rust changes:
 
  - Make Task, CondVar and PollCondVar methods inline to avoid unnecessary
    function calls (Kunwu Chan, Panagiotis Foliadis)
 
  - Add might_sleep() support for Rust code: Rust's "#[track_caller]"
    mechanism is used so that Rust's might_sleep() doesn't need to be
    defined as a macro (Fujita Tomonori)
 
  - Introduce file_from_location() (Boqun Feng)
 
 Debugging & instrumentation:
 
  - Make clangd usable with scheduler source code files again (Peter Zijlstra)
 
  - tools: Add root_domains_dump.py which dumps root domains info (Juri Lelli)
 
  - tools: Add dl_bw_dump.py for printing bandwidth accounting info (Juri Lelli)
 
 Misc cleanups & fixes:
 
  - Remove play_idle() (Feng Lee)
 
  - Fix check_preemption_disabled() (Sebastian Andrzej Siewior)
 
  - Do not call __put_task_struct() on RT if pi_blocked_on is set
    (Luis Claudio R. Goncalves)
 
  - Correct the comment in place_entity() (wang wei)
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmiHHNIRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g7DhAAg9aMW33PuC24A4hCS1XQay6j3rgmR5qC
 AOqDofj/CY4Q374HQtOl4m5CYZB/G5csRv6TZliWQKhAy9vr6VWddoyOMJYOAlAx
 XRurl1Z3MriOMD6DPgNvtHd5PrR5Un8ygALgT+32d0PRz27KNXORW5TyvEf2Bv4r
 BX4/GazlOlK0PdGUdZl0q/3dtkU4Wr5IifQzT8KbarOSBbNwZwVcg+83hLW5gJMx
 LgMGLaAATmiN7VuvJWNDATDfEOmOvQOu8veoS8TuP1AOVeJPfPT2JVh9Jen5V1/5
 3w1RUOkUI2mQX+cujWDW3koniSxjsA1OegXfHnFkF5BXp4q5e54k6D5sSh1xPFDX
 iDhkU5jsbKkkJS2ulD6Vi4bIAct3apMl4IrbJn/OYOLcUVI8WuunHs4UPPEuESAS
 TuQExKSdj4Ntrzo3pWEy8kX3/Z9VGa+WDzwsPUuBSvllB5Ir/jjKgvkxPA6zGsiY
 rbkmZT8qyI01IZ/GXqfI2AQYCGvgp+SOvFPi755ZlELTQS6sUkGZH2/2M5XnKA9t
 Z1wB2iwttoS1VQInx0HgiiAGrXrFkr7IzSIN2T+CfWIqilnL7+nTxzwlJtC206P4
 DB97bF6azDtJ6yh1LetRZ1ZMX/Gr56Cy0Z6USNoOu+a12PLqlPk9+fPBBpkuGcdy
 BRk8KgysEuk=
 =8T0v
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:
 "Core scheduler changes:

   - Better tracking of maximum lag of tasks in presence of different
     slices duration, for better handling of lag in the fair scheduler
     (Vincent Guittot)

   - Clean up and standardize #if/#else/#endif markers throughout the
     entire scheduler code base (Ingo Molnar)

   - Make SMP unconditional: build the SMP scheduler's data structures
     and logic on UP kernel too, even though they are not used, to
     simplify the scheduler and remove around 200 #ifdef/[#else]/#endif
     blocks from the scheduler (Ingo Molnar)

   - Reorganize cgroup bandwidth control interface handling for better
     interfacing with sched_ext (Tejun Heo)

  Balancing:

   - Bump sd->max_newidle_lb_cost when newidle balance fails (Chris
     Mason)

   - Remove sched_domain_topology_level::flags to simplify the code
     (Prateek Nayak)

   - Simplify and clean up build_sched_topology() (Li Chen)

   - Optimize build_sched_topology() on large machines (Li Chen)

  Real-time scheduling:

   - Add initial version of proxy execution: a mechanism for
     mutex-owning tasks to inherit the scheduling context of higher
     priority waiters.

     Currently limited to a single runqueue and conditional on
     CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra,
     Valentin Schneider)

   - Deadline scheduler (Juri Lelli):
      - Fix dl_servers initialization order (Juri Lelli)
      - Fix DL scheduler's root domain reinitialization logic (Juri
        Lelli)
      - Fix accounting bugs after global limits change (Juri Lelli)
      - Fix scalability regression by implementing less agressive
        dl_server handling (Peter Zijlstra)

  PSI:

   - Improve scalability by optimizing psi_group_change() cpu_clock()
     usage (Peter Zijlstra)

  Rust changes:

   - Make Task, CondVar and PollCondVar methods inline to avoid
     unnecessary function calls (Kunwu Chan, Panagiotis Foliadis)

   - Add might_sleep() support for Rust code: Rust's "#[track_caller]"
     mechanism is used so that Rust's might_sleep() doesn't need to be
     defined as a macro (Fujita Tomonori)

   - Introduce file_from_location() (Boqun Feng)

  Debugging & instrumentation:

   - Make clangd usable with scheduler source code files again (Peter
     Zijlstra)

   - tools: Add root_domains_dump.py which dumps root domains info (Juri
     Lelli)

   - tools: Add dl_bw_dump.py for printing bandwidth accounting info
     (Juri Lelli)

  Misc cleanups & fixes:

   - Remove play_idle() (Feng Lee)

   - Fix check_preemption_disabled() (Sebastian Andrzej Siewior)

   - Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis
     Claudio R. Goncalves)

   - Correct the comment in place_entity() (wang wei)"

* tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
  sched/idle: Remove play_idle()
  sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
  sched: Start blocked_on chain processing in find_proxy_task()
  sched: Fix proxy/current (push,pull)ability
  sched: Add an initial sketch of the find_proxy_task() function
  sched: Fix runtime accounting w/ split exec & sched contexts
  sched: Move update_curr_task logic into update_curr_se
  locking/mutex: Add p->blocked_on wrappers for correctness checks
  locking/mutex: Rework task_struct::blocked_on
  sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
  sched/topology: Remove sched_domain_topology_level::flags
  x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
  x86/smpboot: moves x86_topology to static initialize and truncate
  x86/smpboot: remove redundant CONFIG_SCHED_SMT
  smpboot: introduce SDTL_INIT() helper to tidy sched topology setup
  tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
  tools/sched: Add root_domains_dump.py which dumps root domains info
  sched/deadline: Fix accounting after global limits change
  sched/deadline: Reset extra_bw to max_bw when clearing root domains
  sched/deadline: Initialize dl_servers after SMP
  ...
This commit is contained in:
Linus Torvalds 2025-07-29 17:42:52 -07:00
commit bf76f23aa1
68 changed files with 1473 additions and 1464 deletions

View File

@ -6410,6 +6410,11 @@
sa1100ir [NET]
See drivers/net/irda/sa1100_ir.c.
sched_proxy_exec= [KNL]
Enables or disables "proxy execution" style
solution to mutex-based priority inversion.
Format: <bool>
sched_verbose [KNL,EARLY] Enables verbose scheduler debug messages.
schedstats= [KNL,X86] Enable or disable scheduled statistics.

View File

@ -22319,6 +22319,7 @@ F: include/linux/wait.h
F: include/uapi/linux/sched.h
F: kernel/fork.c
F: kernel/sched/
F: tools/sched/
SCHEDULER - SCHED_EXT
R: Tejun Heo <tj@kernel.org>

View File

@ -1700,28 +1700,23 @@ static void __init build_sched_topology(void)
#ifdef CONFIG_SCHED_SMT
if (has_big_cores) {
pr_info("Big cores detected but using small core scheduling\n");
powerpc_topology[i++] = (struct sched_domain_topology_level){
smallcore_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT)
};
powerpc_topology[i++] =
SDTL_INIT(smallcore_smt_mask, powerpc_smt_flags, SMT);
} else {
powerpc_topology[i++] = (struct sched_domain_topology_level){
cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT)
};
powerpc_topology[i++] = SDTL_INIT(cpu_smt_mask, powerpc_smt_flags, SMT);
}
#endif
if (shared_caches) {
powerpc_topology[i++] = (struct sched_domain_topology_level){
shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE)
};
powerpc_topology[i++] =
SDTL_INIT(shared_cache_mask, powerpc_shared_cache_flags, CACHE);
}
if (has_coregroup_support()) {
powerpc_topology[i++] = (struct sched_domain_topology_level){
cpu_mc_mask, powerpc_shared_proc_flags, SD_INIT_NAME(MC)
};
powerpc_topology[i++] =
SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC);
}
powerpc_topology[i++] = (struct sched_domain_topology_level){
cpu_cpu_mask, powerpc_shared_proc_flags, SD_INIT_NAME(PKG)
};
powerpc_topology[i++] = SDTL_INIT(cpu_cpu_mask, powerpc_shared_proc_flags, PKG);
/* There must be one trailing NULL entry left. */
BUG_ON(i >= ARRAY_SIZE(powerpc_topology) - 1);

View File

@ -531,11 +531,11 @@ static const struct cpumask *cpu_drawer_mask(int cpu)
}
static struct sched_domain_topology_level s390_topology[] = {
{ cpu_thread_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
{ cpu_book_mask, SD_INIT_NAME(BOOK) },
{ cpu_drawer_mask, SD_INIT_NAME(DRAWER) },
{ cpu_cpu_mask, SD_INIT_NAME(PKG) },
SDTL_INIT(cpu_thread_mask, cpu_smt_flags, SMT),
SDTL_INIT(cpu_coregroup_mask, cpu_core_flags, MC),
SDTL_INIT(cpu_book_mask, NULL, BOOK),
SDTL_INIT(cpu_drawer_mask, NULL, DRAWER),
SDTL_INIT(cpu_cpu_mask, NULL, PKG),
{ NULL, },
};

View File

@ -478,44 +478,41 @@ static int x86_cluster_flags(void)
*/
static bool x86_has_numa_in_package;
static struct sched_domain_topology_level x86_topology[6];
static struct sched_domain_topology_level x86_topology[] = {
SDTL_INIT(cpu_smt_mask, cpu_smt_flags, SMT),
#ifdef CONFIG_SCHED_CLUSTER
SDTL_INIT(cpu_clustergroup_mask, x86_cluster_flags, CLS),
#endif
#ifdef CONFIG_SCHED_MC
SDTL_INIT(cpu_coregroup_mask, x86_core_flags, MC),
#endif
SDTL_INIT(cpu_cpu_mask, x86_sched_itmt_flags, PKG),
{ NULL },
};
static void __init build_sched_topology(void)
{
int i = 0;
struct sched_domain_topology_level *topology = x86_topology;
#ifdef CONFIG_SCHED_SMT
x86_topology[i++] = (struct sched_domain_topology_level){
cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT)
};
#endif
#ifdef CONFIG_SCHED_CLUSTER
x86_topology[i++] = (struct sched_domain_topology_level){
cpu_clustergroup_mask, x86_cluster_flags, SD_INIT_NAME(CLS)
};
#endif
#ifdef CONFIG_SCHED_MC
x86_topology[i++] = (struct sched_domain_topology_level){
cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC)
};
#endif
/*
* When there is NUMA topology inside the package skip the PKG domain
* since the NUMA domains will auto-magically create the right spanning
* domains based on the SLIT.
* When there is NUMA topology inside the package invalidate the
* PKG domain since the NUMA domains will auto-magically create the
* right spanning domains based on the SLIT.
*/
if (!x86_has_numa_in_package) {
x86_topology[i++] = (struct sched_domain_topology_level){
cpu_cpu_mask, x86_sched_itmt_flags, SD_INIT_NAME(PKG)
};
if (x86_has_numa_in_package) {
unsigned int pkgdom = ARRAY_SIZE(x86_topology) - 2;
memset(&x86_topology[pkgdom], 0, sizeof(x86_topology[pkgdom]));
}
/*
* There must be one trailing NULL entry left.
* Drop the SMT domains if there is only one thread per-core
* since it'll get degenerated by the scheduler anyways.
*/
BUG_ON(i >= ARRAY_SIZE(x86_topology)-1);
if (cpu_smt_num_threads <= 1)
++topology;
set_sched_topology(x86_topology);
set_sched_topology(topology);
}
void set_cpu_sibling_map(int cpu)

View File

@ -187,11 +187,6 @@ static inline void arch_cpu_finalize_init(void) { }
void play_idle_precise(u64 duration_ns, u64 latency_ns);
static inline void play_idle(unsigned long duration_us)
{
play_idle_precise(duration_us * NSEC_PER_USEC, U64_MAX);
}
#ifdef CONFIG_HOTPLUG_CPU
void cpuhp_report_idle_dead(void);
#else

View File

@ -369,8 +369,6 @@ static inline void preempt_notifier_init(struct preempt_notifier *notifier,
#endif
#ifdef CONFIG_SMP
/*
* Migrate-Disable and why it is undesired.
*
@ -429,13 +427,6 @@ static inline void preempt_notifier_init(struct preempt_notifier *notifier,
extern void migrate_disable(void);
extern void migrate_enable(void);
#else
static inline void migrate_disable(void) { }
static inline void migrate_enable(void) { }
#endif /* CONFIG_SMP */
/**
* preempt_disable_nested - Disable preemption inside a normally preempt disabled section
*

View File

@ -84,11 +84,9 @@ enum psi_aggregators {
struct psi_group_cpu {
/* 1st cacheline updated by the scheduler */
/* Aggregator needs to know of concurrent changes */
seqcount_t seq ____cacheline_aligned_in_smp;
/* States of the tasks belonging to this group */
unsigned int tasks[NR_PSI_TASK_COUNTS];
unsigned int tasks[NR_PSI_TASK_COUNTS]
____cacheline_aligned_in_smp;
/* Aggregate pressure state derived from the tasks */
u32 state_mask;

View File

@ -34,6 +34,7 @@
#include <linux/sched/prio.h>
#include <linux/sched/types.h>
#include <linux/signal_types.h>
#include <linux/spinlock.h>
#include <linux/syscall_user_dispatch_types.h>
#include <linux/mm_types_task.h>
#include <linux/netdevice_xmit.h>
@ -395,15 +396,10 @@ enum uclamp_id {
UCLAMP_CNT
};
#ifdef CONFIG_SMP
extern struct root_domain def_root_domain;
extern struct mutex sched_domains_mutex;
extern void sched_domains_mutex_lock(void);
extern void sched_domains_mutex_unlock(void);
#else
static inline void sched_domains_mutex_lock(void) { }
static inline void sched_domains_mutex_unlock(void) { }
#endif
struct sched_param {
int sched_priority;
@ -584,7 +580,15 @@ struct sched_entity {
u64 sum_exec_runtime;
u64 prev_sum_exec_runtime;
u64 vruntime;
s64 vlag;
union {
/*
* When !@on_rq this field is vlag.
* When cfs_rq->curr == se (which implies @on_rq)
* this field is vprot. See protect_slice().
*/
s64 vlag;
u64 vprot;
};
u64 slice;
u64 nr_migrations;
@ -600,7 +604,6 @@ struct sched_entity {
unsigned long runnable_weight;
#endif
#ifdef CONFIG_SMP
/*
* Per entity load average tracking.
*
@ -608,7 +611,6 @@ struct sched_entity {
* collide with read-mostly values above.
*/
struct sched_avg avg;
#endif
};
struct sched_rt_entity {
@ -701,6 +703,7 @@ struct sched_dl_entity {
unsigned int dl_defer : 1;
unsigned int dl_defer_armed : 1;
unsigned int dl_defer_running : 1;
unsigned int dl_server_idle : 1;
/*
* Bandwidth enforcement timer. Each -deadline task has its
@ -838,7 +841,6 @@ struct task_struct {
struct alloc_tag *alloc_tag;
#endif
#ifdef CONFIG_SMP
int on_cpu;
struct __call_single_node wake_entry;
unsigned int wakee_flips;
@ -854,7 +856,6 @@ struct task_struct {
*/
int recent_used_cpu;
int wake_cpu;
#endif
int on_rq;
int prio;
@ -913,9 +914,7 @@ struct task_struct {
cpumask_t *user_cpus_ptr;
cpumask_t cpus_mask;
void *migration_pending;
#ifdef CONFIG_SMP
unsigned short migration_disabled;
#endif
unsigned short migration_flags;
#ifdef CONFIG_PREEMPT_RCU
@ -947,10 +946,8 @@ struct task_struct {
struct sched_info sched_info;
struct list_head tasks;
#ifdef CONFIG_SMP
struct plist_node pushable_tasks;
struct rb_node pushable_dl_tasks;
#endif
struct mm_struct *mm;
struct mm_struct *active_mm;
@ -1234,10 +1231,7 @@ struct task_struct {
struct rt_mutex_waiter *pi_blocked_on;
#endif
#ifdef CONFIG_DEBUG_MUTEXES
/* Mutex deadlock detection: */
struct mutex_waiter *blocked_on;
#endif
struct mutex *blocked_on; /* lock we're blocked on */
#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
/*
@ -1662,6 +1656,19 @@ struct task_struct {
randomized_struct_fields_end
} __attribute__ ((aligned (64)));
#ifdef CONFIG_SCHED_PROXY_EXEC
DECLARE_STATIC_KEY_TRUE(__sched_proxy_exec);
static inline bool sched_proxy_exec(void)
{
return static_branch_likely(&__sched_proxy_exec);
}
#else
static inline bool sched_proxy_exec(void)
{
return false;
}
#endif
#define TASK_REPORT_IDLE (TASK_REPORT + 1)
#define TASK_REPORT_MAX (TASK_REPORT_IDLE << 1)
@ -1776,12 +1783,8 @@ extern struct pid *cad_pid;
static __always_inline bool is_percpu_thread(void)
{
#ifdef CONFIG_SMP
return (current->flags & PF_NO_SETAFFINITY) &&
(current->nr_cpus_allowed == 1);
#else
return true;
#endif
}
/* Per-process atomic flags. */
@ -1846,7 +1849,6 @@ extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpu
extern int task_can_attach(struct task_struct *p);
extern int dl_bw_alloc(int cpu, u64 dl_bw);
extern void dl_bw_free(int cpu, u64 dl_bw);
#ifdef CONFIG_SMP
/* do_set_cpus_allowed() - consider using set_cpus_allowed_ptr() instead */
extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
@ -1864,33 +1866,6 @@ extern void release_user_cpus_ptr(struct task_struct *p);
extern int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask);
extern void force_compatible_cpus_allowed_ptr(struct task_struct *p);
extern void relax_compatible_cpus_allowed_ptr(struct task_struct *p);
#else
static inline void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
{
}
static inline int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
{
/* Opencoded cpumask_test_cpu(0, new_mask) to avoid dependency on cpumask.h */
if ((*cpumask_bits(new_mask) & 1) == 0)
return -EINVAL;
return 0;
}
static inline int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, int node)
{
if (src->user_cpus_ptr)
return -EINVAL;
return 0;
}
static inline void release_user_cpus_ptr(struct task_struct *p)
{
WARN_ON(p->user_cpus_ptr);
}
static inline int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask)
{
return 0;
}
#endif
extern int yield_to(struct task_struct *p, bool preempt);
extern void set_user_nice(struct task_struct *p, long nice);
@ -1979,11 +1954,7 @@ extern int wake_up_state(struct task_struct *tsk, unsigned int state);
extern int wake_up_process(struct task_struct *tsk);
extern void wake_up_new_task(struct task_struct *tsk);
#ifdef CONFIG_SMP
extern void kick_process(struct task_struct *tsk);
#else
static inline void kick_process(struct task_struct *tsk) { }
#endif
extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec);
#define set_task_comm(tsk, from) ({ \
@ -2010,7 +1981,6 @@ extern void __set_task_comm(struct task_struct *tsk, const char *from, bool exec
buf; \
})
#ifdef CONFIG_SMP
static __always_inline void scheduler_ipi(void)
{
/*
@ -2020,9 +1990,6 @@ static __always_inline void scheduler_ipi(void)
*/
preempt_fold_need_resched();
}
#else
static inline void scheduler_ipi(void) { }
#endif
extern unsigned long wait_task_inactive(struct task_struct *, unsigned int match_state);
@ -2165,6 +2132,67 @@ extern int __cond_resched_rwlock_write(rwlock_t *lock);
__cond_resched_rwlock_write(lock); \
})
#ifndef CONFIG_PREEMPT_RT
static inline struct mutex *__get_task_blocked_on(struct task_struct *p)
{
struct mutex *m = p->blocked_on;
if (m)
lockdep_assert_held_once(&m->wait_lock);
return m;
}
static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m)
{
WARN_ON_ONCE(!m);
/* The task should only be setting itself as blocked */
WARN_ON_ONCE(p != current);
/* Currently we serialize blocked_on under the mutex::wait_lock */
lockdep_assert_held_once(&m->wait_lock);
/*
* Check ensure we don't overwrite existing mutex value
* with a different mutex. Note, setting it to the same
* lock repeatedly is ok.
*/
WARN_ON_ONCE(p->blocked_on && p->blocked_on != m);
p->blocked_on = m;
}
static inline void set_task_blocked_on(struct task_struct *p, struct mutex *m)
{
guard(raw_spinlock_irqsave)(&m->wait_lock);
__set_task_blocked_on(p, m);
}
static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
{
WARN_ON_ONCE(!m);
/* Currently we serialize blocked_on under the mutex::wait_lock */
lockdep_assert_held_once(&m->wait_lock);
/*
* There may be cases where we re-clear already cleared
* blocked_on relationships, but make sure we are not
* clearing the relationship with a different lock.
*/
WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
p->blocked_on = NULL;
}
static inline void clear_task_blocked_on(struct task_struct *p, struct mutex *m)
{
guard(raw_spinlock_irqsave)(&m->wait_lock);
__clear_task_blocked_on(p, m);
}
#else
static inline void __clear_task_blocked_on(struct task_struct *p, struct rt_mutex *m)
{
}
static inline void clear_task_blocked_on(struct task_struct *p, struct rt_mutex *m)
{
}
#endif /* !CONFIG_PREEMPT_RT */
static __always_inline bool need_resched(void)
{
return unlikely(tif_need_resched());
@ -2204,8 +2232,6 @@ extern bool sched_task_on_rq(struct task_struct *p);
extern unsigned long get_wchan(struct task_struct *p);
extern struct task_struct *cpu_curr_snapshot(int cpu);
#include <linux/spinlock.h>
/*
* In order to reduce various lock holder preemption latencies provide an
* interface to see if a vCPU is currently running or not.
@ -2228,7 +2254,6 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
#define TASK_SIZE_OF(tsk) TASK_SIZE
#endif
#ifdef CONFIG_SMP
static inline bool owner_on_cpu(struct task_struct *owner)
{
/*
@ -2240,7 +2265,6 @@ static inline bool owner_on_cpu(struct task_struct *owner)
/* Returns effective CPU energy utilization, as seen by the scheduler */
unsigned long sched_cpu_util(int cpu);
#endif /* CONFIG_SMP */
#ifdef CONFIG_SCHED_CORE
extern void sched_core_free(struct task_struct *tsk);

View File

@ -29,15 +29,11 @@ static inline bool dl_time_before(u64 a, u64 b)
return (s64)(a - b) < 0;
}
#ifdef CONFIG_SMP
struct root_domain;
extern void dl_add_task_root_domain(struct task_struct *p);
extern void dl_clear_root_domain(struct root_domain *rd);
extern void dl_clear_root_domain_cpu(int cpu);
#endif /* CONFIG_SMP */
extern u64 dl_cookie;
extern bool dl_bw_visited(int cpu, u64 cookie);

View File

@ -11,11 +11,7 @@ enum cpu_idle_type {
CPU_MAX_IDLE_TYPES
};
#ifdef CONFIG_SMP
extern void wake_up_if_idle(int cpu);
#else
static inline void wake_up_if_idle(int cpu) { }
#endif
/*
* Idle thread specific functions to determine the need_resched

View File

@ -6,7 +6,7 @@
* This is the interface between the scheduler and nohz/dynticks:
*/
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
#ifdef CONFIG_NO_HZ_COMMON
extern void nohz_balance_enter_idle(int cpu);
extern int get_nohz_timer_target(void);
#else
@ -23,7 +23,7 @@ static inline void calc_load_nohz_remote(struct rq *rq) { }
static inline void calc_load_nohz_stop(void) { }
#endif /* CONFIG_NO_HZ_COMMON */
#if defined(CONFIG_NO_HZ_COMMON) && defined(CONFIG_SMP)
#ifdef CONFIG_NO_HZ_COMMON
extern void wake_up_nohz_cpu(int cpu);
#else
static inline void wake_up_nohz_cpu(int cpu) { }

View File

@ -153,14 +153,6 @@ SD_FLAG(SD_ASYM_PACKING, SDF_NEEDS_GROUPS)
*/
SD_FLAG(SD_PREFER_SIBLING, SDF_NEEDS_GROUPS)
/*
* sched_groups of this level overlap
*
* SHARED_PARENT: Set for all NUMA levels above NODE.
* NEEDS_GROUPS: Overlaps can only exist with more than one group.
*/
SD_FLAG(SD_OVERLAP, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
/*
* Cross-node balancing
*

View File

@ -109,11 +109,7 @@ int kernel_wait(pid_t pid, int *stat);
extern void free_task(struct task_struct *tsk);
/* sched_exec is called by processes performing an exec */
#ifdef CONFIG_SMP
extern void sched_exec(void);
#else
#define sched_exec() {}
#endif
static inline struct task_struct *get_task_struct(struct task_struct *t)
{
@ -135,24 +131,17 @@ static inline void put_task_struct(struct task_struct *t)
return;
/*
* In !RT, it is always safe to call __put_task_struct().
* Under RT, we can only call it in preemptible context.
*/
if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
lock_map_acquire_try(&put_task_map);
__put_task_struct(t);
lock_map_release(&put_task_map);
return;
}
/*
* under PREEMPT_RT, we can't call put_task_struct
* Under PREEMPT_RT, we can't call __put_task_struct
* in atomic context because it will indirectly
* acquire sleeping locks.
* acquire sleeping locks. The same is true if the
* current process has a mutex enqueued (blocked on
* a PI chain).
*
* call_rcu() will schedule delayed_put_task_struct_rcu()
* In !RT, it is always safe to call __put_task_struct().
* Though, in order to simplify the code, resort to the
* deferred call too.
*
* call_rcu() will schedule __put_task_struct_rcu_cb()
* to be called in process context.
*
* __put_task_struct() is called when
@ -165,7 +154,7 @@ static inline void put_task_struct(struct task_struct *t)
*
* delayed_free_task() also uses ->rcu, but it is only called
* when it fails to fork a process. Therefore, there is no
* way it can conflict with put_task_struct().
* way it can conflict with __put_task_struct().
*/
call_rcu(&t->rcu, __put_task_struct_rcu_cb);
}

View File

@ -9,7 +9,6 @@
/*
* sched-domains (multiprocessor balancing) declarations:
*/
#ifdef CONFIG_SMP
/* Generate SD flag indexes */
#define SD_FLAG(name, mflags) __##name,
@ -176,8 +175,6 @@ bool cpus_share_resources(int this_cpu, int that_cpu);
typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
typedef int (*sched_domain_flags_f)(void);
#define SDTL_OVERLAP 0x01
struct sd_data {
struct sched_domain *__percpu *sd;
struct sched_domain_shared *__percpu *sds;
@ -188,7 +185,6 @@ struct sd_data {
struct sched_domain_topology_level {
sched_domain_mask_f mask;
sched_domain_flags_f sd_flags;
int flags;
int numa_level;
struct sd_data data;
char *name;
@ -197,39 +193,8 @@ struct sched_domain_topology_level {
extern void __init set_sched_topology(struct sched_domain_topology_level *tl);
extern void sched_update_asym_prefer_cpu(int cpu, int old_prio, int new_prio);
# define SD_INIT_NAME(type) .name = #type
#else /* CONFIG_SMP */
struct sched_domain_attr;
static inline void
partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
struct sched_domain_attr *dattr_new)
{
}
static inline bool cpus_equal_capacity(int this_cpu, int that_cpu)
{
return true;
}
static inline bool cpus_share_cache(int this_cpu, int that_cpu)
{
return true;
}
static inline bool cpus_share_resources(int this_cpu, int that_cpu)
{
return true;
}
static inline void sched_update_asym_prefer_cpu(int cpu, int old_prio, int new_prio)
{
}
#endif /* !CONFIG_SMP */
#define SDTL_INIT(maskfn, flagsfn, dname) ((struct sched_domain_topology_level) \
{ .mask = maskfn, .sd_flags = flagsfn, .name = #dname })
#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
extern void rebuild_sched_domains_energy(void);

View File

@ -142,6 +142,9 @@ config RUSTC_HAS_SPAN_FILE
config RUSTC_HAS_UNNECESSARY_TRANSMUTES
def_bool RUSTC_VERSION >= 108800
config RUSTC_HAS_FILE_WITH_NUL
def_bool RUSTC_VERSION >= 108900
config PAHOLE_VERSION
int
default $(shell,$(srctree)/scripts/pahole-version.sh $(PAHOLE))
@ -875,6 +878,18 @@ config UCLAMP_BUCKETS_COUNT
If in doubt, use the default value.
config SCHED_PROXY_EXEC
bool "Proxy Execution"
# Avoid some build failures w/ PREEMPT_RT until it can be fixed
depends on !PREEMPT_RT
# Need to investigate how to inform sched_ext of split contexts
depends on !SCHED_CLASS_EXT
# Not particularly useful until we get to multi-rq proxying
depends on EXPERT
help
This option enables proxy execution, a mechanism for mutex-owning
tasks to inherit the scheduling context of higher priority waiters.
endmenu
#

View File

@ -2127,9 +2127,8 @@ __latent_entropy struct task_struct *copy_process(
lockdep_init_task(p);
#endif
#ifdef CONFIG_DEBUG_MUTEXES
p->blocked_on = NULL; /* not blocked yet */
#endif
#ifdef CONFIG_BCACHE
p->sequential_io = 0;
p->sequential_io_avg = 0;

View File

@ -53,17 +53,18 @@ void debug_mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
{
lockdep_assert_held(&lock->wait_lock);
/* Mark the current thread as blocked on the lock: */
task->blocked_on = waiter;
/* Current thread can't be already blocked (since it's executing!) */
DEBUG_LOCKS_WARN_ON(__get_task_blocked_on(task));
}
void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter,
struct task_struct *task)
{
struct mutex *blocked_on = __get_task_blocked_on(task);
DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list));
DEBUG_LOCKS_WARN_ON(waiter->task != task);
DEBUG_LOCKS_WARN_ON(task->blocked_on != waiter);
task->blocked_on = NULL;
DEBUG_LOCKS_WARN_ON(blocked_on && blocked_on != lock);
INIT_LIST_HEAD(&waiter->list);
waiter->task = NULL;

View File

@ -644,6 +644,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
goto err_early_kill;
}
__set_task_blocked_on(current, lock);
set_current_state(state);
trace_contention_begin(lock, LCB_F_MUTEX);
for (;;) {
@ -680,6 +681,12 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
first = __mutex_waiter_is_first(lock, &waiter);
/*
* As we likely have been woken up by task
* that has cleared our blocked_on state, re-set
* it to the lock we are trying to acquire.
*/
set_task_blocked_on(current, lock);
set_current_state(state);
/*
* Here we order against unlock; we must either see it change
@ -691,8 +698,15 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
if (first) {
trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
/*
* mutex_optimistic_spin() can call schedule(), so
* clear blocked on so we don't become unselectable
* to run.
*/
clear_task_blocked_on(current, lock);
if (mutex_optimistic_spin(lock, ww_ctx, &waiter))
break;
set_task_blocked_on(current, lock);
trace_contention_begin(lock, LCB_F_MUTEX);
}
@ -700,6 +714,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
}
raw_spin_lock_irqsave(&lock->wait_lock, flags);
acquired:
__clear_task_blocked_on(current, lock);
__set_current_state(TASK_RUNNING);
if (ww_ctx) {
@ -729,9 +744,11 @@ skip_wait:
return 0;
err:
__clear_task_blocked_on(current, lock);
__set_current_state(TASK_RUNNING);
__mutex_remove_waiter(lock, &waiter);
err_early_kill:
WARN_ON(__get_task_blocked_on(current));
trace_contention_end(lock, ret);
raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
debug_mutex_free_waiter(&waiter);
@ -942,6 +959,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
next = waiter->task;
debug_mutex_wake_waiter(lock, waiter);
__clear_task_blocked_on(next, lock);
wake_q_add(&wake_q, next);
}

View File

@ -6,7 +6,7 @@
*
* Copyright (C) 2004, 2005, 2006 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
*/
#ifndef CONFIG_PREEMPT_RT
/*
* This is the control structure for tasks blocked on mutex, which resides
* on the blocked task's kernel stack:
@ -70,3 +70,4 @@ extern void debug_mutex_init(struct mutex *lock, const char *name,
# define debug_mutex_unlock(lock) do { } while (0)
# define debug_mutex_init(lock, name, key) do { } while (0)
#endif /* !CONFIG_DEBUG_MUTEXES */
#endif /* CONFIG_PREEMPT_RT */

View File

@ -284,6 +284,12 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
#ifndef WW_RT
debug_mutex_wake_waiter(lock, waiter);
#endif
/*
* When waking up the task to die, be sure to clear the
* blocked_on pointer. Otherwise we can see circular
* blocked_on relationships that can't resolve.
*/
__clear_task_blocked_on(waiter->task, lock);
wake_q_add(wake_q, waiter->task);
}
@ -331,9 +337,15 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
* it's wounded in __ww_mutex_check_kill() or has a
* wakeup pending to re-read the wounded state.
*/
if (owner != current)
if (owner != current) {
/*
* When waking up the task to wound, be sure to clear the
* blocked_on pointer. Otherwise we can see circular
* blocked_on relationships that can't resolve.
*/
__clear_task_blocked_on(owner, lock);
wake_q_add(wake_q, owner);
}
return true;
}

View File

@ -4,6 +4,9 @@
* Auto-group scheduling implementation:
*/
#include "autogroup.h"
#include "sched.h"
unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1;
static struct autogroup autogroup_default;
static atomic_t autogroup_seq_nr;
@ -25,9 +28,9 @@ static void __init sched_autogroup_sysctl_init(void)
{
register_sysctl_init("kernel", sched_autogroup_sysctls);
}
#else
#else /* !CONFIG_SYSCTL: */
#define sched_autogroup_sysctl_init() do { } while (0)
#endif
#endif /* !CONFIG_SYSCTL */
void __init autogroup_init(struct task_struct *init_task)
{
@ -108,7 +111,7 @@ static inline struct autogroup *autogroup_create(void)
free_rt_sched_group(tg);
tg->rt_se = root_task_group.rt_se;
tg->rt_rq = root_task_group.rt_rq;
#endif
#endif /* CONFIG_RT_GROUP_SCHED */
tg->autogroup = ag;
sched_online_group(tg, &root_task_group);

View File

@ -2,6 +2,8 @@
#ifndef _KERNEL_SCHED_AUTOGROUP_H
#define _KERNEL_SCHED_AUTOGROUP_H
#include "sched.h"
#ifdef CONFIG_SCHED_AUTOGROUP
struct autogroup {
@ -41,7 +43,7 @@ autogroup_task_group(struct task_struct *p, struct task_group *tg)
extern int autogroup_path(struct task_group *tg, char *buf, int buflen);
#else /* !CONFIG_SCHED_AUTOGROUP */
#else /* !CONFIG_SCHED_AUTOGROUP: */
static inline void autogroup_init(struct task_struct *init_task) { }
static inline void autogroup_free(struct task_group *tg) { }
@ -61,6 +63,6 @@ static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
return 0;
}
#endif /* CONFIG_SCHED_AUTOGROUP */
#endif /* !CONFIG_SCHED_AUTOGROUP */
#endif /* _KERNEL_SCHED_AUTOGROUP_H */

View File

@ -50,11 +50,9 @@
#include "idle.c"
#include "rt.c"
#include "cpudeadline.c"
#ifdef CONFIG_SMP
# include "cpudeadline.c"
# include "pelt.c"
#endif
#include "pelt.c"
#include "cputime.c"
#include "deadline.c"

View File

@ -80,11 +80,10 @@
#include "wait_bit.c"
#include "wait.c"
#ifdef CONFIG_SMP
# include "cpupri.c"
# include "stop_task.c"
# include "topology.c"
#endif
#include "cpupri.c"
#include "stop_task.c"
#include "topology.c"
#ifdef CONFIG_SCHED_CORE
# include "core_sched.c"

View File

@ -54,6 +54,9 @@
*
*/
#include <linux/sched/clock.h>
#include "sched.h"
/*
* Scheduler clock - returns current time in nanosec units.
* This is default implementation.
@ -471,7 +474,7 @@ notrace void sched_clock_idle_wakeup_event(void)
}
EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
#else /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */
#else /* !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK: */
void __init sched_clock_init(void)
{
@ -489,7 +492,7 @@ notrace u64 sched_clock_cpu(int cpu)
return sched_clock();
}
#endif /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */
#endif /* !CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */
/*
* Running clock - returns the time that has elapsed while a guest has been

View File

@ -13,6 +13,11 @@
* Waiting for completion is a typically sync point, but not an exclusion point.
*/
#include <linux/linkage.h>
#include <linux/sched/debug.h>
#include <linux/completion.h>
#include "sched.h"
static void complete_with_flags(struct completion *x, int wake_flags)
{
unsigned long flags;

File diff suppressed because it is too large Load Diff

View File

@ -4,6 +4,8 @@
* A simple wrapper around refcount. An allocated sched_core_cookie's
* address is used to compute the cookie of the task.
*/
#include "sched.h"
struct sched_core_cookie {
refcount_t refcnt;
};

View File

@ -6,6 +6,8 @@
* Based on the work by Paul Menage (menage@google.com) and Balbir Singh
* (balbir@in.ibm.com).
*/
#include <linux/sched/cputime.h>
#include "sched.h"
/* Time spent by the tasks of the CPU accounting group executing in ... */
enum cpuacct_stat_index {

View File

@ -6,6 +6,7 @@
*
* Author: Juri Lelli <j.lelli@sssup.it>
*/
#include "sched.h"
static inline int parent(int i)
{

View File

@ -1,4 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/types.h>
#include <linux/spinlock.h>
#define IDX_INVALID -1
@ -15,7 +17,6 @@ struct cpudl {
struct cpudl_item *elements;
};
#ifdef CONFIG_SMP
int cpudl_find(struct cpudl *cp, struct task_struct *p, struct cpumask *later_mask);
void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
void cpudl_clear(struct cpudl *cp, int cpu);
@ -23,4 +24,3 @@ int cpudl_init(struct cpudl *cp);
void cpudl_set_freecpu(struct cpudl *cp, int cpu);
void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
void cpudl_cleanup(struct cpudl *cp);
#endif /* CONFIG_SMP */

View File

@ -5,6 +5,7 @@
* Copyright (C) 2016, Intel Corporation
* Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*/
#include "sched.h"
DEFINE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);

View File

@ -5,6 +5,8 @@
* Copyright (C) 2016, Intel Corporation
* Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
*/
#include <uapi/linux/sched/types.h>
#include "sched.h"
#define IOWAIT_BOOST_MIN (SCHED_CAPACITY_SCALE / 8)
@ -380,9 +382,9 @@ static bool sugov_hold_freq(struct sugov_cpu *sg_cpu)
sg_cpu->saved_idle_calls = idle_calls;
return ret;
}
#else
#else /* !CONFIG_NO_HZ_COMMON: */
static inline bool sugov_hold_freq(struct sugov_cpu *sg_cpu) { return false; }
#endif /* CONFIG_NO_HZ_COMMON */
#endif /* !CONFIG_NO_HZ_COMMON */
/*
* Make sugov_should_update_freq() ignore the rate limit when DL

View File

@ -22,6 +22,7 @@
* worst case complexity of O(min(101, nr_domcpus)), though the scenario that
* yields the worst case search is fairly contrived.
*/
#include "sched.h"
/*
* p->rt_priority p->prio newpri cpupri

View File

@ -1,4 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <linux/atomic.h>
#include <linux/cpumask.h>
#include <linux/sched/rt.h>
#define CPUPRI_NR_PRIORITIES (MAX_RT_PRIO+1)
@ -17,7 +20,6 @@ struct cpupri {
int *cpu_to_pri;
};
#ifdef CONFIG_SMP
int cpupri_find(struct cpupri *cp, struct task_struct *p,
struct cpumask *lowest_mask);
int cpupri_find_fitness(struct cpupri *cp, struct task_struct *p,
@ -26,4 +28,3 @@ int cpupri_find_fitness(struct cpupri *cp, struct task_struct *p,
void cpupri_set(struct cpupri *cp, int cpu, int pri);
int cpupri_init(struct cpupri *cp);
void cpupri_cleanup(struct cpupri *cp);
#endif

View File

@ -2,6 +2,9 @@
/*
* Simple CPU accounting cgroup controller
*/
#include <linux/sched/cputime.h>
#include <linux/tsacct_kern.h>
#include "sched.h"
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
#include <asm/cputime.h>
@ -88,7 +91,7 @@ static u64 irqtime_tick_accounted(u64 maxtime)
return delta;
}
#else /* CONFIG_IRQ_TIME_ACCOUNTING */
#else /* !CONFIG_IRQ_TIME_ACCOUNTING: */
static u64 irqtime_tick_accounted(u64 dummy)
{
@ -241,7 +244,7 @@ void __account_forceidle_time(struct task_struct *p, u64 delta)
task_group_account_field(p, CPUTIME_FORCEIDLE, delta);
}
#endif
#endif /* CONFIG_SCHED_CORE */
/*
* When a guest is interrupted for a longer amount of time, missed clock
@ -262,7 +265,7 @@ static __always_inline u64 steal_account_process_time(u64 maxtime)
return steal;
}
#endif
#endif /* CONFIG_PARAVIRT */
return 0;
}
@ -288,7 +291,7 @@ static inline u64 read_sum_exec_runtime(struct task_struct *t)
{
return t->se.sum_exec_runtime;
}
#else
#else /* !CONFIG_64BIT: */
static u64 read_sum_exec_runtime(struct task_struct *t)
{
u64 ns;
@ -301,7 +304,7 @@ static u64 read_sum_exec_runtime(struct task_struct *t)
return ns;
}
#endif
#endif /* !CONFIG_64BIT */
/*
* Accumulate raw cputime values of dead tasks (sig->[us]time) and live
@ -411,11 +414,11 @@ static void irqtime_account_idle_ticks(int ticks)
{
irqtime_account_process_tick(current, 0, ticks);
}
#else /* CONFIG_IRQ_TIME_ACCOUNTING */
#else /* !CONFIG_IRQ_TIME_ACCOUNTING: */
static inline void irqtime_account_idle_ticks(int ticks) { }
static inline void irqtime_account_process_tick(struct task_struct *p, int user_tick,
int nr_ticks) { }
#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
#endif /* !CONFIG_IRQ_TIME_ACCOUNTING */
/*
* Use precise platform statistics if available:

View File

@ -17,6 +17,10 @@
*/
#include <linux/cpuset.h>
#include <linux/sched/clock.h>
#include <uapi/linux/sched/types.h>
#include "sched.h"
#include "pelt.h"
/*
* Default limits for DL period; on the top end we guard against small util
@ -51,7 +55,7 @@ static int __init sched_dl_sysctl_init(void)
return 0;
}
late_initcall(sched_dl_sysctl_init);
#endif
#endif /* CONFIG_SYSCTL */
static bool dl_server(struct sched_dl_entity *dl_se)
{
@ -99,7 +103,7 @@ static inline bool is_dl_boosted(struct sched_dl_entity *dl_se)
{
return pi_of(dl_se) != dl_se;
}
#else
#else /* !CONFIG_RT_MUTEXES: */
static inline struct sched_dl_entity *pi_of(struct sched_dl_entity *dl_se)
{
return dl_se;
@ -109,9 +113,8 @@ static inline bool is_dl_boosted(struct sched_dl_entity *dl_se)
{
return false;
}
#endif
#endif /* !CONFIG_RT_MUTEXES */
#ifdef CONFIG_SMP
static inline struct dl_bw *dl_bw_of(int i)
{
RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
@ -191,35 +194,6 @@ void __dl_update(struct dl_bw *dl_b, s64 bw)
rq->dl.extra_bw += bw;
}
}
#else
static inline struct dl_bw *dl_bw_of(int i)
{
return &cpu_rq(i)->dl.dl_bw;
}
static inline int dl_bw_cpus(int i)
{
return 1;
}
static inline unsigned long dl_bw_capacity(int i)
{
return SCHED_CAPACITY_SCALE;
}
bool dl_bw_visited(int cpu, u64 cookie)
{
return false;
}
static inline
void __dl_update(struct dl_bw *dl_b, s64 bw)
{
struct dl_rq *dl = container_of(dl_b, struct dl_rq, dl_bw);
dl->extra_bw += bw;
}
#endif
static inline
void __dl_sub(struct dl_bw *dl_b, u64 tsk_bw, int cpus)
@ -552,23 +526,17 @@ void init_dl_rq(struct dl_rq *dl_rq)
{
dl_rq->root = RB_ROOT_CACHED;
#ifdef CONFIG_SMP
/* zero means no -deadline tasks */
dl_rq->earliest_dl.curr = dl_rq->earliest_dl.next = 0;
dl_rq->overloaded = 0;
dl_rq->pushable_dl_tasks_root = RB_ROOT_CACHED;
#else
init_dl_bw(&dl_rq->dl_bw);
#endif
dl_rq->running_bw = 0;
dl_rq->this_bw = 0;
init_dl_rq_bw_ratio(dl_rq);
}
#ifdef CONFIG_SMP
static inline int dl_overloaded(struct rq *rq)
{
return atomic_read(&rq->rd->dlo_count);
@ -753,37 +721,6 @@ static struct rq *dl_task_offline_migration(struct rq *rq, struct task_struct *p
return later_rq;
}
#else
static inline
void enqueue_pushable_dl_task(struct rq *rq, struct task_struct *p)
{
}
static inline
void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p)
{
}
static inline
void inc_dl_migration(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
{
}
static inline
void dec_dl_migration(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
{
}
static inline void deadline_queue_push_tasks(struct rq *rq)
{
}
static inline void deadline_queue_pull_task(struct rq *rq)
{
}
#endif /* CONFIG_SMP */
static void
enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags);
static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags);
@ -824,6 +761,8 @@ static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
struct rq *rq = rq_of_dl_rq(dl_rq);
update_rq_clock(rq);
WARN_ON(is_dl_boosted(dl_se));
WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
@ -1195,7 +1134,6 @@ static int start_dl_timer(struct sched_dl_entity *dl_se)
static void __push_dl_task(struct rq *rq, struct rq_flags *rf)
{
#ifdef CONFIG_SMP
/*
* Queueing this task back might have overloaded rq, check if we need
* to kick someone away.
@ -1209,12 +1147,13 @@ static void __push_dl_task(struct rq *rq, struct rq_flags *rf)
push_dl_task(rq);
rq_repin_lock(rq, rf);
}
#endif
}
/* a defer timer will not be reset if the runtime consumed was < dl_server_min_res */
static const u64 dl_server_min_res = 1 * NSEC_PER_MSEC;
static bool dl_server_stopped(struct sched_dl_entity *dl_se);
static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct sched_dl_entity *dl_se)
{
struct rq *rq = rq_of_dl_se(dl_se);
@ -1234,6 +1173,7 @@ static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct sched_
if (!dl_se->server_has_tasks(dl_se)) {
replenish_dl_entity(dl_se);
dl_server_stopped(dl_se);
return HRTIMER_NORESTART;
}
@ -1339,7 +1279,6 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
goto unlock;
}
#ifdef CONFIG_SMP
if (unlikely(!rq->online)) {
/*
* If the runqueue is no longer available, migrate the
@ -1356,7 +1295,6 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
* there.
*/
}
#endif
enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
if (dl_task(rq->donor))
@ -1600,7 +1538,7 @@ throttle:
rt_rq->rt_time += delta_exec;
raw_spin_unlock(&rt_rq->rt_runtime_lock);
}
#endif
#endif /* CONFIG_RT_GROUP_SCHED */
}
/*
@ -1639,31 +1577,17 @@ void dl_server_update_idle_time(struct rq *rq, struct task_struct *p)
void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec)
{
/* 0 runtime = fair server disabled */
if (dl_se->dl_runtime)
if (dl_se->dl_runtime) {
dl_se->dl_server_idle = 0;
update_curr_dl_se(dl_se->rq, dl_se, delta_exec);
}
}
void dl_server_start(struct sched_dl_entity *dl_se)
{
struct rq *rq = dl_se->rq;
/*
* XXX: the apply do not work fine at the init phase for the
* fair server because things are not yet set. We need to improve
* this before getting generic.
*/
if (!dl_server(dl_se)) {
u64 runtime = 50 * NSEC_PER_MSEC;
u64 period = 1000 * NSEC_PER_MSEC;
dl_server_apply_params(dl_se, runtime, period, 1);
dl_se->dl_server = 1;
dl_se->dl_defer = 1;
setup_new_dl_entity(dl_se);
}
if (!dl_se->dl_runtime)
if (!dl_server(dl_se) || dl_se->dl_server_active)
return;
dl_se->dl_server_active = 1;
@ -1674,7 +1598,7 @@ void dl_server_start(struct sched_dl_entity *dl_se)
void dl_server_stop(struct sched_dl_entity *dl_se)
{
if (!dl_se->dl_runtime)
if (!dl_server(dl_se) || !dl_server_active(dl_se))
return;
dequeue_dl_entity(dl_se, DEQUEUE_SLEEP);
@ -1684,6 +1608,20 @@ void dl_server_stop(struct sched_dl_entity *dl_se)
dl_se->dl_server_active = 0;
}
static bool dl_server_stopped(struct sched_dl_entity *dl_se)
{
if (!dl_se->dl_server_active)
return false;
if (dl_se->dl_server_idle) {
dl_server_stop(dl_se);
return true;
}
dl_se->dl_server_idle = 1;
return false;
}
void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
dl_server_has_tasks_f has_tasks,
dl_server_pick_f pick_task)
@ -1693,6 +1631,32 @@ void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
dl_se->server_pick_task = pick_task;
}
void sched_init_dl_servers(void)
{
int cpu;
struct rq *rq;
struct sched_dl_entity *dl_se;
for_each_online_cpu(cpu) {
u64 runtime = 50 * NSEC_PER_MSEC;
u64 period = 1000 * NSEC_PER_MSEC;
rq = cpu_rq(cpu);
guard(rq_lock_irq)(rq);
dl_se = &rq->fair_server;
WARN_ON(dl_server(dl_se));
dl_server_apply_params(dl_se, runtime, period, 1);
dl_se->dl_server = 1;
dl_se->dl_defer = 1;
setup_new_dl_entity(dl_se);
}
}
void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct rq *rq)
{
u64 new_bw = dl_se->dl_bw;
@ -1844,8 +1808,6 @@ static void init_dl_inactive_task_timer(struct sched_dl_entity *dl_se)
#define __node_2_dle(node) \
rb_entry((node), struct sched_dl_entity, rb_node)
#ifdef CONFIG_SMP
static void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
{
struct rq *rq = rq_of_dl_rq(dl_rq);
@ -1881,13 +1843,6 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
}
}
#else
static inline void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline) {}
static inline void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline) {}
#endif /* CONFIG_SMP */
static inline
void inc_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
{
@ -2166,6 +2121,9 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
if (dl_server(&p->dl))
return;
if (task_is_blocked(p))
return;
if (!task_current(rq, p) && !p->dl.dl_throttled && p->nr_cpus_allowed > 1)
enqueue_pushable_dl_task(rq, p);
}
@ -2214,8 +2172,6 @@ static void yield_task_dl(struct rq *rq)
rq_clock_skip_update(rq);
}
#ifdef CONFIG_SMP
static inline bool dl_task_is_earliest_deadline(struct task_struct *p,
struct rq *rq)
{
@ -2345,7 +2301,6 @@ static int balance_dl(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
return sched_stop_runnable(rq) || sched_dl_runnable(rq);
}
#endif /* CONFIG_SMP */
/*
* Only called when both the current and waking task are -deadline
@ -2359,7 +2314,6 @@ static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
return;
}
#ifdef CONFIG_SMP
/*
* In the unlikely case current and p have the same deadline
* let us try to decide what's the best thing to do...
@ -2367,7 +2321,6 @@ static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
if ((p->dl.deadline == rq->donor->dl.deadline) &&
!test_tsk_need_resched(rq->curr))
check_preempt_equal_dl(rq, p);
#endif /* CONFIG_SMP */
}
#ifdef CONFIG_SCHED_HRTICK
@ -2375,11 +2328,11 @@ static void start_hrtick_dl(struct rq *rq, struct sched_dl_entity *dl_se)
{
hrtick_start(rq, dl_se->runtime);
}
#else /* !CONFIG_SCHED_HRTICK */
#else /* !CONFIG_SCHED_HRTICK: */
static void start_hrtick_dl(struct rq *rq, struct sched_dl_entity *dl_se)
{
}
#endif
#endif /* !CONFIG_SCHED_HRTICK */
static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first)
{
@ -2435,7 +2388,7 @@ again:
if (dl_server(dl_se)) {
p = dl_se->server_pick_task(dl_se);
if (!p) {
if (dl_server_active(dl_se)) {
if (!dl_server_stopped(dl_se)) {
dl_se->dl_yielded = 1;
update_curr_dl_se(rq, dl_se, 0);
}
@ -2465,6 +2418,10 @@ static void put_prev_task_dl(struct rq *rq, struct task_struct *p, struct task_s
update_curr_dl(rq);
update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 1);
if (task_is_blocked(p))
return;
if (on_dl_rq(&p->dl) && p->nr_cpus_allowed > 1)
enqueue_pushable_dl_task(rq, p);
}
@ -2500,8 +2457,6 @@ static void task_fork_dl(struct task_struct *p)
*/
}
#ifdef CONFIG_SMP
/* Only try algorithms three times */
#define DL_MAX_TRIES 3
@ -2976,7 +2931,14 @@ void dl_clear_root_domain(struct root_domain *rd)
int i;
guard(raw_spinlock_irqsave)(&rd->dl_bw.lock);
/*
* Reset total_bw to zero and extra_bw to max_bw so that next
* loop will add dl-servers contributions back properly,
*/
rd->dl_bw.total_bw = 0;
for_each_cpu(i, rd->span)
cpu_rq(i)->dl.extra_bw = cpu_rq(i)->dl.max_bw;
/*
* dl_servers are not tasks. Since dl_add_task_root_domain ignores
@ -2995,8 +2957,6 @@ void dl_clear_root_domain_cpu(int cpu)
dl_clear_root_domain(cpu_rq(cpu)->rd);
}
#endif /* CONFIG_SMP */
static void switched_from_dl(struct rq *rq, struct task_struct *p)
{
/*
@ -3069,10 +3029,8 @@ static void switched_to_dl(struct rq *rq, struct task_struct *p)
}
if (rq->donor != p) {
#ifdef CONFIG_SMP
if (p->nr_cpus_allowed > 1 && rq->dl.overloaded)
deadline_queue_push_tasks(rq);
#endif
if (dl_task(rq->donor))
wakeup_preempt_dl(rq, p, 0);
else
@ -3092,7 +3050,6 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
if (!task_on_rq_queued(p))
return;
#ifdef CONFIG_SMP
/*
* This might be too much, but unfortunately
* we don't have the old deadline value, and
@ -3121,13 +3078,6 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
dl_time_before(p->dl.deadline, rq->curr->dl.deadline))
resched_curr(rq);
}
#else
/*
* We don't know if p has a earlier or later deadline, so let's blindly
* set a (maybe not needed) rescheduling point.
*/
resched_curr(rq);
#endif
}
#ifdef CONFIG_SCHED_CORE
@ -3149,7 +3099,6 @@ DEFINE_SCHED_CLASS(dl) = {
.put_prev_task = put_prev_task_dl,
.set_next_task = set_next_task_dl,
#ifdef CONFIG_SMP
.balance = balance_dl,
.select_task_rq = select_task_rq_dl,
.migrate_task_rq = migrate_task_rq_dl,
@ -3158,7 +3107,6 @@ DEFINE_SCHED_CLASS(dl) = {
.rq_offline = rq_offline_dl,
.task_woken = task_woken_dl,
.find_lock_rq = find_lock_later_rq,
#endif
.task_tick = task_tick_dl,
.task_fork = task_fork_dl,
@ -3242,6 +3190,9 @@ void sched_dl_do_global(void)
if (global_rt_runtime() != RUNTIME_INF)
new_bw = to_ratio(global_rt_period(), global_rt_runtime());
for_each_possible_cpu(cpu)
init_dl_rq_bw_ratio(&cpu_rq(cpu)->dl);
for_each_possible_cpu(cpu) {
rcu_read_lock_sched();
@ -3257,7 +3208,6 @@ void sched_dl_do_global(void)
raw_spin_unlock_irqrestore(&dl_b->lock, flags);
rcu_read_unlock_sched();
init_dl_rq_bw_ratio(&cpu_rq(cpu)->dl);
}
}
@ -3458,7 +3408,6 @@ bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr)
return false;
}
#ifdef CONFIG_SMP
int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
const struct cpumask *trial)
{
@ -3570,7 +3519,6 @@ void dl_bw_free(int cpu, u64 dl_bw)
{
dl_bw_manage(dl_bw_req_free, cpu, dl_bw);
}
#endif
void print_dl_stats(struct seq_file *m, int cpu)
{

View File

@ -6,6 +6,9 @@
*
* Copyright(C) 2007, Red Hat, Inc., Ingo Molnar
*/
#include <linux/debugfs.h>
#include <linux/nmi.h>
#include "sched.h"
/*
* This allows printing both to /sys/kernel/debug/sched/debug and
@ -90,10 +93,10 @@ static void sched_feat_enable(int i)
{
static_key_enable_cpuslocked(&sched_feat_keys[i]);
}
#else
#else /* !CONFIG_JUMP_LABEL: */
static void sched_feat_disable(int i) { };
static void sched_feat_enable(int i) { };
#endif /* CONFIG_JUMP_LABEL */
#endif /* !CONFIG_JUMP_LABEL */
static int sched_feat_set(char *cmp)
{
@ -166,8 +169,6 @@ static const struct file_operations sched_feat_fops = {
.release = single_release,
};
#ifdef CONFIG_SMP
static ssize_t sched_scaling_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *ppos)
{
@ -214,8 +215,6 @@ static const struct file_operations sched_scaling_fops = {
.release = single_release,
};
#endif /* SMP */
#ifdef CONFIG_PREEMPT_DYNAMIC
static ssize_t sched_dynamic_write(struct file *filp, const char __user *ubuf,
@ -283,7 +282,6 @@ static const struct file_operations sched_dynamic_fops = {
__read_mostly bool sched_debug_verbose;
#ifdef CONFIG_SMP
static struct dentry *sd_dentry;
@ -311,9 +309,6 @@ static ssize_t sched_verbose_write(struct file *filp, const char __user *ubuf,
return result;
}
#else
#define sched_verbose_write debugfs_write_file_bool
#endif
static const struct file_operations sched_verbose_fops = {
.read = debugfs_read_file_bool,
@ -512,7 +507,6 @@ static __init int sched_init_debug(void)
debugfs_create_u32("latency_warn_ms", 0644, debugfs_sched, &sysctl_resched_latency_warn_ms);
debugfs_create_u32("latency_warn_once", 0644, debugfs_sched, &sysctl_resched_latency_warn_once);
#ifdef CONFIG_SMP
debugfs_create_file("tunable_scaling", 0644, debugfs_sched, NULL, &sched_scaling_fops);
debugfs_create_u32("migration_cost_ns", 0644, debugfs_sched, &sysctl_sched_migration_cost);
debugfs_create_u32("nr_migrate", 0644, debugfs_sched, &sysctl_sched_nr_migrate);
@ -520,7 +514,6 @@ static __init int sched_init_debug(void)
sched_domains_mutex_lock();
update_sched_domain_debugfs();
sched_domains_mutex_unlock();
#endif
#ifdef CONFIG_NUMA_BALANCING
numa = debugfs_create_dir("numa_balancing", debugfs_sched);
@ -530,7 +523,7 @@ static __init int sched_init_debug(void)
debugfs_create_u32("scan_period_max_ms", 0644, numa, &sysctl_numa_balancing_scan_period_max);
debugfs_create_u32("scan_size_mb", 0644, numa, &sysctl_numa_balancing_scan_size);
debugfs_create_u32("hot_threshold_ms", 0644, numa, &sysctl_numa_balancing_hot_threshold);
#endif
#endif /* CONFIG_NUMA_BALANCING */
debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops);
@ -540,8 +533,6 @@ static __init int sched_init_debug(void)
}
late_initcall(sched_init_debug);
#ifdef CONFIG_SMP
static cpumask_var_t sd_sysctl_cpus;
static int sd_flags_show(struct seq_file *m, void *v)
@ -652,8 +643,6 @@ void dirty_sched_domain_sysctl(int cpu)
__cpumask_set_cpu(cpu, sd_sysctl_cpus);
}
#endif /* CONFIG_SMP */
#ifdef CONFIG_FAIR_GROUP_SCHED
static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group *tg)
{
@ -690,18 +679,16 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group
}
P(se->load.weight);
#ifdef CONFIG_SMP
P(se->avg.load_avg);
P(se->avg.util_avg);
P(se->avg.runnable_avg);
#endif
#undef PN_SCHEDSTAT
#undef PN
#undef P_SCHEDSTAT
#undef P
}
#endif
#endif /* CONFIG_FAIR_GROUP_SCHED */
#ifdef CONFIG_CGROUP_SCHED
static DEFINE_SPINLOCK(sched_debug_lock);
@ -854,7 +841,6 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SEQ_printf(m, " .%-30s: %d\n", "h_nr_queued", cfs_rq->h_nr_queued);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_idle", cfs_rq->h_nr_idle);
SEQ_printf(m, " .%-30s: %ld\n", "load", cfs_rq->load.weight);
#ifdef CONFIG_SMP
SEQ_printf(m, " .%-30s: %lu\n", "load_avg",
cfs_rq->avg.load_avg);
SEQ_printf(m, " .%-30s: %lu\n", "runnable_avg",
@ -874,8 +860,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
cfs_rq->tg_load_avg_contrib);
SEQ_printf(m, " .%-30s: %ld\n", "tg_load_avg",
atomic_long_read(&cfs_rq->tg->load_avg));
#endif
#endif
#endif /* CONFIG_FAIR_GROUP_SCHED */
#ifdef CONFIG_CFS_BANDWIDTH
SEQ_printf(m, " .%-30s: %d\n", "throttled",
cfs_rq->throttled);
@ -929,11 +914,7 @@ void print_dl_rq(struct seq_file *m, int cpu, struct dl_rq *dl_rq)
SEQ_printf(m, " .%-30s: %lu\n", #x, (unsigned long)(dl_rq->x))
PU(dl_nr_running);
#ifdef CONFIG_SMP
dl_bw = &cpu_rq(cpu)->rd->dl_bw;
#else
dl_bw = &dl_rq->dl_bw;
#endif
SEQ_printf(m, " .%-30s: %lld\n", "dl_bw->bw", dl_bw->bw);
SEQ_printf(m, " .%-30s: %lld\n", "dl_bw->total_bw", dl_bw->total_bw);
@ -951,9 +932,9 @@ static void print_cpu(struct seq_file *m, int cpu)
SEQ_printf(m, "cpu#%d, %u.%03u MHz\n",
cpu, freq / 1000, (freq % 1000));
}
#else
#else /* !CONFIG_X86: */
SEQ_printf(m, "cpu#%d\n", cpu);
#endif
#endif /* !CONFIG_X86 */
#define P(x) \
do { \
@ -976,12 +957,10 @@ do { \
#undef P
#undef PN
#ifdef CONFIG_SMP
#define P64(n) SEQ_printf(m, " .%-30s: %Ld\n", #n, rq->n);
P64(avg_idle);
P64(max_idle_balance_cost);
#undef P64
#endif
#define P(n) SEQ_printf(m, " .%-30s: %d\n", #n, schedstat_val(rq->n));
if (schedstat_enabled()) {
@ -1163,7 +1142,7 @@ static void sched_show_numa(struct task_struct *p, struct seq_file *m)
SEQ_printf(m, "current_node=%d, numa_group_id=%d\n",
task_node(p), task_numa_group_id(p));
show_numa_stats(p, m);
#endif
#endif /* CONFIG_NUMA_BALANCING */
}
void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
@ -1247,7 +1226,6 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
__PS("nr_involuntary_switches", p->nivcsw);
P(se.load.weight);
#ifdef CONFIG_SMP
P(se.avg.load_sum);
P(se.avg.runnable_sum);
P(se.avg.util_sum);
@ -1256,13 +1234,12 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
P(se.avg.util_avg);
P(se.avg.last_update_time);
PM(se.avg.util_est, ~UTIL_AVG_UNCHANGED);
#endif
#ifdef CONFIG_UCLAMP_TASK
__PS("uclamp.min", p->uclamp_req[UCLAMP_MIN].value);
__PS("uclamp.max", p->uclamp_req[UCLAMP_MAX].value);
__PS("effective uclamp.min", uclamp_eff_value(p, UCLAMP_MIN));
__PS("effective uclamp.max", uclamp_eff_value(p, UCLAMP_MAX));
#endif
#endif /* CONFIG_UCLAMP_TASK */
P(policy);
P(prio);
if (task_has_dl_policy(p)) {

File diff suppressed because it is too large Load Diff

View File

@ -6,6 +6,11 @@
* (NOTE: these are not related to SCHED_IDLE batch scheduled
* tasks which are handled in sched/fair.c )
*/
#include <linux/cpuidle.h>
#include <linux/suspend.h>
#include <linux/livepatch.h>
#include "sched.h"
#include "smp.h"
/* Linker adds these: start and end of __cpuidle functions */
extern char __cpuidle_text_start[], __cpuidle_text_end[];
@ -47,7 +52,7 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
return 1;
}
__setup("hlt", cpu_idle_nopoll_setup);
#endif
#endif /* CONFIG_GENERIC_IDLE_POLL_SETUP */
static noinline int __cpuidle cpu_idle_poll(void)
{
@ -95,10 +100,10 @@ static inline void cond_tick_broadcast_exit(void)
if (static_branch_unlikely(&arch_needs_tick_broadcast))
tick_broadcast_exit();
}
#else
#else /* !CONFIG_GENERIC_CLOCKEVENTS_BROADCAST_IDLE: */
static inline void cond_tick_broadcast_enter(void) { }
static inline void cond_tick_broadcast_exit(void) { }
#endif
#endif /* !CONFIG_GENERIC_CLOCKEVENTS_BROADCAST_IDLE */
/**
* default_idle_call - Default CPU idle routine.
@ -427,7 +432,6 @@ void cpu_startup_entry(enum cpuhp_state state)
* idle-task scheduling class.
*/
#ifdef CONFIG_SMP
static int
select_task_rq_idle(struct task_struct *p, int cpu, int flags)
{
@ -439,7 +443,6 @@ balance_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
return WARN_ON_ONCE(1);
}
#endif
/*
* Idle tasks are unconditionally rescheduled:
@ -526,11 +529,9 @@ DEFINE_SCHED_CLASS(idle) = {
.put_prev_task = put_prev_task_idle,
.set_next_task = set_next_task_idle,
#ifdef CONFIG_SMP
.balance = balance_idle,
.select_task_rq = select_task_rq_idle,
.set_cpus_allowed = set_cpus_allowed_common,
#endif
.task_tick = task_tick_idle,

View File

@ -7,6 +7,8 @@
* Copyright (C) 2017-2018 SUSE, Frederic Weisbecker
*
*/
#include <linux/sched/isolation.h>
#include "sched.h"
enum hk_flags {
HK_FLAG_DOMAIN = BIT(HK_TYPE_DOMAIN),

View File

@ -6,6 +6,8 @@
* figure. Its a silly number but people think its important. We go through
* great pains to make it work on big machines and tickless kernels.
*/
#include <linux/sched/nohz.h>
#include "sched.h"
/*
* Global load-average calculations
@ -333,12 +335,12 @@ static void calc_global_nohz(void)
smp_wmb();
calc_load_idx++;
}
#else /* !CONFIG_NO_HZ_COMMON */
#else /* !CONFIG_NO_HZ_COMMON: */
static inline long calc_load_nohz_read(void) { return 0; }
static inline void calc_global_nohz(void) { }
#endif /* CONFIG_NO_HZ_COMMON */
#endif /* !CONFIG_NO_HZ_COMMON */
/*
* calc_load - update the avenrun load estimates 10 ticks after the

View File

@ -4,6 +4,8 @@
*
* membarrier system call
*/
#include <uapi/linux/membarrier.h>
#include "sched.h"
/*
* For documentation purposes, here are some membarrier ordering

View File

@ -23,6 +23,7 @@
* Move PELT related code from fair.c into this pelt.c file
* Author: Vincent Guittot <vincent.guittot@linaro.org>
*/
#include "pelt.h"
/*
* Approximate:
@ -413,7 +414,7 @@ int update_hw_load_avg(u64 now, struct rq *rq, u64 capacity)
return 0;
}
#endif
#endif /* CONFIG_SCHED_HW_PRESSURE */
#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
/*
@ -466,7 +467,7 @@ int update_irq_load_avg(struct rq *rq, u64 running)
return ret;
}
#endif
#endif /* CONFIG_HAVE_SCHED_AVG_IRQ */
/*
* Load avg and utiliztion metrics need to be updated periodically and before

View File

@ -1,4 +1,8 @@
#ifdef CONFIG_SMP
// SPDX-License-Identifier: GPL-2.0
#ifndef _KERNEL_SCHED_PELT_H
#define _KERNEL_SCHED_PELT_H
#include "sched.h"
#include "sched-pelt.h"
int __update_load_avg_blocked_se(u64 now, struct sched_entity *se);
@ -15,7 +19,7 @@ static inline u64 hw_load_avg(struct rq *rq)
{
return READ_ONCE(rq->avg_hw.load_avg);
}
#else
#else /* !CONFIG_SCHED_HW_PRESSURE: */
static inline int
update_hw_load_avg(u64 now, struct rq *rq, u64 capacity)
{
@ -26,7 +30,7 @@ static inline u64 hw_load_avg(struct rq *rq)
{
return 0;
}
#endif
#endif /* !CONFIG_SCHED_HW_PRESSURE */
#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
int update_irq_load_avg(struct rq *rq, u64 running);
@ -174,63 +178,12 @@ static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
return rq_clock_pelt(rq_of(cfs_rq)) - cfs_rq->throttled_clock_pelt_time;
}
#else
#else /* !CONFIG_CFS_BANDWIDTH: */
static inline void update_idle_cfs_rq_clock_pelt(struct cfs_rq *cfs_rq) { }
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
return rq_clock_pelt(rq_of(cfs_rq));
}
#endif
#else
static inline int
update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
{
return 0;
}
static inline int
update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
{
return 0;
}
static inline int
update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
{
return 0;
}
static inline int
update_hw_load_avg(u64 now, struct rq *rq, u64 capacity)
{
return 0;
}
static inline u64 hw_load_avg(struct rq *rq)
{
return 0;
}
static inline int
update_irq_load_avg(struct rq *rq, u64 running)
{
return 0;
}
static inline u64 rq_clock_pelt(struct rq *rq)
{
return rq_clock_task(rq);
}
static inline void
update_rq_clock_pelt(struct rq *rq, s64 delta) { }
static inline void
update_idle_rq_clock_pelt(struct rq *rq) { }
static inline void update_idle_cfs_rq_clock_pelt(struct cfs_rq *cfs_rq) { }
#endif
#endif /* !CONFIG_CFS_BANDWIDTH */
#endif /* _KERNEL_SCHED_PELT_H */

View File

@ -136,6 +136,10 @@
* cost-wise, yet way more sensitive and accurate than periodic
* sampling of the aggregate task states would be.
*/
#include <linux/sched/clock.h>
#include <linux/workqueue.h>
#include <linux/psi.h>
#include "sched.h"
static int psi_bug __read_mostly;
@ -172,6 +176,28 @@ struct psi_group psi_system = {
.pcpu = &system_group_pcpu,
};
static DEFINE_PER_CPU(seqcount_t, psi_seq);
static inline void psi_write_begin(int cpu)
{
write_seqcount_begin(per_cpu_ptr(&psi_seq, cpu));
}
static inline void psi_write_end(int cpu)
{
write_seqcount_end(per_cpu_ptr(&psi_seq, cpu));
}
static inline u32 psi_read_begin(int cpu)
{
return read_seqcount_begin(per_cpu_ptr(&psi_seq, cpu));
}
static inline bool psi_read_retry(int cpu, u32 seq)
{
return read_seqcount_retry(per_cpu_ptr(&psi_seq, cpu), seq);
}
static void psi_avgs_work(struct work_struct *work);
static void poll_timer_fn(struct timer_list *t);
@ -182,7 +208,7 @@ static void group_init(struct psi_group *group)
group->enabled = true;
for_each_possible_cpu(cpu)
seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq);
seqcount_init(per_cpu_ptr(&psi_seq, cpu));
group->avg_last_update = sched_clock();
group->avg_next_update = group->avg_last_update + psi_period;
mutex_init(&group->avgs_lock);
@ -262,14 +288,14 @@ static void get_recent_times(struct psi_group *group, int cpu,
/* Snapshot a coherent view of the CPU state */
do {
seq = read_seqcount_begin(&groupc->seq);
seq = psi_read_begin(cpu);
now = cpu_clock(cpu);
memcpy(times, groupc->times, sizeof(groupc->times));
state_mask = groupc->state_mask;
state_start = groupc->state_start;
if (cpu == current_cpu)
memcpy(tasks, groupc->tasks, sizeof(groupc->tasks));
} while (read_seqcount_retry(&groupc->seq, seq));
} while (psi_read_retry(cpu, seq));
/* Calculate state time deltas against the previous snapshot */
for (s = 0; s < NR_PSI_STATES; s++) {
@ -768,30 +794,20 @@ static void record_times(struct psi_group_cpu *groupc, u64 now)
groupc->times[PSI_NONIDLE] += delta;
}
#define for_each_group(iter, group) \
for (typeof(group) iter = group; iter; iter = iter->parent)
static void psi_group_change(struct psi_group *group, int cpu,
unsigned int clear, unsigned int set,
bool wake_clock)
u64 now, bool wake_clock)
{
struct psi_group_cpu *groupc;
unsigned int t, m;
u32 state_mask;
u64 now;
lockdep_assert_rq_held(cpu_rq(cpu));
groupc = per_cpu_ptr(group->pcpu, cpu);
/*
* First we update the task counts according to the state
* change requested through the @clear and @set bits.
*
* Then if the cgroup PSI stats accounting enabled, we
* assess the aggregate resource states this CPU's tasks
* have been in since the last change, and account any
* SOME and FULL time these may have resulted in.
*/
write_seqcount_begin(&groupc->seq);
now = cpu_clock(cpu);
/*
* Start with TSK_ONCPU, which doesn't have a corresponding
* task count - it's just a boolean flag directly encoded in
@ -843,7 +859,6 @@ static void psi_group_change(struct psi_group *group, int cpu,
groupc->state_mask = state_mask;
write_seqcount_end(&groupc->seq);
return;
}
@ -864,8 +879,6 @@ static void psi_group_change(struct psi_group *group, int cpu,
groupc->state_mask = state_mask;
write_seqcount_end(&groupc->seq);
if (state_mask & group->rtpoll_states)
psi_schedule_rtpoll_work(group, 1, false);
@ -900,24 +913,29 @@ static void psi_flags_change(struct task_struct *task, int clear, int set)
void psi_task_change(struct task_struct *task, int clear, int set)
{
int cpu = task_cpu(task);
struct psi_group *group;
u64 now;
if (!task->pid)
return;
psi_flags_change(task, clear, set);
group = task_psi_group(task);
do {
psi_group_change(group, cpu, clear, set, true);
} while ((group = group->parent));
psi_write_begin(cpu);
now = cpu_clock(cpu);
for_each_group(group, task_psi_group(task))
psi_group_change(group, cpu, clear, set, now, true);
psi_write_end(cpu);
}
void psi_task_switch(struct task_struct *prev, struct task_struct *next,
bool sleep)
{
struct psi_group *group, *common = NULL;
struct psi_group *common = NULL;
int cpu = task_cpu(prev);
u64 now;
psi_write_begin(cpu);
now = cpu_clock(cpu);
if (next->pid) {
psi_flags_change(next, 0, TSK_ONCPU);
@ -926,16 +944,15 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next,
* ancestors with @prev, those will already have @prev's
* TSK_ONCPU bit set, and we can stop the iteration there.
*/
group = task_psi_group(next);
do {
if (per_cpu_ptr(group->pcpu, cpu)->state_mask &
PSI_ONCPU) {
for_each_group(group, task_psi_group(next)) {
struct psi_group_cpu *groupc = per_cpu_ptr(group->pcpu, cpu);
if (groupc->state_mask & PSI_ONCPU) {
common = group;
break;
}
psi_group_change(group, cpu, 0, TSK_ONCPU, true);
} while ((group = group->parent));
psi_group_change(group, cpu, 0, TSK_ONCPU, now, true);
}
}
if (prev->pid) {
@ -968,12 +985,11 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next,
psi_flags_change(prev, clear, set);
group = task_psi_group(prev);
do {
for_each_group(group, task_psi_group(prev)) {
if (group == common)
break;
psi_group_change(group, cpu, clear, set, wake_clock);
} while ((group = group->parent));
psi_group_change(group, cpu, clear, set, now, wake_clock);
}
/*
* TSK_ONCPU is handled up to the common ancestor. If there are
@ -983,20 +999,21 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next,
*/
if ((prev->psi_flags ^ next->psi_flags) & ~TSK_ONCPU) {
clear &= ~TSK_ONCPU;
for (; group; group = group->parent)
psi_group_change(group, cpu, clear, set, wake_clock);
for_each_group(group, common)
psi_group_change(group, cpu, clear, set, now, wake_clock);
}
}
psi_write_end(cpu);
}
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct task_struct *prev)
{
int cpu = task_cpu(curr);
struct psi_group *group;
struct psi_group_cpu *groupc;
s64 delta;
u64 irq;
u64 now;
if (static_branch_likely(&psi_disabled) || !irqtime_enabled())
return;
@ -1005,8 +1022,7 @@ void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct task_st
return;
lockdep_assert_rq_held(rq);
group = task_psi_group(curr);
if (prev && task_psi_group(prev) == group)
if (prev && task_psi_group(prev) == task_psi_group(curr))
return;
irq = irq_time_read(cpu);
@ -1015,27 +1031,24 @@ void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct task_st
return;
rq->psi_irq_time = irq;
do {
u64 now;
psi_write_begin(cpu);
now = cpu_clock(cpu);
for_each_group(group, task_psi_group(curr)) {
if (!group->enabled)
continue;
groupc = per_cpu_ptr(group->pcpu, cpu);
write_seqcount_begin(&groupc->seq);
now = cpu_clock(cpu);
record_times(groupc, now);
groupc->times[PSI_IRQ_FULL] += delta;
write_seqcount_end(&groupc->seq);
if (group->rtpoll_states & (1 << PSI_IRQ_FULL))
psi_schedule_rtpoll_work(group, 1, false);
} while ((group = group->parent));
}
psi_write_end(cpu);
}
#endif
#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
/**
* psi_memstall_enter - mark the beginning of a memory stall section
@ -1221,12 +1234,14 @@ void psi_cgroup_restart(struct psi_group *group)
return;
for_each_possible_cpu(cpu) {
struct rq *rq = cpu_rq(cpu);
struct rq_flags rf;
u64 now;
rq_lock_irq(rq, &rf);
psi_group_change(group, cpu, 0, 0, true);
rq_unlock_irq(rq, &rf);
guard(rq_lock_irq)(cpu_rq(cpu));
psi_write_begin(cpu);
now = cpu_clock(cpu);
psi_group_change(group, cpu, 0, 0, now, true);
psi_write_end(cpu);
}
}
#endif /* CONFIG_CGROUPS */
@ -1651,7 +1666,7 @@ static const struct proc_ops psi_irq_proc_ops = {
.proc_poll = psi_fop_poll,
.proc_release = psi_fop_release,
};
#endif
#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
static int __init psi_proc_init(void)
{

View File

@ -4,6 +4,9 @@
* policies)
*/
#include "sched.h"
#include "pelt.h"
int sched_rr_timeslice = RR_TIMESLICE;
/* More than 4 hours if BW_SHIFT equals 20. */
static const u64 max_rt_runtime = MAX_BW;
@ -60,7 +63,7 @@ static int __init sched_rt_sysctl_init(void)
return 0;
}
late_initcall(sched_rt_sysctl_init);
#endif
#endif /* CONFIG_SYSCTL */
void init_rt_rq(struct rt_rq *rt_rq)
{
@ -75,12 +78,10 @@ void init_rt_rq(struct rt_rq *rt_rq)
/* delimiter for bitsearch: */
__set_bit(MAX_RT_PRIO, array->bitmap);
#if defined CONFIG_SMP
rt_rq->highest_prio.curr = MAX_RT_PRIO-1;
rt_rq->highest_prio.next = MAX_RT_PRIO-1;
rt_rq->overloaded = 0;
plist_head_init(&rt_rq->pushable_tasks);
#endif /* CONFIG_SMP */
/* We start is dequeued state, because no RT tasks are queued */
rt_rq->rt_queued = 0;
@ -291,7 +292,7 @@ err:
return 0;
}
#else /* CONFIG_RT_GROUP_SCHED */
#else /* !CONFIG_RT_GROUP_SCHED: */
#define rt_entity_is_task(rt_se) (1)
@ -327,9 +328,7 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
{
return 1;
}
#endif /* CONFIG_RT_GROUP_SCHED */
#ifdef CONFIG_SMP
#endif /* !CONFIG_RT_GROUP_SCHED */
static inline bool need_pull_rt_task(struct rq *rq, struct task_struct *prev)
{
@ -430,21 +429,6 @@ static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
}
}
#else
static inline void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
{
}
static inline void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
{
}
static inline void rt_queue_push_tasks(struct rq *rq)
{
}
#endif /* CONFIG_SMP */
static void enqueue_top_rt_rq(struct rt_rq *rt_rq);
static void dequeue_top_rt_rq(struct rt_rq *rt_rq, unsigned int count);
@ -485,12 +469,12 @@ static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
return cpu_cap >= min(min_cap, max_cap);
}
#else
#else /* !CONFIG_UCLAMP_TASK: */
static inline bool rt_task_fits_capacity(struct task_struct *p, int cpu)
{
return true;
}
#endif
#endif /* !CONFIG_UCLAMP_TASK */
#ifdef CONFIG_RT_GROUP_SCHED
@ -594,17 +578,10 @@ static int rt_se_boosted(struct sched_rt_entity *rt_se)
return p->prio != p->normal_prio;
}
#ifdef CONFIG_SMP
static inline const struct cpumask *sched_rt_period_mask(void)
{
return this_rq()->rd->span;
}
#else
static inline const struct cpumask *sched_rt_period_mask(void)
{
return cpu_online_mask;
}
#endif
static inline
struct rt_rq *sched_rt_period_rt_rq(struct rt_bandwidth *rt_b, int cpu)
@ -625,7 +602,6 @@ bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
rt_rq->rt_time < rt_b->rt_runtime);
}
#ifdef CONFIG_SMP
/*
* We ran out of runtime, see if we can borrow some from our neighbours.
*/
@ -798,9 +774,6 @@ static void balance_runtime(struct rt_rq *rt_rq)
raw_spin_lock(&rt_rq->rt_runtime_lock);
}
}
#else /* !CONFIG_SMP */
static inline void balance_runtime(struct rt_rq *rt_rq) {}
#endif /* CONFIG_SMP */
static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
{
@ -930,7 +903,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
return 0;
}
#else /* !CONFIG_RT_GROUP_SCHED */
#else /* !CONFIG_RT_GROUP_SCHED: */
typedef struct rt_rq *rt_rq_iter_t;
@ -977,12 +950,10 @@ struct rt_rq *sched_rt_period_rt_rq(struct rt_bandwidth *rt_b, int cpu)
return &cpu_rq(cpu)->rt;
}
#ifdef CONFIG_SMP
static void __enable_runtime(struct rq *rq) { }
static void __disable_runtime(struct rq *rq) { }
#endif
#endif /* CONFIG_RT_GROUP_SCHED */
#endif /* !CONFIG_RT_GROUP_SCHED */
static inline int rt_se_prio(struct sched_rt_entity *rt_se)
{
@ -1033,7 +1004,7 @@ static void update_curr_rt(struct rq *rq)
do_start_rt_bandwidth(sched_rt_bandwidth(rt_rq));
}
}
#endif
#endif /* CONFIG_RT_GROUP_SCHED */
}
static void
@ -1075,8 +1046,6 @@ enqueue_top_rt_rq(struct rt_rq *rt_rq)
cpufreq_update_util(rq, 0);
}
#if defined CONFIG_SMP
static void
inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
{
@ -1107,16 +1076,6 @@ dec_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio)
cpupri_set(&rq->rd->cpupri, rq->cpu, rt_rq->highest_prio.curr);
}
#else /* CONFIG_SMP */
static inline
void inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio) {}
static inline
void dec_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio) {}
#endif /* CONFIG_SMP */
#if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHED
static void
inc_rt_prio(struct rt_rq *rt_rq, int prio)
{
@ -1155,13 +1114,6 @@ dec_rt_prio(struct rt_rq *rt_rq, int prio)
dec_rt_prio_smp(rt_rq, prio, prev_prio);
}
#else
static inline void inc_rt_prio(struct rt_rq *rt_rq, int prio) {}
static inline void dec_rt_prio(struct rt_rq *rt_rq, int prio) {}
#endif /* CONFIG_SMP || CONFIG_RT_GROUP_SCHED */
#ifdef CONFIG_RT_GROUP_SCHED
static void
@ -1182,7 +1134,7 @@ dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
WARN_ON(!rt_rq->rt_nr_running && rt_rq->rt_nr_boosted);
}
#else /* CONFIG_RT_GROUP_SCHED */
#else /* !CONFIG_RT_GROUP_SCHED: */
static void
inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
@ -1192,7 +1144,7 @@ inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
static inline
void dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) {}
#endif /* CONFIG_RT_GROUP_SCHED */
#endif /* !CONFIG_RT_GROUP_SCHED */
static inline
unsigned int rt_se_nr_running(struct sched_rt_entity *rt_se)
@ -1488,6 +1440,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
enqueue_rt_entity(rt_se, flags);
if (task_is_blocked(p))
return;
if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
enqueue_pushable_task(rq, p);
}
@ -1538,7 +1493,6 @@ static void yield_task_rt(struct rq *rq)
requeue_task_rt(rq, rq->curr, 0);
}
#ifdef CONFIG_SMP
static int find_lowest_rq(struct task_struct *task);
static int
@ -1653,7 +1607,6 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
return sched_stop_runnable(rq) || sched_dl_runnable(rq) || sched_rt_runnable(rq);
}
#endif /* CONFIG_SMP */
/*
* Preempt the current task with a newly woken task if needed:
@ -1667,7 +1620,6 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
return;
}
#ifdef CONFIG_SMP
/*
* If:
*
@ -1682,7 +1634,6 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
*/
if (p->prio == donor->prio && !test_tsk_need_resched(rq->curr))
check_preempt_equal_prio(rq, p);
#endif
}
static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool first)
@ -1768,6 +1719,8 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p, struct task_s
update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1);
if (task_is_blocked(p))
return;
/*
* The previous task needs to be made eligible for pushing
* if it is still active
@ -1776,8 +1729,6 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p, struct task_s
enqueue_pushable_task(rq, p);
}
#ifdef CONFIG_SMP
/* Only try algorithms three times */
#define RT_MAX_TRIES 3
@ -2451,7 +2402,6 @@ void __init init_sched_rt_class(void)
GFP_KERNEL, cpu_to_node(i));
}
}
#endif /* CONFIG_SMP */
/*
* When switching a task to RT, we may overload the runqueue
@ -2475,10 +2425,8 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
* then see if we can move to another run queue.
*/
if (task_on_rq_queued(p)) {
#ifdef CONFIG_SMP
if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
rt_queue_push_tasks(rq);
#endif /* CONFIG_SMP */
if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq)))
resched_curr(rq);
}
@ -2495,7 +2443,6 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
return;
if (task_current_donor(rq, p)) {
#ifdef CONFIG_SMP
/*
* If our priority decreases while running, we
* may need to pull tasks to this runqueue.
@ -2509,11 +2456,6 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
*/
if (p->prio > rq->rt.highest_prio.curr)
resched_curr(rq);
#else
/* For UP simply resched on drop of prio */
if (oldprio < p->prio)
resched_curr(rq);
#endif /* CONFIG_SMP */
} else {
/*
* This task is not running, but if it is
@ -2549,9 +2491,9 @@ static void watchdog(struct rq *rq, struct task_struct *p)
}
}
}
#else
#else /* !CONFIG_POSIX_TIMERS: */
static inline void watchdog(struct rq *rq, struct task_struct *p) { }
#endif
#endif /* !CONFIG_POSIX_TIMERS */
/*
* scheduler tick hitting a task of our scheduling class.
@ -2620,7 +2562,7 @@ static int task_is_throttled_rt(struct task_struct *p, int cpu)
return rt_rq_throttled(rt_rq);
}
#endif
#endif /* CONFIG_SCHED_CORE */
DEFINE_SCHED_CLASS(rt) = {
@ -2634,7 +2576,6 @@ DEFINE_SCHED_CLASS(rt) = {
.put_prev_task = put_prev_task_rt,
.set_next_task = set_next_task_rt,
#ifdef CONFIG_SMP
.balance = balance_rt,
.select_task_rq = select_task_rq_rt,
.set_cpus_allowed = set_cpus_allowed_common,
@ -2643,7 +2584,6 @@ DEFINE_SCHED_CLASS(rt) = {
.task_woken = task_woken_rt,
.switched_from = switched_from_rt,
.find_lock_rq = find_lock_lowest_rq,
#endif
.task_tick = task_tick_rt,
@ -2887,7 +2827,7 @@ int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
return 1;
}
#else /* !CONFIG_RT_GROUP_SCHED */
#else /* !CONFIG_RT_GROUP_SCHED: */
#ifdef CONFIG_SYSCTL
static int sched_rt_global_constraints(void)
@ -2895,7 +2835,7 @@ static int sched_rt_global_constraints(void)
return 0;
}
#endif /* CONFIG_SYSCTL */
#endif /* CONFIG_RT_GROUP_SCHED */
#endif /* !CONFIG_RT_GROUP_SCHED */
#ifdef CONFIG_SYSCTL
static int sched_rt_global_validate(void)
@ -2951,6 +2891,12 @@ undo:
sched_domains_mutex_unlock();
mutex_unlock(&mutex);
/*
* After changing maximum available bandwidth for DEADLINE, we need to
* recompute per root domain and per cpus variables accordingly.
*/
rebuild_sched_domains();
return ret;
}

View File

@ -1,5 +1,6 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Generated by Documentation/scheduler/sched-pelt; do not modify. */
#include <linux/types.h>
static const u32 runnable_avg_yN_inv[] __maybe_unused = {
0xffffffff, 0xfa83b2da, 0xf5257d14, 0xefe4b99a, 0xeac0c6e6, 0xe5b906e6,

View File

@ -69,6 +69,7 @@
#include <linux/wait_bit.h>
#include <linux/workqueue_api.h>
#include <linux/delayacct.h>
#include <linux/mmu_context.h>
#include <trace/events/power.h>
#include <trace/events/sched.h>
@ -384,6 +385,7 @@ extern void dl_server_stop(struct sched_dl_entity *dl_se);
extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
dl_server_has_tasks_f has_tasks,
dl_server_pick_f pick_task);
extern void sched_init_dl_servers(void);
extern void dl_server_update_idle_time(struct rq *rq,
struct task_struct *p);
@ -401,6 +403,19 @@ static inline bool dl_server_active(struct sched_dl_entity *dl_se)
extern struct list_head task_groups;
#ifdef CONFIG_CFS_BANDWIDTH
extern const u64 max_bw_quota_period_us;
/*
* default period for group bandwidth.
* default: 0.1s, units: microseconds
*/
static inline u64 default_bw_period_us(void)
{
return 100000ULL;
}
#endif /* CONFIG_CFS_BANDWIDTH */
struct cfs_bandwidth {
#ifdef CONFIG_CFS_BANDWIDTH
raw_spinlock_t lock;
@ -424,7 +439,7 @@ struct cfs_bandwidth {
int nr_burst;
u64 throttled_time;
u64 burst_time;
#endif
#endif /* CONFIG_CFS_BANDWIDTH */
};
/* Task group related information */
@ -442,15 +457,13 @@ struct task_group {
/* runqueue "owned" by this group on each CPU */
struct cfs_rq **cfs_rq;
unsigned long shares;
#ifdef CONFIG_SMP
/*
* load_avg can be heavily contended at clock tick time, so put
* it in its own cache-line separated from the fields above which
* will also be accessed at each tick.
*/
atomic_long_t load_avg ____cacheline_aligned;
#endif
#endif
#endif /* CONFIG_FAIR_GROUP_SCHED */
#ifdef CONFIG_RT_GROUP_SCHED
struct sched_rt_entity **rt_se;
@ -531,7 +544,7 @@ extern void free_fair_sched_group(struct task_group *tg);
extern int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent);
extern void online_fair_sched_group(struct task_group *tg);
extern void unregister_fair_sched_group(struct task_group *tg);
#else
#else /* !CONFIG_FAIR_GROUP_SCHED: */
static inline void free_fair_sched_group(struct task_group *tg) { }
static inline int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
{
@ -539,7 +552,7 @@ static inline int alloc_fair_sched_group(struct task_group *tg, struct task_grou
}
static inline void online_fair_sched_group(struct task_group *tg) { }
static inline void unregister_fair_sched_group(struct task_group *tg) { }
#endif
#endif /* !CONFIG_FAIR_GROUP_SCHED */
extern void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
struct sched_entity *se, int cpu,
@ -573,25 +586,20 @@ extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);
extern int sched_group_set_idle(struct task_group *tg, long idle);
#ifdef CONFIG_SMP
extern void set_task_rq_fair(struct sched_entity *se,
struct cfs_rq *prev, struct cfs_rq *next);
#else /* !CONFIG_SMP */
static inline void set_task_rq_fair(struct sched_entity *se,
struct cfs_rq *prev, struct cfs_rq *next) { }
#endif /* CONFIG_SMP */
#else /* !CONFIG_FAIR_GROUP_SCHED */
#else /* !CONFIG_FAIR_GROUP_SCHED: */
static inline int sched_group_set_shares(struct task_group *tg, unsigned long shares) { return 0; }
static inline int sched_group_set_idle(struct task_group *tg, long idle) { return 0; }
#endif /* CONFIG_FAIR_GROUP_SCHED */
#endif /* !CONFIG_FAIR_GROUP_SCHED */
#else /* CONFIG_CGROUP_SCHED */
#else /* !CONFIG_CGROUP_SCHED: */
struct cfs_bandwidth { };
static inline bool cfs_task_bw_constrained(struct task_struct *p) { return false; }
#endif /* CONFIG_CGROUP_SCHED */
#endif /* !CONFIG_CGROUP_SCHED */
extern void unregister_rt_sched_group(struct task_group *tg);
extern void free_rt_sched_group(struct task_group *tg);
@ -667,7 +675,6 @@ struct cfs_rq {
struct sched_entity *curr;
struct sched_entity *next;
#ifdef CONFIG_SMP
/*
* CFS load tracking
*/
@ -699,7 +706,6 @@ struct cfs_rq {
u64 last_h_load_update;
struct sched_entity *h_load_next;
#endif /* CONFIG_FAIR_GROUP_SCHED */
#endif /* CONFIG_SMP */
#ifdef CONFIG_FAIR_GROUP_SCHED
struct rq *rq; /* CPU runqueue to which this cfs_rq is attached */
@ -796,19 +802,13 @@ struct rt_rq {
struct rt_prio_array active;
unsigned int rt_nr_running;
unsigned int rr_nr_running;
#if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHED
struct {
int curr; /* highest queued rt task prio */
#ifdef CONFIG_SMP
int next; /* next highest */
#endif
} highest_prio;
#endif
#ifdef CONFIG_SMP
bool overloaded;
struct plist_head pushable_tasks;
#endif /* CONFIG_SMP */
int rt_queued;
#ifdef CONFIG_RT_GROUP_SCHED
@ -839,7 +839,6 @@ struct dl_rq {
unsigned int dl_nr_running;
#ifdef CONFIG_SMP
/*
* Deadline values of the currently executing and the
* earliest ready task on this rq. Caching these facilitates
@ -859,9 +858,7 @@ struct dl_rq {
* of the leftmost (earliest deadline) element.
*/
struct rb_root_cached pushable_dl_tasks_root;
#else
struct dl_bw dl_bw;
#endif
/*
* "Active utilization" for this runqueue: increased when a
* task wakes up (becomes TASK_RUNNING) and decreased when a
@ -932,7 +929,6 @@ static inline long se_runnable(struct sched_entity *se)
#endif /* !CONFIG_FAIR_GROUP_SCHED */
#ifdef CONFIG_SMP
/*
* XXX we want to get rid of these helpers and use the full load resolution.
*/
@ -1008,7 +1004,7 @@ struct root_domain {
/* These atomics are updated outside of a lock */
atomic_t rto_loop_next;
atomic_t rto_loop_start;
#endif
#endif /* HAVE_RT_PUSH_IPI */
/*
* The "RT overload" flag: it gets set if a CPU has more than
* one runnable RT task.
@ -1043,7 +1039,6 @@ static inline void set_rd_overloaded(struct root_domain *rd, int status)
#ifdef HAVE_RT_PUSH_IPI
extern void rto_push_irq_work_func(struct irq_work *work);
#endif
#endif /* CONFIG_SMP */
#ifdef CONFIG_UCLAMP_TASK
/*
@ -1107,18 +1102,14 @@ struct rq {
unsigned int numa_migrate_on;
#endif
#ifdef CONFIG_NO_HZ_COMMON
#ifdef CONFIG_SMP
unsigned long last_blocked_load_update_tick;
unsigned int has_blocked_load;
call_single_data_t nohz_csd;
#endif /* CONFIG_SMP */
unsigned int nohz_tick_stopped;
atomic_t nohz_flags;
#endif /* CONFIG_NO_HZ_COMMON */
#ifdef CONFIG_SMP
unsigned int ttwu_pending;
#endif
u64 nr_switches;
#ifdef CONFIG_UCLAMP_TASK
@ -1151,10 +1142,15 @@ struct rq {
*/
unsigned long nr_uninterruptible;
#ifdef CONFIG_SCHED_PROXY_EXEC
struct task_struct __rcu *donor; /* Scheduling context */
struct task_struct __rcu *curr; /* Execution context */
#else
union {
struct task_struct __rcu *donor; /* Scheduler context */
struct task_struct __rcu *curr; /* Execution context */
};
#endif
struct sched_dl_entity *dl_server;
struct task_struct *idle;
struct task_struct *stop;
@ -1183,7 +1179,6 @@ struct rq {
int membarrier_state;
#endif
#ifdef CONFIG_SMP
struct root_domain *rd;
struct sched_domain __rcu *sd;
@ -1224,7 +1219,6 @@ struct rq {
#ifdef CONFIG_HOTPLUG_CPU
struct rcuwait hotplug_wait;
#endif
#endif /* CONFIG_SMP */
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
u64 prev_irq_time;
@ -1242,9 +1236,7 @@ struct rq {
long calc_load_active;
#ifdef CONFIG_SCHED_HRTICK
#ifdef CONFIG_SMP
call_single_data_t hrtick_csd;
#endif
struct hrtimer hrtick_timer;
ktime_t hrtick_time;
#endif
@ -1271,9 +1263,7 @@ struct rq {
struct cpuidle_state *idle_state;
#endif
#ifdef CONFIG_SMP
unsigned int nr_pinned;
#endif
unsigned int push_busy;
struct cpu_stop_work push_work;
@ -1294,12 +1284,12 @@ struct rq {
unsigned int core_forceidle_seq;
unsigned int core_forceidle_occupation;
u64 core_forceidle_start;
#endif
#endif /* CONFIG_SCHED_CORE */
/* Scratch cpumask to be temporarily used under rq_lock */
cpumask_var_t scratch_mask;
#if defined(CONFIG_CFS_BANDWIDTH) && defined(CONFIG_SMP)
#ifdef CONFIG_CFS_BANDWIDTH
call_single_data_t cfsb_csd;
struct list_head cfsb_csd_list;
#endif
@ -1313,32 +1303,24 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
return cfs_rq->rq;
}
#else
#else /* !CONFIG_FAIR_GROUP_SCHED: */
static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
{
return container_of(cfs_rq, struct rq, cfs);
}
#endif
#endif /* !CONFIG_FAIR_GROUP_SCHED */
static inline int cpu_of(struct rq *rq)
{
#ifdef CONFIG_SMP
return rq->cpu;
#else
return 0;
#endif
}
#define MDF_PUSH 0x01
static inline bool is_migration_disabled(struct task_struct *p)
{
#ifdef CONFIG_SMP
return p->migration_disabled;
#else
return false;
#endif
}
DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
@ -1349,10 +1331,17 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
#define cpu_curr(cpu) (cpu_rq(cpu)->curr)
#define raw_rq() raw_cpu_ptr(&runqueues)
#ifdef CONFIG_SCHED_PROXY_EXEC
static inline void rq_set_donor(struct rq *rq, struct task_struct *t)
{
rcu_assign_pointer(rq->donor, t);
}
#else
static inline void rq_set_donor(struct rq *rq, struct task_struct *t)
{
/* Do nothing */
}
#endif
#ifdef CONFIG_SCHED_CORE
static inline struct cpumask *sched_group_span(struct sched_group *sg);
@ -1500,6 +1489,7 @@ static inline bool sched_group_cookie_match(struct rq *rq,
}
#endif /* !CONFIG_SCHED_CORE */
#ifdef CONFIG_RT_GROUP_SCHED
# ifdef CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED
DECLARE_STATIC_KEY_FALSE(rt_group_sched);
@ -1507,16 +1497,16 @@ static inline bool rt_group_sched_enabled(void)
{
return static_branch_unlikely(&rt_group_sched);
}
# else
# else /* !CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED: */
DECLARE_STATIC_KEY_TRUE(rt_group_sched);
static inline bool rt_group_sched_enabled(void)
{
return static_branch_likely(&rt_group_sched);
}
# endif /* CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED */
#else
# endif /* !CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED */
#else /* !CONFIG_RT_GROUP_SCHED: */
# define rt_group_sched_enabled() false
#endif /* CONFIG_RT_GROUP_SCHED */
#endif /* !CONFIG_RT_GROUP_SCHED */
static inline void lockdep_assert_rq_held(struct rq *rq)
{
@ -1574,9 +1564,9 @@ static inline void update_idle_core(struct rq *rq)
__update_idle_core(rq);
}
#else
#else /* !CONFIG_SCHED_SMT: */
static inline void update_idle_core(struct rq *rq) { }
#endif
#endif /* !CONFIG_SCHED_SMT */
#ifdef CONFIG_FAIR_GROUP_SCHED
@ -1757,7 +1747,7 @@ static inline void scx_rq_clock_invalidate(struct rq *rq)
WRITE_ONCE(rq->scx.flags, rq->scx.flags & ~SCX_RQ_CLK_VALID);
}
#else /* !CONFIG_SCHED_CLASS_EXT */
#else /* !CONFIG_SCHED_CLASS_EXT: */
#define scx_enabled() false
#define scx_switched_all() false
@ -1781,9 +1771,7 @@ static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
rf->clock_update_flags = 0;
#ifdef CONFIG_SMP
WARN_ON_ONCE(rq->balance_callback && rq->balance_callback != &balance_push_callback);
#endif
}
static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
@ -1961,8 +1949,6 @@ init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
#endif /* !CONFIG_NUMA_BALANCING */
#ifdef CONFIG_SMP
static inline void
queue_balance_callback(struct rq *rq,
struct balance_callback *head,
@ -2128,8 +2114,6 @@ static inline const struct cpumask *task_user_cpus(struct task_struct *p)
return p->user_cpus_ptr;
}
#endif /* CONFIG_SMP */
#ifdef CONFIG_CGROUP_SCHED
/*
@ -2174,7 +2158,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
tg = &root_task_group;
p->rt.rt_rq = tg->rt_rq[cpu];
p->rt.parent = tg->rt_se[cpu];
#endif
#endif /* CONFIG_RT_GROUP_SCHED */
}
#else /* !CONFIG_CGROUP_SCHED: */
@ -2200,7 +2184,7 @@ static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
smp_wmb();
WRITE_ONCE(task_thread_info(p)->cpu, cpu);
p->wake_cpu = cpu;
#endif
#endif /* CONFIG_SMP */
}
/*
@ -2278,13 +2262,17 @@ static inline int task_current_donor(struct rq *rq, struct task_struct *p)
return rq->donor == p;
}
static inline bool task_is_blocked(struct task_struct *p)
{
if (!sched_proxy_exec())
return false;
return !!p->blocked_on;
}
static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
{
#ifdef CONFIG_SMP
return p->on_cpu;
#else
return task_current(rq, p);
#endif
}
static inline int task_on_rq_queued(struct task_struct *p)
@ -2307,11 +2295,9 @@ static inline int task_on_rq_migrating(struct task_struct *p)
#define WF_CURRENT_CPU 0x40 /* Prefer to move the wakee to the current CPU. */
#define WF_RQ_SELECTED 0x80 /* ->select_task_rq() was called */
#ifdef CONFIG_SMP
static_assert(WF_EXEC == SD_BALANCE_EXEC);
static_assert(WF_FORK == SD_BALANCE_FORK);
static_assert(WF_TTWU == SD_BALANCE_WAKE);
#endif
/*
* To aid in avoiding the subversion of "niceness" due to uneven distribution
@ -2367,11 +2353,7 @@ extern const u32 sched_prio_to_wmult[40];
#define ENQUEUE_HEAD 0x10
#define ENQUEUE_REPLENISH 0x20
#ifdef CONFIG_SMP
#define ENQUEUE_MIGRATED 0x40
#else
#define ENQUEUE_MIGRATED 0x00
#endif
#define ENQUEUE_INITIAL 0x80
#define ENQUEUE_MIGRATING 0x100
#define ENQUEUE_DELAYED 0x200
@ -2416,7 +2398,6 @@ struct sched_class {
void (*put_prev_task)(struct rq *rq, struct task_struct *p, struct task_struct *next);
void (*set_next_task)(struct rq *rq, struct task_struct *p, bool first);
#ifdef CONFIG_SMP
int (*select_task_rq)(struct task_struct *p, int task_cpu, int flags);
void (*migrate_task_rq)(struct task_struct *p, int new_cpu);
@ -2429,7 +2410,6 @@ struct sched_class {
void (*rq_offline)(struct rq *rq);
struct rq *(*find_lock_rq)(struct task_struct *p, struct rq *rq);
#endif
void (*task_tick)(struct rq *rq, struct task_struct *p, int queued);
void (*task_fork)(struct task_struct *p);
@ -2487,7 +2467,7 @@ static inline void put_prev_set_next_task(struct rq *rq,
struct task_struct *prev,
struct task_struct *next)
{
WARN_ON_ONCE(rq->curr != prev);
WARN_ON_ONCE(rq->donor != prev);
__put_prev_set_next_dl_server(rq, prev, next);
@ -2581,8 +2561,6 @@ extern struct task_struct *pick_task_idle(struct rq *rq);
#define SCA_MIGRATE_ENABLE 0x04
#define SCA_USER 0x08
#ifdef CONFIG_SMP
extern void update_group_capacity(struct sched_domain *sd, int cpu);
extern void sched_balance_trigger(struct rq *rq);
@ -2634,26 +2612,6 @@ static inline struct task_struct *get_push_task(struct rq *rq)
extern int push_cpu_stop(void *arg);
#else /* !CONFIG_SMP: */
static inline bool task_allowed_on_cpu(struct task_struct *p, int cpu)
{
return true;
}
static inline int __set_cpus_allowed_ptr(struct task_struct *p,
struct affinity_context *ctx)
{
return set_cpus_allowed_ptr(p, ctx->new_mask);
}
static inline cpumask_t *alloc_user_cpus_ptr(int node)
{
return NULL;
}
#endif /* !CONFIG_SMP */
#ifdef CONFIG_CPU_IDLE
static inline void idle_set_state(struct rq *rq,
@ -2749,10 +2707,8 @@ static inline void add_nr_running(struct rq *rq, unsigned count)
call_trace_sched_update_nr_running(rq, count);
}
#ifdef CONFIG_SMP
if (prev_nr < 2 && rq->nr_running >= 2)
set_rd_overloaded(rq->rd, 1);
#endif
sched_update_tick_dependency(rq);
}
@ -2918,10 +2874,7 @@ unsigned long arch_scale_freq_capacity(int cpu)
static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2)
{
rq1->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
/* rq1 == rq2 for !CONFIG_SMP, so just clear RQCF_UPDATED once. */
#ifdef CONFIG_SMP
rq2->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
#endif
}
#define DEFINE_LOCK_GUARD_2(name, type, _lock, _unlock, ...) \
@ -2930,8 +2883,6 @@ static inline class_##name##_t class_##name##_constructor(type *lock, type *lock
{ class_##name##_t _t = { .lock = lock, .lock2 = lock2 }, *_T = &_t; \
_lock; return _t; }
#ifdef CONFIG_SMP
static inline bool rq_order_less(struct rq *rq1, struct rq *rq2)
{
#ifdef CONFIG_SCHED_CORE
@ -2954,7 +2905,7 @@ static inline bool rq_order_less(struct rq *rq1, struct rq *rq2)
/*
* __sched_core_flip() relies on SMT having cpu-id lock order.
*/
#endif
#endif /* CONFIG_SCHED_CORE */
return rq1->cpu < rq2->cpu;
}
@ -3091,42 +3042,6 @@ extern void set_rq_offline(struct rq *rq);
extern bool sched_smp_initialized;
#else /* !CONFIG_SMP: */
/*
* double_rq_lock - safely lock two runqueues
*
* Note this does not disable interrupts like task_rq_lock,
* you need to do so manually before calling.
*/
static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
__acquires(rq1->lock)
__acquires(rq2->lock)
{
WARN_ON_ONCE(!irqs_disabled());
WARN_ON_ONCE(rq1 != rq2);
raw_spin_rq_lock(rq1);
__acquire(rq2->lock); /* Fake it out ;) */
double_rq_clock_clear_update(rq1, rq2);
}
/*
* double_rq_unlock - safely unlock two runqueues
*
* Note this does not restore interrupts like task_rq_unlock,
* you need to do so manually after calling.
*/
static inline void double_rq_unlock(struct rq *rq1, struct rq *rq2)
__releases(rq1->lock)
__releases(rq2->lock)
{
WARN_ON_ONCE(rq1 != rq2);
raw_spin_rq_unlock(rq1);
__release(rq2->lock);
}
#endif /* !CONFIG_SMP */
DEFINE_LOCK_GUARD_2(double_rq_lock, struct rq,
double_rq_lock(_T->lock, _T->lock2),
double_rq_unlock(_T->lock, _T->lock2))
@ -3145,6 +3060,7 @@ extern void print_rt_rq(struct seq_file *m, int cpu, struct rt_rq *rt_rq);
extern void print_dl_rq(struct seq_file *m, int cpu, struct dl_rq *dl_rq);
extern void resched_latency_warn(int cpu, u64 latency);
#ifdef CONFIG_NUMA_BALANCING
extern void show_numa_stats(struct task_struct *p, struct seq_file *m);
extern void
@ -3184,7 +3100,7 @@ extern void nohz_balance_exit_idle(struct rq *rq);
static inline void nohz_balance_exit_idle(struct rq *rq) { }
#endif /* !CONFIG_NO_HZ_COMMON */
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
#ifdef CONFIG_NO_HZ_COMMON
extern void nohz_run_idle_balance(int cpu);
#else
static inline void nohz_run_idle_balance(int cpu) { }
@ -3254,14 +3170,14 @@ static inline u64 irq_time_read(int cpu)
return total;
}
#else
#else /* !CONFIG_IRQ_TIME_ACCOUNTING: */
static inline int irqtime_enabled(void)
{
return 0;
}
#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
#endif /* !CONFIG_IRQ_TIME_ACCOUNTING */
#ifdef CONFIG_CPU_FREQ
@ -3310,8 +3226,6 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) { }
# define arch_scale_freq_invariant() false
#endif
#ifdef CONFIG_SMP
unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
unsigned long *min,
unsigned long *max);
@ -3355,10 +3269,6 @@ static inline unsigned long cpu_util_rt(struct rq *rq)
return READ_ONCE(rq->avg_rt.util_avg);
}
#else /* !CONFIG_SMP */
static inline bool update_other_load_avgs(struct rq *rq) { return false; }
#endif /* CONFIG_SMP */
#ifdef CONFIG_UCLAMP_TASK
unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id);
@ -3535,13 +3445,13 @@ static inline bool sched_energy_enabled(void)
return static_branch_unlikely(&sched_energy_present);
}
#else /* ! (CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL) */
#else /* !(CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL): */
#define perf_domain_span(pd) NULL
static inline bool sched_energy_enabled(void) { return false; }
#endif /* CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
#endif /* !(CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL) */
#ifdef CONFIG_MEMBARRIER
@ -3567,7 +3477,7 @@ static inline void membarrier_switch_mm(struct rq *rq,
WRITE_ONCE(rq->membarrier_state, membarrier_state);
}
#else /* !CONFIG_MEMBARRIER :*/
#else /* !CONFIG_MEMBARRIER: */
static inline void membarrier_switch_mm(struct rq *rq,
struct mm_struct *prev_mm,
@ -3577,7 +3487,6 @@ static inline void membarrier_switch_mm(struct rq *rq,
#endif /* !CONFIG_MEMBARRIER */
#ifdef CONFIG_SMP
static inline bool is_per_cpu_kthread(struct task_struct *p)
{
if (!(p->flags & PF_KTHREAD))
@ -3588,7 +3497,6 @@ static inline bool is_per_cpu_kthread(struct task_struct *p)
return true;
}
#endif
extern void swake_up_all_locked(struct swait_queue_head *q);
extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait);
@ -3887,7 +3795,6 @@ static inline void init_sched_mm_cid(struct task_struct *t) { }
extern u64 avg_vruntime(struct cfs_rq *cfs_rq);
extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se);
#ifdef CONFIG_SMP
static inline
void move_queued_task_locked(struct rq *src_rq, struct rq *dst_rq, struct task_struct *task)
{
@ -3908,7 +3815,6 @@ bool task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
return false;
}
#endif
#ifdef CONFIG_RT_MUTEXES
@ -3949,21 +3855,8 @@ extern void check_class_changed(struct rq *rq, struct task_struct *p,
const struct sched_class *prev_class,
int oldprio);
#ifdef CONFIG_SMP
extern struct balance_callback *splice_balance_callbacks(struct rq *rq);
extern void balance_callbacks(struct rq *rq, struct balance_callback *head);
#else
static inline struct balance_callback *splice_balance_callbacks(struct rq *rq)
{
return NULL;
}
static inline void balance_callbacks(struct rq *rq, struct balance_callback *head)
{
}
#endif
#ifdef CONFIG_SCHED_CLASS_EXT
/*

View File

@ -1,8 +1,13 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _KERNEL_SCHED_SMP_H
#define _KERNEL_SCHED_SMP_H
/*
* Scheduler internal SMP callback types and methods between the scheduler
* and other internal parts of the core kernel:
*/
#include <linux/types.h>
extern void sched_ttwu_pending(void *arg);
@ -13,3 +18,5 @@ extern void flush_smp_call_function_queue(void);
#else
static inline void flush_smp_call_function_queue(void) { }
#endif
#endif /* _KERNEL_SCHED_SMP_H */

View File

@ -2,6 +2,7 @@
/*
* /proc/schedstat implementation
*/
#include "sched.h"
void __update_stats_wait_start(struct rq *rq, struct task_struct *p,
struct sched_statistics *stats)
@ -114,10 +115,8 @@ static int show_schedstat(struct seq_file *seq, void *v)
seq_printf(seq, "timestamp %lu\n", jiffies);
} else {
struct rq *rq;
#ifdef CONFIG_SMP
struct sched_domain *sd;
int dcount = 0;
#endif
cpu = (unsigned long)(v - 2);
rq = cpu_rq(cpu);
@ -132,7 +131,6 @@ static int show_schedstat(struct seq_file *seq, void *v)
seq_printf(seq, "\n");
#ifdef CONFIG_SMP
/* domain-specific stats */
rcu_read_lock();
for_each_domain(cpu, sd) {
@ -163,7 +161,6 @@ static int show_schedstat(struct seq_file *seq, void *v)
sd->ttwu_move_balance);
}
rcu_read_unlock();
#endif
}
return 0;
}

View File

@ -112,10 +112,10 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next,
bool sleep);
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct task_struct *prev);
#else
#else /* !CONFIG_IRQ_TIME_ACCOUNTING: */
static inline void psi_account_irqtime(struct rq *rq, struct task_struct *curr,
struct task_struct *prev) {}
#endif /*CONFIG_IRQ_TIME_ACCOUNTING */
#endif /* !CONFIG_IRQ_TIME_ACCOUNTING */
/*
* PSI tracks state that persists across sleeps, such as iowaits and
* memory stalls. As a result, it has to distinguish between sleeps,
@ -220,7 +220,7 @@ static inline void psi_sched_switch(struct task_struct *prev,
psi_task_switch(prev, next, sleep);
}
#else /* CONFIG_PSI */
#else /* !CONFIG_PSI: */
static inline void psi_enqueue(struct task_struct *p, bool migrate) {}
static inline void psi_dequeue(struct task_struct *p, bool migrate) {}
static inline void psi_ttwu_dequeue(struct task_struct *p) {}
@ -229,7 +229,7 @@ static inline void psi_sched_switch(struct task_struct *prev,
bool sleep) {}
static inline void psi_account_irqtime(struct rq *rq, struct task_struct *curr,
struct task_struct *prev) {}
#endif /* CONFIG_PSI */
#endif /* !CONFIG_PSI */
#ifdef CONFIG_SCHED_INFO
/*
@ -334,6 +334,6 @@ sched_info_switch(struct rq *rq, struct task_struct *prev, struct task_struct *n
# define sched_info_enqueue(rq, t) do { } while (0)
# define sched_info_dequeue(rq, t) do { } while (0)
# define sched_info_switch(rq, t, next) do { } while (0)
#endif /* CONFIG_SCHED_INFO */
#endif /* !CONFIG_SCHED_INFO */
#endif /* _KERNEL_STATS_H */

View File

@ -7,8 +7,8 @@
*
* See kernel/stop_machine.c
*/
#include "sched.h"
#ifdef CONFIG_SMP
static int
select_task_rq_stop(struct task_struct *p, int cpu, int flags)
{
@ -20,7 +20,6 @@ balance_stop(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
return sched_stop_runnable(rq);
}
#endif /* CONFIG_SMP */
static void
wakeup_preempt_stop(struct rq *rq, struct task_struct *p, int flags)
@ -106,11 +105,9 @@ DEFINE_SCHED_CLASS(stop) = {
.put_prev_task = put_prev_task_stop,
.set_next_task = set_next_task_stop,
#ifdef CONFIG_SMP
.balance = balance_stop,
.select_task_rq = select_task_rq_stop,
.set_cpus_allowed = set_cpus_allowed_common,
#endif
.task_tick = task_tick_stop,

View File

@ -2,6 +2,7 @@
/*
* <linux/swait.h> (simple wait queues ) implementation:
*/
#include "sched.h"
void __init_swait_queue_head(struct swait_queue_head *q, const char *name,
struct lock_class_key *key)

View File

@ -174,7 +174,7 @@ SYSCALL_DEFINE1(nice, int, increment)
return 0;
}
#endif
#endif /* __ARCH_WANT_SYS_NICE */
/**
* task_prio - return the priority value of a given task.
@ -209,10 +209,8 @@ int idle_cpu(int cpu)
if (rq->nr_running)
return 0;
#ifdef CONFIG_SMP
if (rq->ttwu_pending)
return 0;
#endif
return 1;
}
@ -255,8 +253,7 @@ int sched_core_idle_cpu(int cpu)
return idle_cpu(cpu);
}
#endif
#endif /* CONFIG_SCHED_CORE */
/**
* find_process_by_pid - find a process with a matching PID value.
@ -448,7 +445,7 @@ static inline int uclamp_validate(struct task_struct *p,
}
static void __setscheduler_uclamp(struct task_struct *p,
const struct sched_attr *attr) { }
#endif
#endif /* !CONFIG_UCLAMP_TASK */
/*
* Allow unprivileged RT tasks to decrease priority.
@ -642,7 +639,6 @@ change:
goto unlock;
}
#endif /* CONFIG_RT_GROUP_SCHED */
#ifdef CONFIG_SMP
if (dl_bandwidth_enabled() && dl_policy(policy) &&
!(attr->sched_flags & SCHED_FLAG_SUGOV)) {
cpumask_t *span = rq->rd->span;
@ -658,7 +654,6 @@ change:
goto unlock;
}
}
#endif
}
/* Re-check policy now with rq lock held: */
@ -1120,7 +1115,6 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
return copy_struct_to_user(uattr, usize, &kattr, sizeof(kattr), NULL);
}
#ifdef CONFIG_SMP
int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask)
{
/*
@ -1149,7 +1143,6 @@ int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask)
return 0;
}
#endif /* CONFIG_SMP */
int __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
{
@ -1242,7 +1235,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
if (user_mask) {
cpumask_copy(user_mask, in_mask);
} else if (IS_ENABLED(CONFIG_SMP)) {
} else {
return -ENOMEM;
}

View File

@ -3,7 +3,9 @@
* Scheduler topology setup/handling methods
*/
#include <linux/sched/isolation.h>
#include <linux/bsearch.h>
#include "sched.h"
DEFINE_MUTEX(sched_domains_mutex);
void sched_domains_mutex_lock(void)
@ -87,7 +89,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
break;
}
if (!(sd->flags & SD_OVERLAP) &&
if (!(sd->flags & SD_NUMA) &&
cpumask_intersects(groupmask, sched_group_span(group))) {
printk(KERN_CONT "\n");
printk(KERN_ERR "ERROR: repeated CPUs\n");
@ -100,7 +102,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
group->sgc->id,
cpumask_pr_args(sched_group_span(group)));
if ((sd->flags & SD_OVERLAP) &&
if ((sd->flags & SD_NUMA) &&
!cpumask_equal(group_balance_mask(group), sched_group_span(group))) {
printk(KERN_CONT " mask=%*pbl",
cpumask_pr_args(group_balance_mask(group)));
@ -313,7 +315,7 @@ static int __init sched_energy_aware_sysctl_init(void)
}
late_initcall(sched_energy_aware_sysctl_init);
#endif
#endif /* CONFIG_PROC_SYSCTL */
static void free_pd(struct perf_domain *pd)
{
@ -449,9 +451,9 @@ free:
return false;
}
#else
#else /* !(CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL): */
static void free_pd(struct perf_domain *pd) { }
#endif /* CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL*/
#endif /* !(CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL) */
static void free_rootdomain(struct rcu_head *rcu)
{
@ -1318,8 +1320,6 @@ next:
update_group_capacity(sd, cpu);
}
#ifdef CONFIG_SMP
/* Update the "asym_prefer_cpu" when arch_asym_cpu_priority() changes. */
void sched_update_asym_prefer_cpu(int cpu, int old_prio, int new_prio)
{
@ -1344,7 +1344,7 @@ void sched_update_asym_prefer_cpu(int cpu, int old_prio, int new_prio)
* "sg->asym_prefer_cpu" to "sg->sgc->asym_prefer_cpu"
* which is shared by all the overlapping groups.
*/
WARN_ON_ONCE(sd->flags & SD_OVERLAP);
WARN_ON_ONCE(sd->flags & SD_NUMA);
sg = sd->groups;
if (cpu != sg->asym_prefer_cpu) {
@ -1374,8 +1374,6 @@ void sched_update_asym_prefer_cpu(int cpu, int old_prio, int new_prio)
}
}
#endif /* CONFIG_SMP */
/*
* Set of available CPUs grouped by their corresponding capacities
* Each list entry contains a CPU mask reflecting CPUs that share the same
@ -1598,7 +1596,7 @@ static int sched_domains_curr_level;
int sched_max_numa_distance;
static int *sched_domains_numa_distance;
static struct cpumask ***sched_domains_numa_masks;
#endif
#endif /* CONFIG_NUMA */
/*
* SD_flags allowed in topology descriptions.
@ -1714,7 +1712,7 @@ sd_init(struct sched_domain_topology_level *tl,
SD_WAKE_AFFINE);
}
#endif
#endif /* CONFIG_NUMA */
} else {
sd->cache_nice_tries = 1;
}
@ -1739,17 +1737,17 @@ sd_init(struct sched_domain_topology_level *tl,
*/
static struct sched_domain_topology_level default_topology[] = {
#ifdef CONFIG_SCHED_SMT
{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
SDTL_INIT(cpu_smt_mask, cpu_smt_flags, SMT),
#endif
#ifdef CONFIG_SCHED_CLUSTER
{ cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
SDTL_INIT(cpu_clustergroup_mask, cpu_cluster_flags, CLS),
#endif
#ifdef CONFIG_SCHED_MC
{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
SDTL_INIT(cpu_coregroup_mask, cpu_core_flags, MC),
#endif
{ cpu_cpu_mask, SD_INIT_NAME(PKG) },
SDTL_INIT(cpu_cpu_mask, NULL, PKG),
{ NULL, },
};
@ -2010,23 +2008,14 @@ void sched_init_numa(int offline_node)
/*
* Add the NUMA identity distance, aka single NODE.
*/
tl[i++] = (struct sched_domain_topology_level){
.mask = sd_numa_mask,
.numa_level = 0,
SD_INIT_NAME(NODE)
};
tl[i++] = SDTL_INIT(sd_numa_mask, NULL, NODE);
/*
* .. and append 'j' levels of NUMA goodness.
*/
for (j = 1; j < nr_levels; i++, j++) {
tl[i] = (struct sched_domain_topology_level){
.mask = sd_numa_mask,
.sd_flags = cpu_numa_flags,
.flags = SDTL_OVERLAP,
.numa_level = j,
SD_INIT_NAME(NUMA)
};
tl[i] = SDTL_INIT(sd_numa_mask, cpu_numa_flags, NUMA);
tl[i].numa_level = j;
}
sched_domain_topology_saved = sched_domain_topology;
@ -2337,7 +2326,7 @@ static void __sdt_free(const struct cpumask *cpu_map)
if (sdd->sd) {
sd = *per_cpu_ptr(sdd->sd, j);
if (sd && (sd->flags & SD_OVERLAP))
if (sd && (sd->flags & SD_NUMA))
free_sched_groups(sd->groups, 0);
kfree(*per_cpu_ptr(sdd->sd, j));
}
@ -2403,9 +2392,13 @@ static bool topology_span_sane(const struct cpumask *cpu_map)
id_seen = sched_domains_tmpmask2;
for_each_sd_topology(tl) {
int tl_common_flags = 0;
if (tl->sd_flags)
tl_common_flags = (*tl->sd_flags)();
/* NUMA levels are allowed to overlap */
if (tl->flags & SDTL_OVERLAP)
if (tl_common_flags & SD_NUMA)
continue;
cpumask_clear(covered);
@ -2476,8 +2469,6 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
if (tl == sched_domain_topology)
*per_cpu_ptr(d.sd, i) = sd;
if (tl->flags & SDTL_OVERLAP)
sd->flags |= SD_OVERLAP;
if (cpumask_equal(cpu_map, sched_domain_span(sd)))
break;
}
@ -2490,7 +2481,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
for_each_cpu(i, cpu_map) {
for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
sd->span_weight = cpumask_weight(sched_domain_span(sd));
if (sd->flags & SD_OVERLAP) {
if (sd->flags & SD_NUMA) {
if (build_overlap_sched_groups(sd, i))
goto error;
} else {

View File

@ -4,6 +4,7 @@
*
* (C) 2004 Nadia Yvette Chambers, Oracle
*/
#include "sched.h"
void __init_waitqueue_head(struct wait_queue_head *wq_head, const char *name, struct lock_class_key *key)
{

View File

@ -1,5 +1,8 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/sched/debug.h>
#include "sched.h"
/*
* The implementation of the wait_bit*() and related waiting APIs:
*/

View File

@ -18,8 +18,6 @@
#include "smpboot.h"
#ifdef CONFIG_SMP
#ifdef CONFIG_GENERIC_SMP_IDLE_THREAD
/*
* For the hotplug case we keep the task structs around and reuse
@ -76,8 +74,6 @@ void __init idle_threads_init(void)
}
#endif
#endif /* #ifdef CONFIG_SMP */
static LIST_HEAD(hotplug_threads);
static DEFINE_MUTEX(smpboot_threads_lock);

View File

@ -22,10 +22,8 @@ unsigned int check_preemption_disabled(const char *what1, const char *what2)
if (is_percpu_thread())
goto out;
#ifdef CONFIG_SMP
if (current->migration_disabled)
goto out;
#endif
/*
* It is valid to assume CPU-locality during early bootup:

View File

@ -1,7 +1,13 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/kernel.h>
#include <linux/sched/task.h>
void rust_helper_might_resched(void)
{
might_resched();
}
struct task_struct *rust_helper_get_current(void)
{
return current;

View File

@ -43,6 +43,10 @@
#![cfg_attr(not(CONFIG_RUSTC_HAS_COERCE_POINTEE), feature(coerce_unsized))]
#![cfg_attr(not(CONFIG_RUSTC_HAS_COERCE_POINTEE), feature(dispatch_from_dyn))]
#![cfg_attr(not(CONFIG_RUSTC_HAS_COERCE_POINTEE), feature(unsize))]
//
// `feature(file_with_nul)` is expected to become stable. Before Rust 1.89.0, it did not exist, so
// enable it conditionally.
#![cfg_attr(CONFIG_RUSTC_HAS_FILE_WITH_NUL, feature(file_with_nul))]
// Ensure conditional compilation based on the kernel configuration works;
// otherwise we may silently break things like initcall handling.
@ -279,3 +283,47 @@ macro_rules! asm {
::core::arch::asm!( $($asm)*, $($rest)* )
};
}
/// Gets the C string file name of a [`Location`].
///
/// If `file_with_nul()` is not available, returns a string that warns about it.
///
/// [`Location`]: core::panic::Location
///
/// # Examples
///
/// ```
/// # use kernel::file_from_location;
///
/// #[track_caller]
/// fn foo() {
/// let caller = core::panic::Location::caller();
///
/// // Output:
/// // - A path like "rust/kernel/example.rs" if file_with_nul() is available.
/// // - "<Location::file_with_nul() not supported>" otherwise.
/// let caller_file = file_from_location(caller);
///
/// // Prints out the message with caller's file name.
/// pr_info!("foo() called in file {caller_file:?}\n");
///
/// # if cfg!(CONFIG_RUSTC_HAS_FILE_WITH_NUL) {
/// # assert_eq!(Ok(caller.file()), caller_file.to_str());
/// # }
/// }
///
/// # foo();
/// ```
#[inline]
pub fn file_from_location<'a>(loc: &'a core::panic::Location<'a>) -> &'a core::ffi::CStr {
#[cfg(CONFIG_RUSTC_HAS_FILE_WITH_NUL)]
{
loc.file_with_nul()
}
#[cfg(not(CONFIG_RUSTC_HAS_FILE_WITH_NUL))]
{
let _ = loc;
c"<Location::file_with_nul() not supported>"
}
}

View File

@ -216,6 +216,7 @@ impl CondVar {
/// This method behaves like `notify_one`, except that it hints to the scheduler that the
/// current thread is about to go to sleep, so it should schedule the target thread on the same
/// CPU.
#[inline]
pub fn notify_sync(&self) {
// SAFETY: `wait_queue_head` points to valid memory.
unsafe { bindings::__wake_up_sync(self.wait_queue_head.get(), TASK_NORMAL) };
@ -225,6 +226,7 @@ impl CondVar {
///
/// This is not 'sticky' in the sense that if no thread is waiting, the notification is lost
/// completely (as opposed to automatically waking up the next waiter).
#[inline]
pub fn notify_one(&self) {
self.notify(1);
}
@ -233,6 +235,7 @@ impl CondVar {
///
/// This is not 'sticky' in the sense that if no thread is waiting, the notification is lost
/// completely (as opposed to automatically waking up the next waiter).
#[inline]
pub fn notify_all(&self) {
self.notify(0);
}

View File

@ -91,6 +91,7 @@ impl Deref for PollCondVar {
#[pinned_drop]
impl PinnedDrop for PollCondVar {
#[inline]
fn drop(self: Pin<&mut Self>) {
// Clear anything registered using `register_wait`.
//

View File

@ -173,6 +173,7 @@ impl Task {
/// Callers must ensure that the returned object is only used to access a [`CurrentTask`]
/// within the task context that was active when this function was called. For more details,
/// see the invariants section for [`CurrentTask`].
#[inline]
pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
struct TaskRef {
task: *const CurrentTask,
@ -222,24 +223,28 @@ impl Task {
}
/// Returns the UID of the given task.
#[inline]
pub fn uid(&self) -> Kuid {
// SAFETY: It's always safe to call `task_uid` on a valid task.
Kuid::from_raw(unsafe { bindings::task_uid(self.as_ptr()) })
}
/// Returns the effective UID of the given task.
#[inline]
pub fn euid(&self) -> Kuid {
// SAFETY: It's always safe to call `task_euid` on a valid task.
Kuid::from_raw(unsafe { bindings::task_euid(self.as_ptr()) })
}
/// Determines whether the given task has pending signals.
#[inline]
pub fn signal_pending(&self) -> bool {
// SAFETY: It's always safe to call `signal_pending` on a valid task.
unsafe { bindings::signal_pending(self.as_ptr()) != 0 }
}
/// Returns task's pid namespace with elevated reference count
#[inline]
pub fn get_pid_ns(&self) -> Option<ARef<PidNamespace>> {
// SAFETY: By the type invariant, we know that `self.0` is valid.
let ptr = unsafe { bindings::task_get_pid_ns(self.as_ptr()) };
@ -255,6 +260,7 @@ impl Task {
/// Returns the given task's pid in the provided pid namespace.
#[doc(alias = "task_tgid_nr_ns")]
#[inline]
pub fn tgid_nr_ns(&self, pidns: Option<&PidNamespace>) -> Pid {
let pidns = match pidns {
Some(pidns) => pidns.as_ptr(),
@ -268,6 +274,7 @@ impl Task {
}
/// Wakes up the task.
#[inline]
pub fn wake_up(&self) {
// SAFETY: It's always safe to call `wake_up_process` on a valid task, even if the task
// running.
@ -341,11 +348,13 @@ impl CurrentTask {
// SAFETY: The type invariants guarantee that `Task` is always refcounted.
unsafe impl crate::types::AlwaysRefCounted for Task {
#[inline]
fn inc_ref(&self) {
// SAFETY: The existence of a shared reference means that the refcount is nonzero.
unsafe { bindings::get_task_struct(self.as_ptr()) };
}
#[inline]
unsafe fn dec_ref(obj: ptr::NonNull<Self>) {
// SAFETY: The safety requirements guarantee that the refcount is nonzero.
unsafe { bindings::put_task_struct(obj.cast().as_ptr()) }
@ -391,3 +400,27 @@ impl PartialEq for Kuid {
}
impl Eq for Kuid {}
/// Annotation for functions that can sleep.
///
/// Equivalent to the C side [`might_sleep()`], this function serves as
/// a debugging aid and a potential scheduling point.
///
/// This function can only be used in a nonatomic context.
///
/// [`might_sleep()`]: https://docs.kernel.org/driver-api/basics.html#c.might_sleep
#[track_caller]
#[inline]
pub fn might_sleep() {
#[cfg(CONFIG_DEBUG_ATOMIC_SLEEP)]
{
let loc = core::panic::Location::caller();
let file = kernel::file_from_location(loc);
// SAFETY: `file.as_ptr()` is valid for reading and guaranteed to be nul-terminated.
unsafe { crate::bindings::__might_sleep(file.as_ptr().cast(), loc.line() as i32) }
}
// SAFETY: Always safe to call.
unsafe { crate::bindings::might_resched() }
}

57
tools/sched/dl_bw_dump.py Normal file
View File

@ -0,0 +1,57 @@
#!/usr/bin/env drgn
# SPDX-License-Identifier: GPL-2.0
# Copyright (C) 2025 Juri Lelli <juri.lelli@redhat.com>
# Copyright (C) 2025 Red Hat, Inc.
desc = """
This is a drgn script to show dl_rq bandwidth accounting information. For more
info on drgn, visit https://github.com/osandov/drgn.
Only online CPUs are reported.
"""
import os
import argparse
import drgn
from drgn import FaultError
from drgn.helpers.common import *
from drgn.helpers.linux import *
def print_dl_bws_info():
print("Retrieving dl_rq bandwidth accounting information:")
runqueues = prog['runqueues']
for cpu_id in for_each_possible_cpu(prog):
try:
rq = per_cpu(runqueues, cpu_id)
if rq.online == 0:
continue
dl_rq = rq.dl
print(f" From CPU: {cpu_id}")
# Access and print relevant fields from struct dl_rq
print(f" running_bw : {dl_rq.running_bw}")
print(f" this_bw : {dl_rq.this_bw}")
print(f" extra_bw : {dl_rq.extra_bw}")
print(f" max_bw : {dl_rq.max_bw}")
print(f" bw_ratio : {dl_rq.bw_ratio}")
except drgn.FaultError as fe:
print(f" (CPU {cpu_id}: Fault accessing kernel memory: {fe})")
except AttributeError as ae:
print(f" (CPU {cpu_id}: Missing attribute for root_domain (kernel struct change?): {ae})")
except Exception as e:
print(f" (CPU {cpu_id}: An unexpected error occurred: {e})")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description=desc,
formatter_class=argparse.RawTextHelpFormatter)
args = parser.parse_args()
print_dl_bws_info()

View File

@ -0,0 +1,68 @@
#!/usr/bin/env drgn
# SPDX-License-Identifier: GPL-2.0
# Copyright (C) 2025 Juri Lelli <juri.lelli@redhat.com>
# Copyright (C) 2025 Red Hat, Inc.
desc = """
This is a drgn script to show the current root domains configuration. For more
info on drgn, visit https://github.com/osandov/drgn.
Root domains are only printed once, as multiple CPUs might be attached to the
same root domain.
"""
import os
import argparse
import drgn
from drgn import FaultError
from drgn.helpers.common import *
from drgn.helpers.linux import *
def print_root_domains_info():
# To store unique root domains found
seen_root_domains = set()
print("Retrieving (unique) Root Domain Information:")
runqueues = prog['runqueues']
def_root_domain = prog['def_root_domain']
for cpu_id in for_each_possible_cpu(prog):
try:
rq = per_cpu(runqueues, cpu_id)
root_domain = rq.rd
# Check if we've already processed this root domain to avoid duplicates
# Use the memory address of the root_domain as a unique identifier
root_domain_cast = int(root_domain)
if root_domain_cast in seen_root_domains:
continue
seen_root_domains.add(root_domain_cast)
if root_domain_cast == int(def_root_domain.address_):
print(f"\n--- Root Domain @ def_root_domain ---")
else:
print(f"\n--- Root Domain @ 0x{root_domain_cast:x} ---")
print(f" From CPU: {cpu_id}") # This CPU belongs to this root domain
# Access and print relevant fields from struct root_domain
print(f" Span : {cpumask_to_cpulist(root_domain.span[0])}")
print(f" Online : {cpumask_to_cpulist(root_domain.span[0])}")
except drgn.FaultError as fe:
print(f" (CPU {cpu_id}: Fault accessing kernel memory: {fe})")
except AttributeError as ae:
print(f" (CPU {cpu_id}: Missing attribute for root_domain (kernel struct change?): {ae})")
except Exception as e:
print(f" (CPU {cpu_id}: An unexpected error occurred: {e})")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description=desc,
formatter_class=argparse.RawTextHelpFormatter)
args = parser.parse_args()
print_root_domains_info()