Commit Graph

267 Commits

Author SHA1 Message Date
Lucas Zampieri f6029bf351 Merge: workqueue: Backport workqueue commits to v6.9
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3910

JIRA: https://issues.redhat.com/browse/RHEL-25103    
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3910    
Depends: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3847    

The primary purpose of this MR is to backport those upstream workqueue
commits which enables ordered workqueues and rescuers to follow
changes in workqueue unbound cpumask which is necessary to make sure
that isolated CPUs won't be disturbed due to unbound work items being
handled by those CPUs.

These upstream commits were merged into the v6.9 kernel which also
contains some major changes in workqueue code. This makes the required
commits dependent on some of the v6.9 workqueue commits. It is less risky
to sync the workqueue code up to v6.9 instead of selective backports
of some dependent commits. This MR also includes some miscellaneous
commits in other subsystems due to changes in the underlying workqueue
implementations.

A follow-up proactive workqueue fixes MR will be created later on,
if necessary.

Signed-off-by: Waiman Long <longman@redhat.com>

Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: Vladis Dronov <vdronov@redhat.com>
Approved-by: Prarit Bhargava <prarit@redhat.com>
Approved-by: Wander Lairson Costa <wander@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>
Approved-by: Radu Rendec <rrendec@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-06-13 13:07:43 +00:00
Waiman Long efd17b3bb7 workqueue: Drain BH work items on hot-unplugged CPUs
JIRA: https://issues.redhat.com/browse/RHEL-25103

commit 1acd92d95fa24edca8f0292b21870025da93e24f
Author: Tejun Heo <tj@kernel.org>
Date:   Mon, 26 Feb 2024 15:38:55 -1000

    workqueue: Drain BH work items on hot-unplugged CPUs

    Boqun pointed out that workqueues aren't handling BH work items on offlined
    CPUs. Unlike tasklet which transfers out the pending tasks from
    CPUHP_SOFTIRQ_DEAD, BH workqueue would just leave them pending which is
    problematic. Note that this behavior is specific to BH workqueues as the
    non-BH per-CPU workers just become unbound when the CPU goes offline.

    This patch fixes the issue by draining the pending BH work items from an
    offlined CPU from CPUHP_SOFTIRQ_DEAD. Because work items carry more context,
    it's not as easy to transfer the pending work items from one pool to
    another. Instead, run BH work items which execute the offlined pools on an
    online CPU.

    Note that this assumes that no further BH work items will be queued on the
    offlined CPUs. This assumption is shared with tasklet and should be fine for
    conversions. However, this issue also exists for per-CPU workqueues which
    will just keep executing work items queued after CPU offline on unbound
    workers and workqueue should reject per-CPU and BH work items queued on
    offline CPUs. This will be addressed separately later.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Reported-and-reviewed-by: Boqun Feng <boqun.feng@gmail.com>
    Link: http://lkml.kernel.org/r/Zdvw0HdSXcU3JZ4g@boqun-archlinux

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-03 13:39:33 -04:00
Waiman Long da3eaa2838 workqueue: Implement BH workqueues to eventually replace tasklets
JIRA: https://issues.redhat.com/browse/RHEL-25103
Conflicts: A minor context diff in kernel/workqueue.c due to missing
	   upstream commit 68279f9c9f59 ("treewide: mark stuff as
	   __ro_after_init").

commit 4cb1ef64609f9b0254184b2947824f4b46ccab22
Author: Tejun Heo <tj@kernel.org>
Date:   Sun, 4 Feb 2024 11:28:06 -1000

    workqueue: Implement BH workqueues to eventually replace tasklets

    The only generic interface to execute asynchronously in the BH context is
    tasklet; however, it's marked deprecated and has some design flaws such as
    the execution code accessing the tasklet item after the execution is
    complete which can lead to subtle use-after-free in certain usage scenarios
    and less-developed flush and cancel mechanisms.

    This patch implements BH workqueues which share the same semantics and
    features of regular workqueues but execute their work items in the softirq
    context. As there is always only one BH execution context per CPU, none of
    the concurrency management mechanisms applies and a BH workqueue can be
    thought of as a convenience wrapper around softirq.

    Except for the inability to sleep while executing and lack of max_active
    adjustments, BH workqueues and work items should behave the same as regular
    workqueues and work items.

    Currently, the execution is hooked to tasklet[_hi]. However, the goal is to
    convert all tasklet users over to BH workqueues. Once the conversion is
    complete, tasklet can be removed and BH workqueues can directly take over
    the tasklet softirqs.

    system_bh[_highpri]_wq are added. As queue-wide flushing doesn't exist in
    tasklet, all existing tasklet users should be able to use the system BH
    workqueues without creating their own workqueues.

    v3: - Add missing interrupt.h include.

    v2: - Instead of using tasklets, hook directly into its softirq action
          functions - tasklet[_hi]_action(). This is slightly cheaper and closer
          to the eventual code structure we want to arrive at. Suggested by Lai.

        - Lai also pointed out several places which need NULL worker->task
          handling or can use clarification. Updated.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
    Link: http://lkml.kernel.org/r/CAHk-=wjDW53w4-YcSmgKC5RruiRLHmJ1sXeYdp_ZgVoBw=5byA@mail.gmail.com
    Tested-by: Allen Pais <allen.lkml@gmail.com>
    Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-05-03 13:39:31 -04:00
Phil Auld 5664854fb4 sched/core: introduce sched_core_idle_cpu()
JIRA: https://issues.redhat.com/browse/RHEL-25535

commit 548796e2e70b44b4661fd7feee6eb239245ff1f8
Author: Cruz Zhao <CruzZhao@linux.alibaba.com>
Date:   Thu Jun 29 12:02:04 2023 +0800

    sched/core: introduce sched_core_idle_cpu()

    As core scheduling introduced, a new state of idle is defined as
    force idle, running idle task but nr_running greater than zero.

    If a cpu is in force idle state, idle_cpu() will return zero. This
    result makes sense in some scenarios, e.g., load balance,
    showacpu when dumping, and judge the RCU boost kthread is starving.

    But this will cause error in other scenarios, e.g., tick_irq_exit():
    When force idle, rq->curr == rq->idle but rq->nr_running > 0, results
    that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu()
    is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will
    not become 1, which became 0 in tick_nohz_irq_enter().
    ts->idle_sleeptime won't update in function update_ts_time_stats(), if
    ts->idle_active is 0, which should be 1. And this bug will result that
    ts->idle_sleeptime is less than the actual value, and finally will
    result that the idle time in /proc/stat is less than the actual value.

    To solve this problem, we introduce sched_core_idle_cpu(), which
    returns 1 when force idle. We audit all users of idle_cpu(), and
    change idle_cpu() into sched_core_idle_cpu() in function
    tick_irq_exit().

    v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in
    function tick_irq_exit(). And modify the corresponding commit log.

    Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Peter Zijlstra <peterz@infradead.org>
    Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
    Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
    Link: https://lore.kernel.org/r/1688011324-42406-1-git-send-email-CruzZhao@linux.alibaba.com

Signed-off-by: Phil Auld <pauld@redhat.com>
2024-04-05 09:49:13 -04:00
Phil Auld 7d0cddf221 genirq, softirq: Use in_hardirq() instead of in_irq()
JIRA: https://issues.redhat.com/browse/RHEL-25535
Conflicts: Dropped one hunk because it was removed by a later
commit which is already in rhel 792ea6a074ae ("genirq: Remove
WARN_ON_ONCE() in generic_handle_domain_irq()").

commit fe13889c390e14205e064d7e159e61eb5da4b1c3
Author: Changbin Du <changbin.du@intel.com>
Date:   Fri Jan 28 19:07:27 2022 +0800

    genirq, softirq: Use in_hardirq() instead of in_irq()

    Replace the obsolete and ambiguos macro in_irq() with the new macro
    in_hardirq().

    Signed-off-by: Changbin Du <changbin.du@gmail.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20220128110727.5110-1-changbin.du@gmail.com

Signed-off-by: Phil Auld <pauld@redhat.com>
2024-04-05 09:49:13 -04:00
Oleg Nesterov 7ca1a24ee7 Revert "softirq: Let ksoftirqd do its job"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2196764
Upstream-status: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

commit d15121be7485655129101f3960ae6add40204463
Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon May 8 08:17:44 2023 +0200

    Revert "softirq: Let ksoftirqd do its job"

    This reverts the following commits:

      4cd13c21b2 ("softirq: Let ksoftirqd do its job")
      3c53776e29 ("Mark HI and TASKLET softirq synchronous")
      1342d8080f ("softirq: Don't skip softirq execution when softirq thread is parking")

    in a single change to avoid known bad intermediate states introduced by a
    patch series reverting them individually.

    Due to the mentioned commit, when the ksoftirqd threads take charge of
    softirq processing, the system can experience high latencies.

    In the past a few workarounds have been implemented for specific
    side-effects of the initial ksoftirqd enforcement commit:

    commit 1ff688209e ("watchdog: core: make sure the watchdog_worker is not deferred")
    commit 8d5755b3f7 ("watchdog: softdog: fire watchdog even if softirqs do not get to run")
    commit 217f697436 ("net: busy-poll: allow preemption in sk_busy_loop()")
    commit 3c53776e29 ("Mark HI and TASKLET softirq synchronous")

    But the latency problem still exists in real-life workloads, see the link
    below.

    The reverted commit intended to solve a live-lock scenario that can now be
    addressed with the NAPI threaded mode, introduced with commit 29863d41bb
    ("net: implement threaded-able napi poll loop support"), which is nowadays
    in a pretty stable status.

    While a complete solution to put softirq processing under nice resource
    control would be preferable, that has proven to be a very hard task. In
    the short term, remove the main pain point, and also simplify a bit the
    current softirq implementation.

    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Jason Xing <kerneljasonxing@gmail.com>
    Reviewed-by: Jakub Kicinski <kuba@kernel.org>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: netdev@vger.kernel.org
    Link: https://lore.kernel.org/netdev/305d7742212cbe98621b16be782b0562f1012cb6.camel@redhat.com
    Link: https://lore.kernel.org/r/57e66b364f1b6f09c9bc0316742c3b14f4ce83bd.1683526542.git.pabeni@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2023-05-24 12:07:54 +02:00
Eder Zulian 921a5c7f51 softirq: Wake ktimers thread also in softirq.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2187979
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git

commit 40efd84fb5de11104d690043dd7247edd9c5f632
Author: Junxiao Chang <junxiao.chang@intel.com>
Date:   Mon Feb 20 09:12:20 2023 +0100

    softirq: Wake ktimers thread also in softirq.

    If the hrtimer is raised while a softirq is processed then it does not
    wake the corresponding ktimers thread. This is due to the optimisation in the
    irq-exit path which is also used to wake the ktimers thread. For the other
    softirqs, this is okay because the additional softirq bits will be handled by
    the currently running softirq handler.
    The timer related softirq bits are added to a different variable and rely on
    the ktimers thread.
    As a consuequence the wake up of ktimersd is delayed until the next timer tick.

    Always wake the ktimers thread if a timer related softirq is pending.

    Reported-by: Peh, Hock Zhang <hock.zhang.peh@intel.com>
    Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Eder Zulian <ezulian@redhat.com>
2023-05-16 20:09:33 +02:00
Waiman Long 4cabb4dcd7 context_tracking: Take IRQ eqs entrypoints over RCU
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2169516
Conflicts: The drivers/cpuidle/cpuidle-riscv-sbi.c hunk is dropped as
	   RISC-V is not a supported arch and the file is not currently
	   present in Centos-Stream-9.

commit 6f0e6c1598b1a3d19fc30db86b6e26d6f881b43d
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Wed, 8 Jun 2022 16:40:26 +0200

    context_tracking: Take IRQ eqs entrypoints over RCU

    The RCU dynticks counter is going to be merged into the context tracking
    subsystem. Prepare with moving the IRQ extended quiescent states
    entrypoints to context tracking. For now those are dumb redirection to
    existing RCU calls.

    [ paulmck: Apply Stephen Rothwell feedback from -next. ]
    [ paulmck: Apply Nathan Chancellor feedback. ]

    Acked-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
    Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
    Cc: Yu Liao <liaoyu15@huawei.com>
    Cc: Phil Auld <pauld@redhat.com>
    Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
    Cc: Alex Belits <abelits@marvell.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
    Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
    Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-03-30 08:36:16 -04:00
Juri Lelli 5a2f14b035 tick: Fix timer storm since introduction of timersd
Bugzilla: https://bugzilla.redhat.com/2171995
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git

commit 4891aae98ee80dd177d32088bb2cda4a8445a2ba
Author:    Frederic Weisbecker <frederic@kernel.org>
Date:      Tue Apr 5 03:07:52 2022 +0200

    tick: Fix timer storm since introduction of timersd

    If timers are pending while the tick is reprogrammed on nohz_mode, the
    next expiry is not armed to fire now, it is delayed one jiffy forward
    instead so as not to raise an inextinguishable timer storm with such
    scenario:

    1) IRQ triggers and queue a timer
    2) ksoftirqd() is woken up
    3) IRQ tail: timer is reprogrammed to fire now
    4) IRQ exit
    5) TIMER interrupt
    6) goto 3)

    ...all that until we finally reach ksoftirqd.

    Unfortunately we are checking the wrong softirq vector bitmask since
    timersd kthread has split from ksoftirqd. Timers now have their own
    vector state field that must be checked separately. As a result, the
    old timer storm is back. This shows up early on boot with extremely long
    initcalls:

    	[  333.004807] initcall dquot_init+0x0/0x111 returned 0 after 323822879 usecs

    and the cause is uncovered with the right trace events showing just
    10 microseconds between ticks (~100 000 Hz):

    |swapper/-1 1dn.h111 60818582us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415486608
    |swapper/-1 1dn.h111 60818592us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415496082
    |swapper/-1 1dn.h111 60818601us : hrtimer_expire_entry: hrtimer=00000000e0ef0f6b function=tick_sched_timer now=60415505550

    Fix this by checking the right timer vector state from the nohz code.

    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Link: https://lkml.kernel.org/r/20220405010752.1347437-2-frederic@kernel.org

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
2023-02-27 13:46:09 +01:00
Juri Lelli 688e3bbf5b rcutorture: Also force sched priority to timersd on boosting test.
Bugzilla: https://bugzilla.redhat.com/2171995
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git

commit 2eeb264f1e03aa2969b5017861d316f4810dd2f9
Author:    Frederic Weisbecker <frederic@kernel.org>
Date:      Tue Apr 5 03:07:51 2022 +0200

    rcutorture: Also force sched priority to timersd on boosting test.

    ksoftirqd is statically boosted to the priority level right above the
    one of rcu_torture_boost() so that timers, which torture readers rely on,
    get a chance to run while rcu_torture_boost() is polling.

    However timers processing got split from ksoftirqd into their own kthread
    (timersd) that isn't boosted. It has the same SCHED_FIFO low prio as
    rcu_torture_boost() and therefore timers can't preempt it and may
    starve.

    The issue can be triggered in practice on v5.17.1-rt17 using:

    	./kvm.sh --allcpus --configs TREE04 --duration 10m --kconfig "CONFIG_EXPERT=y CONFIG_PREEMPT_RT=y"

    Fix this with statically boosting timersd just like is done with
    ksoftirqd in commit
       ea6d962e80 ("rcutorture: Judge RCU priority boosting on grace periods, not callbacks")

    Suggested-by: Mel Gorman <mgorman@suse.de>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Link: https://lkml.kernel.org/r/20220405010752.1347437-1-frederic@kernel.org
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
2023-02-27 13:46:09 +01:00
Juri Lelli 87ec2dea19 softirq: Use a dedicated thread for timer wakeups.
Bugzilla: https://bugzilla.redhat.com/2171995
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git

commit 4335d431bec7d67faea9a92afa66f4bbbcfde7f8
Author:    Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:      Wed Dec 1 17:41:09 2021 +0100

    softirq: Use a dedicated thread for timer wakeups.

    A timer/hrtimer softirq is raised in-IRQ context. With threaded
    interrupts enabled or on PREEMPT_RT this leads to waking the ksoftirqd
    for the processing of the softirq.
    Once the ksoftirqd is marked as pending (or is running) it will collect
    all raised softirqs. This in turn means that a softirq which would have
    been processed at the end of the threaded interrupt, which runs at an
    elevated priority, is now moved to ksoftirqd which runs at SCHED_OTHER
    priority and competes with every regular task for CPU resources.
    This introduces long delays on heavy loaded systems and is not desired
    especially if the system is not overloaded by the softirqs.

    Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated
    timers thread and let it run at the lowest SCHED_FIFO priority.
    RT tasks are are woken up from hardirq context so only timer_list timers
    and hrtimers for "regular" tasks are processed here. The higher priority
    ensures that wakeups are performed before scheduling SCHED_OTHER tasks.

    Using a dedicated variable to store the pending softirq bits values
    ensure that the timer are not accidentally picked up by ksoftirqd and
    other threaded interrupts.
    It shouldn't be picked up by ksoftirqd since it runs at lower priority.
    However if the timer bits are ORed while a threaded interrupt is
    running, then the timer softirq would be performed at higher priority.
    The new timer thread will block on the softirq lock before it starts
    softirq work. This "race window" isn't closed because while timer thread
    is performing the softirq it can get PI-boosted via the softirq lock by
    a random force-threaded thread.
    The timer thread can pick up pending softirqs from ksoftirqd but only
    if the softirq load is high. It is not be desired that the picked up
    softirqs are processed at SCHED_FIFO priority under high softirq load
    but this can already happen by a PI-boost by a force-threaded interrupt.

    Reported-by: kernel test robot <lkp@intel.com> [ static timer_threads ]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
2023-02-27 13:46:08 +01:00
Phil Auld eed8502760 smp: Make softirq handling RT safe in flush_smp_call_function_queue()
Bugzilla: https://bugzilla.redhat.com/2120671

commit 1a90bfd220201fbe050dfc15deaac20ca5f15638
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Apr 13 15:31:05 2022 +0200

    smp: Make softirq handling RT safe in flush_smp_call_function_queue()

    flush_smp_call_function_queue() invokes do_softirq() which is not available
    on PREEMPT_RT. flush_smp_call_function_queue() is invoked from the idle
    task and the migration task with preemption or interrupts disabled.

    So RT kernels cannot process soft interrupts in that context as that has to
    acquire 'sleeping spinlocks' which is not possible with preemption or
    interrupts disabled and forbidden from the idle task anyway.

    The currently known SMP function call which raises a soft interrupt is in
    the block layer, but this functionality is not enabled on RT kernels due to
    latency and performance reasons.

    RT could wake up ksoftirqd unconditionally, but this wants to be avoided if
    there were soft interrupts pending already when this is invoked in the
    context of the migration task. The migration task might have preempted a
    threaded interrupt handler which raised a soft interrupt, but did not reach
    the local_bh_enable() to process it. The "running" ksoftirqd might prevent
    the handling in the interrupt thread context which is causing latency
    issues.

    Add a new function which handles this case explicitely for RT and falls
    back to do_softirq() on !RT kernels. In the RT case this warns when one of
    the flushed SMP function calls raised a soft interrupt so this can be
    investigated.

    [ tglx: Moved the RT part out of SMP code ]

    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/r/YgKgL6aPj8aBES6G@linutronix.de
    Link: https://lore.kernel.org/r/20220413133024.356509586@linutronix.de

Signed-off-by: Phil Auld <pauld@redhat.com>
2022-09-08 11:25:07 -04:00
Phil Auld 4e19e0b26f timers/nohz: Last resort update jiffies on nohz_full IRQ entry
Bugzilla: https://bugzilla.redhat.com/2078906

commit 53e87e3cdc155f20c3417b689df8d2ac88d79576
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue Oct 26 16:10:54 2021 +0200

    timers/nohz: Last resort update jiffies on nohz_full IRQ entry

    When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU
    is guaranteed to stay online and to never stop its tick.

    Meanwhile on some rare case, the dedicated timekeeper may be running
    with interrupts disabled for a while, such as in stop_machine.

    If jiffies stop being updated, a nohz_full CPU may end up endlessly
    programming the next tick in the past, taking the last jiffies update
    monotonic timestamp as a stale base, resulting in an tick storm.

    Here is a scenario where it matters:

    0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU.

    1) A stop machine callback is queued to execute somewhere.

    2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in
       MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1
       can still take IRQs.

    3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward.

    4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking
       last_jiffies_update as a base. But last_jiffies_update hasn't been
       updated for 2 jiffies since the timekeeper has interrupts disabled.

    5) clockevents_program_event(), which relies on ktime_get(), observes
       that the expiration is in the past and therefore programs the min
       delta event on the clock.

    6) The tick fires immediately, goto 3)

    7) Tick storm, the nohz_full CPU is drown and takes ages to reach
       MULTI_STOP_DISABLE_IRQ, which is the only way out of this situation.

    Solve this with unconditionally updating jiffies if the value is stale
    on nohz_full IRQ entry. IRQs and other disturbances are expected to be
    rare enough on nohz_full for the unconditional call to ktime_get() to
    actually matter.

    Reported-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Paul E. McKenney <paulmck@kernel.org>
    Link: https://lore.kernel.org/r/20211026141055.57358-2-frederic@kernel.org

Signed-off-by: Phil Auld <pauld@redhat.com>
2022-06-01 13:54:12 -04:00
Prarit Bhargava 422a09ac44 genirq: Change force_irqthreads to a static key
Bugzilla: http://bugzilla.redhat.com/2023084

commit 91cc470e797828d779cd4c1efbe8519bcb358bae
Author: Tanner Love <tannerlove@google.com>
Date:   Wed Jun 2 14:03:38 2021 -0400

    genirq: Change force_irqthreads to a static key

    With CONFIG_IRQ_FORCED_THREADING=y, testing the boolean force_irqthreads
    could incur a cache line miss in invoke_softirq() and other places.

    Replace the test with a static key to avoid the potential cache miss.

    [ tglx: Dropped the IDE part, removed the export and updated blk-mq ]

    Suggested-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Tanner Love <tannerlove@google.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20210602180338.3324213-1-tannerlove.kernel@gmail.com

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
2022-01-11 18:23:31 -05:00
Peter Zijlstra b03fbd4ff2 sched: Introduce task_is_running()
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org
2021-06-18 11:43:07 +02:00
Peter Zijlstra 37aadc687a sched: Unbreak wakeups
Remove broken task->state references and let wake_up_process() DTRT.

The anti-pattern in these patches breaks the ordering of ->state vs
COND as described in the comment near set_current_state() and can lead
to missed wakeups:

	(OoO load, observes RUNNING)<-.
	for (;;) {                    |
	  t->state = UNINTERRUPTIBLE; |
	  smp_mb();          ,----->  | (observes !COND)
                             |        /
	  if (COND) ---------'       |	COND = 1;
		break;		     `- if (t->state != RUNNING)
					  wake_up_process(t); // not done
	  schedule(); // forever waiting
	}
	t->state = TASK_RUNNING;

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.160855222@infradead.org
2021-06-18 11:43:06 +02:00
Linus Torvalds 9a45da9270 RCU changes for this cycle were:
- Bitmap support for "N" as alias for last bit
  - kvfree_rcu updates
  - mm_dump_obj() updates.  (One of these is to mm, but was suggested by Andrew Morton.)
  - RCU callback offloading update
  - Polling RCU grace-period interfaces
  - Realtime-related RCU updates
  - Tasks-RCU updates
  - Torture-test updates
  - Torture-test scripting updates
  - Miscellaneous fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCJCZERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hRjw/+Jkb9KvR9odPt/zqN/KPtIlburCUWgsFb
 2zAlWN4uMocPAiXT2Xq58/8gqMkpyn7ZVZtL1tD8fZSvlwEr0U8Z74+/NdoQvYE+
 kMXIYIuhIAGRyAupmzkriqN33iY+BSZPacX3u6ziPj57/0OZzbWVN/DAhbuvyLqG
 J/oL4PHCa7XAqXbf95rd5Zjs680QJ3CbTRh4nA8uHArzJmKZOaaHJ05Pxd1LpULe
 SJ+5p1GQnnwxd1HqmlHMDu/dW+2hE35BGykF8zi78je9OJXualDoM/6JpIYGhMNY
 5qlhU55QYP1jzjuNGVZZUS4L77eS2/W7SpPAaTmMEy/SsVB59G8Kf22oNDpVaEqQ
 m+2ErqwaHvlkMjqnsx+JQbsOP0yCi2NZBoEPFdfk1H23E2deVlSDbxPso4Zb1oUD
 E12769kN+SWDytuLSOAe1PY/KXqmNUKjPZl1GDCGXL7HlCnWyggUDschTsKJa19O
 XXl+yCTGMUH4XAPSqavAKQbBjurqpT6i4zfooSH4TBtOHm1ExgZOUS8gglZ1JuJd
 q+uJdZIgS8BcGkGw/k1bYDWY5TA4Rjv3sAOKQL1PgYBl1t/yLK441mE7LI9gWOwz
 Crz7vlSxD6Jc2cYQeUVW0KPGt5aVd63Gd9HjpXxGkqYQSDRqYMCebHEAGagz+jj7
 Nv/nOnf34Uc=
 =mpNt
 -----END PGP SIGNATURE-----

Merge tag 'core-rcu-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:

 - Support for "N" as alias for last bit in bitmap parsing library (eg
   using syntax like "nohz_full=2-N")

 - kvfree_rcu updates

 - mm_dump_obj() updates. (One of these is to mm, but was suggested by
   Andrew Morton.)

 - RCU callback offloading update

 - Polling RCU grace-period interfaces

 - Realtime-related RCU updates

 - Tasks-RCU updates

 - Torture-test updates

 - Torture-test scripting updates

 - Miscellaneous fixes

* tag 'core-rcu-2021-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
  rcutorture: Test start_poll_synchronize_rcu() and poll_state_synchronize_rcu()
  rcu: Provide polling interfaces for Tiny RCU grace periods
  torture: Fix kvm.sh --datestamp regex check
  torture: Consolidate qemu-cmd duration editing into kvm-transform.sh
  torture: Print proper vmlinux path for kvm-again.sh runs
  torture: Make TORTURE_TRUST_MAKE available in kvm-again.sh environment
  torture: Make kvm-transform.sh update jitter commands
  torture: Add --duration argument to kvm-again.sh
  torture: Add kvm-again.sh to rerun a previous torture-test
  torture: Create a "batches" file for build reuse
  torture: De-capitalize TORTURE_SUITE
  torture: Make upper-case-only no-dot no-slash scenario names official
  torture: Rename SRCU-t and SRCU-u to avoid lowercase characters
  torture: Remove no-mpstat error message
  torture: Record kvm-test-1-run.sh and kvm-test-1-run-qemu.sh PIDs
  torture: Record jitter start/stop commands
  torture: Extract kvm-test-1-run-qemu.sh from kvm-test-1-run.sh
  torture: Record TORTURE_KCONFIG_GDB_ARG in qemu-cmd
  torture: Abstract jitter.sh start/stop into scripts
  rcu: Provide polling interfaces for Tree RCU grace periods
  ...
2021-04-28 12:00:13 -07:00
Thomas Gleixner 47c218dcae tick/sched: Prevent false positive softirq pending warnings on RT
On RT a task which has soft interrupts disabled can block on a lock and
schedule out to idle while soft interrupts are pending. This triggers the
warning in the NOHZ idle code which complains about going idle with pending
soft interrupts. But as the task is blocked soft interrupt processing is
temporarily blocked as well which means that such a warning is a false
positive.

To prevent that check the per CPU state which indicates that a scheduled
out task has soft interrupts disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309085727.527563866@linutronix.de
2021-03-17 16:34:11 +01:00
Thomas Gleixner 8b1c04acad softirq: Make softirq control and processing RT aware
Provide a local lock based serialization for soft interrupts on RT which
allows the local_bh_disabled() sections and servicing soft interrupts to be
preemptible.

Provide the necessary inline helpers which allow to reuse the bulk of the
softirq processing code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309085727.426370483@linutronix.de
2021-03-17 16:34:10 +01:00
Thomas Gleixner f02fc963e9 softirq: Move various protections into inline helpers
To allow reuse of the bulk of softirq processing code for RT and to avoid
#ifdeffery all over the place, split protections for various code sections
out into inline helpers so the RT variant can just replace them in one go.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309085727.310118772@linutronix.de
2021-03-17 16:34:10 +01:00
Thomas Gleixner eb2dafbba8 tasklets: Prevent tasklet_unlock_spin_wait() deadlock on RT
tasklet_unlock_spin_wait() spin waits for the TASKLET_STATE_SCHED bit in
the tasklet state to be cleared. This works on !RT nicely because the
corresponding execution can only happen on a different CPU.

On RT softirq processing is preemptible, therefore a task preempting the
softirq processing thread can spin forever.

Prevent this by invoking local_bh_disable()/enable() inside the loop. In
case that the softirq processing thread was preempted by the current task,
current will block on the local lock which yields the CPU to the preempted
softirq processing thread. If the tasklet is processed on a different CPU
then the local_bh_disable()/enable() pair is just a waste of processor
cycles.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309084241.988908275@linutronix.de
2021-03-17 16:33:57 +01:00
Peter Zijlstra 697d8c63c4 tasklets: Replace spin wait in tasklet_kill()
tasklet_kill() spin waits for TASKLET_STATE_SCHED to be cleared invoking
yield() from inside the loop. yield() is an ill defined mechanism and the
result might still be wasting CPU cycles in a tight loop which is
especially painful in a guest when the CPU running the tasklet is scheduled
out.

tasklet_kill() is used in teardown paths and not performance critical at
all. Replace the spin wait with wait_var_event().

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309084241.890532921@linutronix.de
2021-03-17 16:33:57 +01:00
Peter Zijlstra da04474740 tasklets: Replace spin wait in tasklet_unlock_wait()
tasklet_unlock_wait() spin waits for TASKLET_STATE_RUN to be cleared. This
is wasting CPU cycles in a tight loop which is especially painful in a
guest when the CPU running the tasklet is scheduled out.

tasklet_unlock_wait() is invoked from tasklet_kill() which is used in
teardown paths and not performance critical at all. Replace the spin wait
with wait_var_event().

There are no users of tasklet_unlock_wait() which are invoked from atomic
contexts. The usage in tasklet_disable() has been replaced temporarily with
the spin waiting variant until the atomic users are fixed up and will be
converted to the sleep wait variant later.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210309084241.783936921@linutronix.de
2021-03-17 16:33:55 +01:00
Dirk Behme 6b2c339df9 softirq: s/BUG/WARN_ONCE/ on tasklet SCHED state not set
Replace BUG() with WARN_ONCE() on wrong tasklet state, in order to:

 - increase the verbosity / aid in debugging
 - avoid fatal/unrecoverable state

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210317102012.32399-1-erosca@de.adit-jv.com
2021-03-17 12:59:35 +01:00
Davidlohr Bueso 3a0ade0c52 tasklet: Remove tasklet_kill_immediate
Ever since RCU was converted to softirq, it has no users.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20210306213658.12862-1-dave@stgolabs.net
2021-03-16 15:06:31 +01:00
Paul E. McKenney 1c0c4bc1ce softirq: Don't try waking ksoftirqd before it has been spawned
If there is heavy softirq activity, the softirq system will attempt
to awaken ksoftirqd and will stop the traditional back-of-interrupt
softirq processing.  This is all well and good, but only if the
ksoftirqd kthreads already exist, which is not the case during early
boot, in which case the system hangs.

One reproducer is as follows:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_NO_HZ_IDLE=y CONFIG_HZ_PERIODIC=n" --bootargs "threadirqs=1" --trust-make

This commit therefore adds a couple of existence checks for ksoftirqd
and forces back-of-interrupt softirq processing when ksoftirqd does not
yet exist.  With this change, the above test passes.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: Uladzislau Rezki <urezki@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
[ paulmck: Remove unneeded check per Sebastian Siewior feedback. ]
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2021-03-15 13:51:48 -07:00
Thomas Gleixner db1cc7aede softirq: Move do_softirq_own_stack() to generic asm header
To avoid include recursion hell move the do_softirq_own_stack() related
content into a generic asm header and include it from all places in arch/
which need the prototype.

This allows architectures to provide an inline implementation of
do_softirq_own_stack() without introducing a lot of #ifdeffery all over the
place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210210002513.289960691@linutronix.de
2021-02-10 23:34:16 +01:00
Linus Torvalds 6be5f58215 Misc fixes/updates:
- Fix static keys usage in module __init sections
 - Add separate MAINTAINERS entry for static branches/calls
 - Fix lockdep splat with CONFIG_PREEMPTIRQ_EVENTS=y tracing
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl/oWJ0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hdtRAAsmmi7b8Di3ANfkJPRWyikPETdOn2bZA6
 dNaXSmRC0vPrngfoPgJ1A0iqIMgnZyeZs907qeB/vV9/EXa9zkFRzLvYL307lVnN
 sJo5kx7PLOGdGtQ1jcbDO2QC4INPD0PLlMr3wnAVF5EycX+geux/Vc/R2ZLB1pkC
 BzrA4u2V1P+DCbstNAX4b0SwAGSdvBWLFNSpXBbQrPMd9umdoL6uS9mF+zOf/0+q
 5r7Hc+41t7COBpHumiPdF7IrLTKP8bostMQalUu41GkD5G4vhoQUGw2edRirHE3X
 GGZbDmXo1FqU9q6qHaWhwugYbMoaMH3LTyOYTpW5xnDH1huXyXhaX2385V+aT/Ts
 g64SrLrAwJ9lokZwDUqORfxi6pi8mfNzwCQ49fKKjBC606bEKE1tjW1b7KkHxIWe
 wLCcdhZA4QuSN5F9XT3NyiQQgDLcReSA3WjA6T6QF26q5x5hmvuQqP+gmoqRthw1
 YXQ9lox3u4bLfqlvBMpJFhrCGBC2afyQyH18Xy7lsI1qfoktwvgoUEsObpb52U/G
 v/Y4sl6MnZJHJPik3yTdD+/EtdLPvMRPn9b/wMS3B+JSlP1pv1L03Nklz82tcMs/
 BoeKtdynRlKd3Sw4o3K44c8WjJ1ARXtDhLVUMDiILYnOVsT0B1xkdJkwLGOlhe3A
 hMuhh41TFtI=
 =fG6D
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:
 "Misc fixes/updates:

   - Fix static keys usage in module __init sections

   - Add separate MAINTAINERS entry for static branches/calls

   - Fix lockdep splat with CONFIG_PREEMPTIRQ_EVENTS=y tracing"

* tag 'locking-urgent-2020-12-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  softirq: Avoid bad tracing / lockdep interaction
  jump_label/static_call: Add MAINTAINERS
  jump_label: Fix usage in module __init
2020-12-27 09:06:10 -08:00
Peter Zijlstra 91ea62d58b softirq: Avoid bad tracing / lockdep interaction
Similar to commit:

  1a63dcd876 ("softirq: Reorder trace_softirqs_on to prevent lockdep splat")

__local_bh_enable_ip() can also call into tracing with inconsistent
state. Unlike that commit we don't need to bother about the tracepoint
because 'cnt-1' never matches preempt_count() (by construction).

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lkml.kernel.org/r/20201218154519.GW3092@hirez.programming.kicks-ass.net
2020-12-18 16:53:13 +01:00
Frederic Weisbecker d14ce74f1f irq: Call tick_irq_enter() inside HARDIRQ_OFFSET
Now that account_hardirq_enter() is called after HARDIRQ_OFFSET has
been incremented, there is nothing left that prevents us from also
moving tick_irq_enter() after HARDIRQ_OFFSET is incremented.

The desired outcome is to remove the nasty hack that prevents softirqs
from being raised through ksoftirqd instead of the hardirq bottom half.
Also tick_irq_enter() then becomes appropriately covered by lockdep.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201202115732.27827-6-frederic@kernel.org
2020-12-02 20:20:05 +01:00
Frederic Weisbecker d3759e7184 irqtime: Move irqtime entry accounting after irq offset incrementation
IRQ time entry is currently accounted before HARDIRQ_OFFSET or
SOFTIRQ_OFFSET are incremented. This is convenient to decide to which
index the cputime to account is dispatched.

Unfortunately it prevents tick_irq_enter() from being called under
HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ
entry accounting due to the necessary clock catch up. As a result we
don't benefit from appropriate lockdep coverage on tick_irq_enter().

To prepare for fixing this, move the IRQ entry cputime accounting after
the preempt offset is incremented. This requires the cputime dispatch
code to handle the extra offset.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20201202115732.27827-5-frederic@kernel.org
2020-12-02 20:20:05 +01:00
Thomas Gleixner ae9ef58996 softirq: Move related code into one section
To prepare for adding a RT aware variant of softirq serialization and
processing move related code into one section so the necessary #ifdeffery
is reduced to one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20201113141733.974214480@linutronix.de
2020-11-23 10:31:06 +01:00
Jiafei Pan cdabce2e3d softirq: Add debug check to __raise_softirq_irqoff()
__raise_softirq_irqoff() must be called with interrupts disabled to protect
the per CPU softirq pending state update against an interrupt and soft
interrupt handling on return from interrupt.

Add a lockdep assertion to validate the calling convention.

[ tglx: Massaged changelog ]

Signed-off-by: Jiafei Pan <Jiafei.Pan@nxp.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200814045522.45719-1-Jiafei.Pan@nxp.com
2020-09-16 15:18:56 +02:00
Linus Torvalds 427714f258 tasklets API update for v5.9-rc1
- Prepare for tasklet API modernization (Romain Perier, Allen Pais, Kees Cook)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oXpMWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJtJgEACVb88nzYwu5mC5ZcfvwSyXeQsR
 eDpCkX5HT6CsxlOn0/YJvxUtkkerQftbRuAXrzoUpQkpyBh82PviVZFKDS7NE9Lc
 6xPqloi2gbZ8EfgMraVynL+9lpLh0+qNCM7LPg4xT+JxMDLut/nWRdrp8d7uBYfQ
 AXV6CV4Tc4ijOMROV6AEVVdSTzkRCbiqUnRDBLETBfiJOdDn5MgJgxicWvN5FTpu
 PiUVF3CtWaKCRfQO/GEAXTG65hOtmql5IbX9n7uooNu/wCCnEFfVUus1uTcsrqxN
 ByrZ56NVPoO7z2jYLt8Lft3myo2e/mn88PKqrzS2p9GPn0VBv7rcO3ePmbbHL/RA
 mp+pg8wdpmKrHv4YGfsF+obT1v8f6VJoTLUt5S/WqZAzl1sVJgEJdAkjmDKythGG
 yYKKCemMceMMzLXxnFAYMzdXzdXZ3YEpiW4UkBb77EhUisDrLxCHSL5t4UzyWnuO
 Gtzw7N69iHPHLsxAk1hESAD8sdlk2EdN6vzJVelOsiW955x1hpR+msvNpwZwBqdq
 A2h8VnnrxLK2APl93T5VW9T6kvhzaTwLhoCH+oKklE+U0XJTAYZ4D/AcRVghBvMg
 bC1+1vDx+t/S+8P308evPQnEygLtL2I+zpPnBA1DZzHRAoY8inCLc5HQOfr6pi/f
 koNTtKkmSSKaFSYITw==
 =hb+e
 -----END PGP SIGNATURE-----

Merge tag 'tasklets-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull tasklets API update from Kees Cook:
 "These are the infrastructure updates needed to support converting the
  tasklet API to something more modern (and hopefully for removal
  further down the road).

  There is a 300-patch series waiting in the wings to get set out to
  subsystem maintainers, but these changes need to be present in the
  kernel first. Since this has some treewide changes, I carried this
  series for -next instead of paining Thomas with it in -tip, but it's
  got his Ack.

  This is similar to the timer_struct modernization from a while back,
  but not nearly as messy (I hope). :)

   - Prepare for tasklet API modernization (Romain Perier, Allen Pais,
     Kees Cook)"

* tag 'tasklets-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  tasklet: Introduce new initialization API
  treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()
  usb: gadget: udc: Avoid tasklet passing a global
2020-08-04 13:40:35 -07:00
Romain Perier 12cc923f1c tasklet: Introduce new initialization API
Nowadays, modern kernel subsystems that use callbacks pass the data
structure associated with a given callback as argument to the callback.
The tasklet subsystem remains one which passes an arbitrary unsigned
long to the callback function. This has several problems:

- This keeps an extra field for storing the argument in each tasklet
  data structure, it bloats the tasklet_struct structure with a redundant
  .data field

- No type checking can be performed on this argument. Instead of
  using container_of() like other callback subsystems, it forces callbacks
  to do explicit type cast of the unsigned long argument into the required
  object type.

- Buffer overflows can overwrite the .func and the .data field, so
  an attacker can easily overwrite the function and its first argument
  to whatever it wants.

Add a new tasklet initialization API, via DECLARE_TASKLET() and
tasklet_setup(), which will replace the existing ones.

This work is greatly inspired by the timer_struct conversion series,
see commit e99e88a9d2 ("treewide: setup_timer() -> timer_setup()")

To avoid problems with both -Wcast-function-type (which is enabled in
the kernel via -Wextra is several subsystems), and with mismatched
function prototypes when build with Control Flow Integrity enabled,
this adds the "use_callback" member to let the tasklet caller choose
which union member to call through. Once all old API uses are removed,
this and the .data member will be removed as well. (On 64-bit this does
not grow the struct size as the new member fills the hole after atomic_t,
which is also "int" sized.)

Signed-off-by: Romain Perier <romain.perier@gmail.com>
Co-developed-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-30 11:16:01 -07:00
Peter Zijlstra f9ad4a5f3f lockdep: Remove lockdep_hardirq{s_enabled,_context}() argument
Now that the macros use per-cpu data, we no longer need the argument.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200623083721.571835311@infradead.org
2020-07-10 12:00:02 +02:00
Peter Zijlstra a21ee6055c lockdep: Change hardirq{s_enabled,_context} to per-cpu variables
Currently all IRQ-tracking state is in task_struct, this means that
task_struct needs to be defined before we use it.

Especially for lockdep_assert_irq*() this can lead to header-hell.

Move the hardirq state into per-cpu variables to avoid the task_struct
dependency.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200623083721.512673481@infradead.org
2020-07-10 12:00:02 +02:00
Peter Zijlstra 59bc300b71 x86/entry: Clarify irq_{enter,exit}_rcu()
Because:

  irq_enter_rcu() includes lockdep_hardirq_enter()
  irq_exit_rcu() does *NOT* include lockdep_hardirq_exit()

Which resulted in two 'stray' lockdep_hardirq_exit() calls in
idtentry.h, and me spending a long time trying to find the matching
enter calls.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.359433429@infradead.org
2020-06-11 15:15:24 +02:00
Thomas Gleixner 8a6bc4787f genirq: Provide irq_enter/exit_rcu()
irq_enter()/exit() currently include RCU handling. To properly separate the RCU
handling code, provide variants which contain only the non-RCU related
functionality.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.567023613@linutronix.de
2020-06-11 15:15:06 +02:00
Peter Zijlstra ef996916e7 lockdep: Rename trace_{hard,soft}{irq_context,irqs_enabled}()
Continue what commit:

  d820ac4c2f ("locking: rename trace_softirq_[enter|exit] => lockdep_softirq_[enter|exit]")

started, rename these to avoid confusing them with tracepoints.

git grep -l "trace_\(soft\|hard\)\(irq_context\|irqs_enabled\)" | while read file;
do
	sed -ie 's/trace_\(soft\|hard\)\(irq_context\|irqs_enabled\)/lockdep_\1\2/g' $file;
done

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200320115859.178626842@infradead.org
2020-03-21 16:03:54 +01:00
Peter Zijlstra 0d38453c85 lockdep: Rename trace_softirqs_{on,off}()
Continue what commit:

  d820ac4c2f ("locking: rename trace_softirq_[enter|exit] => lockdep_softirq_[enter|exit]")

started, rename these to avoid confusing them with tracepoints.

git grep -l "trace_softirqs_\(on\|off\)" | while read file;
do
	sed -ie 's/trace_softirqs_\(on\|off\)/lockdep_softirqs_\1/g' $file;
done

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200320115859.119434738@infradead.org
2020-03-21 16:03:54 +01:00
Thomas Gleixner 2502ec37a7 lockdep: Rename trace_hardirq_{enter,exit}()
Continue what commit:

  d820ac4c2f ("locking: rename trace_softirq_[enter|exit] => lockdep_softirq_[enter|exit]")

started, rename these to avoid confusing them with tracepoints.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200320115859.060481361@infradead.org
2020-03-21 16:03:53 +01:00
Linus Torvalds 2a1ccd3142 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "The irq departement provides the usual mixed bag:

  Core:

   - Further improvements to the irq timings code which aims to predict
     the next interrupt for power state selection to achieve better
     latency/power balance

   - Add interrupt statistics to the core NMI handlers

   - The usual small fixes and cleanups

  Drivers:

   - Support for Renesas RZ/A1, Annapurna Labs FIC, Meson-G12A SoC and
     Amazon Gravition AMR/GIC interrupt controllers.

   - Rework of the Renesas INTC controller driver

   - ACPI support for Socionext SoCs

   - Enhancements to the CSKY interrupt controller

   - The usual small fixes and cleanups"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
  irq/irqdomain: Fix comment typo
  genirq: Update irq stats from NMI handlers
  irqchip/gic-pm: Remove PM_CLK dependency
  irqchip/al-fic: Introduce Amazon's Annapurna Labs Fabric Interrupt Controller Driver
  dt-bindings: interrupt-controller: Add Amazon's Annapurna Labs FIC
  softirq: Use __this_cpu_write() in takeover_tasklets()
  irqchip/mbigen: Stop printing kernel addresses
  irqchip/gic: Add dependency for ARM_GIC_MAX_NR
  genirq/affinity: Remove unused argument from [__]irq_build_affinity_masks()
  genirq/timings: Add selftest for next event computation
  genirq/timings: Add selftest for irqs circular buffer
  genirq/timings: Add selftest for circular array
  genirq/timings: Encapsulate storing function
  genirq/timings: Encapsulate timings push
  genirq/timings: Optimize the period detection speed
  genirq/timings: Fix timings buffer inspection
  genirq/timings: Fix next event index function
  irqchip/qcom: Use struct_size() in devm_kzalloc()
  irqchip/irq-csky-mpintc: Remove unnecessary loop in interrupt handler
  dt-bindings: interrupt-controller: Update csky mpintc
  ...
2019-07-08 11:01:13 -07:00
Muchun Song 8afecaa68d softirq: Use __this_cpu_write() in takeover_tasklets()
The code is executed with interrupts disabled, so it's safe to use
__this_cpu_write().

[ tglx: Massaged changelog ]

Signed-off-by: Muchun Song <smuchun@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: joel@joelfernandes.org
Cc: rostedt@goodmis.org
Cc: frederic@kernel.org
Cc: paulmck@linux.vnet.ibm.com
Cc: alexander.levin@verizon.com
Cc: peterz@infradead.org
Link: https://lkml.kernel.org/r/20190618143305.2038-1-smuchun@gmail.com
2019-06-23 18:14:27 +02:00
Thomas Gleixner 767a67b0b3 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430
Based on 1 normalized pattern(s):

  distribute under gplv2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 8 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190114.475576622@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:37:16 +02:00
Thomas Gleixner d7dcf26ff0 softirq: Remove tasklet_hrtimer
There are no more users of this interface.
Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Link: https://lkml.kernel.org/r/20190301224821.29843-4-bigeasy@linutronix.de
2019-03-22 14:36:02 +01:00
Matthias Kaehlcke 1342d8080f softirq: Don't skip softirq execution when softirq thread is parking
When a CPU is unplugged the kernel threads of this CPU are parked (see
smpboot_park_threads()). kthread_park() is used to mark each thread as
parked and wake it up, so it can complete the process of parking itselfs
(see smpboot_thread_fn()).

If local softirqs are pending on interrupt exit invoke_softirq() is called
to process the softirqs, however it skips processing when the softirq
kernel thread of the local CPU is scheduled to run. The softirq kthread is
one of the threads that is parked when a CPU is unplugged. Parking the
kthread wakes it up, however only to complete the parking process, not to
process the pending softirqs. Hence processing of softirqs at the end of an
interrupt is skipped, but not done elsewhere, which can result in warnings
about pending softirqs when a CPU is unplugged:

/sys/devices/system/cpu # echo 0 > cpu4/online
[ ... ] NOHZ: local_softirq_pending 02
[ ... ] NOHZ: local_softirq_pending 202
[ ... ] CPU4: shutdown
[ ... ] psci: CPU4 killed.

Don't skip processing of softirqs at the end of an interrupt when the
softirq thread of the CPU is parking.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Link: https://lkml.kernel.org/r/20190128234625.78241-3-mka@chromium.org
2019-02-10 21:51:39 +01:00
Linus Torvalds 5947a64a7e Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "The interrupt brigade came up with the following updates:

   - Driver for the Marvell System Error Interrupt machinery

   - Overhaul of the GIC-V3 ITS driver

   - Small updates and fixes all over the place"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
  genirq: Fix race on spurious interrupt detection
  softirq: Fix typo in __do_softirq() comments
  genirq: Fix grammar s/an /a /
  irqchip/gic: Unify GIC priority definitions
  irqchip/gic-v3: Remove acknowledge loop
  dt-bindings/interrupt-controller: Add documentation for Marvell SEI controller
  dt-bindings/interrupt-controller: Update Marvell ICU bindings
  irqchip/irq-mvebu-icu: Add support for System Error Interrupts (SEI)
  arm64: marvell: Enable SEI driver
  irqchip/irq-mvebu-sei: Add new driver for Marvell SEI
  irqchip/irq-mvebu-icu: Support ICU subnodes
  irqchip/irq-mvebu-icu: Disociate ICU and NSR
  irqchip/irq-mvebu-icu: Clarify the reset operation of configured interrupts
  irqchip/irq-mvebu-icu: Fix wrong private data retrieval
  dt-bindings/interrupt-controller: Fix Marvell ICU length in the example
  genirq/msi: Allow creation of a tree-based irqdomain for platform-msi
  dt-bindings: irqchip: renesas-irqc: Document r8a7744 support
  dt-bindings: irqchip: renesas-irqc: Document R-Car E3 support
  irqchip/pdc: Setup all edge interrupts as rising edge at GIC
  irqchip/gic-v3-its: Allow use of LPI tables in reserved memory
  ...
2018-10-25 11:43:47 -07:00
Yangtao Li e45506ac0a softirq: Fix typo in __do_softirq() comments
s/s/as

[ mingo: Also add a missing 'the', add proper punctuation and clarify what 'swap' means here. ]

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: alexander.levin@verizon.com
Cc: frederic@kernel.org
Cc: joel@joelfernandes.org
Cc: paulmck@linux.vnet.ibm.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20181018142133.12341-1-tiny.windzz@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-18 18:10:23 +02:00
Paul E. McKenney 65cfe3583b rcu: Define RCU-bh update API in terms of RCU
Now that the main RCU API knows about softirq disabling and softirq's
quiescent states, the RCU-bh update code can be dispensed with.
This commit therefore removes the RCU-bh update-side implementation and
defines RCU-bh's update-side API in terms of that of either RCU-preempt or
RCU-sched, depending on the setting of the CONFIG_PREEMPT Kconfig option.

In kernels built with CONFIG_RCU_NOCB_CPU=y this has the knock-on effect
of reducing by one the number of rcuo kthreads per CPU.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-08-30 16:02:40 -07:00