linux.git
12 years agokconfig-preempt-rt-full.patch v3.2.13-rt22
Thomas Gleixner [Wed, 29 Jun 2011 12:58:57 +0000 (14:58 +0200)]
kconfig-preempt-rt-full.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agokconfig-disable-a-few-options-rt.patch
Thomas Gleixner [Sun, 24 Jul 2011 10:11:43 +0000 (12:11 +0200)]
kconfig-disable-a-few-options-rt.patch

Disable stuff which is known to have issues on RT

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agonet: Use cpu_chill() instead of cpu_relax()
Thomas Gleixner [Wed, 7 Mar 2012 20:10:04 +0000 (21:10 +0100)]
net: Use cpu_chill() instead of cpu_relax()

Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
12 years agofs: dcache: Use cpu_chill() in trylock loops
Thomas Gleixner [Wed, 7 Mar 2012 20:00:34 +0000 (21:00 +0100)]
fs: dcache: Use cpu_chill() in trylock loops

Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
12 years agort: Introduce cpu_chill()
Thomas Gleixner [Wed, 7 Mar 2012 19:51:03 +0000 (20:51 +0100)]
rt: Introduce cpu_chill()

Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
12 years agosoftirq: Check preemption after reenabling interrupts
Thomas Gleixner [Sun, 13 Nov 2011 16:17:09 +0000 (17:17 +0100)]
softirq: Check preemption after reenabling interrupts

raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.

In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.

Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
12 years agocpu: Make hotplug.lock a "sleeping" spinlock on RT
Steven Rostedt [Fri, 2 Mar 2012 15:36:57 +0000 (10:36 -0500)]
cpu: Make hotplug.lock a "sleeping" spinlock on RT

Tasks can block on hotplug.lock in pin_current_cpu(), but their state
might be != RUNNING. So the mutex wakeup will set the state
unconditionally to RUNNING. That might cause spurious unexpected
wakeups. We could provide a state preserving mutex_lock() function,
but this is semantically backwards. So instead we convert the
hotplug.lock() to a spinlock for RT, which has the state preserving
semantics already.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark.williams@gmail.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/1330702617.25686.265.camel@gandalf.stny.rr.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agolglock/rt: Use non-rt for_each_cpu() in -rt code
Steven Rostedt [Thu, 1 Mar 2012 18:55:30 +0000 (13:55 -0500)]
lglock/rt: Use non-rt for_each_cpu() in -rt code

Currently the RT version of the lglocks() does a for_each_online_cpu()
in the name##_global_lock_online() functions. Non-rt uses its own
mask for this, and for good reason.

A task may grab a *_global_lock_online(), and in the mean time, one
of the CPUs goes offline. Now when that task does a *_global_unlock_online()
it releases all the locks *except* the one that went offline.

Now if that CPU were to come back on line, its lock is now owned by a
task that never released it when it should have.

This causes all sorts of fun errors. Like owners of a lock no longer
existing, or sleeping on IO, waiting to be woken up by a task that
happens to be blocked on the lock it never released.

Convert the RT versions to use the lglock specific cpumasks. As once
a CPU comes on line, the mask is set, and never cleared even when the
CPU goes offline. The locks for that CPU will still be taken and released.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark.williams@gmail.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/20120301190345.374756214@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched/rt: Fix wait_task_interactive() to test rt_spin_lock state
Steven Rostedt [Thu, 1 Mar 2012 18:55:33 +0000 (13:55 -0500)]
sched/rt: Fix wait_task_interactive() to test rt_spin_lock state

The wait_task_interactive() will have a task sleep waiting for another
task to have a certain state. But it ignores the rt_spin_locks state
and can return with an incorrect result if the task it is waiting
for is blocked on a rt_spin_lock() and is waking up.

The rt_spin_locks save the tasks state in the saved_state field
and the wait_task_interactive() must also test that state.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark.williams@gmail.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/20120301190345.979435764@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoring-buffer/rt: Check for irqs disabled before grabbing reader lock
Steven Rostedt [Thu, 1 Mar 2012 18:55:32 +0000 (13:55 -0500)]
ring-buffer/rt: Check for irqs disabled before grabbing reader lock

In RT the reader lock is a mutex and we can not grab it when preemption is
disabled. The in_atomic() check that is there does not check if irqs are
disabled. Add that check as well.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark.williams@gmail.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/20120301190345.786365803@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agofutex/rt: Fix possible lockup when taking pi_lock in proxy handler
Steven Rostedt [Thu, 1 Mar 2012 18:55:29 +0000 (13:55 -0500)]
futex/rt: Fix possible lockup when taking pi_lock in proxy handler

When taking the pi_lock, we must disable interrupts because the
pi_lock can also be taken in an interrupt handler.

Use raw_spin_lock_irq() instead of raw_spin_lock().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark.williams@gmail.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/20120301190345.165160680@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agotimer: Fix hotplug for -rt
Steven Rostedt [Thu, 1 Mar 2012 18:55:28 +0000 (13:55 -0500)]
timer: Fix hotplug for -rt

Revert the RT patch:
    Author: Ingo Molnar <mingo@elte.hu>
    Date:   Fri Jul 3 08:30:32 2009 -0500
    timers: fix timer hotplug on -rt

    Here we are in the CPU_DEAD notifier, and we must not sleep nor
    enable interrupts.

There's no problem with sleeping in this notifier.

But the get_cpu_var() had to be converted to a get_local_var().

Replace the previous fix with the get_local_var() convert.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Carsten Emde <C.Emde@osadl.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark.williams@gmail.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/20120301190344.948157137@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoseqlock: Prevent rt starvation
Thomas Gleixner [Wed, 22 Feb 2012 11:03:30 +0000 (12:03 +0100)]
seqlock: Prevent rt starvation

If a low prio writer gets preempted while holding the seqlock write
locked, a high prio reader spins forever on RT.

To prevent this let the reader grab the spinlock, so it blocks and
eventually boosts the writer. This way the writer can proceed and
endless spinning is prevented.

For seqcount writers we disable preemption over the update code
path. Thaanks to Al Viro for distangling some VFS code to make that
possible.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
12 years agocpumask: Disable CONFIG_CPUMASK_OFFSTACK for RT
Thomas Gleixner [Wed, 14 Dec 2011 00:03:49 +0000 (01:03 +0100)]
cpumask: Disable CONFIG_CPUMASK_OFFSTACK for RT

We can't deal with the cpumask allocations which happen in atomic
context (see arch/x86/kernel/apic/io_apic.c) on RT right now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agodm: Make rt aware
Thomas Gleixner [Mon, 14 Nov 2011 22:06:09 +0000 (23:06 +0100)]
dm: Make rt aware

Use the BUG_ON_NORT variant for the irq_disabled() checks. RT has
interrupts legitimately enabled here as we cant deadlock against the
irq thread due to the "sleeping spinlocks" conversion.

Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agox86: crypto: Reduce preempt disabled regions
Peter Zijlstra [Mon, 14 Nov 2011 17:19:27 +0000 (18:19 +0100)]
x86: crypto: Reduce preempt disabled regions

Restrict the preempt disabled regions to the actual floating point
operations and enable preemption for the administrative actions.

This is necessary on RT to avoid that kfree and other operations are
called with preemption disabled.

Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoscsi-fcoe-rt-aware.patch
Thomas Gleixner [Sat, 12 Nov 2011 13:00:48 +0000 (14:00 +0100)]
scsi-fcoe-rt-aware.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agox86-kvm-require-const-tsc-for-rt.patch
Thomas Gleixner [Sun, 6 Nov 2011 11:26:18 +0000 (12:26 +0100)]
x86-kvm-require-const-tsc-for-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosysrq: Allow immediate Magic SysRq output for PREEMPT_RT_FULL
Frank Rowand [Fri, 23 Sep 2011 20:43:12 +0000 (13:43 -0700)]
sysrq: Allow immediate Magic SysRq output for PREEMPT_RT_FULL

Add a CONFIG option to allow the output from Magic SysRq to be output
immediately, even if this causes large latencies.

If PREEMPT_RT_FULL, printk() will not try to acquire the console lock
when interrupts or preemption are disabled.  If the console lock is
not acquired the printk() output will be buffered, but will not be
output immediately. Some drivers call into the Magic SysRq code
with interrupts or preemption disabled, so the output of Magic SysRq
will be buffered instead of printing immediately if this option is
not selected.

Even with this option selected, Magic SysRq output will be delayed
if the attempt to acquire the console lock fails.

Signed-off-by: Frank Rowand <frank.rowand@am.sony.com>
Link: http://lkml.kernel.org/r/4E7CEF60.5020508@am.sony.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoipc/sem: Rework semaphore wakeups
Peter Zijlstra [Tue, 13 Sep 2011 13:09:40 +0000 (15:09 +0200)]
ipc/sem: Rework semaphore wakeups

Current sysv sems have a weird ass wakeup scheme that involves keeping
preemption disabled over a potential O(n^2) loop and busy waiting on
that on other CPUs.

Kill this and simply wake the task directly from under the sem_lock.

This was discovered by a migrate_disable() debug feature that
disallows:

  spin_lock();
  preempt_disable();
  spin_unlock()
  preempt_enable();

Cc: Manfred Spraul <manfred@colorfullife.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Manfred Spraul <manfred@colorfullife.com>
Link: http://lkml.kernel.org/r/1315994224.5040.1.camel@twins
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agomm, rt: kmap_atomic scheduling
Peter Zijlstra [Thu, 28 Jul 2011 08:43:51 +0000 (10:43 +0200)]
mm, rt: kmap_atomic scheduling

In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.

Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
12 years agoadd /sys/kernel/realtime entry
Clark Williams [Sun, 31 Jul 2011 02:55:53 +0000 (21:55 -0500)]
add /sys/kernel/realtime entry

Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.

Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.

Are there better solutions? Should it exist and return 0 on !-rt?

Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
12 years agokgdb/serial: Short term workaround
Jason Wessel [Thu, 28 Jul 2011 17:42:23 +0000 (12:42 -0500)]
kgdb/serial: Short term workaround

On 07/27/2011 04:37 PM, Thomas Gleixner wrote:
>  - KGDB (not yet disabled) is reportedly unusable on -rt right now due
>    to missing hacks in the console locking which I dropped on purpose.
>

To work around this in the short term you can use this patch, in
addition to the clocksource watchdog patch that Thomas brewed up.

Comments are welcome of course.  Ultimately the right solution is to
change separation between the console and the HW to have a polled mode
+ work queue so as not to introduce any kind of latency.

Thanks,
Jason.

12 years agoping-sysrq.patch
Carsten Emde [Tue, 19 Jul 2011 12:51:17 +0000 (13:51 +0100)]
ping-sysrq.patch

There are (probably rare) situations when a system crashed and the system
console becomes unresponsive but the network icmp layer still is alive.
Wouldn't it be wonderful, if we then could submit a sysreq command via ping?

This patch provides this facility. Please consult the updated documentation
Documentation/sysrq.txt for details.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
12 years agonet: Avoid livelock in net_tx_action() on RT
Steven Rostedt [Thu, 6 Oct 2011 14:48:39 +0000 (10:48 -0400)]
net: Avoid livelock in net_tx_action() on RT

qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code
holding a qdisc_lock() can be interrupted and softirqs can run on the
return of interrupt in !RT.

The spin_trylock() in net_tx_action() makes sure, that the softirq
does not deadlock. When the lock can't be acquired q is requeued and
the NET_TX softirq is raised. That causes the softirq to run over and
over.

That works in mainline as do_softirq() has a retry loop limit and
leaves the softirq processing in the interrupt return path and
schedules ksoftirqd. The task which holds qdisc_lock cannot be
preempted, so the lock is released and either ksoftirqd or the next
softirq in the return from interrupt path can proceed. Though it's a
bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it
decides to bail out even if it's clear in the first iteration :)

On RT all softirq processing is done in a FIFO thread and we don't
have a loop limit, so ksoftirqd preempts the lock holder forever and
unqueues and requeues until the reset button is hit.

Due to the forced threading of ksoftirqd on RT we actually cannot
deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to
replace the spin_trylock() with a spin_lock(). When contended,
ksoftirqd is scheduled out and the lock holder can proceed.

[ tglx: Massaged changelog and code comments ]

Solved-by: Thomas Gleixner <tglx@linuxtronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Carsten Emde <cbe@osadl.org>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agomips-disable-highmem-on-rt.patch
Thomas Gleixner [Mon, 18 Jul 2011 15:10:12 +0000 (17:10 +0200)]
mips-disable-highmem-on-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoARM: at91: tclib: Default to tclib timer for RT
Thomas Gleixner [Sat, 1 May 2010 16:29:35 +0000 (18:29 +0200)]
ARM: at91: tclib: Default to tclib timer for RT

RT is not too happy about the shared timer interrupt in AT91
devices. Default to tclib timer for RT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoarm-disable-highmem-on-rt.patch
Thomas Gleixner [Mon, 18 Jul 2011 15:09:28 +0000 (17:09 +0200)]
arm-disable-highmem-on-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agopower-disable-highmem-on-rt.patch
Thomas Gleixner [Mon, 18 Jul 2011 15:08:34 +0000 (17:08 +0200)]
power-disable-highmem-on-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agopower-use-generic-rwsem-on-rt
Thomas Gleixner [Sat, 24 Mar 2012 14:29:26 +0000 (09:29 -0500)]
power-use-generic-rwsem-on-rt

12 years agoprintk: Disable migration instead of preemption
Richard Weinberger [Mon, 12 Dec 2011 13:35:56 +0000 (14:35 +0100)]
printk: Disable migration instead of preemption

There is no need do disable preemption in vprintk(), disable_migrate()
is sufficient. This fixes the following bug in -rt:

[   14.759233] BUG: sleeping function called from invalid context
at /home/rw/linux-rt/kernel/rtmutex.c:645
[   14.759235] in_atomic(): 1, irqs_disabled(): 0, pid: 547, name: bash
[   14.759244] Pid: 547, comm: bash Not tainted 3.0.12-rt29+ #3
[   14.759246] Call Trace:
[   14.759301]  [<ffffffff8106fade>] __might_sleep+0xeb/0xf0
[   14.759318]  [<ffffffff810ad784>] rt_spin_lock_fastlock.constprop.9+0x21/0x43
[   14.759336]  [<ffffffff8161fef0>] rt_spin_lock+0xe/0x10
[   14.759354]  [<ffffffff81347ad1>] serial8250_console_write+0x81/0x121
[   14.759366]  [<ffffffff8107ecd3>] __call_console_drivers+0x7c/0x93
[   14.759369]  [<ffffffff8107ef31>] _call_console_drivers+0x5c/0x60
[   14.759372]  [<ffffffff8107f7e5>] console_unlock+0x147/0x1a2
[   14.759374]  [<ffffffff8107fd33>] vprintk+0x3ea/0x462
[   14.759383]  [<ffffffff816160e0>] printk+0x51/0x53
[   14.759399]  [<ffffffff811974e4>] ? proc_reg_poll+0x9a/0x9a
[   14.759403]  [<ffffffff81335b42>] __handle_sysrq+0x50/0x14d
[   14.759406]  [<ffffffff81335c8a>] write_sysrq_trigger+0x4b/0x53
[   14.759408]  [<ffffffff81335c3f>] ? __handle_sysrq+0x14d/0x14d
[   14.759410]  [<ffffffff81197583>] proc_reg_write+0x9f/0xbe
[   14.759426]  [<ffffffff811497ec>] vfs_write+0xac/0xf3
[   14.759429]  [<ffffffff8114a9b3>] ? fget_light+0x3a/0x9b
[   14.759431]  [<ffffffff811499db>] sys_write+0x4a/0x6e
[   14.759438]  [<ffffffff81625d52>] system_call_fastpath+0x16/0x1b

Signed-off-by: Richard Weinberger <rw@linutronix.de>
Link: http://lkml.kernel.org/r/1323696956-11445-1-git-send-email-rw@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoconsole-make-rt-friendly.patch
Thomas Gleixner [Sun, 17 Jul 2011 20:43:07 +0000 (22:43 +0200)]
console-make-rt-friendly.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agox86-no-perf-irq-work-rt.patch
Thomas Gleixner [Wed, 13 Jul 2011 12:05:05 +0000 (14:05 +0200)]
x86-no-perf-irq-work-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoskbufhead-raw-lock.patch
Thomas Gleixner [Tue, 12 Jul 2011 13:38:34 +0000 (15:38 +0200)]
skbufhead-raw-lock.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agojump-label-rt.patch
Thomas Gleixner [Wed, 13 Jul 2011 09:03:16 +0000 (11:03 +0200)]
jump-label-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agodebugobjects-rt.patch
Thomas Gleixner [Sun, 17 Jul 2011 19:41:35 +0000 (21:41 +0200)]
debugobjects-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agohotplug-stuff.patch
Thomas Gleixner [Fri, 4 Nov 2011 17:58:24 +0000 (18:58 +0100)]
hotplug-stuff.patch

Do not take lock for non handled cases (might be atomic context)

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoworkqueue: Use get_cpu_light() in flush_gcwq()
Yong Zhang [Sun, 16 Oct 2011 10:56:46 +0000 (18:56 +0800)]
workqueue: Use get_cpu_light() in flush_gcwq()

BUG: sleeping function called from invalid context at kernel/rtmutex.c:645
in_atomic(): 1, irqs_disabled(): 0, pid: 1739, name: bash
Pid: 1739, comm: bash Not tainted 3.0.6-rt17-00284-gb76d419 #3
Call Trace:
 [<c06e3b5d>] ? printk+0x1d/0x20
 [<c01390b6>] __might_sleep+0xe6/0x110
 [<c06e633c>] rt_spin_lock+0x1c/0x30
 [<c01655a6>] flush_gcwq+0x236/0x320
 [<c021c651>] ? kfree+0xe1/0x1a0
 [<c05b7178>] ? __cpufreq_remove_dev+0xf8/0x260
 [<c0183fad>] ? rt_down_write+0xd/0x10
 [<c06cd91e>] workqueue_cpu_down_callback+0x26/0x2d
 [<c06e9d65>] notifier_call_chain+0x45/0x60
 [<c0171cfe>] __raw_notifier_call_chain+0x1e/0x30
 [<c014c9b4>] __cpu_notify+0x24/0x40
 [<c06cbc6f>] _cpu_down+0xdf/0x330
 [<c06cbef0>] cpu_down+0x30/0x50
 [<c06cd6b0>] store_online+0x50/0xa7
 [<c06cd660>] ? acpi_os_map_memory+0xec/0xec
 [<c04f2faa>] sysdev_store+0x2a/0x40
 [<c02887a4>] sysfs_write_file+0xa4/0x100
 [<c0229ab2>] vfs_write+0xa2/0x170
 [<c0288700>] ? sysfs_poll+0x90/0x90
 [<c0229d92>] sys_write+0x42/0x70
 [<c06ecedf>] sysenter_do_call+0x12/0x2d
CPU 1 is now offline
SMP alternatives: switching to UP code
SMP alternatives: switching to SMP code
Booting Node 0 Processor 1 APIC 0x1
smpboot cpu 1: start_ip = 9b000
Initializing CPU#1
BUG: sleeping function called from invalid context at kernel/rtmutex.c:645
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: kworker/0:0
Pid: 0, comm: kworker/0:0 Not tainted 3.0.6-rt17-00284-gb76d419 #3
Call Trace:
 [<c06e3b5d>] ? printk+0x1d/0x20
 [<c01390b6>] __might_sleep+0xe6/0x110
 [<c06e633c>] rt_spin_lock+0x1c/0x30
 [<c06cd85b>] workqueue_cpu_up_callback+0x56/0xf3
 [<c06e9d65>] notifier_call_chain+0x45/0x60
 [<c0171cfe>] __raw_notifier_call_chain+0x1e/0x30
 [<c014c9b4>] __cpu_notify+0x24/0x40
 [<c014c9ec>] cpu_notify+0x1c/0x20
 [<c06e1d43>] notify_cpu_starting+0x1e/0x20
 [<c06e0aad>] smp_callin+0xfb/0x10e
 [<c06e0ad9>] start_secondary+0x19/0xd7
NMI watchdog enabled, takes one hw-pmu counter.
Switched to NOHz mode on CPU #1

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Link: http://lkml.kernel.org/r/1318762607-2261-5-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoworkqueue: Fix PF_THREAD_BOUND abuse
Peter Zijlstra [Mon, 3 Oct 2011 10:43:25 +0000 (12:43 +0200)]
workqueue: Fix PF_THREAD_BOUND abuse

PF_THREAD_BOUND is set by kthread_bind() and means the thread is bound
to a particular cpu for correctness. The workqueue code abuses this
flag and blindly sets it for all created threads, including those that
are free to migrate.

Restore the original semantics now that the worst abuse in the
cpu-hotplug path are gone. The only icky bit is the rescue thread for
per-cpu workqueues, this cannot use kthread_bind() but will use
set_cpus_allowed_ptr() to migrate itself to the desired cpu.

Set and clear PF_THREAD_BOUND manually here.

XXX: I think worker_maybe_bind_and_lock()/worker_unbind_and_unlock()
should also do a get_online_cpus(), this would likely allow us to
remove the while loop.

XXX: should probably repurpose GCWQ_DISASSOCIATED to warn on adding
works after CPU_DOWN_PREPARE -- its dual use to mark unbound gcwqs is
a tad annoying though.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoworkqueue: Fix cpuhotplug trainwreck
Peter Zijlstra [Fri, 30 Sep 2011 09:57:58 +0000 (11:57 +0200)]
workqueue: Fix cpuhotplug trainwreck

The current workqueue code does crazy stuff on cpu unplug, it relies on
forced affine breakage, thereby violating per-cpu expectations. Worse,
it tries to re-attach to a cpu if the thing comes up again before all
previously queued works are finished. This breaks (admittedly bonkers)
cpu-hotplug use that relies on a down-up cycle to push all usage away.

Introduce a new WQ_NON_AFFINE flag that indicates a per-cpu workqueue
will not respect cpu affinity and use this to migrate all its pending
works to whatever cpu is doing cpu-down.

This also adds a warning for queue_on_cpu() users which warns when its
used on WQ_NON_AFFINE workqueues for the API implies you care about
what cpu things are ran on when such workqueues cannot guarantee this.

For the rest, simply flush all per-cpu works and don't mess about.
This also means that currently all workqueues that are manually
flushing things on cpu-down in order to provide the per-cpu guarantee
no longer need to do so.

In short, we tell the WQ what we want it to do, provide validation for
this and loose ~250 lines of code.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agomm-vmalloc.patch
Thomas Gleixner [Tue, 12 Jul 2011 09:39:36 +0000 (11:39 +0200)]
mm-vmalloc.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoepoll.patch
Thomas Gleixner [Fri, 8 Jul 2011 14:35:35 +0000 (16:35 +0200)]
epoll.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoworkqueue-use-get-cpu-light.patch
Thomas Gleixner [Sun, 17 Jul 2011 19:42:26 +0000 (21:42 +0200)]
workqueue-use-get-cpu-light.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agox86: Disable IST stacks for debug/int 3/stack fault for PREEMPT_RT
Andi Kleen [Fri, 3 Jul 2009 13:44:10 +0000 (08:44 -0500)]
x86: Disable IST stacks for debug/int 3/stack fault for PREEMPT_RT

Normally the x86-64 trap handlers for debug/int 3/stack fault run
on a special interrupt stack to make them more robust
when dealing with kernel code.

The PREEMPT_RT kernel can sleep in locks even while allocating
GFP_ATOMIC memory. When one of these trap handlers needs to send
real time signals for ptrace it allocates memory and could then
try to to schedule.  But it is not allowed to schedule on a
IST stack. This can cause warnings and hangs.

This patch disables the IST stacks for these handlers for PREEMPT_RT
kernel. Instead let them run on the normal process stack.

The kernel only really needs the ISTs here to make kernel debuggers more
robust in case someone sets a break point somewhere where the stack is
invalid. But there are no kernel debuggers in the standard kernel
that do this.

It also means kprobes cannot be set in situations with invalid stack;
but that sounds like a reasonable restriction.

The stack fault change could minimally impact oops quality, but not very
much because stack faults are fairly rare.

A better solution would be to use similar logic as the NMI "paranoid"
path: check if signal is for user space, if yes go back to entry.S, switch stack,
call sync_regs, then do the signal sending etc.

But this patch is much simpler and should work too with minimal impact.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agox86: Use generic rwsem_spinlocks on -rt
Thomas Gleixner [Sun, 26 Jul 2009 00:21:32 +0000 (02:21 +0200)]
x86: Use generic rwsem_spinlocks on -rt

Simplifies the separation of anon_rw_semaphores and rw_semaphores for
-rt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agox86: stackprotector: Avoid random pool on rt
Thomas Gleixner [Thu, 16 Dec 2010 13:25:18 +0000 (14:25 +0100)]
x86: stackprotector: Avoid random pool on rt

CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.

Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agox86: Convert mce timer to hrtimer
Thomas Gleixner [Mon, 13 Dec 2010 15:33:39 +0000 (16:33 +0100)]
x86: Convert mce timer to hrtimer

mce_timer is started in atomic contexts of cpu bringup. This results
in might_sleep() warnings on RT. Convert mce_timer to a hrtimer to
avoid this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agofs: ntfs: disable interrupt only on !RT
Mike Galbraith [Fri, 3 Jul 2009 13:44:12 +0000 (08:44 -0500)]
fs: ntfs: disable interrupt only on !RT

On Sat, 2007-10-27 at 11:44 +0200, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> > > [10138.175796]  [<c0105de3>] show_trace+0x12/0x14
> > > [10138.180291]  [<c0105dfb>] dump_stack+0x16/0x18
> > > [10138.184769]  [<c011609f>] native_smp_call_function_mask+0x138/0x13d
> > > [10138.191117]  [<c0117606>] smp_call_function+0x1e/0x24
> > > [10138.196210]  [<c012f85c>] on_each_cpu+0x25/0x50
> > > [10138.200807]  [<c0115c74>] flush_tlb_all+0x1e/0x20
> > > [10138.205553]  [<c016caaf>] kmap_high+0x1b6/0x417
> > > [10138.210118]  [<c011ec88>] kmap+0x4d/0x4f
> > > [10138.214102]  [<c026a9d8>] ntfs_end_buffer_async_read+0x228/0x2f9
> > > [10138.220163]  [<c01a0e9e>] end_bio_bh_io_sync+0x26/0x3f
> > > [10138.225352]  [<c01a2b09>] bio_endio+0x42/0x6d
> > > [10138.229769]  [<c02c2a08>] __end_that_request_first+0x115/0x4ac
> > > [10138.235682]  [<c02c2da7>] end_that_request_chunk+0x8/0xa
> > > [10138.241052]  [<c0365943>] ide_end_request+0x55/0x10a
> > > [10138.246058]  [<c036dae3>] ide_dma_intr+0x6f/0xac
> > > [10138.250727]  [<c0366d83>] ide_intr+0x93/0x1e0
> > > [10138.255125]  [<c015afb4>] handle_IRQ_event+0x5c/0xc9
> >
> > Looks like ntfs is kmap()ing from interrupt context. Should be using
> > kmap_atomic instead, I think.
>
> it's not atomic interrupt context but irq thread context - and -rt
> remaps kmap_atomic() to kmap() internally.

Hm.  Looking at the change to mm/bounce.c, perhaps I should do this
instead?

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agofs-block-rt-support.patch
Thomas Gleixner [Tue, 14 Jun 2011 15:05:09 +0000 (17:05 +0200)]
fs-block-rt-support.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agomm-protect-activate-switch-mm.patch
Thomas Gleixner [Mon, 4 Jul 2011 07:48:40 +0000 (09:48 +0200)]
mm-protect-activate-switch-mm.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agofs: namespace preemption fix
Thomas Gleixner [Sun, 19 Jul 2009 13:44:27 +0000 (08:44 -0500)]
fs: namespace preemption fix

On RT we cannot loop with preemption disabled here as
mnt_make_readonly() might have been preempted. We can safely enable
preemption while waiting for MNT_WRITE_HOLD to be cleared. Safe on !RT
as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agort: Improve the serial console PASS_LIMIT
Ingo Molnar [Wed, 14 Dec 2011 12:05:54 +0000 (13:05 +0100)]
rt: Improve the serial console PASS_LIMIT

Beyond the warning:

 drivers/tty/serial/8250.c:1613:6: warning: unused variable ‘pass_counter’ [-Wunused-variable]

the solution of just looping infinitely was ugly - up it to 1 million to
give it a chance to continue in some really ugly situation.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agodrivers-tty-fix-omap-lock-crap.patch
Thomas Gleixner [Thu, 28 Jul 2011 11:32:57 +0000 (13:32 +0200)]
drivers-tty-fix-omap-lock-crap.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoserial: 8250: Call flush_to_ldisc when the irq is threaded
Ingo Molnar [Fri, 3 Jul 2009 13:30:01 +0000 (08:30 -0500)]
serial: 8250: Call flush_to_ldisc when the irq is threaded

Signed-off-by: Ingo Molnar <mingo@elte.hu>
12 years agoserial: 8250: Clean up the locking for -rt
Ingo Molnar [Fri, 3 Jul 2009 13:30:01 +0000 (08:30 -0500)]
serial: 8250: Clean up the locking for -rt

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agolglocks-rt.patch
Thomas Gleixner [Wed, 15 Jun 2011 09:02:21 +0000 (11:02 +0200)]
lglocks-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agort/rcutree: Move misplaced prototype
Ingo Molnar [Wed, 14 Dec 2011 11:51:28 +0000 (12:51 +0100)]
rt/rcutree: Move misplaced prototype

Fix this warning on x86 defconfig:

  kernel/rcutree.h:433:13: warning: ‘rcu_preempt_qs’ declared ‘static’ but never defined [-Wunused-function]

The #ifdefs and prototypes here are a maze, move it closer to the
usage site that needs it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agorcu: Make ksoftirqd do RCU quiescent states
Paul E. McKenney [Wed, 5 Oct 2011 18:45:18 +0000 (11:45 -0700)]
rcu: Make ksoftirqd do RCU quiescent states

Implementing RCU-bh in terms of RCU-preempt makes the system vulnerable
to network-based denial-of-service attacks.  This patch therefore
makes __do_softirq() invoke rcu_bh_qs(), but only when __do_softirq()
is running in ksoftirqd context.  A wrapper layer in interposed so that
other calls to __do_softirq() avoid invoking rcu_bh_qs().  The underlying
function __do_softirq_common() does the actual work.

The reason that rcu_bh_qs() is bad in these non-ksoftirqd contexts is
that there might be a local_bh_enable() inside an RCU-preempt read-side
critical section.  This local_bh_enable() can invoke __do_softirq()
directly, so if __do_softirq() were to invoke rcu_bh_qs() (which just
calls rcu_preempt_qs() in the PREEMPT_RT_FULL case), there would be
an illegal RCU-preempt quiescent state in the middle of an RCU-preempt
read-side critical section.  Therefore, quiescent states can only happen
in cases where __do_softirq() is invoked directly from ksoftirqd.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005184518.GA21601@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agorcu-more-fallout.patch
Thomas Gleixner [Mon, 14 Nov 2011 09:57:54 +0000 (10:57 +0100)]
rcu-more-fallout.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agorcu: Fix macro substitution for synchronize_rcu_bh() on RT
John Kacur [Mon, 14 Nov 2011 01:44:42 +0000 (02:44 +0100)]
rcu: Fix macro substitution for synchronize_rcu_bh() on RT

kernel/rcutorture.c:492: error: ‘synchronize_rcu_bh’ undeclared here (not in a function)

synchronize_rcu_bh() is not just called as a normal function, but can
also be referenced as a function pointer. When CONFIG_PREEMPT_RT_FULL
is enabled, synchronize_rcu_bh() is defined as synchronize_rcu(), but
needs to be defined without the parenthesis because the compiler will
complain when synchronize_rcu_bh is referenced as a function pointer
and not a function.

Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/1321235083-21756-1-git-send-email-jkacur@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agorcu: Merge RCU-bh into RCU-preempt
Thomas Gleixner [Wed, 5 Oct 2011 18:59:38 +0000 (11:59 -0700)]
rcu: Merge RCU-bh into RCU-preempt

The Linux kernel has long RCU-bh read-side critical sections that
intolerably increase scheduling latency under mainline's RCU-bh rules,
which include RCU-bh read-side critical sections being non-preemptible.
This patch therefore arranges for RCU-bh to be implemented in terms of
RCU-preempt for CONFIG_PREEMPT_RT_FULL=y.

This has the downside of defeating the purpose of RCU-bh, namely,
handling the case where the system is subjected to a network-based
denial-of-service attack that keeps at least one CPU doing full-time
softirq processing.  This issue will be fixed by a later commit.

The current commit will need some work to make it appropriate for
mainline use, for example, it needs to be extended to cover Tiny RCU.

[ paulmck: Added a useful changelog ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20111005185938.GA20403@linux.vnet.ibm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agorcu: Frob softirq test
Peter Zijlstra [Fri, 12 Aug 2011 22:23:17 +0000 (00:23 +0200)]
rcu: Frob softirq test

With RT_FULL we get the below wreckage:

[  126.060484] =======================================================
[  126.060486] [ INFO: possible circular locking dependency detected ]
[  126.060489] 3.0.1-rt10+ #30
[  126.060490] -------------------------------------------------------
[  126.060492] irq/24-eth0/1235 is trying to acquire lock:
[  126.060495]  (&(lock)->wait_lock#2){+.+...}, at: [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[  126.060503]
[  126.060504] but task is already holding lock:
[  126.060506]  (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429
[  126.060511]
[  126.060511] which lock already depends on the new lock.
[  126.060513]
[  126.060514]
[  126.060514] the existing dependency chain (in reverse order) is:
[  126.060516]
[  126.060516] -> #1 (&p->pi_lock){-...-.}:
[  126.060519]        [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[  126.060524]        [<ffffffff8150291e>] _raw_spin_lock_irqsave+0x4b/0x85
[  126.060527]        [<ffffffff810b5aa4>] task_blocks_on_rt_mutex+0x36/0x20f
[  126.060531]        [<ffffffff815019bb>] rt_mutex_slowlock+0xd1/0x15a
[  126.060534]        [<ffffffff81501ae3>] rt_mutex_lock+0x2d/0x2f
[  126.060537]        [<ffffffff810d9020>] rcu_boost+0xad/0xde
[  126.060541]        [<ffffffff810d90ce>] rcu_boost_kthread+0x7d/0x9b
[  126.060544]        [<ffffffff8109a760>] kthread+0x99/0xa1
[  126.060547]        [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[  126.060551]
[  126.060552] -> #0 (&(lock)->wait_lock#2){+.+...}:
[  126.060555]        [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816
[  126.060558]        [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[  126.060561]        [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73
[  126.060564]        [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[  126.060566]        [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29
[  126.060569]        [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4
[  126.060573]        [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89
[  126.060576]        [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5
[  126.060580]        [<ffffffff8107511c>] try_to_wake_up+0x175/0x429
[  126.060583]        [<ffffffff81075425>] wake_up_process+0x15/0x17
[  126.060585]        [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26
[  126.060590]        [<ffffffff81081df9>] irq_exit+0x49/0x55
[  126.060593]        [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98
[  126.060597]        [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20
[  126.060600]        [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44
[  126.060603]        [<ffffffff810d582c>] irq_thread+0xde/0x1af
[  126.060606]        [<ffffffff8109a760>] kthread+0x99/0xa1
[  126.060608]        [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[  126.060611]
[  126.060612] other info that might help us debug this:
[  126.060614]
[  126.060615]  Possible unsafe locking scenario:
[  126.060616]
[  126.060617]        CPU0                    CPU1
[  126.060619]        ----                    ----
[  126.060620]   lock(&p->pi_lock);
[  126.060623]                                lock(&(lock)->wait_lock);
[  126.060625]                                lock(&p->pi_lock);
[  126.060627]   lock(&(lock)->wait_lock);
[  126.060629]
[  126.060629]  *** DEADLOCK ***
[  126.060630]
[  126.060632] 1 lock held by irq/24-eth0/1235:
[  126.060633]  #0:  (&p->pi_lock){-...-.}, at: [<ffffffff81074fdc>] try_to_wake_up+0x35/0x429
[  126.060638]
[  126.060638] stack backtrace:
[  126.060641] Pid: 1235, comm: irq/24-eth0 Not tainted 3.0.1-rt10+ #30
[  126.060643] Call Trace:
[  126.060644]  <IRQ>  [<ffffffff810acbde>] print_circular_bug+0x289/0x29a
[  126.060651]  [<ffffffff810af1b8>] __lock_acquire+0x1157/0x1816
[  126.060655]  [<ffffffff810ab3aa>] ? trace_hardirqs_off_caller+0x1f/0x99
[  126.060658]  [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[  126.060661]  [<ffffffff810afe9e>] lock_acquire+0x145/0x18a
[  126.060664]  [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[  126.060668]  [<ffffffff8150279e>] _raw_spin_lock+0x40/0x73
[  126.060671]  [<ffffffff81501c81>] ? rt_mutex_slowunlock+0x16/0x55
[  126.060674]  [<ffffffff810d9655>] ? rcu_report_qs_rsp+0x87/0x8c
[  126.060677]  [<ffffffff81501c81>] rt_mutex_slowunlock+0x16/0x55
[  126.060680]  [<ffffffff810d9ea3>] ? rcu_read_unlock_special+0x9b/0x1c4
[  126.060683]  [<ffffffff81501ce7>] rt_mutex_unlock+0x27/0x29
[  126.060687]  [<ffffffff810d9f86>] rcu_read_unlock_special+0x17e/0x1c4
[  126.060690]  [<ffffffff810da014>] __rcu_read_unlock+0x48/0x89
[  126.060693]  [<ffffffff8106847a>] select_task_rq_rt+0xc7/0xd5
[  126.060696]  [<ffffffff810683da>] ? select_task_rq_rt+0x27/0xd5
[  126.060701]  [<ffffffff810a852a>] ? clockevents_program_event+0x8e/0x90
[  126.060704]  [<ffffffff8107511c>] try_to_wake_up+0x175/0x429
[  126.060708]  [<ffffffff810a95dc>] ? tick_program_event+0x1f/0x21
[  126.060711]  [<ffffffff81075425>] wake_up_process+0x15/0x17
[  126.060715]  [<ffffffff81080a51>] wakeup_softirqd+0x24/0x26
[  126.060718]  [<ffffffff81081df9>] irq_exit+0x49/0x55
[  126.060721]  [<ffffffff8150a3bd>] smp_apic_timer_interrupt+0x8a/0x98
[  126.060724]  [<ffffffff81509793>] apic_timer_interrupt+0x13/0x20
[  126.060726]  <EOI>  [<ffffffff81072855>] ? migrate_disable+0x75/0x12d
[  126.060733]  [<ffffffff81080a61>] ? local_bh_disable+0xe/0x1f
[  126.060736]  [<ffffffff81080a70>] ? local_bh_disable+0x1d/0x1f
[  126.060739]  [<ffffffff810d5952>] irq_forced_thread_fn+0x1b/0x44
[  126.060742]  [<ffffffff81502ac0>] ? _raw_spin_unlock_irq+0x3b/0x59
[  126.060745]  [<ffffffff810d582c>] irq_thread+0xde/0x1af
[  126.060748]  [<ffffffff810d5937>] ? irq_thread_fn+0x3a/0x3a
[  126.060751]  [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1
[  126.060754]  [<ffffffff810d574e>] ? irq_finalize_oneshot+0xd1/0xd1
[  126.060757]  [<ffffffff8109a760>] kthread+0x99/0xa1
[  126.060761]  [<ffffffff81509b14>] kernel_thread_helper+0x4/0x10
[  126.060764]  [<ffffffff81069ed7>] ? finish_task_switch+0x87/0x10a
[  126.060768]  [<ffffffff81502ec4>] ? retint_restore_args+0xe/0xe
[  126.060771]  [<ffffffff8109a6c7>] ? __init_kthread_worker+0x8c/0x8c
[  126.060774]  [<ffffffff81509b10>] ? gs_change+0xb/0xb

Because irq_exit() does:

void irq_exit(void)
{
account_system_vtime(current);
trace_hardirq_exit();
sub_preempt_count(IRQ_EXIT_OFFSET);
if (!in_interrupt() && local_softirq_pending())
invoke_softirq();

...
}

Which triggers a wakeup, which uses RCU, now if the interrupted task has
t->rcu_read_unlock_special set, the rcu usage from the wakeup will end
up in rcu_read_unlock_special(). rcu_read_unlock_special() will test
for in_irq(), which will fail as we just decremented preempt_count
with IRQ_EXIT_OFFSET, and in_sering_softirq(), which for
PREEMPT_RT_FULL reads:

int in_serving_softirq(void)
{
int res;

preempt_disable();
res = __get_cpu_var(local_softirq_runner) == current;
preempt_enable();
return res;
}

Which will thus also fail, resulting in the above wreckage.

The 'somewhat' ugly solution is to open-code the preempt_count() test
in rcu_read_unlock_special().

Also, we're not at all sure how ->rcu_read_unlock_special gets set
here... so this is very likely a bandaid and more thought is required.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
12 years agoRCU: Force PREEMPT_RCU for PREEMPT-RT
Ingo Molnar [Fri, 3 Jul 2009 13:30:30 +0000 (08:30 -0500)]
RCU: Force PREEMPT_RCU for PREEMPT-RT

PREEMPT_RT relies on PREEMPT_RCU - only allow RCU to be configured
interactively in the !PREEMPT_RT case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-j1y0phicu6s6pu8guku2vca0@git.kernel.org
12 years agotimer-handle-idle-trylock-in-get-next-timer-irq.patch
Thomas Gleixner [Sun, 17 Jul 2011 20:08:38 +0000 (22:08 +0200)]
timer-handle-idle-trylock-in-get-next-timer-irq.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agorwlocks: Fix section mismatch
John Kacur [Mon, 19 Sep 2011 09:09:27 +0000 (11:09 +0200)]
rwlocks: Fix section mismatch

This fixes the following build error for the preempt-rt kernel.

make kernel/fork.o
  CC      kernel/fork.o
kernel/fork.c:90: error: section of ¡tasklist_lock¢ conflicts with previous declaration
make[2]: *** [kernel/fork.o] Error 1
make[1]: *** [kernel/fork.o] Error 2

The rt kernel cache aligns the RWLOCK in DEFINE_RWLOCK by default.
The non-rt kernels explicitly cache align only the tasklist_lock in
kernel/fork.c
That can create a build conflict. This fixes the build problem by making the
non-rt kernels cache align RWLOCKs by default. The side effect is that
the other RWLOCKs are also cache aligned for non-rt.

This is a short term solution for rt only.
The longer term solution would be to push the cache aligned DEFINE_RWLOCK
to mainline. If there are objections, then we could create a
DEFINE_RWLOCK_CACHE_ALIGNED or something of that nature.

Comments? Objections?

Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.LFD.2.00.1109191104010.23118@localhost6.localdomain6
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agort: Add the preempt-rt lock replacement APIs
Thomas Gleixner [Sun, 26 Jul 2009 17:39:56 +0000 (19:39 +0200)]
rt: Add the preempt-rt lock replacement APIs

Map spinlocks, rwlocks, rw_semaphores and semaphores to the rt_mutex
based locking functions for preempt-rt.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agorwsem-add-rt-variant.patch
Thomas Gleixner [Wed, 29 Jun 2011 19:02:53 +0000 (21:02 +0200)]
rwsem-add-rt-variant.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agort-add-rt-to-mutex-headers.patch
Thomas Gleixner [Wed, 29 Jun 2011 18:56:22 +0000 (20:56 +0200)]
rt-add-rt-to-mutex-headers.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agort-add-rt-spinlocks.patch
Thomas Gleixner [Wed, 29 Jun 2011 17:43:35 +0000 (19:43 +0200)]
rt-add-rt-spinlocks.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agortmutex-avoid-include-hell.patch
Thomas Gleixner [Wed, 29 Jun 2011 18:06:39 +0000 (20:06 +0200)]
rtmutex-avoid-include-hell.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agospinlock-types-separate-raw.patch
Thomas Gleixner [Wed, 29 Jun 2011 17:34:01 +0000 (19:34 +0200)]
spinlock-types-separate-raw.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agort-mutex-add-sleeping-spinlocks-support.patch
Thomas Gleixner [Fri, 10 Jun 2011 09:21:25 +0000 (11:21 +0200)]
rt-mutex-add-sleeping-spinlocks-support.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agofutex: Fix bug on when a requeued RT task times out
Steven Rostedt [Sat, 24 Mar 2012 14:29:20 +0000 (09:29 -0500)]
futex: Fix bug on when a requeued RT task times out

Requeue with timeout causes a bug with PREEMPT_RT_FULL.

The bug comes from a timed out condition.

TASK 1 TASK 2
------ ------
    futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>

double_lock_hb();

raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
    current->pi_blocked_on = PI_WAKE_INPROGRESS;
    run_spin_unlock(pi_lock);
    spin_lock(hb->lock); <-- blocked!

plist_for_each_entry_safe(this) {
    rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!

The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.

When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.

The fix:

When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.

Cc: stable-rt@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agortmutex-futex-prepare-rt.patch
Thomas Gleixner [Fri, 10 Jun 2011 09:04:15 +0000 (11:04 +0200)]
rtmutex-futex-prepare-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agortmutex-lock-killable.patch
Thomas Gleixner [Thu, 9 Jun 2011 09:43:52 +0000 (11:43 +0200)]
rtmutex-lock-killable.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agomd: raid5: Make raid5_percpu handling RT aware
Thomas Gleixner [Tue, 6 Apr 2010 14:51:31 +0000 (16:51 +0200)]
md: raid5: Make raid5_percpu handling RT aware

__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.

Serialize the access to the percpu data with a lock and keep the code
preemptible.

Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
12 years agolocal-vars-migrate-disable.patch
Thomas Gleixner [Tue, 28 Jun 2011 18:42:16 +0000 (20:42 +0200)]
local-vars-migrate-disable.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agogenirq: Allow disabling of softirq processing in irq thread context
Thomas Gleixner [Tue, 31 Jan 2012 12:01:27 +0000 (13:01 +0100)]
genirq: Allow disabling of softirq processing in irq thread context

The processing of softirqs in irq thread context is a performance gain
for the non-rt workloads of a system, but it's counterproductive for
interrupts which are explicitely related to the realtime
workload. Allow such interrupts to prevent softirq processing in their
thread context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
12 years agotasklet: Prevent tasklets from going into infinite spin in RT
Ingo Molnar [Wed, 30 Nov 2011 01:18:22 +0000 (20:18 -0500)]
tasklet: Prevent tasklets from going into infinite spin in RT

When CONFIG_PREEMPT_RT_FULL is enabled, tasklets run as threads,
and spinlocks turn are mutexes. But this can cause issues with
tasks disabling tasklets. A tasklet runs under ksoftirqd, and
if a tasklets are disabled with tasklet_disable(), the tasklet
count is increased. When a tasklet runs, it checks this counter
and if it is set, it adds itself back on the softirq queue and
returns.

The problem arises in RT because ksoftirq will see that a softirq
is ready to run (the tasklet softirq just re-armed itself), and will
not sleep, but instead run the softirqs again. The tasklet softirq
will still see that the count is non-zero and will not execute
the tasklet and requeue itself on the softirq again, which will
cause ksoftirqd to run it again and again and again.

It gets worse because ksoftirqd runs as a real-time thread.
If it preempted the task that disabled tasklets, and that task
has migration disabled, or can't run for other reasons, the tasklet
softirq will never run because the count will never be zero, and
ksoftirqd will go into an infinite loop. As an RT task, it this
becomes a big problem.

This is a hack solution to have tasklet_disable stop tasklets, and
when a tasklet runs, instead of requeueing the tasklet softirqd
it delays it. When tasklet_enable() is called, and tasklets are
waiting, then the tasklet_enable() will kick the tasklets to continue.
This prevents the lock up from ksoftirq going into an infinite loop.

[ rostedt@goodmis.org: ported to 3.0-rt ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosoftirq-make-fifo.patch
Thomas Gleixner [Thu, 21 Jul 2011 19:06:43 +0000 (21:06 +0200)]
softirq-make-fifo.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosoftirq-disable-softirq-stacks-for-rt.patch
Thomas Gleixner [Mon, 18 Jul 2011 11:59:17 +0000 (13:59 +0200)]
softirq-disable-softirq-stacks-for-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosoftirq: Fix unplug deadlock
Peter Zijlstra [Fri, 30 Sep 2011 13:52:14 +0000 (15:52 +0200)]
softirq: Fix unplug deadlock

If ksoftirqd gets woken during hot-unplug, __thread_do_softirq() will
call pin_current_cpu() which will block on the held cpu_hotplug.lock.
Moving the offline check in __thread_do_softirq() before the
pin_current_cpu() call doesn't work, since the wakeup can happen
before we mark the cpu offline.

So here we have the ksoftirq thread stuck until hotplug finishes, but
then the ksoftirq CPU_DOWN notifier issues kthread_stop() which will
wait for the ksoftirq thread to go away -- while holding the hotplug
lock.

Sort this by delaying the kthread_stop() until CPU_POST_DEAD, which is
outside of the cpu_hotplug.lock, but still serialized by the
cpu_add_remove_lock.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: rostedt <rostedt@goodmis.org>
Cc: Clark Williams <williams@redhat.com>
Link: http://lkml.kernel.org/r/1317391156.12973.3.camel@twins
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agohardirq.h: Define softirq_count() as OUL to kill build warning
Yong Zhang [Thu, 13 Oct 2011 09:19:09 +0000 (17:19 +0800)]
hardirq.h: Define softirq_count() as OUL to kill build warning

kernel/lockdep.c: In function ‘print_bad_irq_dependency’:
kernel/lockdep.c:1476:3: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 7 has type ‘unsigned int’
kernel/lockdep.c: In function ‘print_usage_bug’:
kernel/lockdep.c:2193:3: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 7 has type ‘unsigned int’

kernel/lockdep.i show this:
 printk("%s/%d [HC%u[%lu]:SC%u[%lu]:HE%u:SE%u] is trying to acquire:\n",
  curr->comm, task_pid_nr(curr),
  curr->hardirq_context, ((current_thread_info()->preempt_count) & (((1UL << (10))-1) << ((0 + 8) + 8))) >> ((0 + 8) + 8),
  curr->softirq_context, (0U) >> (0 + 8),
  curr->hardirqs_enabled,
  curr->softirqs_enabled);

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Link: http://lkml.kernel.org/r/20111013091909.GA32739@zhy
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosoftirq: Export in_serving_softirq()
John Kacur [Mon, 14 Nov 2011 01:44:43 +0000 (02:44 +0100)]
softirq: Export in_serving_softirq()

ERROR: "in_serving_softirq" [net/sched/cls_cgroup.ko] undefined!

The above can be fixed by exporting in_serving_softirq

Signed-off-by: John Kacur <jkacur@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable-rt@vger.kernel.org
Link: http://lkml.kernel.org/r/1321235083-21756-2-git-send-email-jkacur@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosoftirq-local-lock.patch
Thomas Gleixner [Tue, 28 Jun 2011 13:57:18 +0000 (15:57 +0200)]
softirq-local-lock.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agomutex-no-spin-on-rt.patch
Thomas Gleixner [Sun, 17 Jul 2011 19:51:45 +0000 (21:51 +0200)]
mutex-no-spin-on-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agolockdep-rt.patch
Thomas Gleixner [Sun, 17 Jul 2011 16:51:23 +0000 (18:51 +0200)]
lockdep-rt.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosoftirq: Sanitize softirq pending for NOHZ/RT
Thomas Gleixner [Fri, 3 Jul 2009 18:16:38 +0000 (13:16 -0500)]
softirq: Sanitize softirq pending for NOHZ/RT

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agonet-netif_rx_ni-migrate-disable.patch
Thomas Gleixner [Sun, 17 Jul 2011 14:29:27 +0000 (16:29 +0200)]
net-netif_rx_ni-migrate-disable.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoring-buffer: Convert reader_lock from raw_spin_lock into spin_lock
Steven Rostedt [Tue, 27 Sep 2011 17:56:50 +0000 (13:56 -0400)]
ring-buffer: Convert reader_lock from raw_spin_lock into spin_lock

The reader_lock is mostly taken in normal context with interrupts enabled.
But because ftrace_dump() can happen anywhere, it is used as a spin lock
and in some cases a check to in_nmi() is performed to determine if the
ftrace_dump() was initiated from an NMI and if it is, the lock is not taken.

But having the lock as a raw_spin_lock() causes issues with the real-time
kernel as the lock is held during allocation and freeing of the buffer.
As memory locks convert into mutexes, keeping the reader_lock as a spin_lock
causes problems.

Converting the reader_lock is not straight forward as we must still deal
with the ftrace_dump() happening not only from an NMI but also from
true interrupt context in PREEPMT_RT.

Two wrapper functions are created to take and release the reader lock:

  int read_buffer_lock(cpu_buffer, unsigned long *flags)
  void read_buffer_unlock(cpu_buffer, unsigned long flags, int locked)

The read_buffer_lock() returns 1 if it actually took the lock, disables
interrupts and updates the flags. The only time it returns 0 is in the
case of a ftrace_dump() happening in an unsafe context.

The read_buffer_unlock() checks the return of locked and will simply
unlock the spin lock if it was successfully taken.

Instead of just having this in specific cases that the NMI might call
into, all instances of the reader_lock is converted to the wrapper
functions to make this a bit simpler to read and less error prone.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <clark@redhat.com>
Link: http://lkml.kernel.org/r/1317146210.26514.33.camel@gandalf.stny.rr.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agoftrace-crap.patch
Thomas Gleixner [Fri, 9 Sep 2011 14:55:53 +0000 (16:55 +0200)]
ftrace-crap.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched-clear-pf-thread-bound-on-fallback-rq.patch
Thomas Gleixner [Fri, 4 Nov 2011 19:48:36 +0000 (20:48 +0100)]
sched-clear-pf-thread-bound-on-fallback-rq.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched: Have migrate_disable ignore bounded threads
Peter Zijlstra [Tue, 27 Sep 2011 12:40:25 +0000 (08:40 -0400)]
sched: Have migrate_disable ignore bounded threads

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Link: http://lkml.kernel.org/r/20110927124423.567944215@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched: Do not compare cpu masks in scheduler
Peter Zijlstra [Tue, 27 Sep 2011 12:40:24 +0000 (08:40 -0400)]
sched: Do not compare cpu masks in scheduler

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Link: http://lkml.kernel.org/r/20110927124423.128129033@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched: Postpone actual migration disalbe to schedule
Steven Rostedt [Tue, 27 Sep 2011 12:40:23 +0000 (08:40 -0400)]
sched: Postpone actual migration disalbe to schedule

The migrate_disable() can cause a bit of a overhead to the RT kernel,
as changing the affinity is expensive to do at every lock encountered.
As a running task can not migrate, the actual disabling of migration
does not need to occur until the task is about to schedule out.

In most cases, a task that disables migration will enable it before
it schedules making this change improve performance tremendously.

[ Frank Rowand: UP compile fix ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Clark Williams <williams@redhat.com>
Link: http://lkml.kernel.org/r/20110927124422.779693167@goodmis.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched: teach migrate_disable about atomic contexts
Peter Zijlstra [Fri, 2 Sep 2011 12:29:27 +0000 (14:29 +0200)]
sched: teach migrate_disable about atomic contexts

 <NMI>  [<ffffffff812dafd8>] spin_bug+0x94/0xa8
 [<ffffffff812db07f>] do_raw_spin_lock+0x43/0xea
 [<ffffffff814fa9be>] _raw_spin_lock_irqsave+0x6b/0x85
 [<ffffffff8106ff9e>] ? migrate_disable+0x75/0x12d
 [<ffffffff81078aaf>] ? pin_current_cpu+0x36/0xb0
 [<ffffffff8106ff9e>] migrate_disable+0x75/0x12d
 [<ffffffff81115b9d>] pagefault_disable+0xe/0x1f
 [<ffffffff81047027>] copy_from_user_nmi+0x74/0xe6
 [<ffffffff810489d7>] perf_callchain_user+0xf3/0x135

Now clearly we can't go around taking locks from NMI context, cure
this by short-circuiting migrate_disable() when we're in an atomic
context already.

Add some extra debugging to avoid things like:

  preempt_disable()
  migrate_disable();

  preempt_enable();
  migrate_enable();

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1314967297.1301.14.camel@twins
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-wbot4vsmwhi8vmbf83hsclk6@git.kernel.org
12 years agosched, rt: Fix migrate_enable() thinko
Mike Galbraith [Tue, 23 Aug 2011 14:12:43 +0000 (16:12 +0200)]
sched, rt: Fix migrate_enable() thinko

Assigning mask = tsk_cpus_allowed(p) after p->migrate_disable = 0 ensures
that we won't see a mask change.. no push/pull, we stack tasks on one CPU.

Also add a couple fields to sched_debug for the next guy.

[ Build fix from Stratos Psomadakis <psomas@gentoo.org> ]

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1314108763.6689.4.camel@marge.simson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
12 years agosched: Generic migrate_disable
Peter Zijlstra [Thu, 11 Aug 2011 13:14:58 +0000 (15:14 +0200)]
sched: Generic migrate_disable

Make migrate_disable() be a preempt_disable() for !rt kernels. This
allows generic code to use it but still enforces that these code
sections stay relatively small.

A preemptible migrate_disable() accessible for general use would allow
people growing arbitrary per-cpu crap instead of clean these things
up.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-275i87sl8e1jcamtchmehonm@git.kernel.org
12 years agosched: Optimize migrate_disable
Peter Zijlstra [Thu, 11 Aug 2011 13:03:35 +0000 (15:03 +0200)]
sched: Optimize migrate_disable

Change from task_rq_lock() to raw_spin_lock(&rq->lock) to avoid a few
atomic ops. See comment on why it should be safe.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-cbz6hkl5r5mvwtx5s3tor2y6@git.kernel.org
12 years agomigrate-disable-rt-variant.patch
Thomas Gleixner [Sun, 17 Jul 2011 17:48:20 +0000 (19:48 +0200)]
migrate-disable-rt-variant.patch

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>