Discussion:
Interrupts disabled for too long in printk
(too old to reply)
Mathieu Desnoyers
2006-06-03 11:30:15 UTC
Permalink
Hi,

I ran some experiments with my kernel tracer (LTTng : http://ltt.polymtl.ca)
that showed missing interrupts. I wrote a small paper to show how to use my
tracer to solve this kind of problem which I presented at the CE Linux Form
last April.

http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=celf2006-desnoyers.pdf

It shows that, when the serial console is activated, the following code disables
interrupts for up to 15ms. On a system configured with a 250HZ timer (each 4ms),
it means that 3 scheduler ticks are lost.

In the current git :

kernel/printk.c: release_console_sem()

for ( ; ; ) {
-----> spin_lock_irqsave(&logbuf_lock, flags);
wake_klogd |= log_start - log_end;
if (con_start == log_end)
break; /* Nothing to print */
_con_start = con_start;
_log_end = log_end;
con_start = log_end; /* Flush */
spin_unlock(&logbuf_lock);
call_console_drivers(_con_start, _log_end);
-----> local_irq_restore(flags);
}

I guess interrupts are disabled for a good reason (to protect this spinlock for
being taken by a nested interrupt handler. One way I am thinking to fix this
problem would be to do a spin try lock and fail if it is already taken.

Mathieu


OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Steven Rostedt
2006-06-03 21:50:06 UTC
Permalink
Post by Mathieu Desnoyers
Hi,
I ran some experiments with my kernel tracer (LTTng : http://ltt.polymtl.ca)
that showed missing interrupts. I wrote a small paper to show how to use my
tracer to solve this kind of problem which I presented at the CE Linux Form
last April.
http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=celf2006-desnoyers.pdf
It shows that, when the serial console is activated, the following code disables
interrupts for up to 15ms. On a system configured with a 250HZ timer (each 4ms),
it means that 3 scheduler ticks are lost.
kernel/printk.c: release_console_sem()
for ( ; ; ) {
-----> spin_lock_irqsave(&logbuf_lock, flags);
wake_klogd |= log_start - log_end;
if (con_start == log_end)
break; /* Nothing to print */
_con_start = con_start;
_log_end = log_end;
con_start = log_end; /* Flush */
spin_unlock(&logbuf_lock);
call_console_drivers(_con_start, _log_end);
-----> local_irq_restore(flags);
}
I guess interrupts are disabled for a good reason (to protect this spinlock for
being taken by a nested interrupt handler. One way I am thinking to fix this
problem would be to do a spin try lock and fail if it is already taken.
So what's the problem?

printk is more for debugging. If you don't like the latency then disable
printks. But turning the spin_lock_irqsave into a spin_lock means you
need to do a trylock every time in printk. Since printk can be called
from interrupt handlers. So what do you do when you fail? just return?
So you just lost your printk that you needed, which could be of
importance.

Actually, the spin_lock is not your problem, since it is not held when
the console drivers are being called. But...

There may be console drivers that grab spin_locks without turning off
interrupts, which mean that you can again deadlock if an interrupt that
calls printk happens in one of those drivers.

If latency is your worry, then try out Ingo Molnar's -rt patch
http://people.redhat.com/mingo/realtime-preempt/
It isn't affected by this problem.

-- Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Mathieu Desnoyers
2006-06-08 01:10:08 UTC
Permalink
Hi Steven,
Post by Steven Rostedt
So what's the problem?
printk is more for debugging. If you don't like the latency then disable
printks.
I have not seen many systems where printk is totally disabled : it is useful to
have a trace of the kernel messages somewhere.
Post by Steven Rostedt
But turning the spin_lock_irqsave into a spin_lock means you
need to do a trylock every time in printk. Since printk can be called
from interrupt handlers. So what do you do when you fail? just return?
Yes the idea is to return, leaving the data in the buffers and not transferring
it to the console driver. As there is already another execution thread in the
relase_console_sem loop, it will take care of the data in its next pass in the
loop.
Post by Steven Rostedt
So you just lost your printk that you needed, which could be of
importance.
No, it is not lost.
Post by Steven Rostedt
Actually, the spin_lock is not your problem, since it is not held when
the console drivers are being called. But...
irq disabling is the problem. And if this function stays with a standard
spin_lock (not a try lock), disabling interrupts is required.
Post by Steven Rostedt
There may be console drivers that grab spin_locks without turning off
interrupts, which mean that you can again deadlock if an interrupt that
calls printk happens in one of those drivers.
Wait.. the release_console_sem calls the console drivers with interrupts
already disabled. I do not understand your last statement.
Post by Steven Rostedt
If latency is your worry, then try out Ingo Molnar's -rt patch
http://people.redhat.com/mingo/realtime-preempt/
It isn't affected by this problem.
My main concern is more than just latency : missing 3 timer interrupts in a row
has an impact on the kernel time keeping. On systems where NTP is not available,
it can be important. Linux, if I remember well, deals with cases where one timer
interrupt is missed, but not 2. So, if we get one printk message on the console
each minute on a system at 100HZ, we will suffer of a 14.40s drift every day.


Mathieu
Post by Steven Rostedt
-- Steve
OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jon Smirl
2006-06-08 01:40:08 UTC
Permalink
Post by Mathieu Desnoyers
Hi,
I ran some experiments with my kernel tracer (LTTng : http://ltt.polymtl.ca)
that showed missing interrupts. I wrote a small paper to show how to use my
tracer to solve this kind of problem which I presented at the CE Linux Form
last April.
http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=celf2006-desnoyers.pdf
It shows that, when the serial console is activated, the following code disables
interrupts for up to 15ms. On a system configured with a 250HZ timer (each 4ms),
it means that 3 scheduler ticks are lost.
kernel/printk.c: release_console_sem()
for ( ; ; ) {
-----> spin_lock_irqsave(&logbuf_lock, flags);
wake_klogd |= log_start - log_end;
if (con_start == log_end)
break; /* Nothing to print */
_con_start = con_start;
_log_end = log_end;
con_start = log_end; /* Flush */
spin_unlock(&logbuf_lock);
call_console_drivers(_con_start, _log_end);
-----> local_irq_restore(flags);
}
You can look at this problem from the other direction too. Why is it
taking 15ms to get between the two points? If IRQs are off how is the
serial driver getting interrupts to be able to display the message? It
is probably worthwhile to take a look and see what the serial console
driver is doing.
Post by Mathieu Desnoyers
I guess interrupts are disabled for a good reason (to protect this spinlock for
being taken by a nested interrupt handler. One way I am thinking to fix this
problem would be to do a spin try lock and fail if it is already taken.
--
Jon Smirl
***@gmail.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jon Smirl
2006-06-08 02:40:06 UTC
Permalink
Post by Jon Smirl
You can look at this problem from the other direction too. Why is it
taking 15ms to get between the two points? If IRQs are off how is the
serial driver getting interrupts to be able to display the message? It
is probably worthwhile to take a look and see what the serial console
driver is doing.
Hi John,
The serial port is configured at 38000 bauds. It can therefore transmit 4800
bytes per seconds, for 72 characters in 15 ms. So the console driver would be
simply busy sending characters to the serial port during that interrupt
disabling period.
Why can't the serial console driver record the message in a buffer at
the point where it is being called with interrupts off, and then let
the serial port slowly print it via interupts? It sounds to me like
the serial port is being driven in polling mode.
Mathieu
OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
Jon Smirl
***@gmail.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
linux-os (Dick Johnson)
2006-06-08 10:30:16 UTC
Permalink
Post by Jon Smirl
Post by Jon Smirl
You can look at this problem from the other direction too. Why is it
taking 15ms to get between the two points? If IRQs are off how is the
serial driver getting interrupts to be able to display the message? It
is probably worthwhile to take a look and see what the serial console
driver is doing.
Hi John,
The serial port is configured at 38000 bauds. It can therefore transmit 4800
bytes per seconds, for 72 characters in 15 ms. So the console driver would be
simply busy sending characters to the serial port during that interrupt
disabling period.
Why can't the serial console driver record the message in a buffer at
the point where it is being called with interrupts off, and then let
the serial port slowly print it via interupts? It sounds to me like
the serial port is being driven in polling mode.
Mathieu
OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
Probably because there is no buffer that is big enough! If the character
source (a call to printk()) can generate N characters per second, but
the UART can only send out N-1, then the buffer will fill up at which
time software will wait until the UART is ready for the next character,
effectively becoming poll-mode with a buffer full of characters as
backlog.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.88 BogoMips).
New book: http://www.AbominableFirebug.com/
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to ***@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jon Smirl
2006-06-08 14:40:11 UTC
Permalink
Post by linux-os (Dick Johnson)
Post by Jon Smirl
Post by Jon Smirl
You can look at this problem from the other direction too. Why is it
taking 15ms to get between the two points? If IRQs are off how is the
serial driver getting interrupts to be able to display the message? It
is probably worthwhile to take a look and see what the serial console
driver is doing.
Hi John,
The serial port is configured at 38000 bauds. It can therefore transmit 4800
bytes per seconds, for 72 characters in 15 ms. So the console driver would be
simply busy sending characters to the serial port during that interrupt
disabling period.
Why can't the serial console driver record the message in a buffer at
the point where it is being called with interrupts off, and then let
the serial port slowly print it via interupts? It sounds to me like
the serial port is being driven in polling mode.
Mathieu
OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
Probably because there is no buffer that is big enough! If the character
source (a call to printk()) can generate N characters per second, but
the UART can only send out N-1, then the buffer will fill up at which
time software will wait until the UART is ready for the next character,
effectively becoming poll-mode with a buffer full of characters as
backlog.
That sounds like a reasonable explanation if that is what is actually
happening. Does the serial console driver revert to a polling
behavior when interrupts are off?

What should the console do in this situation? If called to printk with
interrupts off, and the backlog buffer is full, should it suspend the
system like it is doing, or should it toss the printk and return an
error?
--
Jon Smirl
***@gmail.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
linux-os (Dick Johnson)
2006-06-08 15:20:08 UTC
Permalink
Post by Jon Smirl
Post by linux-os (Dick Johnson)
Post by Jon Smirl
Post by Jon Smirl
You can look at this problem from the other direction too. Why is it
taking 15ms to get between the two points? If IRQs are off how is the
serial driver getting interrupts to be able to display the message? It
is probably worthwhile to take a look and see what the serial console
driver is doing.
Hi John,
The serial port is configured at 38000 bauds. It can therefore transmit 4800
bytes per seconds, for 72 characters in 15 ms. So the console driver would be
simply busy sending characters to the serial port during that interrupt
disabling period.
Why can't the serial console driver record the message in a buffer at
the point where it is being called with interrupts off, and then let
the serial port slowly print it via interupts? It sounds to me like
the serial port is being driven in polling mode.
Mathieu
OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
Probably because there is no buffer that is big enough! If the character
source (a call to printk()) can generate N characters per second, but
the UART can only send out N-1, then the buffer will fill up at which
time software will wait until the UART is ready for the next character,
effectively becoming poll-mode with a buffer full of characters as
backlog.
That sounds like a reasonable explanation if that is what is actually
happening. Does the serial console driver revert to a polling
behavior when interrupts are off?
The last serial I/O that I looked at while attempting to fix a problem
with a ppp link, will normally sleep until there is room in the output
buffer. This makes everything work smoothly when doing normal I/O
using RS-232C. However, the serial console can't sleep when being
fed from interrupt context, so there are likely some compromises.

Right now, I have to take a work-break so I can't look at it, but
I would suggest that it is illegal, immoral, and fattening to do
printk() from interrupt context anyway, so you will not find anybody
who will "fix" the problem. Printk() is already buffered, but there
is only so much one can do when you can generate characters faster
than you can possibly dispose of them, especially from within an
interrupt or otherwise when the interrupts are off.
Post by Jon Smirl
What should the console do in this situation? If called to printk with
interrupts off, and the backlog buffer is full, should it suspend the
system like it is doing, or should it toss the printk and return an
error?
--
Jon Smirl
Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.88 BogoMips).
New book: http://www.AbominableFirebug.com/
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to ***@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jon Smirl
2006-06-08 15:50:12 UTC
Permalink
Post by linux-os (Dick Johnson)
Post by Jon Smirl
Post by linux-os (Dick Johnson)
Post by Jon Smirl
Post by Jon Smirl
You can look at this problem from the other direction too. Why is it
taking 15ms to get between the two points? If IRQs are off how is the
serial driver getting interrupts to be able to display the message? It
is probably worthwhile to take a look and see what the serial console
driver is doing.
Hi John,
The serial port is configured at 38000 bauds. It can therefore transmit 4800
bytes per seconds, for 72 characters in 15 ms. So the console driver would be
simply busy sending characters to the serial port during that interrupt
disabling period.
Why can't the serial console driver record the message in a buffer at
the point where it is being called with interrupts off, and then let
the serial port slowly print it via interupts? It sounds to me like
the serial port is being driven in polling mode.
Mathieu
OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
Probably because there is no buffer that is big enough! If the character
source (a call to printk()) can generate N characters per second, but
the UART can only send out N-1, then the buffer will fill up at which
time software will wait until the UART is ready for the next character,
effectively becoming poll-mode with a buffer full of characters as
backlog.
That sounds like a reasonable explanation if that is what is actually
happening. Does the serial console driver revert to a polling
behavior when interrupts are off?
The last serial I/O that I looked at while attempting to fix a problem
with a ppp link, will normally sleep until there is room in the output
buffer. This makes everything work smoothly when doing normal I/O
using RS-232C. However, the serial console can't sleep when being
fed from interrupt context, so there are likely some compromises.
Right now, I have to take a work-break so I can't look at it, but
I would suggest that it is illegal, immoral, and fattening to do
printk() from interrupt context anyway, so you will not find anybody
who will "fix" the problem. Printk() is already buffered, but there
is only so much one can do when you can generate characters faster
than you can possibly dispose of them, especially from within an
interrupt or otherwise when the interrupts are off.
Printk from interrupt context works fine on VGAcon/fbdev because they
can quickly print (us not ms) and don't need interrupts to work. You
can't remove the feature because everyone will then complain that they
can't debug their interrupt handlers. Removing this capability was
brought up at OLS last year and it got a big no vote.

This just seems to be an issue with the serial console implementation
which is much slower. So the answer looks to be that if the serial
console buffer is full, and it is being called with interrupts off, it
should just toss the printk. If someone really needs all the output
they would recompile the serial console with a bigger buffer. If you
get fancy the serial console could remember it tossed things and when
the buffer empties it could print out a message saying things were
lost. Alternativately you could just leave things like they are and
assume that printk's from interrupts are for debugging only and it
doesn't matter if they cause interrupts to be lost.
Post by linux-os (Dick Johnson)
Post by Jon Smirl
What should the console do in this situation? If called to printk with
interrupts off, and the backlog buffer is full, should it suspend the
system like it is doing, or should it toss the printk and return an
error?
--
Jon Smirl
Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.88 BogoMips).
New book: http://www.AbominableFirebug.com/
_

****************************************************************
Thank you.
--
Jon Smirl
***@gmail.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Miquel van Smoorenburg
2006-06-08 18:40:15 UTC
Permalink
Post by Jon Smirl
This just seems to be an issue with the serial console implementation
which is much slower. So the answer looks to be that if the serial
console buffer is full, and it is being called with interrupts off, it
should just toss the printk.
Read the serial console code. It is a standalone implementation
completely seperate from the standard drivers, which deliberately
turns off the interrupts and reverts to polling. This because
there is no guarantee the whole irq handling stuff still works
at the moment you're printk'ing a panic.

Also the current standard serial drivers only work if the
tty has been opened by a userspace process.

If you want to change this, first fix the latter problem, then
change the serial console output driver so that it uses the
standard serial driver for lower priority messages and only
uses the polling code for panics.

Mike.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Mathieu Desnoyers
2006-06-08 02:40:07 UTC
Permalink
Post by Jon Smirl
You can look at this problem from the other direction too. Why is it
taking 15ms to get between the two points? If IRQs are off how is the
serial driver getting interrupts to be able to display the message? It
is probably worthwhile to take a look and see what the serial console
driver is doing.
Hi John,

The serial port is configured at 38000 bauds. It can therefore transmit 4800
bytes per seconds, for 72 characters in 15 ms. So the console driver would be
simply busy sending characters to the serial port during that interrupt
disabling period.

Mathieu

OpenPGP public key: http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Loading...