Discussion:
[PATCH] bcm63xx_enet: fix poll callback.
(too old to reply)
Nicolas Schichan
2015-03-03 11:50:02 UTC
Permalink
In case there was some tx buffer reclaimed and not enough rx packets
to consume the whole budget, napi_complete would not be called and
interrupts would be kept disabled, effectively resulting in the
network core never to call the poll callback again and no rx/tx
interrupts to be fired either.

Fix that by only accounting the rx work done in the poll callback.

Signed-off-by: Nicolas Schichan <***@freebox.fr>
---
drivers/net/ethernet/broadcom/bcm63xx_enet.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index 21206d3..a7f2cc3 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -486,7 +486,7 @@ static int bcm_enet_poll(struct napi_struct *napi, int budget)
{
struct bcm_enet_priv *priv;
struct net_device *dev;
- int tx_work_done, rx_work_done;
+ int rx_work_done;

priv = container_of(napi, struct bcm_enet_priv, napi);
dev = priv->net_dev;
@@ -498,14 +498,14 @@ static int bcm_enet_poll(struct napi_struct *napi, int budget)
ENETDMAC_IR, priv->tx_chan);

/* reclaim sent skb */
- tx_work_done = bcm_enet_tx_reclaim(dev, 0);
+ bcm_enet_tx_reclaim(dev, 0);

spin_lock(&priv->rx_lock);
rx_work_done = bcm_enet_receive_queue(dev, budget);
spin_unlock(&priv->rx_lock);

- if (rx_work_done >= budget || tx_work_done > 0) {
- /* rx/tx queue is not yet empty/clean */
+ if (rx_work_done >= budget) {
+ /* rx queue is not yet empty/clean */
return rx_work_done;
}
--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric Dumazet
2015-03-03 13:40:02 UTC
Permalink
Post by Nicolas Schichan
In case there was some tx buffer reclaimed and not enough rx packets
to consume the whole budget, napi_complete would not be called and
interrupts would be kept disabled, effectively resulting in the
network core never to call the poll callback again and no rx/tx
interrupts to be fired either.
Fix that by only accounting the rx work done in the poll callback.
---
This looks better, thanks.

Acked-by: Eric Dumazet <***@google.com>

Note that the way bcm_enet_tx_reclaim() is written, it can livelock on
SMP hosts :

CPU 1,2,3,... keep queuing packets via ndo_start_xmit()

CPU 0 is looping forever in bcm_enet_tx_reclaim() draining queue and
freeing skbs.

To avoid that, I would take priv->tx_lock only once, or add a limit on
the number of skbs that can be drained per round.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric Dumazet
2015-03-03 13:50:01 UTC
Permalink
Post by Eric Dumazet
Post by Nicolas Schichan
In case there was some tx buffer reclaimed and not enough rx packets
to consume the whole budget, napi_complete would not be called and
interrupts would be kept disabled, effectively resulting in the
network core never to call the poll callback again and no rx/tx
interrupts to be fired either.
Fix that by only accounting the rx work done in the poll callback.
---
This looks better, thanks.
Note that the way bcm_enet_tx_reclaim() is written, it can livelock on
CPU 1,2,3,... keep queuing packets via ndo_start_xmit()
CPU 0 is looping forever in bcm_enet_tx_reclaim() draining queue and
freeing skbs.
To avoid that, I would take priv->tx_lock only once, or add a limit on
the number of skbs that can be drained per round.
Something like this (untested) patch

diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index 21206d33b638..9e8e83865e52 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -429,29 +429,23 @@ static int bcm_enet_receive_queue(struct net_device *dev, int budget)
*/
static int bcm_enet_tx_reclaim(struct net_device *dev, int force)
{
- struct bcm_enet_priv *priv;
- int released;
+ struct bcm_enet_priv *priv = netdev_priv(dev);
+ int released = 0;

- priv = netdev_priv(dev);
- released = 0;
+ spin_lock(&priv->tx_lock);

while (priv->tx_desc_count < priv->tx_ring_size) {
struct bcm_enet_desc *desc;
struct sk_buff *skb;

- /* We run in a bh and fight against start_xmit, which
- * is called with bh disabled */
- spin_lock(&priv->tx_lock);
-
desc = &priv->tx_desc_cpu[priv->tx_dirty_desc];

- if (!force && (desc->len_stat & DMADESC_OWNER_MASK)) {
- spin_unlock(&priv->tx_lock);
+ if (!force && (desc->len_stat & DMADESC_OWNER_MASK))
break;
- }

/* ensure other field of the descriptor were not read
- * before we checked ownership */
+ * before we checked ownership
+ */
rmb();

skb = priv->tx_skb[priv->tx_dirty_desc];
@@ -464,8 +458,6 @@ static int bcm_enet_tx_reclaim(struct net_device *dev, int force)
priv->tx_dirty_desc = 0;
priv->tx_desc_count++;

- spin_unlock(&priv->tx_lock);
-
if (desc->len_stat & DMADESC_UNDER_MASK)
dev->stats.tx_errors++;

@@ -476,6 +468,8 @@ static int bcm_enet_tx_reclaim(struct net_device *dev, int force)
if (netif_queue_stopped(dev) && released)
netif_wake_queue(dev);

+ spin_unlock(&priv->tx_lock);
+
return released;
}



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric Dumazet
2015-03-03 14:20:02 UTC
Permalink
Post by Eric Dumazet
Post by Eric Dumazet
To avoid that, I would take priv->tx_lock only once, or add a limit on
the number of skbs that can be drained per round.
Something like this (untested) patch
I'm not against testing this patch, but we do not have any SMP capable bcm63xx
board here so I don't think it will be of any use.
bcm6358 and bcm6368 do indeed have two MIPS threads, but SMP is not possible
(due to a data cache or TLB shared across all MIPS threads , unbearably
complicating things, IIRC).
bcm63xx ARM SoCs look like they can support SMP though.
Regards,
I am reasonably confident the patch is OK, but I cannot easily compile
it on my laptop (x86_64) :

CC drivers/net/ethernet/broadcom/bcm63xx_enet.o
drivers/net/ethernet/broadcom/bcm63xx_enet.c:34:30: fatal error:
bcm63xx_dev_enet.h: No such file or directory
#include <bcm63xx_dev_enet.h>
^
compilation terminated.
make[1]: *** [drivers/net/ethernet/broadcom/bcm63xx_enet.o] Error 1
make: *** [drivers/net/ethernet/broadcom/bcm63xx_enet.o] Error 2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Nicolas Schichan
2015-03-03 14:50:01 UTC
Permalink
Post by Eric Dumazet
Post by Eric Dumazet
Post by Eric Dumazet
To avoid that, I would take priv->tx_lock only once, or add a limit on
the number of skbs that can be drained per round.
Something like this (untested) patch
I'm not against testing this patch, but we do not have any SMP capable bcm63xx
board here so I don't think it will be of any use.
bcm6358 and bcm6368 do indeed have two MIPS threads, but SMP is not possible
(due to a data cache or TLB shared across all MIPS threads , unbearably
complicating things, IIRC).
bcm63xx ARM SoCs look like they can support SMP though.
Regards,
I am reasonably confident the patch is OK, but I cannot easily compile
Okay, I gave your patch a go and it works correctly on our non-SMP capable
boards (and with DEBUG_SPINLOCK=y).

Feel free to add:

Tested-by: Nicolas Schichan <***@freebox.fr>

Regards,
--
Nicolas Schichan
Freebox SAS
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Alexander Duyck
2015-03-03 16:10:05 UTC
Permalink
Post by Eric Dumazet
Post by Eric Dumazet
Post by Nicolas Schichan
In case there was some tx buffer reclaimed and not enough rx packets
to consume the whole budget, napi_complete would not be called and
interrupts would be kept disabled, effectively resulting in the
network core never to call the poll callback again and no rx/tx
interrupts to be fired either.
Fix that by only accounting the rx work done in the poll callback.
---
This looks better, thanks.
Note that the way bcm_enet_tx_reclaim() is written, it can livelock on
CPU 1,2,3,... keep queuing packets via ndo_start_xmit()
CPU 0 is looping forever in bcm_enet_tx_reclaim() draining queue and
freeing skbs.
To avoid that, I would take priv->tx_lock only once, or add a limit on
the number of skbs that can be drained per round.
Something like this (untested) patch
diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index 21206d33b638..9e8e83865e52 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -429,29 +429,23 @@ static int bcm_enet_receive_queue(struct net_device *dev, int budget)
*/
static int bcm_enet_tx_reclaim(struct net_device *dev, int force)
{
- struct bcm_enet_priv *priv;
- int released;
+ struct bcm_enet_priv *priv = netdev_priv(dev);
+ int released = 0;
- priv = netdev_priv(dev);
- released = 0;
+ spin_lock(&priv->tx_lock);
while (priv->tx_desc_count < priv->tx_ring_size) {
struct bcm_enet_desc *desc;
struct sk_buff *skb;
- /* We run in a bh and fight against start_xmit, which
- * is called with bh disabled */
- spin_lock(&priv->tx_lock);
-
desc = &priv->tx_desc_cpu[priv->tx_dirty_desc];
- if (!force && (desc->len_stat & DMADESC_OWNER_MASK)) {
- spin_unlock(&priv->tx_lock);
+ if (!force && (desc->len_stat & DMADESC_OWNER_MASK))
break;
- }
/* ensure other field of the descriptor were not read
- * before we checked ownership */
+ * before we checked ownership
+ */
rmb();
This rmb() can probably be replaced with a dma_rmb() since it is just a
coherent/coherent ordering you are concerned with based on the comment.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric Dumazet
2015-03-03 16:40:02 UTC
Permalink
Post by Alexander Duyck
This rmb() can probably be replaced with a dma_rmb() since it is just a
coherent/coherent ordering you are concerned with based on the comment.
Right, but this patch would be a stable candidate, where dma_rmb() does
not exist yet.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Jonas Gorski
2015-03-03 23:10:01 UTC
Permalink
Post by Eric Dumazet
Post by Eric Dumazet
To avoid that, I would take priv->tx_lock only once, or add a limit on
the number of skbs that can be drained per round.
Something like this (untested) patch
I'm not against testing this patch, but we do not have any SMP capable bcm63xx
board here so I don't think it will be of any use.
bcm6358 and bcm6368 do indeed have two MIPS threads, but SMP is not possible
(due to a data cache or TLB shared across all MIPS threads , unbearably
complicating things, IIRC).
6358 does have the shared TLB (early BMIPS4350), but 6368 (later
BMIPS4350) runs just fine in a regular SMP configuration, that's the
default for OpenWrt actually. Maybe Jonas has something readily
available he could test on?
While I do have several SMP capable boards here, I'm not sure what I'm
supposed to test and especially how.

I mean I can test whether it breaks things, but for testing whether it
fixes things I would need some help on how to break it first ;-)


Jonas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Miller
2015-03-04 20:50:02 UTC
Permalink
From: Nicolas Schichan <***@freebox.fr>
Date: Tue, 3 Mar 2015 12:45:12 +0100
Post by Nicolas Schichan
In case there was some tx buffer reclaimed and not enough rx packets
to consume the whole budget, napi_complete would not be called and
interrupts would be kept disabled, effectively resulting in the
network core never to call the poll callback again and no rx/tx
interrupts to be fired either.
Fix that by only accounting the rx work done in the poll callback.
Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Loading...