Discussion:
[PATCH 0/3] dwc2: Speed up the interrupt handler quite a bit
(too old to reply)
Douglas Anderson
2015-11-04 23:00:01 UTC
Permalink
The dwc2 interrupt handler is quite slow. On rk3288 with a few things
plugged into the ports and with cpufreq locked at 696MHz (to simulate
real world idle system), I can easily observe dwc2_handle_hcd_intr()
taking > 120 us, sometimes > 150 us. Note that SOF interrupts come
every 125 us with high speed USB, so taking > 120 us in the interrupt
handler is a big deal.

The patches here will speed up the interrupt controller significantly.
After this series, I have a hard time seeing the interrupt controller
taking > 20 us and I don't ever see it taking > 30 us ever in my tests
unless I bring the cpufreq back down. With the cpufreq at 126 MHz I can
still see the interrupt handler take > 50 us, so I'm sure we could
improve this further. ...but hey, it's a start.

In addition to the speedup, this series also has the advantage of
simplifying dwc2 and making it more like everyone else (introducing the
possibility of future simplifications). Picking this series up will
help your diffstat and likely win you friends. ;)

===

Steps for gathering data with ftrace:

cd /sys/devices/system/cpu/cpu0/cpufreq/
echo userspace > scaling_governor
echo 696000 > scaling_setspeed

cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo "" > trace
echo nop > current_tracer
echo function_graph > current_tracer
echo dwc2_handle_hcd_intr > set_graph_function
echo dwc2_handle_common_intr >> set_graph_function
echo dwc2_handle_hcd_intr > set_ftrace_filter
echo dwc2_handle_common_intr >> set_ftrace_filter
echo funcgraph-abstime > trace_options
echo 70 > tracing_thresh
echo 1 > /sys/kernel/debug/tracing/tracing_on

sleep 2
cat trace

===

NOTE: This series doesn't replace any other patches I've submitted
recently, it merely adds another set of changes that upstream could
benefit from.


Douglas Anderson (3):
usb: dwc2: rockchip: Make the max_transfer_size automatic
usb: dwc2: host: Giveback URB in tasklet context
usb: dwc2: host: Get aligned DMA in a more supported way

drivers/usb/dwc2/core.c | 21 +-----
drivers/usb/dwc2/hcd.c | 170 ++++++++++++++++++++-----------------------
drivers/usb/dwc2/hcd.h | 10 ---
drivers/usb/dwc2/hcd_intr.c | 65 -----------------
drivers/usb/dwc2/hcd_queue.c | 7 +-
drivers/usb/dwc2/platform.c | 2 +-
6 files changed, 85 insertions(+), 190 deletions(-)
--
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Douglas Anderson
2015-11-04 23:00:01 UTC
Permalink
Previously we needed to set the max_transfer_size to explicitly be 65535
because the old driver would detect that our hardware could support much
bigger transfers and then would try to do them. This wouldn't work
since the DMA alignment code couldn't support it.

Later in commit e8f8c14d9da7 ("usb: dwc2: clip max_transfer_size to
65535") upstream added support for clipping this automatically. Since
that commit it has been OK to just use "-1" (default), but nobody
bothered to change it.

Let's change it to default now for two reasons:
- It's nice to use autodetected params.
- If we can remove the 65535 limit, we can transfer more!

Signed-off-by: Douglas Anderson <***@chromium.org>
---
drivers/usb/dwc2/platform.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c
index 5859b0fa19ee..f26e0c31c07e 100644
--- a/drivers/usb/dwc2/platform.c
+++ b/drivers/usb/dwc2/platform.c
@@ -95,7 +95,7 @@ static const struct dwc2_core_params params_rk3066 = {
.host_rx_fifo_size = 520, /* 520 DWORDs */
.host_nperio_tx_fifo_size = 128, /* 128 DWORDs */
.host_perio_tx_fifo_size = 256, /* 256 DWORDs */
- .max_transfer_size = 65535,
+ .max_transfer_size = -1,
.max_packet_count = -1,
.host_channels = -1,
.phy_type = -1,
--
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Douglas Anderson
2015-11-04 23:00:02 UTC
Permalink
In commit 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet
context") support was added to give back the URB in tasklet context.
Let's take advantage of this in dwc2.

This speeds up the dwc2 interrupt handler considerably.

Signed-off-by: Douglas Anderson <***@chromium.org>
---
drivers/usb/dwc2/hcd.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index e79baf73c234..9e7988950c7a 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2273,9 +2273,7 @@ void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct dwc2_qtd *qtd,
kfree(qtd->urb);
qtd->urb = NULL;

- spin_unlock(&hsotg->lock);
usb_hcd_giveback_urb(dwc2_hsotg_to_hcd(hsotg), urb, status);
- spin_lock(&hsotg->lock);
}

/*
@@ -2888,7 +2886,7 @@ static struct hc_driver dwc2_hc_driver = {
.hcd_priv_size = sizeof(struct wrapper_priv_data),

.irq = _dwc2_hcd_irq,
- .flags = HCD_MEMORY | HCD_USB2,
+ .flags = HCD_MEMORY | HCD_USB2 | HCD_BH,

.start = _dwc2_hcd_start,
.stop = _dwc2_hcd_stop,
--
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Doug Anderson
2015-11-05 00:40:01 UTC
Permalink
Hi,
Post by Douglas Anderson
In commit 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet
context") support was added to give back the URB in tasklet context.
Let's take advantage of this in dwc2.
This speeds up the dwc2 interrupt handler considerably.
---
drivers/usb/dwc2/hcd.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index e79baf73c234..9e7988950c7a 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2273,9 +2273,7 @@ void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct dwc2_qtd *qtd,
kfree(qtd->urb);
qtd->urb = NULL;
- spin_unlock(&hsotg->lock);
usb_hcd_giveback_urb(dwc2_hsotg_to_hcd(hsotg), urb, status);
- spin_lock(&hsotg->lock);
}
/*
@@ -2888,7 +2886,7 @@ static struct hc_driver dwc2_hc_driver = {
.hcd_priv_size = sizeof(struct wrapper_priv_data),
.irq = _dwc2_hcd_irq,
- .flags = HCD_MEMORY | HCD_USB2,
+ .flags = HCD_MEMORY | HCD_USB2 | HCD_BH,
.start = _dwc2_hcd_start,
.stop = _dwc2_hcd_stop,
In the ChromeOS gerrit
<https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
points out that for EHCI it was good to take the optimization from
commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
this one. I'm still trying to learn USB / dwc2 so it's unclear to me
whether we also need a similar change before landing.

I'll see if I can do some investigation about this and also some
benchmarking before and after. Certainly profiling the interrupt
handler itself showed a huge improvement, but I'd hate to see a
regression elsewhere.

If anyone else knows better than I, please speak up! :)

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Alan Stern
2015-11-05 15:20:02 UTC
Permalink
Post by Doug Anderson
In the ChromeOS gerrit
<https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
points out that for EHCI it was good to take the optimization from
commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
this one. I'm still trying to learn USB / dwc2 so it's unclear to me
whether we also need a similar change before landing.
I'll see if I can do some investigation about this and also some
benchmarking before and after. Certainly profiling the interrupt
handler itself showed a huge improvement, but I'd hate to see a
regression elsewhere.
If anyone else knows better than I, please speak up! :)
This is a matter of both efficiency and correctness. Giving back URBs
in a tasklet is not a simple change.

Have you read the kerneldoc for usb_submit_urb() in
drivers/usb/core/urb.c? The portion about "Reserved Bandwidth
Transfers" is highly relevant. I don't know how dwc2 goes about
reserving bandwidth for periodic transfers, but if it relies on the
endpoint queue being non-empty to maintain a reservation then it will
be affected by this change.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Doug Anderson
2015-11-06 00:30:02 UTC
Permalink
Alan,
Post by Alan Stern
Post by Doug Anderson
In the ChromeOS gerrit
<https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
points out that for EHCI it was good to take the optimization from
commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
this one. I'm still trying to learn USB / dwc2 so it's unclear to me
whether we also need a similar change before landing.
I'll see if I can do some investigation about this and also some
benchmarking before and after. Certainly profiling the interrupt
handler itself showed a huge improvement, but I'd hate to see a
regression elsewhere.
If anyone else knows better than I, please speak up! :)
This is a matter of both efficiency and correctness. Giving back URBs
in a tasklet is not a simple change.
Have you read the kerneldoc for usb_submit_urb() in
drivers/usb/core/urb.c? The portion about "Reserved Bandwidth
Transfers" is highly relevant. I don't know how dwc2 goes about
reserving bandwidth for periodic transfers, but if it relies on the
endpoint queue being non-empty to maintain a reservation then it will
be affected by this change.
It does look as if you are right and the reservation will end up being
released. It looks to me like dwc2_deschedule_periodic() is in charge
of releasing the reservation. I'll work on trying to actually confirm
this. I guess I need to find a USB test setup where there are enough
devices that I exceed the available time so I can see the brokenness
of my old solution...

I hadn't realized that this was a correctness problem and not just an
optimization problem, so thank you very much for the info! :) I ran
with a bunch of USB devices and it worked fine (and performance
improved!) so I figured I was good to go... Now I've read the
kerneldoc you pointed at and it was very helpful. As I understand it,
it's considered OK if I copy what EHCI did and release the reservation
if nothing has been scheduled for 5 ms.

Quoting a friend of mine: I'm now all done adding the delayed
reservation release code. Now I just need to compile it and test it.
:-P My plan is add some printouts to my current implementation to see
cases where the deferred "unreserve" actually saved us and (to me)
that will help indicate that it's working properly. Presumably I
won't see this case hit (or not much) without HCD_BH and I will see
this case with HCD_BH.

Please consider this patch "on hold" until my next spin.

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Alan Stern
2015-11-06 15:50:01 UTC
Permalink
Post by Doug Anderson
Alan,
Post by Alan Stern
Post by Doug Anderson
In the ChromeOS gerrit
<https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
points out that for EHCI it was good to take the optimization from
commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
this one. I'm still trying to learn USB / dwc2 so it's unclear to me
whether we also need a similar change before landing.
I'll see if I can do some investigation about this and also some
benchmarking before and after. Certainly profiling the interrupt
handler itself showed a huge improvement, but I'd hate to see a
regression elsewhere.
If anyone else knows better than I, please speak up! :)
This is a matter of both efficiency and correctness. Giving back URBs
in a tasklet is not a simple change.
Have you read the kerneldoc for usb_submit_urb() in
drivers/usb/core/urb.c? The portion about "Reserved Bandwidth
Transfers" is highly relevant. I don't know how dwc2 goes about
reserving bandwidth for periodic transfers, but if it relies on the
endpoint queue being non-empty to maintain a reservation then it will
be affected by this change.
It does look as if you are right and the reservation will end up being
released. It looks to me like dwc2_deschedule_periodic() is in charge
of releasing the reservation. I'll work on trying to actually confirm
this. I guess I need to find a USB test setup where there are enough
devices that I exceed the available time so I can see the brokenness
of my old solution...
You may not need that. Try a single USB keyboard and see what happens
when the interrupt URB is given back.
Post by Doug Anderson
I hadn't realized that this was a correctness problem and not just an
optimization problem, so thank you very much for the info! :) I ran
with a bunch of USB devices and it worked fine (and performance
improved!) so I figured I was good to go... Now I've read the
kerneldoc you pointed at and it was very helpful. As I understand it,
it's considered OK if I copy what EHCI did and release the reservation
if nothing has been scheduled for 5 ms.
You might also look into the issues surrounding isochronous URBs. In
particular, when an URB is submitted, which frames or microframes
should it be scheduled in? The problem is that when the submission
occurs, some of the slots following the previous URB may already have
expired. The explanations for EXDEV and EFBIG in
Documentation/usb/error-codes.txt are relevant here, although terse.

The host controller drivers that I maintain work like this:

If the endpoint's queue is empty and none of the endpoint's
URBs are still being given back by the tasklet, pretend that
the URB_ISO_ASAP flag is set. Note that this involves
testing hcd_periodic_completion_in_progress() -- that's
where switching over to tasklets makes a difference.

If the URB_ISO_ASAP flag is set, the URB is scheduled for
the first unallocated slot that hasn't already expired.

If the flag isn't set, try to schedule the URB for the first
unallocated slot. If that means all the isoc packets in the
URB would be assigned to expired slots, return -EXDEV. If
some but not all of the packets would be assigned to expired
slots, skip them -- only schedule the packets whose slots
haven't expired yet.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Doug Anderson
2015-11-07 01:30:02 UTC
Permalink
Alan,
Post by Alan Stern
Post by Doug Anderson
Alan,
Post by Alan Stern
Post by Doug Anderson
In the ChromeOS gerrit
<https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
points out that for EHCI it was good to take the optimization from
commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
this one. I'm still trying to learn USB / dwc2 so it's unclear to me
whether we also need a similar change before landing.
I'll see if I can do some investigation about this and also some
benchmarking before and after. Certainly profiling the interrupt
handler itself showed a huge improvement, but I'd hate to see a
regression elsewhere.
If anyone else knows better than I, please speak up! :)
This is a matter of both efficiency and correctness. Giving back URBs
in a tasklet is not a simple change.
Have you read the kerneldoc for usb_submit_urb() in
drivers/usb/core/urb.c? The portion about "Reserved Bandwidth
Transfers" is highly relevant. I don't know how dwc2 goes about
reserving bandwidth for periodic transfers, but if it relies on the
endpoint queue being non-empty to maintain a reservation then it will
be affected by this change.
It does look as if you are right and the reservation will end up being
released. It looks to me like dwc2_deschedule_periodic() is in charge
of releasing the reservation. I'll work on trying to actually confirm
this. I guess I need to find a USB test setup where there are enough
devices that I exceed the available time so I can see the brokenness
of my old solution...
You may not need that. Try a single USB keyboard and see what happens
when the interrupt URB is given back.
I haven't been able to reproduce any new errors with my patch, but I
have with trace_printk I have proven that it does cause reservations
to be lost and then re-gained, which isn't good.

Note that dwc2 is currently having scheduling problems anyway (even
before my patch). ***@rock-chips.com posted an RFC patch to fix
problems he was seeing, but I have a setup that has problems too and
his patch doesn't fix it. :( ...so possibly if everything was
working then I could see my patch break things, but as it is now it's
hard to say.

In any case, I've finished testing the patch that adds a 5us delay
before releasing the reservation. I'll post that so we can be on the
same page.
Post by Alan Stern
Post by Doug Anderson
I hadn't realized that this was a correctness problem and not just an
optimization problem, so thank you very much for the info! :) I ran
with a bunch of USB devices and it worked fine (and performance
improved!) so I figured I was good to go... Now I've read the
kerneldoc you pointed at and it was very helpful. As I understand it,
it's considered OK if I copy what EHCI did and release the reservation
if nothing has been scheduled for 5 ms.
You might also look into the issues surrounding isochronous URBs. In
particular, when an URB is submitted, which frames or microframes
should it be scheduled in? The problem is that when the submission
occurs, some of the slots following the previous URB may already have
expired. The explanations for EXDEV and EFBIG in
Documentation/usb/error-codes.txt are relevant here, although terse.
If the endpoint's queue is empty and none of the endpoint's
URBs are still being given back by the tasklet, pretend that
the URB_ISO_ASAP flag is set. Note that this involves
testing hcd_periodic_completion_in_progress() -- that's
where switching over to tasklets makes a difference.
If the URB_ISO_ASAP flag is set, the URB is scheduled for
the first unallocated slot that hasn't already expired.
If the flag isn't set, try to schedule the URB for the first
unallocated slot. If that means all the isoc packets in the
URB would be assigned to expired slots, return -EXDEV. If
some but not all of the packets would be assigned to expired
slots, skip them -- only schedule the packets whose slots
haven't expired yet.
You're talking to someone who has only been looking at the details of
USB for about 2 days now. :-P ...I'm trying to grok all of that, but
I'm not sure I got it all...

I will say that "URB_ISO_ASAP" is not referenced anywhere in dwc2.
Presumably that's a bug (or missing feature). Would it be terrible if
I just ignored that for now and said that if you try to add something
to the queue and we're currently in the process of waiting for our
timer to expire then you'll just get back -ENOSPC? dwc2 won't deal
perfectly well if you've very close to exhausting the available
bandwidth, but that could be a task for another day.

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Alan Stern
2015-11-07 15:10:01 UTC
Permalink
Post by Doug Anderson
You're talking to someone who has only been looking at the details of
USB for about 2 days now. :-P ...I'm trying to grok all of that, but
I'm not sure I got it all...
I will say that "URB_ISO_ASAP" is not referenced anywhere in dwc2.
Presumably that's a bug (or missing feature). Would it be terrible if
I just ignored that for now and said that if you try to add something
to the queue and we're currently in the process of waiting for our
timer to expire then you'll just get back -ENOSPC? dwc2 won't deal
perfectly well if you've very close to exhausting the available
bandwidth, but that could be a task for another day.
Sure. It's not necessary to support everything perfectly, right from
the beginning.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Douglas Anderson
2015-11-04 23:00:02 UTC
Permalink
All other host controllers who want aligned buffers for DMA do it a
certain way. Let's do that too instead of working behind the USB core's
back. This makes our interrupt handler not take forever and also rips
out a lot of code, simplifying things a bunch.

This also has the side effect of removing the 65535 max transfer size
limit.

NOTE: The actual code to allocate the aligned buffers is ripped almost
completely from the tegra EHCI driver. At some point in the future we
may want to add this functionality to the USB core to share more code
everywhere.

Signed-off-by: Douglas Anderson <***@chromium.org>
---
drivers/usb/dwc2/core.c | 21 +-----
drivers/usb/dwc2/hcd.c | 166 ++++++++++++++++++++-----------------------
drivers/usb/dwc2/hcd.h | 10 ---
drivers/usb/dwc2/hcd_intr.c | 65 -----------------
drivers/usb/dwc2/hcd_queue.c | 7 +-
5 files changed, 83 insertions(+), 186 deletions(-)

diff --git a/drivers/usb/dwc2/core.c b/drivers/usb/dwc2/core.c
index ef73e498e98f..7e28cfafcfd8 100644
--- a/drivers/usb/dwc2/core.c
+++ b/drivers/usb/dwc2/core.c
@@ -1830,19 +1830,11 @@ void dwc2_hc_start_transfer(struct dwc2_hsotg *hsotg,
}

if (hsotg->core_params->dma_enable > 0) {
- dma_addr_t dma_addr;
-
- if (chan->align_buf) {
- if (dbg_hc(chan))
- dev_vdbg(hsotg->dev, "align_buf\n");
- dma_addr = chan->align_buf;
- } else {
- dma_addr = chan->xfer_dma;
- }
- dwc2_writel((u32)dma_addr, hsotg->regs + HCDMA(chan->hc_num));
+ dwc2_writel((u32)chan->xfer_dma,
+ hsotg->regs + HCDMA(chan->hc_num));
if (dbg_hc(chan))
dev_vdbg(hsotg->dev, "Wrote %08lx to HCDMA(%d)\n",
- (unsigned long)dma_addr, chan->hc_num);
+ (unsigned long)chan->xfer_dma, chan->hc_num);
}

/* Start the split */
@@ -3137,13 +3129,6 @@ int dwc2_get_hwparams(struct dwc2_hsotg *hsotg)
width = (hwcfg3 & GHWCFG3_XFER_SIZE_CNTR_WIDTH_MASK) >>
GHWCFG3_XFER_SIZE_CNTR_WIDTH_SHIFT;
hw->max_transfer_size = (1 << (width + 11)) - 1;
- /*
- * Clip max_transfer_size to 65535. dwc2_hc_setup_align_buf() allocates
- * coherent buffers with this size, and if it's too large we can
- * exhaust the coherent DMA pool.
- */
- if (hw->max_transfer_size > 65535)
- hw->max_transfer_size = 65535;
width = (hwcfg3 & GHWCFG3_PACKET_SIZE_CNTR_WIDTH_MASK) >>
GHWCFG3_PACKET_SIZE_CNTR_WIDTH_SHIFT;
hw->max_packet_count = (1 << (width + 4)) - 1;
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index 9e7988950c7a..4487f1b262b2 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -598,9 +598,9 @@ static void dwc2_hc_init_split(struct dwc2_hsotg *hsotg,
chan->hub_port = (u8)hub_port;
}

-static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
- struct dwc2_host_chan *chan,
- struct dwc2_qtd *qtd, void *bufptr)
+static void dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
+ struct dwc2_host_chan *chan,
+ struct dwc2_qtd *qtd)
{
struct dwc2_hcd_urb *urb = qtd->urb;
struct dwc2_hcd_iso_packet_desc *frame_desc;
@@ -620,7 +620,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
else
chan->xfer_buf = urb->setup_packet;
chan->xfer_len = 8;
- bufptr = NULL;
break;

case DWC2_CONTROL_DATA:
@@ -647,7 +646,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
chan->xfer_dma = hsotg->status_buf_dma;
else
chan->xfer_buf = hsotg->status_buf;
- bufptr = NULL;
break;
}
break;
@@ -680,14 +678,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,

chan->xfer_len = frame_desc->length - qtd->isoc_split_offset;

- /* For non-dword aligned buffers */
- if (hsotg->core_params->dma_enable > 0 &&
- (chan->xfer_dma & 0x3))
- bufptr = (u8 *)urb->buf + frame_desc->offset +
- qtd->isoc_split_offset;
- else
- bufptr = NULL;
-
if (chan->xact_pos == DWC2_HCSPLT_XACTPOS_ALL) {
if (chan->xfer_len <= 188)
chan->xact_pos = DWC2_HCSPLT_XACTPOS_ALL;
@@ -696,63 +686,89 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
}
break;
}
+}
+
+#define DWC2_USB_DMA_ALIGN 4
+
+struct dma_aligned_buffer {
+ void *kmalloc_ptr;
+ void *old_xfer_buffer;
+ u8 data[0];
+};
+
+static void dwc2_free_dma_aligned_buffer(struct urb *urb)
+{
+ struct dma_aligned_buffer *temp;
+
+ if (!(urb->transfer_flags & URB_ALIGNED_TEMP_BUFFER))
+ return;

- return bufptr;
+ temp = container_of(urb->transfer_buffer,
+ struct dma_aligned_buffer, data);
+
+ if (usb_urb_dir_in(urb))
+ memcpy(temp->old_xfer_buffer, temp->data,
+ urb->transfer_buffer_length);
+ urb->transfer_buffer = temp->old_xfer_buffer;
+ kfree(temp->kmalloc_ptr);
+
+ urb->transfer_flags &= ~URB_ALIGNED_TEMP_BUFFER;
}

-static int dwc2_hc_setup_align_buf(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
- struct dwc2_host_chan *chan,
- struct dwc2_hcd_urb *urb, void *bufptr)
+static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
{
- u32 buf_size;
- struct urb *usb_urb;
- struct usb_hcd *hcd;
+ struct dma_aligned_buffer *temp, *kmalloc_ptr;
+ size_t kmalloc_size;

- if (!qh->dw_align_buf) {
- if (chan->ep_type != USB_ENDPOINT_XFER_ISOC)
- buf_size = hsotg->core_params->max_transfer_size;
- else
- /* 3072 = 3 max-size Isoc packets */
- buf_size = 3072;
+ if (urb->num_sgs || urb->sg ||
+ urb->transfer_buffer_length == 0 ||
+ !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
+ return 0;

- qh->dw_align_buf = kmalloc(buf_size, GFP_ATOMIC | GFP_DMA);
- if (!qh->dw_align_buf)
- return -ENOMEM;
- qh->dw_align_buf_size = buf_size;
- }
+ /* Allocate a buffer with enough padding for alignment */
+ kmalloc_size = urb->transfer_buffer_length +
+ sizeof(struct dma_aligned_buffer) + DWC2_USB_DMA_ALIGN - 1;

- if (chan->xfer_len) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
- usb_urb = urb->priv;
+ kmalloc_ptr = kmalloc(kmalloc_size, mem_flags);
+ if (!kmalloc_ptr)
+ return -ENOMEM;

- if (usb_urb) {
- if (usb_urb->transfer_flags &
- (URB_SETUP_MAP_SINGLE | URB_DMA_MAP_SG |
- URB_DMA_MAP_PAGE | URB_DMA_MAP_SINGLE)) {
- hcd = dwc2_hsotg_to_hcd(hsotg);
- usb_hcd_unmap_urb_for_dma(hcd, usb_urb);
- }
- if (!chan->ep_is_in)
- memcpy(qh->dw_align_buf, bufptr,
- chan->xfer_len);
- } else {
- dev_warn(hsotg->dev, "no URB in dwc2_urb\n");
- }
- }
+ /* Position our struct dma_aligned_buffer such that data is aligned */
+ temp = PTR_ALIGN(kmalloc_ptr + 1, DWC2_USB_DMA_ALIGN) - 1;
+ temp->kmalloc_ptr = kmalloc_ptr;
+ temp->old_xfer_buffer = urb->transfer_buffer;
+ if (usb_urb_dir_out(urb))
+ memcpy(temp->data, urb->transfer_buffer,
+ urb->transfer_buffer_length);
+ urb->transfer_buffer = temp->data;

- qh->dw_align_buf_dma = dma_map_single(hsotg->dev,
- qh->dw_align_buf, qh->dw_align_buf_size,
- chan->ep_is_in ? DMA_FROM_DEVICE : DMA_TO_DEVICE);
- if (dma_mapping_error(hsotg->dev, qh->dw_align_buf_dma)) {
- dev_err(hsotg->dev, "can't map align_buf\n");
- chan->align_buf = 0;
- return -EINVAL;
- }
+ urb->transfer_flags |= URB_ALIGNED_TEMP_BUFFER;

- chan->align_buf = qh->dw_align_buf_dma;
return 0;
}

+static int dwc2_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb,
+ gfp_t mem_flags)
+{
+ int ret;
+
+ ret = dwc2_alloc_dma_aligned_buffer(urb, mem_flags);
+ if (ret)
+ return ret;
+
+ ret = usb_hcd_map_urb_for_dma(hcd, urb, mem_flags);
+ if (ret)
+ dwc2_free_dma_aligned_buffer(urb);
+
+ return ret;
+}
+
+static void dwc2_unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb)
+{
+ usb_hcd_unmap_urb_for_dma(hcd, urb);
+ dwc2_free_dma_aligned_buffer(urb);
+}
+
/**
* dwc2_assign_and_init_hc() - Assigns transactions from a QTD to a free host
* channel and initializes the host channel to perform the transactions. The
@@ -767,7 +783,6 @@ static int dwc2_assign_and_init_hc(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
struct dwc2_host_chan *chan;
struct dwc2_hcd_urb *urb;
struct dwc2_qtd *qtd;
- void *bufptr = NULL;

if (dbg_qh(qh))
dev_vdbg(hsotg->dev, "%s(%p,%p)\n", __func__, hsotg, qh);
@@ -829,16 +844,10 @@ static int dwc2_assign_and_init_hc(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
!dwc2_hcd_is_pipe_in(&urb->pipe_info))
urb->actual_length = urb->length;

- if (hsotg->core_params->dma_enable > 0) {
+ if (hsotg->core_params->dma_enable > 0)
chan->xfer_dma = urb->dma + urb->actual_length;
-
- /* For non-dword aligned case */
- if (hsotg->core_params->dma_desc_enable <= 0 &&
- (chan->xfer_dma & 0x3))
- bufptr = (u8 *)urb->buf + urb->actual_length;
- } else {
+ else
chan->xfer_buf = (u8 *)urb->buf + urb->actual_length;
- }

chan->xfer_len = urb->length - urb->actual_length;
chan->xfer_count = 0;
@@ -850,27 +859,7 @@ static int dwc2_assign_and_init_hc(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
chan->do_split = 0;

/* Set the transfer attributes */
- bufptr = dwc2_hc_init_xfer(hsotg, chan, qtd, bufptr);
-
- /* Non DWORD-aligned buffer case */
- if (bufptr) {
- dev_vdbg(hsotg->dev, "Non-aligned buffer\n");
- if (dwc2_hc_setup_align_buf(hsotg, qh, chan, urb, bufptr)) {
- dev_err(hsotg->dev,
- "%s: Failed to allocate memory to handle non-dword aligned buffer\n",
- __func__);
- /* Add channel back to free list */
- chan->align_buf = 0;
- chan->multi_count = 0;
- list_add_tail(&chan->hc_list_entry,
- &hsotg->free_hc_list);
- qtd->in_process = 0;
- qh->channel = NULL;
- return -ENOMEM;
- }
- } else {
- chan->align_buf = 0;
- }
+ dwc2_hc_init_xfer(hsotg, chan, qtd);

if (chan->ep_type == USB_ENDPOINT_XFER_INT ||
chan->ep_type == USB_ENDPOINT_XFER_ISOC)
@@ -2902,6 +2891,9 @@ static struct hc_driver dwc2_hc_driver = {

.bus_suspend = _dwc2_hcd_suspend,
.bus_resume = _dwc2_hcd_resume,
+
+ .map_urb_for_dma = dwc2_map_urb_for_dma,
+ .unmap_urb_for_dma = dwc2_unmap_urb_for_dma,
};

/*
diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h
index f105bada2fd1..8a4486e1a542 100644
--- a/drivers/usb/dwc2/hcd.h
+++ b/drivers/usb/dwc2/hcd.h
@@ -75,8 +75,6 @@ struct dwc2_qh;
* (micro)frame
* @xfer_buf: Pointer to current transfer buffer position
* @xfer_dma: DMA address of xfer_buf
- * @align_buf: In Buffer DMA mode this will be used if xfer_buf is not
- * DWORD aligned
* @xfer_len: Total number of bytes to transfer
* @xfer_count: Number of bytes transferred so far
* @start_pkt_count: Packet count at start of transfer
@@ -132,7 +130,6 @@ struct dwc2_host_chan {

u8 *xfer_buf;
dma_addr_t xfer_dma;
- dma_addr_t align_buf;
u32 xfer_len;
u32 xfer_count;
u16 start_pkt_count;
@@ -241,10 +238,6 @@ enum dwc2_transaction_type {
* @frame_usecs: Internal variable used by the microframe scheduler
* @start_split_frame: (Micro)frame at which last start split was initialized
* @ntd: Actual number of transfer descriptors in a list
- * @dw_align_buf: Used instead of original buffer if its physical address
- * is not dword-aligned
- * @dw_align_buf_size: Size of dw_align_buf
- * @dw_align_buf_dma: DMA address for dw_align_buf
* @qtd_list: List of QTDs for this QH
* @channel: Host channel currently processing transfers for this QH
* @qh_list_entry: Entry for QH in either the periodic or non-periodic
@@ -276,9 +269,6 @@ struct dwc2_qh {
u16 frame_usecs[8];
u16 start_split_frame;
u16 ntd;
- u8 *dw_align_buf;
- int dw_align_buf_size;
- dma_addr_t dw_align_buf_dma;
struct list_head qtd_list;
struct dwc2_host_chan *channel;
struct list_head qh_list_entry;
diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index bda0b21b850f..84190243b8be 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -465,18 +465,6 @@ static int dwc2_update_urb_state(struct dwc2_hsotg *hsotg,
xfer_length = urb->length - urb->actual_length;
}

- /* Non DWORD-aligned buffer case handling */
- if (chan->align_buf && xfer_length) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
- dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
- chan->qh->dw_align_buf_size,
- chan->ep_is_in ?
- DMA_FROM_DEVICE : DMA_TO_DEVICE);
- if (chan->ep_is_in)
- memcpy(urb->buf + urb->actual_length,
- chan->qh->dw_align_buf, xfer_length);
- }
-
dev_vdbg(hsotg->dev, "urb->actual_length=%d xfer_length=%d\n",
urb->actual_length, xfer_length);
urb->actual_length += xfer_length;
@@ -558,21 +546,6 @@ static enum dwc2_halt_status dwc2_update_isoc_urb_state(
frame_desc->status = 0;
frame_desc->actual_length = dwc2_get_actual_xfer_length(hsotg,
chan, chnum, qtd, halt_status, NULL);
-
- /* Non DWORD-aligned buffer case handling */
- if (chan->align_buf && frame_desc->actual_length) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n",
- __func__);
- dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
- chan->qh->dw_align_buf_size,
- chan->ep_is_in ?
- DMA_FROM_DEVICE : DMA_TO_DEVICE);
- if (chan->ep_is_in)
- memcpy(urb->buf + frame_desc->offset +
- qtd->isoc_split_offset,
- chan->qh->dw_align_buf,
- frame_desc->actual_length);
- }
break;
case DWC2_HC_XFER_FRAME_OVERRUN:
urb->error_count++;
@@ -593,21 +566,6 @@ static enum dwc2_halt_status dwc2_update_isoc_urb_state(
frame_desc->actual_length = dwc2_get_actual_xfer_length(hsotg,
chan, chnum, qtd, halt_status, NULL);

- /* Non DWORD-aligned buffer case handling */
- if (chan->align_buf && frame_desc->actual_length) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n",
- __func__);
- dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
- chan->qh->dw_align_buf_size,
- chan->ep_is_in ?
- DMA_FROM_DEVICE : DMA_TO_DEVICE);
- if (chan->ep_is_in)
- memcpy(urb->buf + frame_desc->offset +
- qtd->isoc_split_offset,
- chan->qh->dw_align_buf,
- frame_desc->actual_length);
- }
-
/* Skip whole frame */
if (chan->qh->do_split &&
chan->ep_type == USB_ENDPOINT_XFER_ISOC && chan->ep_is_in &&
@@ -673,8 +631,6 @@ static void dwc2_deactivate_qh(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
}

no_qtd:
- if (qh->channel)
- qh->channel->align_buf = 0;
qh->channel = NULL;
dwc2_hcd_qh_deactivate(hsotg, qh, continue_split);
}
@@ -939,14 +895,6 @@ static int dwc2_xfercomp_isoc_split_in(struct dwc2_hsotg *hsotg,

frame_desc->actual_length += len;

- if (chan->align_buf) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
- dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
- chan->qh->dw_align_buf_size, DMA_FROM_DEVICE);
- memcpy(qtd->urb->buf + frame_desc->offset +
- qtd->isoc_split_offset, chan->qh->dw_align_buf, len);
- }
-
qtd->isoc_split_offset += len;

if (frame_desc->actual_length >= frame_desc->length) {
@@ -1169,19 +1117,6 @@ static void dwc2_update_urb_state_abn(struct dwc2_hsotg *hsotg,
xfer_length = urb->length - urb->actual_length;
}

- /* Non DWORD-aligned buffer case handling */
- if (chan->align_buf && xfer_length && chan->ep_is_in) {
- dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
- dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
- chan->qh->dw_align_buf_size,
- chan->ep_is_in ?
- DMA_FROM_DEVICE : DMA_TO_DEVICE);
- if (chan->ep_is_in)
- memcpy(urb->buf + urb->actual_length,
- chan->qh->dw_align_buf,
- xfer_length);
- }
-
urb->actual_length += xfer_length;

hctsiz = dwc2_readl(hsotg->regs + HCTSIZ(chnum));
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 7d8d06cfe3c1..3e1edafc593c 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -232,13 +232,8 @@ struct dwc2_qh *dwc2_hcd_qh_create(struct dwc2_hsotg *hsotg,
*/
void dwc2_hcd_qh_free(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
{
- if (hsotg->core_params->dma_desc_enable > 0) {
+ if (hsotg->core_params->dma_desc_enable > 0)
dwc2_hcd_qh_free_ddma(hsotg, qh);
- } else {
- /* kfree(NULL) is safe */
- kfree(qh->dw_align_buf);
- qh->dw_align_buf_dma = (dma_addr_t)0;
- }
kfree(qh);
}
--
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Heiko Stuebner
2015-11-05 19:20:01 UTC
Permalink
Post by Douglas Anderson
The dwc2 interrupt handler is quite slow. On rk3288 with a few things
plugged into the ports and with cpufreq locked at 696MHz (to simulate
real world idle system), I can easily observe dwc2_handle_hcd_intr()
taking > 120 us, sometimes > 150 us. Note that SOF interrupts come
every 125 us with high speed USB, so taking > 120 us in the interrupt
handler is a big deal.
The patches here will speed up the interrupt controller significantly.
After this series, I have a hard time seeing the interrupt controller
taking > 20 us and I don't ever see it taking > 30 us ever in my tests
unless I bring the cpufreq back down. With the cpufreq at 126 MHz I can
still see the interrupt handler take > 50 us, so I'm sure we could
improve this further. ...but hey, it's a start.
In addition to the speedup, this series also has the advantage of
simplifying dwc2 and making it more like everyone else (introducing the
possibility of future simplifications). Picking this series up will
help your diffstat and likely win you friends. ;)
I gave this a simple spin on my veyron-pinky with both a device attached
directly to the port as well as with an usb-hub in between. Everything was
still working smoothly, so

Tested-by: Heiko Stuebner <***@sntech.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Continue reading on narkive:
Loading...