SEGGER Ozone J-Trace Code Profile identified iterations over daint value
as hot path. The iterations show at the very top of code profile because
full iteration happens whenever there is any activity on endpoint.
Optimize daint handling loops so only set bits are iterated over. While
this optimization depends on find_lsb_set() efficiency, it seems to be
worth it solely on the basis that quite often only few bits are set.
After a bit deeper analysis, I was suprised that on ARM Cortex-M33 the
find_lsb_set() approach is faster than naive iteration even if all bits
are set (which is extreme case because USB applications are unlikely to
use all 16 IN and 16 OUT endpoints simultaneously). This is due to fact
that there is only one conditional jump CBNZ and find_lsb_set() - 1
translates to RBIT + CLZ and then clearing the bit uses LSL.W + BIC.W.
Whereas the naive itation uses ADDS + CMP + BNE for the loop handling
and also has LSR.W + LSLS + BPL (+ ADD.W instruction on each iteration
to add 16 for OUT endpoints) for the continue check. Therefore the
optimized code on ARM Cortex-M33 is never worse than naive iteration.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
DWC2 otg versions earlier than 5.00a are subject to randomly occurring
glitch on Hibernation Exit by Host Initiated Resume, Hibernation Exit by
Device Inititated Resume and Hibernation Exit by Host Initiated Reset.
When the glitch happens the device address is not correctly restored.
If the address is not correctly restored then the tokens addressed to
the device will timeout leading to host resetting the bus.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Programming Guide states that bit 17 on PCGCCTL writes should be set if
the controller was enumerated for High Speed operation. Add the missing
bit set to adhere to the Programming Guide.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
DMA transfers do not execute when the USBD peripheral is in Low Power
mode. Make sure that there is no DMA active transfer when entering Low
Power mode and that new DMA transfers are not started when in Low Power
mode because the transfer won't ever finish.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Add Power and clock gating control register to register map and
appropriate bit macros. Add missing GHWCFG4, GLPMCFG and GPWRDN bits.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Add all register bit defines necessary for isochronous transfers. Clean
up the endpoint transfer size register defines clearly separating IN and
OUT registers because they do use different bit fields. Add alternate
bit names for bits that do have different meaning based on configured
endpoint transfer type.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Make driver aware of High-Bandwidth endpoints both in Completer and
Buffer DMA mode. In Completer mode TxFIFO must be able to hold all
packets for microframe, while in Buffer DMA mode space enough for two
packets is sufficient.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
The processing order is relevant if the IRQ handler call is delayed and
multiple interrupts are pending. Handle USB SoF before other interrupts
to ensure that it would be reported before other USB events (e.g. before
completed USB data transfers).
Signed-off-by: Marek Pieta <Marek.Pieta@nordicsemi.no>
This header is based on drivers/usb/device/usb_dw_registers.h and
describes registers of the DWC2 controllers IP and is intended for use
in both device controller drivers and a host controller driver. The
difference to usb_dw_registers.h is that this header does not confuse
offsets with bit positions, contains all the definitions required for
device mode, has register and bit field names identical to the databook
and no annoying underscores.
Signed-off-by: Johann Fischer <johann.fischer@nordicsemi.no>
Code uses local RAM buffer to properly handle the case where provided
USB transfer TX data is not in RAM.
Signed-off-by: Marek Pieta <Marek.Pieta@nordicsemi.no>
Use MDK directly instead of via USBD HAL to not mix direct register
accesses with USBD HAL. In my opinion this change is not really
necessary but reviewers are strict about consistency.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
There is no point in setting the feeder or consumer at runtime. The time
saved due to not checking conditions on each transaction is smaller than
the additional cost of calling function via pointer.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Do not enable nor disable endpoint done interrupts at runtime because it
is not necessary. Simply keep all relevant interrupts enabled when USBD
is active and disable all interrupts when USBD is disabled.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Ensure strict order within interrupt processing to eliminate workaround
in IN transfer acknowledged handler. EPDATASTATUS and DMA done events
are processed in a way that eliminates the possibility for race condions
between interrupt handler and host IN tokens.
Store last started DMA endpoint and use it in common handler for all DMA
finished events. Unify use of ep dma waiting variable because atomics
only make sense if all accesses are through atomic functions - here the
accesses are guarded with critical section.
Do not disable SETUP interrupt when DMA transfers data on endpoint 0 but
simply postpone handling if currently active DMA is on endpoint 0.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
There is no point in calling an empty function on every single DMA
transfer start. While the empty function is just BX LR the fact that is
is called on every DMA transfer makes its impact visible. This change
improves CDC ACM echo throughput by approximately 2%.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Early DMA process handling actually decreases performance in real world
scenarios because real world scenarios tend to be CPU bound. Moreover
the actual optimization is questionable because DMA semaphore can only
be released after the respective endpoint end event is handled. Remove
early DMA processing altogether and only check pending DMA at the end of
interrupt handler.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Rely on semaphore to serialize access to DMA instead of busy looping
after triggering DMA. With this change Ozone Code Profile generated with
J-Trace Pro on nrf52840dk_nrf52840 board running headphones microphone
sample shows following Load changes (trace data was reset once playback
and recording started and percentages were taken when memcpy reached
200 000 Run Count):
* usbd_dmareq_process() from 17.16% to 2.24%
* memcpy() from 9.37% to 8.36%
* nrf_usbd_common_irq_handler() from 8.89% to 10.88%
Mark nrf_usbd_common_stop() as static because the caller must acquire
DMA semaphore before calling this function and the only place where it
is used is already acquiring the semaphore.
Disable EP0 SETUP interrupt when there is active DMA on EP0 to eliminate
the need for aborting DMA on EP0. This code path should not really
happen in real life though because hosts must not issue new SETUP before
a relatively long timeout (at least 50 ms).
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Rename local usbd copy from nrfx_usbd to nrf_usbd_common and use it in
both USB stacks. Renaming header to nrf_usbd_common.h allows breaking
changes in exposed interface. Mark all doxygen comments as internal
because local usbd copy should not be treated as public interface
because we are under refactoring process that aims to arrive at native
driver and therefore drop nrf_usbd_common in the future.
Use Zephyr constructs directly instead of nrfx glue macros.
No functional changes.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>