When the endpoint is re-enabled, check if the current FIFO settings can
be reused. Further work is needed to improve FIFO memory handling for
more advanced interface configurations.
Signed-off-by: Johann Fischer <johann.fischer@nordicsemi.no>
Do not synchronously wait for endpoint interrupt bits when disarming
endpoints. Introduce ep_disabled k_event that makes it possible for
application to wait for the action to take effect which is necessary
when handling SetFeature(ENDPOINT_HALT) or SetInterface() requests.
This change improves incomplete iso IN and OUT handling performance,
especially when there are multiple isochronous endpoints that need
servicing.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
At High-Speed there is at most 25 us handling window for incomplete iso
IN/OUT and therefore determining which endpoints are isochronous is too
wasteful. Add lookup variable holding which isochronous endpoints are
enabled and limit the search to only enabled endpoints. For applications
with just one OUT and one IN isochronous endpoint this is optimal.
The lookup variable is updated only when mutex is held. Interrupt
handler accesses the variable read-only and in general there is no
problem is incomplete iso handling interrupt hits when the lookup
variable is updated, because:
* when endpoint is just activated, it cannot be source of incomplete
iso interrupt because the endpoint is not armed yet
* when endpoint is just deactivated, it was first disabled
If there is more than one isochronous endpoint same direction then just
relying on endpoint enabled is not necessarily optimal. However, in
order to be able to limit the search to only armed endpoints, the lookup
variable would have to be updated on every transfer preparation and
completion which would require more time-expensive synchronization.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Replace irq_lock() with spin lock which is proper synchronization
primitive that should be used. Because non-SMP implementations are
allowed to optimize spin lock to just locking interrupts the resulting
code is the same on all currently supported DWC2 targets.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Constify vendor quirks structure to not keep it in RAM. Use constified
vendor quirks structure directly if there is only one snps,dwc2 instance
to allow compiler inlining quirk implementation.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Signed-off-by: Johann Fischer <johann.fischer@nordicsemi.no>
New control transfer is started prematurely from device perspective when
host timeout occurs. Any data transfer from previous control transfer
have to be cancelled prior to handling SETUP data. Unconditionally
disable control IN endpoint to prevent race for enqueued buffer between
udc_buf_get_all() called in dwc2_handle_evt_setup() and udc_buf_peek()
called in dwc2_handle_in_xfercompl().
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Device can be considered enabled only after the Soft Disconnect bit is
cleared. Move the post enable quirk past the SftDiscon bit clear.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
SEGGER Ozone J-Trace Code Profile identified iterations over daint value
as hot path. The iterations show at the very top of code profile because
full iteration happens whenever there is any activity on endpoint.
Optimize daint handling loops so only set bits are iterated over. While
this optimization depends on find_lsb_set() efficiency, it seems to be
worth it solely on the basis that quite often only few bits are set.
After a bit deeper analysis, I was suprised that on ARM Cortex-M33 the
find_lsb_set() approach is faster than naive iteration even if all bits
are set (which is extreme case because USB applications are unlikely to
use all 16 IN and 16 OUT endpoints simultaneously). This is due to fact
that there is only one conditional jump CBNZ and find_lsb_set() - 1
translates to RBIT + CLZ and then clearing the bit uses LSL.W + BIC.W.
Whereas the naive itation uses ADDS + CMP + BNE for the loop handling
and also has LSR.W + LSLS + BPL (+ ADD.W instruction on each iteration
to add 16 for OUT endpoints) for the continue check. Therefore the
optimized code on ARM Cortex-M33 is never worse than naive iteration.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
DWC2 core sets DIEPCTL0 SNAK when SETUP packet is received. The CNAK bit
results in device sending NAK in response to IN token sent to EP0, but
it does not modify the TxFIFO in any way. The stale data in TxFIFO can
then lead to "FIFO space is too low" error. Solve the issue by disabling
and flushing IN endpoint 0 if previous control transfer did not finish.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Limit maximum operating speed in DCFG register if USB stack is
configured to support only Full-Speed.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
There is no need to wait on xfer_new and xfer_finished and therefore
atomic services can be used instead of events.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
When endpoint is disabled while hibernated, the UDC endpoint state has
to be reset. Set the busy to false to keep UDC endpoint state in sync
with peripheral register state.
Fixes: 2be960ad2b ("drivers: udc_dwc2: Disable endpoint while
hibernated")
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
UDC API passes struct udc_ep_config to all functions. Some UDC functions
were using endpoint config structure while some were using device and
endpoint number. This API inconsistency led to completely unnecessary
endpoint structure lookups. Remove unnecessary lookups by using endpoint
config structure pointer where it makes sense.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
SETUP data is unconditionally ACKed by the controller. Other DATA
packets sent to OUT control endpoint 0 (i.e. OUT Data Stage packets
and OUT Status Stage packet) are ACKed by the device only if the
endpoint was enabled with CNAK bit set.
In Buffer DMA mode controller will lock up in following scenario:
* OUT EP0 is not enabled, e.g. OUT Status Stage has finished
* Host starts Control Write transfer, i.e. sends SETUP DATA0 and
device ACKs (regardless if endpoint is enabled or not)
* host sends OUT Data Stage (OUT DATA1)
- software enables endpoint to be able to receive next SETUP data
while host is transmitting the OUT token. If CNAK bit is set
alongside the EPENA bit, the device will ACK the OUT Data Stage.
If CNAK bit is not set, the device will NAK the OUT Data Stage.
When the lockup occurs, from host perspective the OUT Data Stage packet
was successfully transmitted. This can result in host starting IN Status
Stage if there was only one OUT Data Stage packet. This in turn results
in device never getting the DOEPTINT0 SetUp interrupt. Besides just not
getting the SetUp interrupt, any subsequent control transfer won't be
noticed by device at all.
The lockup was first observed while stress testing. The host was issuing
endless sequence of Control Write, Control Read, Control Write, Control
Read, ... commands. When the controller did lock up in Buffer DMA mode,
from host perspective the device was timing out all control transfers.
Avoid the Buffer DMA lockup by setting CNAK bit only when necessary.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
There is no point in calling k_event_test() to determine what events are
posted and then passing that value to k_event_clear(). Simply pass
UINT32_MAX to k_event_clear() and use the return value to slightly
reduce overhead.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
OUT endpoint 0 cannot be disabled and therefore the only way to forcibly
reclaim the buffer is to reset the core. The reset does not finish if
PHY clock is not running, but just triggering the reset seems to be
enough to be able to reclaim the buffer.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
When the endpoint is disabled while the core is hibernated, there are
timeouts waiting for interrupts. It is not clear how the stack should
behave when class and/or application wants to disable the endpoint while
device is suspended. The problem was originally observed when the
endpoints were disabled as a result of usbd_disable() call.
Avoid the timeouts by modifying the backup values instead of the real
registers (which are not accessible when hibernated).
Split the 32-bit txf_set variable into two 16-bit variables (txf_set and
pending_tx_flush) because maximum number of TxFIFO instances is 16.
The txf_set variable is used as-is, while the pending_tx_flush is used
to keep track of TxFIFOs that have to be flushed on hibernation exit.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
DWC2 thread must acquire UDC mutex before accessing shared resources
(peripheral registers and software data structures). Whenever software
enqueues a buffer, the caller first obtains mutex, adds the buffer to
the list, posts event to wake up DWC2 thread and releases mutex. If DWC2
thread has higher priority than the task currently holding a mutex,
there will be two completely unnecessary task switches: DWC2 will switch
in, try to obtain mutex, and then the control will be returned to the
mutex holder.
Avoid the unnecessary task switches by locking scheduler prior to
obtaining the mutex and unlocking scheduler after releasing the mutex.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
USB stack does not check api->lock() and api->unlock() return value and
all UDC drivers block without timeout in its lock() and unlock() api
implementations. There is no realistic way to handle lock() and unlock()
errors without making USB device stack API unnecessarily complex.
Remove the return type from lock() and unlock() to make it clear that
the functions must not fail.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
In Completer mode SETUP data can always be received and therefore
endpoint 0 should only be enabled for OUT Data Stage and OUT Status
Stage.
In Buffer DMA mode, SETUP can only be received when endpoint is enabled
and therefore the software has to make sure that there is a buffer
available to receive SETUP data.
Rework the EP0 buffer feeding to adhere to DWC2 Programming Guide.
Synchronize the accesses with driver mutex to avoid interrupt related
race conditions.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
The transfer is finished after ZLP is transmitted. Do not re-enable the
endpoint waiting for more data.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Control OUT buffers must be multiple of bMaxPacketSize0 in Buffer DMA
mode. While the transfer can be configured to smaller values, DMA will
write data past the buffer (and transfer size counter will underflow) if
the packet on the bus is larger or if there are multiple back-to-back
SETUP packets.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Use helper functions to check whether device is operating in Buffer DMA
or Completer mode. This allows compile time optimizations to remove DMA
handling code when DMA is disabled via KConfig symbol UDC_DWC2_DMA.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
DWC2 otg versions earlier than 5.00a are subject to randomly occurring
glitch on Hibernation Exit by Host Initiated Resume, Hibernation Exit by
Device Inititated Resume and Hibernation Exit by Host Initiated Reset.
When the glitch happens the device address is not correctly restored.
If the address is not correctly restored then the tokens addressed to
the device will timeout leading to host resetting the bus.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Programming Guide states that bit 17 on PCGCCTL writes should be set if
the controller was enumerated for High Speed operation. Add the missing
bit set to adhere to the Programming Guide.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
DMA transfers are supposed to write to buffer tail. Use the proper
pointer to make multipart DMA transfers actually write the data to the
intended location.
The issue was observed on control write transfers where the OUT Data
Stage was at least 128 bytes (because endpoint 0 transfer width is
limited to 7 bits).
The issue is unlikely to happen on non-control transfers because the
transfer size width is at least 11 bits (at most 19 bits) and packet
size counter is at least 4 bits (at most: 10 bits) which means that
at least 2048 byte transfer spanning at least 15 packets (or at least
524288 byte spanning 1023 packets for 19 bits transfer size counter
and 10 bits packet counter) is required to necessitate multipart DMA.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Prepare buffer to receive SETUP data on OUT endpoint 0 after endpoint
halt. This solves the issue where the device would no longer process any
control transfers after the first failed transfer with too large OUT
Data Stage (when processing failed due to data stage buffer allocation
failure).
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
The DOEPTSIZ0 XferSize field is 7 bits long, meaning that maximum single
transfer can be 127 bytes long. Configure the control write (OUT)
transfers considering the XferSize field size to support transfers with
data stage larger than 127 bytes.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Endpoint disable function is racing against bus traffic. If the bus
traffic leads to transfer completion immediately before the endpoint
disable is executed, then the transfer complete interrupt will remain
set when the endpoint is disabled. For OUT endpoints this leads to "No
buffer for ep" errors, while for IN endpoint this can lead to double
buffer pull which causes assertion failure.
The proper solution would be to change endpoint disable to not actually
wait for the individual events (and accept that the endpoint may not
need to be disabled because the transfer can just finish). For the time
being workaround the issue by clearing XferCompl bit on endpoint
disable.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
When handling incomplete iso IN interrupt mark current transfer as
complete and program the endpoint with any subsequently queued packet.
Program the endpoint directly in interrupt handler because the data
must be programmed before SOF (by the time incomplete iso IN interrupt
is raised there is less than 20% * 125 us = 25 us before SOF).
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
The NAKSts bit may be set on isochronous OUT endpoints when incomplete
ISO OUT interrupt is raised. The code would then assume that endpoint is
already disabled and would not perform the endpoint disable procedure.
This in turn was essentially halting any transmission on the isochronous
endpoint, abruptly terminating the data stream.
Fix the issue by always following full endpoint disable procedure on
isochronous endpoints.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
DWC2 peripherals can have TxFIFO sizes configured to any value between
16 and 32768. The value configured during synthesis is the maximum value
the software can program. Designs that give full flexibility configure
the TxFIFO sizes to value equal to total SPRAM size.
Currently DWC2 driver does not have prior knowledge about the endpoints
used within available configurations and has to come up with TxFIFO0
value up front. The original approach was to use MAX(16, max allowed).
locations. Because DWC2 peripheral cannot have TxFIFO0 with size lower
than 16 locations, always the max allowed was used. This logic prevented
any IN endpoint other than EP0 on designs that have TxFIFO0 size set to
total SPRAM size.
Change the logic to MIN(2 * 16, max allowed) to have sufficient memory
available on flexible designs and allow simultaneous operation if
possible (i.e. when maximum TxFIFO0 size is at least 32).
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
DWC2 otg OUT transfers are being used for SETUP DATA0, OUT Data Stage
packets and OUT Status Stage ZLP. On High-Speed it is possible for IN
Data Stage, OUT Status Stage ZLP and subsequent SETUP DATA0 to happen
in very quick succession, making all the three events appear at the same
time to the handler thread.
The handler thread is picking up next endpoint to handle based on the
least significant bit set. When OUT endpoints were on bits 0-15 and IN
endpoints were on bits 16-31, the least significant bit policy favored
OUT endpoints over IN endpoints. This caused problems in Completer mode
(but suprisingly not in Buffer DMA mode) that lead to incorrect control
transfer handling.
The choice between least significant bit first or most significant bit
first is arbitrary. Switching from least to most significant bit first
would have resolved the issue. It would also favor higher numbered
endpoints over lower numbered endpoints.
Swap the order of endpoints in bitmaps to have IN on bits 0-15 and OUT
on bits 16-31 to keep handling lower numbered endpoints first and
resolve the control transfer handling in Completer mode.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Peripherals are required to support the suspend state whenever VBUS
is powered, even if bus reset has not occurred (Mandate: Required,
Effective Date: February, 2010).
Remove the stale suspend check that essentially prevented the device
from hibernating when connected to already suspended bus.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Make it possible to have vendor quirks after hibernation entry sequence
and before hibernation exit sequence.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
According to USB 2.0 Specification the remote wakeup device must hold
the resume signaling for at least 1 ms but for no more than 15 ms. The
DWC2 otg controller requires the software to drive the remote wakeup
signalling for appropriate duration (when LPM is disabled, which is
currently always the case in udc_dwc2). Arbitrarily choose to drive the
resume signalling for 2 ms to have sufficient margin in both directions.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
The PMU is not reset by core reset and therefore it is necessary to exit
hibernation on DWC2 disable to prevent endless PMU interrupt loop when
the driver is enabled again.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Enter hibernation in thread context with the lock held to make sure to
not queue any transfers when the core is hibernated.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
UDC API handlers and DWC2 driver thread share not only software
constructs, but also the underlying hardware. Ensure that any UDC API
call is not preempted by DWC2 driver thread (and vice versa) by
acquiring the lock in thread handler.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
When the queue is full, all messages posted inside interrupt handlers
are simply dropped. This problem can be remedied by having the message
queue large enough, but determining the maximum number of messages that
can ever be posted in the system is really complex task.
Hopefully in DWC2 driver there is finite number of events that have to
be processed inside thread handler. Therefore it is unnecessary to
determine the maximum queue size for the events if the events are posted
to k_event object instead of send to k_msgq object.
Use combination of three k_event structures to handle all possible event
sources. This not only guarantees by design that no event will be lost,
but also slightly reduces the memory usage.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Do not allocate TxFIFO in a way that could corrupt Endpoint Information
Controller data (stored at DFIFO Depth address) or access locations
outside the SPRAM.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Do not schedule isochronous data on current frame. While doing so can
work at Full-Speed, it is pretty much impossible to do it quickly enough
at High-Speed.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Change ensures that `dwc2_handle_xfer_next` would notify upper layer if
`dwc2_tx_fifo_write` fails. This is necessary to ensure that upper layer
is aware of the failed TX for the submitted transfer. It also ensures
that the submitted transfer is removed from the TX queue.
Signed-off-by: Marek Pieta <Marek.Pieta@nordicsemi.no>
Signed-off-by: Kamil Piszczek <Kamil.Piszczek@nordicsemi.no>
Save power when the bus is suspended by entering hibernation if
hibernation support is enabled and controller supports hibernation.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Remove the early max packet size check on endpoint activation, because
there are valid use cases where the max packet size is not multiple of 4
and the stack won't perform multi-transaction transfers. Example use
case is UAC2 explicit feedback endpoint that has Max Packet Size equal
3 when device is operating at Full-Speed.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Fail TxFIFO write if the endpoint is not activated, because there is
essentially no fifo assigned. Writing to not activated endpoint is
possible, but it does tend to corrupt fifo number 0 which is used for
control transfers. USB stack should not really be attempting to write to
endpoints that have not been activated (ep_enable called).
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Store SOF frame number and use the information to set the Even/Odd
microframe bit when enabling isochronous endpoints. Reject received
isochronous data if number of packets received does not align with
received PID.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>