This patch fixes memory corruption that can happen when running in
multi-thread and multi-core environment with heavy scheduling stress.
In SMP configuration, we must ensure that all thread's context is
stored before writing the switch_handle flag. Otherwise some of the
thread context writes could be delayed after another CPU begins to
schedule this thread which could lead to memory corruptions.
Signed-off-by: Sylvain Chouleur <schouleur@snapchat.com>
Initializing the C++ stack unwinding data structures takes quite a bit
of stack space. Increase the TEST_EXTRA_STACK_SIZE when using these.
Signed-off-by: Keith Packard <keithp@keithp.com>
Zephyr switched to using CMSIS_6 module in f726cb51
which breaks certain boards like `nucleo_h745zi_q/stm32h745xx/m7` when
CONFIG_CORTEX_M_DWT, CONFIG_TIMING_FUNCTIONS are enabled and cmsis from
`module/hal/cmsis` is not available (deleted explicitly after west
update).
This commit adds a provision to be able to use CMSIS_6 macros when the
module cmsis is not available.
Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Stack walking on x86_64 can run with compiler optimization and
does not need CONFIG_NO_OPTIMIZATIONS as long as frame pointers
are not omitted. So remove the "depends on" for x86_64
from CONFIG_ARCH_HAS_STACKWALK.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Add missing curly braces in if/while/for statements.
This is a style guideline we have that was not enforced in CI. All
issues fixed here were detected by sonarqube SCA.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Building gdbstub for xtensa is failing currently with multiple
failures like
arch/xtensa/core/gdbstub.c:432:24: error: invalid operands to \
binary - (have 'int *' and 'const struct arch_esf *')
432 | if ((int *)bsa - stack > 4) {
Fix them by using appropriate pointer types.
Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
The original 'arch_pm_s2ram_resume' implementation saved lr on the stack
using 'push {lr}' and restored it using 'pop {lr}'. However, the Thumb-1
'pop' does not support lr as a target register, so this code would not
compile for ARMv6-M or ARMv8-M Baseline. r0 was added to these push/pop
later in 2590c48d40.
In 474d4c3249, arch_pm_s2ram* functions were
modified to no longer use the stack, which incidentally "fixed" this issue.
b4fb5d38eb reverted this commit and brought
back 'pop {r0, lr}' as-is, without taking compatibility into account.
Modify the sequence to use "pop {r0, pc}" which is supported on all
ARM M-profile implementations (v6/v7/v8 Baseline/v8 Mainline), and
add comments to (hopefully) prevent this issue from re-appearing.
Signed-off-by: Mathieu Choplain <mathieu.choplain@st.com>
On the Xilinx MPSoC (Cortex-R5) platform, erratic operation was often
seen when an operation which disabled the dcache, such as sys_reboot,
was performed. Usually this manifested as an undefined instruction trap
due to the CPU jumping to an invalid memory address.
It appears the problem was due to dirty cache lines being present at the
time the cache is disabled. Once the cache is disabled, the CPU will
ignore the cache contents and read the possibly out-of-date data in main
memory. Likewise, since the cache was being cleaned after it was already
disabled, if the CPU had already written through changes to some memory
locations, cleaning the cache at that point would potentially overwrite
those changes with older data.
The fact that the arch_dcache_flush_and_invd_all function was being
called to do the cleaning and invalidation also contributed to this
problem, because it is a non-inline function which means the compiler
will generate memory writes to the stack when the function is called and
returns. Corruption of the stack can result in the CPU ending up jumping
to garbage addresses when trying to return from functions.
To avoid this problem, the cache is now cleaned and invalidated prior to
the dcache being disabled. This is done by directly calling the
L1C_CleanInvalidateDCacheAll function, which, as it is declared as force
inline, should help ensure there are no memory accesses, which would
populate new cache lines, between the cache cleaning and disabling the
cache.
Ideally, for maximum safety, the cache cleaning and cache disabling
should be done in assembler code, to guarantee that there are no memory
accesses generated by the compiler during these operations. However, the
present change does appear to solve this issue.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
This file previously defined an MPU access permission mode of 0x7 which
corresponded to privileged read-only, unprivileged read-only, similar to
mode 0x6. However, it appears that at least Cortex-R5 does not support
this mode, defining 0x7 as UNP (Unpredictable) or a value which should
not be used.
This value was in turn referenced by the REGION_FLASH_ATTR macro, which
caused the offending value to be used when a memory region was declared
as DT_MEM_ARM(ATTR_MPU_FLASH) in the device tree, causing such regions
to not work properly on Cortex-R5.
Since 0x6 is supported by both Cortex-M and Cortex-R and does the same
thing, there is no reason to use 0x7. Remove the RO_Msk definition which
referenced it, and change REGION_FLASH_ATTR to use P_RO_U_RO_Msk instead.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Add Kconfig option to dump only a portion of stack from the
current stack pointer to the stack end. This is enough to
let gdb reconstruct the stack trace and can significantly
reduce the dump size. This is crucial if the core dump needs
to be sent over radio.
Additionally, add another option to set the limit for the
dumped stack portion.
Signed-off-by: Damian Krolik <damian.krolik@nordicsemi.no>
This ASSERT fails because the comparison is made between the physical
APIC ID and the virtual CPU's LAPIC ID. If CPU-1 is assigned to ACRN,
it considers that CPU as its CPU-0.
Signed-off-by: Anisetti Avinash Krishna <anisetti.avinash.krishna@intel.com>
Xtensa arch layer has some custom compilation commands to
generate the interrupt dispatchers and the core-isa* files.
However, the include path to find core-isa.h does not work
for Espressif ESP32. So update the mechanism to use correct
path pointing to Espressif HAL when targeting ESP32 family
SoCs.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The QingKe V2C has an integer multiplier but no divide. Add support
for the corresponding Zmmul extension and, as the extension was added
in GCC 13.0, add a test for the compiler version.
Signed-off-by: Michael Hope <michaelh@juju.nz>
Add support for .mot file flash using west flash command
The RX build output .mot as binary file to flash into
board
Signed-off-by: Phi Tran <phi.tran.jg@bp.renesas.com>
Signed-off-by: Duy Nguyen <duy.nguyen.xa@renesas.com>
GCC and Clang support the undefined behavior sanitizer in any
configuration, the only restriction is that if you want to get nice
messages printed, then you need the ubsan library routines which are only
present for posix architecture or when using picolibc.
This patch adds three new compiler properties:
* sanitizer_undefined. Enables the undefined behavior sanitizer.
* sanitizer_undefined_library. Calls ubsan library routines on fault.
* sanitizer_undefined_trap. Invokes __builtin_trap() on fault.
Overhead for using the trapping sanitizer is fairly low and should be
considered for use in CI once all of the undefined behavior faults in
Zephyr are fixed.
Signed-off-by: Keith Packard <keithp@keithp.com>
There is no need for kernel/internal/smp.h as SOF does not call
z_sched_ipi(). Actually... git log over there has no mention of
z_sched_ipi() anywhere, just arch_sched_ipi().
And include <ksched.h> for source using z_sched_ipi() since
they are using scheduling functions, and would be the correct
file to include.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Move and rephrase the comment to better explain reason why obtained
IRQ number is decremented.
Signed-off-by: Dominik Ermel <dominik.ermel@nordicsemi.no>
Allow to place the vector table section in SRAM with
CONFIG_SRAM_VECTOR_TABLE option for all cortex-m architecture that
have VTOR register.
Signed-off-by: Martin Hoff <martin.hoff@silabs.com>
Move SRAM_VECTOR_TABLE symbol from general Kconfig to Arch Kconfig
because it depends on the architecture possibility to relocate the
vector table.
Signed-off-by: Martin Hoff <martin.hoff@silabs.com>
The declaration for the text region '__text_region_start' and
'__text_region_end' should be consistent with the declaration in the
include file 'zephyr/linker/linker-defs.h'.
The local declarations are:
extern uintptr_t __text_region_start, __text_region_end;
whereas 'linker_defs.h' declares them as:
extern char __text_region_start[];
extern char __text_region_end[];
This may result in conflicting types when 'linker_defs.h' is indirectly
included. Hence, remove the local declarations.
Signed-off-by: Christoph Winklhofer <cj.winklhofer@gmail.com>
The `nocache` is not loadable, thus data stored therein cannot be
initialized by the startup code. This might be needed in special
cases. E.g. One might have a buffer which one wants to DMA into,
and which is a member of a struct. Other members of the struct one
may want to have initialized by the startup code.
The buffer thus should be placed in the `nocache` region, but for
the other members of the buffer to be initialized by the startup
code, the `nocache` region needs to be loadable.
Fix it by making the `nocache` region loadable. Adding a KConfig
symbol to do this optionally was considered, but deemed unnecessary
during the PR.
Signed-off-by: Julian Achatzi <mail@achatzi.pro>
When looking for jump address in the syscall table, we need to
multiply the syscall ID by 4 before adding the address offset
of the beginning of the table. This is due to the jump address
being 32-bit (4 bytes). Instead of using two instructions to
shift the ID by 4 first and then the addition, we can use one
ADDX4 instruction to achieve the same result.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Since the necessary register values are now pre-computed and
stored in the memory domain struct, we can use them directly
in various assembly locations, thus replacing the function
call to xtensa_swap_update_page_tables().
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When context switching and dealing with non-nested interrupts,
the context to be restored are saved in the thread stack.
When userspace is enabled, this means saving context into
the user stacks for user threads. This allows PS values to be
manipulated externally by setting PS.RING in the saved PS
value to 0, resulting in granting kernel access privilege when
the thread is restored. To prevent this, we store the PS value
into the thread struct instead, where user threads cannot
manipulate that. Note that nested interrupts and syscalls are
not using the user stack but the interrupt stack and thread
privileged stack respectively, where they are not accessible
under user mode.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When syscall assembly is executed, the EPC points to the syscall
instruction, and we have to manually advance it so we will
return to the instruction after syscall to continue execution.
However, with zero-overhead loops and the syscall instruction is
the last instruction, this simple addition does not work as it
would point past the loop and would have skipped the loop.
Because of this, syscall entrance would need to look at the loop
registers and set the PC back to the beginning of loop if we are
still looping. Assuming most of the syscalls are not inside
loops, the extra handling code consumes quite a few cycles.
To workaround this, simply adds a nop after syscall so we no
longer have to deal with loops at syscall entrance, and that
a nop is faster than all the code to manipulate loop registers.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This extends arch_cohere_stacks() to handle privileged stacks of
user threads when userspace is enabled.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Towards the end of interrupt handling, and before restoring
context, we would spill all register windows. This requires
A0 and A1 to be restored from the saved context so spilling
would work correct. However, when coherence is enabled,
window spilling has already been done earlier so there is
no need to spill the register windows again. So there is
no need to restore A0 and A1. They will be restored again
before returning from interrupt anyway.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Instead of computing all the needed register values when
swapping page tables, we can pre-compute those values when
the memory domain is first initialized. Should save some
time during context switching.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
There is no need for ZSR_FLUSH when threads are pin only
(CONFIG_SCHED_CPU_MASK_PIN_ONLY=y), so there is no need to
reserve it.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
In xtensa_userspace_enter, we were hard-coding to use level 2
interrupt return mechanism to pivot to user mode and start
running the user thread. However, EPC2 and EPS2 may be used
for other purposes, and they could be used for interrupt
return if there are only two level interrupts. So change
the userspace enter to use ZSR_RFI_LEVEL, ZSR_EPC and ZSR_EPS
instead to be more explicit.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When returning from syscall, we cannot use RFE by using EPC1.
As there is no EPS1, we would need to write to PS before
returning. However, this creates a situation where interrupts
are being enabled (since PS is set), and any interrupts will
overwrite EPC1 before we return (which ensures chaos as we
would be returning to the wrong address). So utilize the same
mechanism as interrupt returning by use ZSR_EPS, ZSR_EPC and
ZSR_RFI_LEVEL.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
There is no need to do a jump to _syscall_returned as it is
the next to be run anyway. Keep the label there so we can
set breakpoint if needed.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
We stuff the 7th argument into stack by moving the stack pointer
before calling syscall handler. The Xtensa ABI says stack must
be 16-byte aligned. So instead of moving stack pointer 4 bytes,
we move 16 bytes (assuming stack has been aligned so far).
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Xtensa ISA mentions that stack always needs to be aligned on
16 bytes. So we need to pad the stack frame to be also 16 bytes
aligned when dealing with interrupts. Or else the stack would
not be 16 bytes aligned when we add the stack frame to stack
during interrupt handling.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When crossing stack during interrupt handling, we do two call4
to pivot to the interrupt stack, with arguments to these two
call4 stashed in A6, A10 and A11. However, A4-A11 may be marked
as invalid in the register file, and accessing them would
result in window overflowing. At that point, A0 and A1 are not
setup to handle window overflows, and will result in registers
being stashed in incorrect location, resulting in incorrect
value being restored during window underflowing. So move around
the code a bit to restore A0 and A1 properly before accessing
A4-A11.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Do not force CONFIG_XTENSA_SMALL_VECTOR_TABLE_ENTRY when MMU
is enabled. It is up to the SoC to decide whether they need
this.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The logic to swap page tables or MPU entries is moved to the end
of cross stack call, since it is still running in the interrupt
stack instead of the thread stack. The old logic was calling
the swap functions in the outgoing thread stack, which is not
desirable.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When switching page table and MPU enties, we need to call
corresponding functions via call4 which would move the register
window. So we need to do that earlier before we start saving
context into stack by manipulating stack pointer manually which
definitely would interfere with window spilling.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
During arch_cohere_stacks(), the used portion of the outgoing
thread is cache flushed, and then the unused portion of cache
invalidated. However, this results in the cache line at
the stack pointer being flushed and then invalidated due to
how sys_cache_data_*() operates. If we are swapping back to
the same thread (e.g. after handling interrupt), this cache
line will need to be retrieved again from main memory since
it has already been invalidated. This creates unnecessary
data move between cache and main memory. So create our own
version of cache flushing and invalidation routines just for
arch_cohere_stacks(). Bouns is that these work directly with
bounding addresses and skips the size calculation which should
save a little bit amount of execution time.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This removes _xstack_call1_* trampoline as we can simply use
callx4 to jump to the interrupt handler.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Before cross stack call is setup correctly, we cannot allow
interrupts to be triggered or it may interfere with register
window spilling since we are clobbering registers needed for
that to work. However, there was a brief period where higher
level interrupts could fire due to code writing to PS with
lowered interrupt mask before raising it again. So rework
that part to avoid writing PS with intermediate value, and
now we mask interrupt until everything is setup correctly
before interrupt is enabled again.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
When initializing the stack at thread creation, we should not
set the pointer to privileged stack pointer yet as the thread
can be a kernel thread. Only when a thread is transitioning to
user mode, then we need to set the pointer to point to
the privileged stack. This is a purely semantic change and
should not affect any functionality.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>