Move the syscall_handler.h header, used internally only to a dedicated
internal folder that should not be used outside of Zephyr.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Integrates object core statistics framework into the following
kernel objects:
sys_mem_blocks, k_mem_slab
threads, _cpu, z_kernel
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
The original idea of z_current_get() was to be the counterpart
of k_current_get() when thread local variable for current has
not been initialized if TLS is enabled, otherwise they are
the same function. Now since z_current_get() is being used
outside of core kernel, rename it under kernel namespace so
other subsystem can conceptually use them too.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Previously we limit maximum number of CPU cores to 5, now be
bumping this restriction so we can use 12 cores.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Evgeniy Paltsev <PaltsevEvgeniy@gmail.com>
This header does not expose any public APIs, so move it under
kernel/include and change files including it.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
Combining Meta IRQs with cooperative threads requires extra care to
return to pre-empted cooperative threads when returning from a Meta IRQ.
This is only needed when there are cooperative threads that are not also
Meta IRQs. This PR saves some space & time when the number of Meta IRQs
is equal to the number of available cooperative threads.
Signed-off-by: Florian Grandel <fgrandel@code-for-humans.de>
When `CONFIG_FPU_SHARING` is enabled each `k_thread` struct has a saved
floating point context (`saved_fp_context`). During a context switch, the
current FPU owner's (`_current_cpu->arch.fpu_owner`) registers are saved
to its `saved_fp_context`, and the destination threads FPU registers are
loaded from its `saved_fp_context`.
When a thread ends, it does not release ownership of the FPU
(`_current_cpu->arch.fpu_owner`). This is problematic if the `k_thread`
struct was allocated on the stack. The next context switch will save the
FPU registers into `k_thread -> saved_fp_context` which may now be out of
scope. This will likely (but not always) result in a crash.
Adding `arch_float_disable(thread);` when a thread ends disables
preservation of floating point context information, fixing this issue
Signed-off-by: Grant Ramsay <gramsay@enphaseenergy.com>
In commit d537267f, the check on thread abortion was moved from next_up
to z_get_next_switch_handle. However, next_up is also called from
z_swap_next_thread, so the check on thread abortion is now missing there.
This sometimes caused the thread to be stuck in ABORTING + PENDING state
during the test_smp_switch_torture in test/kernel/smp
To avoid such cases in the future, it is worth leaving the check in next_up
Signed-off-by: Vadim Shakirov <vadim.shakirov@syntacore.com>
As discovered by Carlo Caione, the k_thread_join code had a case where
it detected it had been called on a thread already marked _THREAD_DEAD
and exited early. That's not sufficient. The thread state is mutated
from the thread itself on its exit path. It may still be running!
Just like the code in z_swap(), we need to spin waiting on the other
CPU to write the switch handle before knowing it's safe to return,
otherwise the calling context might (and did) do something like
immediately k_thread_create() a new thread in the "dead" thread's
struct while it was still running on the other core.
There was also a similar case in k_thread_abort() which had the same
issue: it needs to spin waiting on the other CPU to kill the thread
via the same mechanism.
Fixes#58116
Originally-by: Carlo Caione <ccaione@baylibre.com>
Signed-off-by: Andy Ross <andyross@google.com>
This trick turns out also to be needed by the abort/join code.
Promote it to a more formal-looking internal API and clean up the
documentation to (hopefully) clarify the exact behavior and better
explain the need.
This is one of the more... enchanted bits of the scheduler, and while
the trick is IMHO pretty clean, it remains a big SMP footgun.
Signed-off-by: Andy Ross <andyross@google.com>
When a running thread gets aborted asynchronously (this only happens
in SMP contexts, obviously) it gets flagged "aborting", but the actual
abort needs to happen in the thread's own context. For convenience,
this was done in the next_up() routine that selects the next thread to
run at interrupt exit time.
But this check was being done AFTER the next candidate thread was
selected from the run queue. Thread abort can wake up threads blocked
in k_thread_join(), and therefore these weren't seen as runable
threads, even if they should have been.
Executive summary: if you killed a thread running on another CPU, and
there was another thread joined to the killed thread that should have
run on that CPU, it wouldn't (until it received an interrupt or
otherwise reached a schedule point).
Move the abort check above the run queue inspection and into the
end-of-interrupt processing in z_get_next_switch_handle() (so it's
actually a mild performance boost as it's no longer part of the
cooperative context switch path). Simple fix, subtle bug.
Fixes#58040
Signed-off-by: Andy Ross <andyross@google.com>
Many areas of Zephyr divide and round up without using the DIV_ROUND_UP
macro. Make use of it, so that we make use of a tested system macro and
at the same time we make code more readable.
Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
All we really want here is to set default parameters. However
k_sched_time_slice_set() also calls z_reset_time_slice(_current)
which expects `_current` to be fully initialized.
Simply initialize `slice_ticks` and `slice_max_prio` with default values
directly. Unfortunately the compiler isn't smart enough to expand
k_ms_to_ticks_ceil32(CONFIG_TIMESLICE_SIZE) to a constant expression
at build time so we must do the conversion by hand (and it shouldn't
overflow due to the nature of the value).
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Slice expirations are now based on the same timeout mechanism as
regular timers which have been recently fixed and proven to work with
single-tick periods.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
The reason for arch_num_cpus() is to be able to dynamically adapt to
the actual number of available CPUs at run time.
In the z_sched_init() case, it is not the number of active CPUs that
we need but rather the total number of potential CPUs, and that is
represented by CONFIG_MP_MAX_NUM_CPUS not arch_num_cpus().
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Make sliceable() the actual condition for a sliceable thread. Avoid
creating a slice timeout for non sliceable threads. Always reset
slice_expired even if the next thread is not sliceable. Fold
slice_expired_locked() into z_time_slice() to avoid the hidden
unlock/lock. Change `curr` to `thread` as this is not necessarily
the current thread (yet) being set. Make variables static.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Updates events to prevent a timeout from corrupting the list of
threads that needs to be waken up.
Signed-off-by: Aastha Grover <aastha.grover@intel.com>
Moving timeslice events to timeouts isn't quite enough on SMP, as it's
still possible for systems that don't broadcast their timer interrupts
to end up handling an expiration for a foreign CPU. There, we need an
IPI, and a symmetric call to z_time_slice() (which is itempotent and
fast) in the IPI ISR.
Signed-off-by: Andy Ross <andyross@google.com>
Rework the fragile and ad-hoc computation of timeslice expirations
into per-CPU struct _timeout objects with regular callbacks. The
expiration callbacks themselves simply set a per-cpu flag (they might
run on any CPU), which gets checked at the end of the timer ISR on
every CPU.
This simplifies logic and removes a bunch of code. It also fixes at
least three bugs:
1. As @npitre discovered: On SMP, the number of ticks announced on any
given CPU is going to be a subset of all expired ticks. This broke
the accounting of timeslice ticks, and effectively meant that
timeslicing only worked on SMP on systems where one CPU could hog all
the announcements, and only on that CPU.
2. The bootstrap path to arm the timer driver after setting the first
timeout in an empty list couldn't take into account
sys_clock_elapsed() ticks, as it didn't know whether it was being
called underneath an existing announce loop. Now this code is no
longer responsible for knowing anything about time slicing at all.
3. Also on SMP, there was a case where two CPUs timeslicing
simultaneously could stomp on each others' timeouts in
z_set_timeout_expiry(), as neither had a way of knowing what the
other's state was. CPUs could miss their own expiration and have to
wait for the slice expiration on the other CPU. Now, timeouts are
global objects with simple expiration times, and there's no need for
that function at all.
Signed-off-by: Andy Ross <andyross@google.com>
By the time the working list of readers/writers is processed, it is
possible that waiting reader/writer being processed had timed out
and is no longer on the wait queue. As such, we can not blindly
wake the next thread as that next thread might not be the thread we
had just been processing.
To address this, the calls to z_sched_wake() have been replaced
with z_unpend_thread() and z_ready_thread() so that a specific
thread can be safely targeted for waking.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Adds a routine to safely walk a specified wait queue and invoke a
custom callback function on each waiting thread.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
- Logging supports printing 64-bit values now. Cast to unsigned long and
use %lu all times.
Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
Move runtime code to use arch_num_cpus() instead of CONFIG_MP_NUM_CPUS
and use CONFIG_MP_MAX_NUM_CPUS for ifdef and BUILD_ASSERT macros.
Signed-off-by: Kumar Gala <kumar.gala@intel.com>
Change for loops of the form:
for (i = 0; i < CONFIG_MP_NUM_CPUS; i++)
...
to
unsigned int num_cpus = arch_num_cpus();
for (i = 0; i < num_cpus; i++)
...
We do the call outside of the for loop so that it only happens once,
rather than on every iteration.
Signed-off-by: Kumar Gala <kumar.gala@intel.com>
For historical reasons[1] suspending threads would release the
scheduler lock between pend() (which places the current thread onto a
wait queue) and z_swap() (which effects the context swtich). This
process happens with the caller's lock held, so local interrupts are
masked. But on SMP this opens a tiny race where another CPU could
grab the pended thread and switch to it while we were still executing
on its stack!
Fix this by elevating the "lock swap" code that already exists in the
(portable/switch-based) z_swap() code one level so that it happens in
z_pend_curr() also. Now we hold the scheduler lock between pend and
the final context switch.
Note that this technique can't work for the older z_swap_irqlock()
implementation, which exists to vestigially support a few bits of arch
code (mostly direct interrupts) that don't work on SMP anyway.
Address with an assert to prevent future misuse.
[1] z_swap() is a historical API implemented in per-arch assembly for
older architectures (like ARM32!). It was designed to be called
with what at the time was a global IRQ lock, so it doesn't
understand the idea of a separate scheduler lock. When we finally
get all archictures on arch_switch() this design can be cleaned up
quite a bit.
Signed-off-by: Andy Ross <andyross@google.com>
When building with CONFIG_SCHED_CPU_MASK_PIN_ONLY=y, CPU mask
is fixed and cannot be changed while thread is running.
The current code asserts if thread state is anything but PREPARED.
We do however have interface like k_work_queue_start() where a thread is
started as part of the queue start. To allow user to set the pinned CPU
for the work queue thread, it needs to be possible to suspend the
thread, set the mask, and then call k_thread_resume(). This seems to be
a valid sequence, so relax the assert check to reflect this.
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
MISRA C:2012 Rule 14.4 (The controlling expression of an if statement
and the controlling expression of an iteration-statement shall have
essentially Boolean type.)
Use `bool' instead of `int' to represent Boolean values.
Use `do { ... } while (false)' instead of `do { ... } while (0)'.
Use comparisons with zero instead of implicitly testing integers.
This commit is a subset of the original commit:
5d02614e34
Signed-off-by: Simon Hein <SHein@baumer.com>
Documentation specifies that aborting/terminating/exiting essential
threads is a system panic condition, but we didn't actually implement
that and allowed it as for other threads. At least one app wants to
exploit this documented behavior as a "watchdog" kind of condition,
and that seems reasonable. Do what we say we're supposed to do.
This also includes a small fix to a test, which seemed like it was
written to exercise exactly this condition. Except that it failed to
detect whether or not a system fatal error was actually signaled and
was (incorrectly) indicating "success". Check that we actually enter
the handler.
Fixes#45545
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
In order to bring consistency in-tree, migrate all kernel code to the
new prefix <zephyr/...>. Note that the conversion has been scripted,
refer to zephyrproject-rtos#45388 for more details.
Signed-off-by: Gerard Marull-Paretas <gerard.marull@nordicsemi.no>
Implements a function that application and driver code can use to check
whether it is valid to yield (or block) in the current context. This
check is required for functions that can feasibly be run from multiple
contexts. The primary intended use case is power management transition
functions, which can be run by application code explicitly or
automatically in the idle thread by system PM.
Signed-off-by: Jordan Yates <jordan.yates@data61.csiro.au>
Do not allow changing the CPU which a thread is pinned when it is
already being executed. This allows further optimizations in some
platforms with incoherent memory since we can safely assume that the
thread will run in the same CPU and avoid invalidate / flush the
cache during context switches.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
The original design intent with arch_sched_ipi() was that
interprocessor interrupts were fast and easily sent, so to reduce
latency the scheduler should notify other CPUs synchronously when
scheduler state changes.
This tends to result in "storms" of IPIs in some use cases, though.
For example, SOF will enumerate over all cores doing a k_sem_give() to
notify a worker thread pinned to each, each call causing a separate
IPI. Add to that the fact that unlike x86's IO-APIC, the intel_adsp
architecture has targeted/non-broadcast IPIs that need to be repeated
for each core, and suddenly we have an O(N^2) scaling problem in the
number of CPUs.
Instead, batch the "pending" IPIs and send them only at known
scheduling points (end-of-interrupt and swap). This semantically
matches the locations where application code will "expect" to see
other threads run, so arguably is a better choice anyway.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Add an API that clears cpu mask from a thread and sets it to a specific
CPU.
This is the equivelent of:
k_thread_cpu_mask_clear(&thread);
k_thread_cpu_mask_enable(&thread, cpu_idx);
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
This is an attempt at formally distinguishing and supporting the case
described in 40795 where an architecture doesn't preserve/restore the
complete thread state upon entering/exiting interrupt exception state.
This is mainly about promoting the current behavior from the accepted
workaround to a formal API specification. This workaround is currently
used on ARM64 but RISC-V requires it too.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Zephyr's timeslice implementation has always been somewhat primitive.
You get a global timeslice that applies broadly to the whole bottom of
the priority space, with no ability (beyond that one priority
threshold) to tune it to work on certain threads, etc...
This adds an (optionally configurable) API that allows timeslicing to
be controlled on a per-thread basis: any thread at any priority can be
set to timeslice, for a configurable per-thread slice time, and at the
end of its slice a callback can be provided that can take action.
This allows the application to implement things like responsiveness
heuristics, "fair" scheduling algorithms, etc... without requiring
that facility in the core kernel.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Moves the CONFIG_SCHED_THREAD_USAGE block of code out of sched.c
into its own file. Not only do they employ their own private
spin lock, but it is expected that additional usage routines will be
added in the future.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Move z_priq_mq_add and z_priq_mq_remove into #ifdef CONFIG_SCHED_MULTIQ
block, because they are only used with that config.
Signed-off-by: Jeremy Bettis <jbettis@google.com>
Applies the 'static' keyword to the following inlined routines:
z_priq_dumb_add()
z_priq_mq_add()
z_priq_mq_remove()
As those routines are only used in one place, they no longer have
externally visible declarations.
Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
Removed unused functions, or moved inside #ifdefs.
This allows using -Werror=unused-function on the clang compiler. Tested
by building the ChromeOS EC on all supported platforms with
-Werror=unused-functions.
Signed-off-by: Jeremy Bettis <jbettis@google.com>
It turns out that we have a sample (though not a test) that really
does want to use "k_thread_runtime_stats_all_get()" to measure system
uptime.
Instead of breaking this needlessly, separate the accounting for idle
and non-idle threads. The legacy API can report their sum, and the
more useful value is available via the kernel struct for future
analysis.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>