Another batch of syscall perf enhancements #1964

cbiffle · 2024-12-26T04:39:14Z

This is unspooling more commits from my performance fix branch in my fork. As with the previous performance PR, this is a collection of semi-related commits that may be easier to review commit-by-commit than as a big diff. Boy I miss Gerrit.

This is only a ~10% overall improvement because most of the truly low hanging fruit has been eaten. But, this significantly improves the performance of kernel memory copies and SEND with an empty message (unusual but not unheard of). This also knocks a few hundred bytes off a typical kernel by removing some panic sites.

This was generating code for the ..copy_len slicing _and_ the equality check implied by copy_from_slice. Now it just does the latter. This only takes about 3% off safe_copy, but produces smaller binaries by removing an additional panic site.

I had gone to some lengths in the original implementation to only use safe code, and in general it worked pretty well. However, it was generating more panic sites (and checks) than I'd like. This new version is shorter, clearer (IMO), and generates somewhat faster code. By shortening the various routines involved, this reduces the mean cost of safe_copy during IPC syscalls by about 3%.

Empty copies are somewhat unusual, but appear in IPC interfaces that follow certain patterns, or in lease accesses that hit the end. Previously they were expensive nops. This makes them cheap nops. This reduces the cost of a zero-byte send by 24% (to about 928 cycles on a Cortex-M4, if you're curious).

TaskState is a nontrivial enum. It turns out, we were comparing it often enough that the compiler thought it'd be good to out-line the comparison into a callable function. That function is big and slow because it covers every case. Being out-lined, it has to be generic and can't use any context to optimize. So, I've gone through and replaced all TaskState comparisons with pattern deconstruction. This knocks like 256 bytes off a typical kernel and speeds up syscall-heavy code by 3% on average.

The missing must_use was just an oversight -- access check functions that return flags, instead of faulting or returning Result, should always be must_use. inline(always) turns out to significantly improve performance of Task::can_access by specializing the generated code for the predicate it uses.

hawkw

very nice! nothing too complicated in this one.

hawkw · 2024-12-27T03:33:28Z

sys/kern/src/umem.rs

+    /// Shortening `a` and `b` ensures that the slices returned from `try_read`
+    /// / `try_write` have the same length, without the need for further
+    /// slicing.


Neat! Love to see panic sites disappear!

hawkw · 2024-12-27T03:34:37Z

sys/abi/src/lib.rs

@@ -252,7 +252,10 @@ impl TaskState {
    /// Checks if a task in this state is trying to deliver a message to
    /// `target`.
    pub fn is_sending_to(&self, target: TaskId) -> bool {
-        self == &TaskState::Healthy(SchedState::InSend(target))
+        match self {
+            TaskState::Healthy(SchedState::InSend(t)) => *t == target,


Using matches instead of generating whole PartialEq/Eq implementations is a smart move when optimizing for binary size, i'll have to remember that.

cbiffle added 3 commits December 25, 2024 21:12

umem: remove unnecessary extra slice check

8f89156

This was generating code for the ..copy_len slicing _and_ the equality check implied by copy_from_slice. Now it just does the latter. This only takes about 3% off safe_copy, but produces smaller binaries by removing an additional panic site.

cbiffle requested a review from hawkw December 26, 2024 04:39

cbiffle added 2 commits December 25, 2024 21:39

cbiffle force-pushed the cbiffle/more-syscall-perf branch from aa584cb to af2f519 Compare December 26, 2024 04:40

hawkw approved these changes Dec 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Another batch of syscall perf enhancements #1964

Another batch of syscall perf enhancements #1964

cbiffle commented Dec 26, 2024

hawkw left a comment

hawkw Dec 27, 2024

hawkw Dec 27, 2024

Another batch of syscall perf enhancements #1964

Are you sure you want to change the base?

Another batch of syscall perf enhancements #1964

Conversation

cbiffle commented Dec 26, 2024

hawkw left a comment

Choose a reason for hiding this comment

hawkw Dec 27, 2024

Choose a reason for hiding this comment

hawkw Dec 27, 2024

Choose a reason for hiding this comment