-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Another batch of syscall perf enhancements #1964
base: master
Are you sure you want to change the base?
Conversation
This was generating code for the ..copy_len slicing _and_ the equality check implied by copy_from_slice. Now it just does the latter. This only takes about 3% off safe_copy, but produces smaller binaries by removing an additional panic site.
I had gone to some lengths in the original implementation to only use safe code, and in general it worked pretty well. However, it was generating more panic sites (and checks) than I'd like. This new version is shorter, clearer (IMO), and generates somewhat faster code. By shortening the various routines involved, this reduces the mean cost of safe_copy during IPC syscalls by about 3%.
Empty copies are somewhat unusual, but appear in IPC interfaces that follow certain patterns, or in lease accesses that hit the end. Previously they were expensive nops. This makes them cheap nops. This reduces the cost of a zero-byte send by 24% (to about 928 cycles on a Cortex-M4, if you're curious).
TaskState is a nontrivial enum. It turns out, we were comparing it often enough that the compiler thought it'd be good to out-line the comparison into a callable function. That function is big and slow because it covers every case. Being out-lined, it has to be generic and can't use any context to optimize. So, I've gone through and replaced all TaskState comparisons with pattern deconstruction. This knocks like 256 bytes off a typical kernel and speeds up syscall-heavy code by 3% on average.
The missing must_use was just an oversight -- access check functions that return flags, instead of faulting or returning Result, should always be must_use. inline(always) turns out to significantly improve performance of Task::can_access by specializing the generated code for the predicate it uses.
aa584cb
to
af2f519
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice! nothing too complicated in this one.
/// Shortening `a` and `b` ensures that the slices returned from `try_read` | ||
/// / `try_write` have the same length, without the need for further | ||
/// slicing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! Love to see panic sites disappear!
@@ -252,7 +252,10 @@ impl TaskState { | |||
/// Checks if a task in this state is trying to deliver a message to | |||
/// `target`. | |||
pub fn is_sending_to(&self, target: TaskId) -> bool { | |||
self == &TaskState::Healthy(SchedState::InSend(target)) | |||
match self { | |||
TaskState::Healthy(SchedState::InSend(t)) => *t == target, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using matches instead of generating whole PartialEq
/Eq
implementations is a smart move when optimizing for binary size, i'll have to remember that.
This is unspooling more commits from my performance fix branch in my fork. As with the previous performance PR, this is a collection of semi-related commits that may be easier to review commit-by-commit than as a big diff. Boy I miss Gerrit.
This is only a ~10% overall improvement because most of the truly low hanging fruit has been eaten. But, this significantly improves the performance of kernel memory copies and SEND with an empty message (unusual but not unheard of). This also knocks a few hundred bytes off a typical kernel by removing some panic sites.