You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed there is a PR #210 that removed most #[inline] annotations. However, in developing one of my crates, I found that Rust's stdlib cbrtf was a significant bottleneck. I attempted to use this crate, but it provided minimal benefit due to the functions not being inlined. The bottleneck was due to the fact that cbrtf was being called in a loop, and could not be autovectorized because it had to be accessed via a call instruction.
If I add the #[inline] attribute, my function is able to inline and autovectorize the code, which resulted in a 25% performance improvement. Therefore, I would move to consider readding the #[inline] attribute where possible. (#[inline(always)] is not needed, only #[inline] to allow the compiler to consider it.)
Edit: I tried doing the same with powf for other places it's used in the crate, and in that case it had a very negative effect... so this turns out to not be a universal thing at all 🤔
The text was updated successfully, but these errors were encountered:
If you've found a specific case where the lack of #[inline] causes a performance problem for you then we are happy to accept a PR to add the #[inline] back.
I benchmarked every function with #[inline] vs. as-is, and there are a few considerable improvements. On my machine (x86_64 Arch Linux, Ryzen 7 5800H) I got these significant results - though I'd appreciate someone on a different system/architecture running a bench as well:
atan: 3 -> 2 ns/iter
atan2: 6 -> 5 ns/iter
cosf: 4 -> 2 ns/iter
coshf: 4 -> 3 ns/iter
expm1f: 3 -> 2 ns/iter
hypot: 4 -> 2 ns/iter
hypotf: 2 -> 1 ns/iter
sinf: 5 -> 2 ns/iter
tan: 4 -> 3 ns/iter
tanf: 7 -> 2 ns/iter
tanh: 8 -> 6 ns/iter
I only included results where #[inline] helped by >10% but I will do more runs and average out results before doing any proper PR, for now just want to put this out there
I noticed there is a PR #210 that removed most
#[inline]
annotations. However, in developing one of my crates, I found that Rust's stdlibcbrtf
was a significant bottleneck. I attempted to use this crate, but it provided minimal benefit due to the functions not being inlined. The bottleneck was due to the fact thatcbrtf
was being called in a loop, and could not be autovectorized because it had to be accessed via acall
instruction.If I add the
#[inline]
attribute, my function is able to inline and autovectorize the code, which resulted in a 25% performance improvement. Therefore, I would move to consider readding the#[inline]
attribute where possible. (#[inline(always)]
is not needed, only#[inline]
to allow the compiler to consider it.)Edit: I tried doing the same with
powf
for other places it's used in the crate, and in that case it had a very negative effect... so this turns out to not be a universal thing at all 🤔The text was updated successfully, but these errors were encountered: