-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stackmargin
results are misleading
#1872
Comments
In an ideal hardware world, there'd be a register that records the numerically-lowest (i.e. deepest stack) value of the SP as a high watermark, and the kernel could read that out and replace it on context switch. None of our target processors have such a feature. Systems like FreeRTOS attempt to indicate a stack high water mark by recording SP values any time a task is preempted. As noted above on the bullet about kernel entry, we effectively already get this for free --- FreeRTOS's behavior mostly makes sense because it's a non-memory-protected system vulnerable to stack clash, so you could overflow a stack and still take an interrupt/syscall. |
What about doing this manually? Create a similar kind of system as ringbuf, only it's sized to 4 bytes and tracing into it reads the current stack pointer, and does a compare and write if greater. Then, pepper this trace to any and all functions even remotely suspected of having a chance of exhausting the stack (both in userland and kernel land?). Turn on the feature in lab builds. Done. The downsides are obviously that placing these traces is manual and thus won't catch everything, and running the traces may be cheap but they're not free, and you may not enjoy the increase in testing time. |
Yeah, these feel like pretty significant drawbacks to me -- a protection mechanism that only works when people remember to copy-paste some code won't be reliable for long. |
Yeah, that's true. If it turned out that doing the "tracing" was fast enough to not really affect performance then this might be somewhat mitigated by making this an attribute macro that is applied to all functions: It would be harder to forget a But it would still definitely be possible. |
Sampling the SP any time a task is preempted sounds interesting. The kernel could note the SP high water mark for the task. That accounting does not sound too costly. The humility As you said the SP high water mark does not indicate that data was actually written, so there is a gap in knowledge there. However, the |
Perhaps the SP high water mark accounting could be a tracepoint that is a NO-OP by default, so it does not impact production. Humility could poke some bits and activate the accounting only when profiling. |
So I ended up looking into and thinking about this a bit. As I see it, there are two main options:
Interrupt based sampling has been kind of well talked above, I feel: It's fairly easy to add on top of what Hubris already does, but has the downside of not detecting everything. So, stack probing! Best as I can tell, LLVM probably doesn't natively support stack-probing for 32-bit ARM currently, or at least I was unable to find code in llvm-project that would point to that being a thing. Additionally, this patch is unreviewed and unmerged [1]. See also the closed rust-embedded issue by the same author, referencing this LLVM patch [2]. But on the other hand there exists this merged patch about changing the default max page size on ARM32 so I'm a bit confused: [3]. Still, since Hubris has no paging any max page size stuff is meaningless. Best as I can tell, you'd need to insert the stack probe call (or inline it) to the beginning of every call (unless you can statically prove it cannot overflow the stack). ... Would you also have different stack base address for each task? Or an otherwise statically knowable SP max value? That would make a stack probe easier to implement. Though that's a side point. I have a feeling that getting stack probes to work "natively" will be an uphill battle. Overriding Rust's P.S. This may also be interesting [4]. |
That does seem to be the only way to get the information in the absence of hardware support (a hardware maintained SP high water mark register mentioned above). It would cost software, in both code footprint and execution cycles. warning crazy talk ahead For sake of argument, let's say the problem of plunking down some code in every function prologue is solved -- it's a tool in the toolbox. As embedded folks know, having debug/profiling code conditionally included in images is problematic -- when you want it, it is not in the image you are running. You enable it, recompile and now the image is too big. Blah. It would be nice to have the debug code compiled in all the time (paying a code foot print penalty of course), but have it remain dormant or inactive (no runtime penalty) until activated. Would it be possible to have "dtrace like" probes (nops and trampolines) in hubris that humility could activate for profiling? Thinking about the stack probe problem and calling something in every function prologue (or perhaps only functions suitably attributed). The probe might be 4 bytes? 8 bytes? And the trampoline "real code" would be 100 bytes (wag), which would be common for the task, only one copy per task. Clearly you wouldn't want too many latent probes since code footprint is often at a premium. Trade offs abound. Just a thought. On the general "investigating stack usage" topic I have used cargo-call-stack to investigate. While you cannot just throw a hubris task at it, I did take some troublesome sections of code from a project to make a toy example and feed that to |
There are a bunch of cases where it'd be nice to have something like this, @cbrune, but there's a problem: unlike on Unix, our code is in ROM. Certain ARM microcontrollers implement a "Flash Patch Unit" that can be used to interpose on flash instruction reads and substitute different values --- this was used for breakpoints and hooks. This facility is gradually going away on newer parts, however, because manufacturers have realized that people (hi) can use it to defeat their DRM and un-protect their ROMs. I always saw that as a feature, but, to each their own I suppose. Without the ability to flash-patch, we can't really replace instructions in a task on Hubris. We could potentially use the breakpoint unit --- set a breakpoint at the location you want to intercept, and have Humility notice that and emulate the tracepoint. There are two implementation difficulties there:
If anyone sees other ways of implementing this sort of thing, I'm definitely here for it. I may have missed opportunities. |
That accounting would be suuuuper cheap, actually. It'd be lost in the noise relative to the current syscall entry/exit sequences. Roughly three machine cycles, I think? I'd need to implement it. But I think something like this would be worth doing, personally. |
(This could arguably be in the Humility repo, but I suspect that any changes we make to fix the problem will be in Hubris, so I'm filing it here; YOLO.)
tl;dr: The
humility stackmargin
results can lull you into a false sense of security. We should figure out how to fix this.What stackmargin does
When a task is (re)initialized, we fill the stack memory with a recognizable pattern of bits.
humility stackmargin
scans the task's stack area from low addresses to high, and reports the first point where this recognizable pattern of bits is altered. This is roughly the highest watermark of used stack memory in this incarnation of the task.What programs do
The issue is that programs don't completely write their stack frames. There are usually locals allocated in the stack that are only used on conditional paths. This is not theoretical -- we've been bitten by this in a couple of situations:
stackmargin
.This means that an "out of stack" condition can hide, unnoticed by either the kernel or
stackmargin
, for an arbitrary period of time until it happens to be noticed by an interrupt. (At that point, it remains invisible tostackmargin
, sincestackmargin
doesn't consider crashed tasks.)And so
So
stackmargin
right now is always an over-estimate. In some cases, it's an extreme over-estimate. This is bad, because we usestackmargin
to check whether e.g. a compiler upgrade will cause problems by changing inlining decisions. Currently, we could observe a higherstackmargin
but actually have a new stack overflow problem (if the leaf function in the deepest stack has a very large partially used frame).We should figure out how to fix this.
I'd like to open the floor to brainstorming on this. Here are my initial thoughts.
The text was updated successfully, but these errors were encountered: