0

I'm trying to calculate maximum stack usage of an embedded program using static analysis.

I've used the compiler flag -fstack-usage to get the maximum stack usage for each function and the flag -fdump-rtl-expand to generate a graph of all function calls.

The last missing ingredient is stack usage of built-in functions. (at the moment it's only memset)

I guess I could measure it some other way and put a constant into my script. However, I don't want a situation where the implementation of the built-in function changes in a new version of GCC and the value in my script stays the same.

Maybe there is some way to compile built-in functions with the flag -fstack-usage? Or some other way to measure their stack usage via static analysis?


Edit:

This question is not a duplicate of Stack Size Estimation. The other question is about estimating stack usage of an entire program while I asked how to estimate it for a single built-in library function. The other question doesn't even mention built-in library functions nor any of the answers for it does.

NO_NAME
  • 2,233
  • 17
  • 45
  • The compiler is free to implement built-in functions as it sees fit. In particular, there is no reason that different calls to (say) `memset` would always use the same amount of stack. – TonyK Apr 19 '19 at 19:31
  • @TonyK I think it would be safe to assume that there is only one version of this function. I don't see what benefits a few version would have but even if they are some, it would be outweighed by increase of size of the program. I mean, this is a program for microcontroller which has only a few kilobytes of program memory and compiler knows that so it would really bad optimization. – NO_NAME Apr 19 '19 at 19:41
  • @TonyK Another point would be that I check mangled names of functions so it should be visible if they are a few versions. – NO_NAME Apr 19 '19 at 19:43
  • If you say so... – TonyK Apr 19 '19 at 19:57
  • One of the things the compiler can do with a call to `memset` is replace it with equivalent machine code. This is highly likely if the second and third arguments are known at compile-time. – rici Apr 20 '19 at 02:22
  • @rici Actually, I don't have explicit calls to built-in functions in my code. The only call to `memset` is generated by the compiler. It seem to make all optimizations before creating RTL file (the one crated by flag `-fdump-rtl-expand`). I know for sure it does inlining before that. – NO_NAME Apr 20 '19 at 09:54
  • @NO_NAME: OK, I think I understand your context better. I don't have an answer to your question as asked, but it might be possible to ask gcc to not produce the call to `memset`, if it is being generated as a replacement for a loop in your code (as opposed to the zero-initialization of a structure on the stack). Replacement of loops is controlled by the option `-ftree-loop-distribute-patterns`, which is enabled by default at `-O3`. I don't know how to turn off use of memset/memcpy for initialisation, though. Also, you might want memset to be called. – rici Apr 20 '19 at 19:04
  • @rici I think `-ffreestanding` would be more appropriate. – yugr Apr 24 '19 at 08:53
  • Possible duplicate of [Stack Size Estimation](https://stackoverflow.com/questions/1756285/stack-size-estimation) – yugr Apr 24 '19 at 09:06
  • Thanks for your answers but I won't be messing with optimization of the program only for minor improvement in static analysis. – NO_NAME Apr 24 '19 at 09:15
  • @NO_NAME Note that the question header is misleading. You are really looking to get an estimate of compiled code from a library (word "builtin" have very specific meaning in GCC land). – yugr Apr 24 '19 at 10:45
  • @yugr Built-in functions are exactly what I mean. Here: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html – NO_NAME Apr 24 '19 at 10:51
  • @NO_NAME My understanding was that you were trying to find stack usage of function (`memset`) in external library (`libc`). Builtin GCC functions is a rather special compiler construct (many of them don't even compile to function calls and some compile to sequences of function calls). – yugr Apr 24 '19 at 11:27
  • I have to remove my +1, the question is ill-posed as it stands. – yugr Apr 24 '19 at 12:03
  • @yugr No, it is not an external library. I've written that this question is about embedded program. There are no "external libraries" on the device. There is no even a file system. – NO_NAME Apr 24 '19 at 14:08
  • @NO_NAME I don't think I ever mentioned that library has to reside on device. Static libc for embedded target that your app links for `memset` implementation is external code as well. – yugr Apr 24 '19 at 14:47
  • @yugr I won't claim to really know what I'm saying here but I've did a little google research and it seems that GCC prefers built-in functions over libc unless `-fno-builtin` is specified. I guess this is even more true for calls generated by the compiler and not explicitly stated in source code. – NO_NAME Apr 24 '19 at 19:36
  • @yugr Idn, you may have right that GCC uses function from `libc`. I don't know how to check that and I certainly cannot rule this out. Sorry for being stubborn. I guess, I'm just waiting for some proof that would show me what is the truth I'm trying to give your arguments some punches to see which doesn't move. – NO_NAME Apr 24 '19 at 19:55
  • You are right in that gcc will only generate call to `memset` when `-fbuiltin` is on (because this option allows him to assume that all libc library functions have standard semantics). But your end goal (collect information about precompiled `memset` implementation in your toolchain's libc) does not depend on the fact whether you called it yourself or the call was automatically generated by compiler. That's why I don't think builtins are important in this context. – yugr Apr 25 '19 at 06:28

2 Answers2

1

Approach 1 (dynamic analysis)

You could determine stack size at runtime by filling stack with a predefined pattern, executing memset and then checking how many bytes have been modified. This is slower and more involved as you need to compile a sample program, upload it to target (unless you have a simulator) and collect results. You'll also need to be careful about test data that you supply to the function as execution path may change depending on size, data alignment, etc.

For a real-world example of this approach check Abseil's code.

Approach 2 (static analysis)

In general static analysis of binary code is tricky (even disassembling it isn't trivial) and you'd need sophisticated symbolic execution machinery to deal with it (e.g. miasm). But in most cases you can safely rely on detecting patterns which your compiler uses to allocate frames. E.g. for x86_64 GCC you could do something like:

objdump -d /lib64/libc.so.6 | sed -ne '/<__memset_x86_64>:/,/^$/p' > memset.d
NUM_PUSHES=$(grep -c pushq memset.d)
LOCALS=$(sed -ne '/sub .*%rsp/{ s/.*sub \+\$\([^,]\+\),%rsp.*/\1/; p }' memset.d)
LOCALS=$(printf '%d' $LOCALS)  # Unhex
echo $(( LOCALS + 8 * NUM_PUSHES ))

Note that this simple approach produces a conservative estimate (getting more precise result is doable but would require a path-sensitive analysis which requires proper parsing, building control-flow graph, etc.) and does not handle nested function calls (can be easily added but should probly be done in a language more expressive than shell).

AVR assembly is in general more complicated because you can't easily detect allocation of space for local variables (modification of stack pointer is split across several in, out and adiw instructions so would require non-trivial parsing in e.g. Python). Simple functions like memset or memcpy don't use local variables so you can still get away with simple greps:

NUM_PUSHES=$(grep -c 'push ' memset.d)
NUM_RCALLS=$(grep -c 'rcall \+\.+0' memset.d)
# A safety check for functions which we can't handle
if grep -qi 'out \+0x3[de]' memset.d; then
  echo >&2 'Unable to parse stack modification'
  exit 1
fi
echo $((NUM_PUSHES + 2 * NUM_RCALLS))
yugr
  • 13,457
  • 3
  • 37
  • 71
  • I don't think a few simple regular expressions can be called "a lot of parsing". Do binary files even store frame sizes? I was under impression that this information is lost after assembler is done with it. Isn't it just a bunch of pushes pops and other stack instruction at this point? – NO_NAME Apr 22 '19 at 20:14
  • I know that assembly generated from source file has this information but it doesn't seem like something that disassembler could figure out. – NO_NAME Apr 22 '19 at 20:15
  • @NO_NAME Sorry, I didn't get notifications for this. Yes, this information is lost but noone stops you from computing it yourself. I've added example in my answer. – yugr Apr 24 '19 at 08:08
  • This is not so simple. Push is not the only instruction that can affect size of stack. For example program can just manually move stack pointer to allocated some initialized memory. Also there is no rule that different paths of execution of a function may not use different amount of stack. This method would sum all that into a single result which is still kind of useful information but I'm not sure if it would be accurate enough. – NO_NAME Apr 24 '19 at 09:18
  • As for dynamic analysis, I know the method and I use it to verify if my static analysis works but the end goal is to use only static analysis and that's the scope of the question. – NO_NAME Apr 24 '19 at 09:21
  • @NO_NAME "Push is not the only instruction that can affect size of stack" - sure, it also depends on the platform but normally `push` and `sub` are the only stack opcodes that compiler generates. "This method would sum all that into a single result" - yes, my method produces a conservative estimate. – yugr Apr 24 '19 at 10:43
  • @NO_NAME You could use an open-source asm parser or symbolic executor to estimate stack size precisely but I'd say this would be an overkill for such simple task... – yugr Apr 24 '19 at 10:47
  • I assume that would work for `libc` on `x86_64`. My code is written for AVR, though. The problem is that this is much harder to parse. Here the assembly snippet: https://pastebin.com/y4zxJSyx I don't think this is worth it. – NO_NAME Apr 24 '19 at 19:59
  • @NO_NAME Yeah, I just provided x64 to illustrate the idea. Your snippet looks like a standard two-address asm so same approach should work. I can try cooking something if you provide disasm (better disasm for full libc, `memset` often does not use stack at all so is a bad example). – yugr Apr 25 '19 at 04:58
  • Here's the result of `avr-objdump -d libc.a`: https://pastebin.com/RwKTqYSJ I've had to remove 3 last functions to fit it into one pastebin page but there was nothing special at the end. I cannot tell if there are any funny operations on stack pointer. I guess even crude estimation is better than total guesswork, though. – NO_NAME Apr 25 '19 at 17:55
  • @NO_NAME I've updated my answer. It's not a complete solution (it _will_ work for simple functions like `memset` or `memcpy` but handling more complicated code would require involved parsing of asm to detect matching sequences of `in`s, `adiw`s and `out`s which should better be done in Perl or Python). – yugr Apr 26 '19 at 07:30
0

This is not a great answer but it still may be useful.

Many of built-in functions are very simple. For example memset can be implemented just as a simple loop. From my observation it appears that compiler avoid using stack if it can just use registers (which makes perfect sense). Only very long function need more stack. All that shorter ones need is the return address for ret instruction.

It is relatively safe to assume that simple built-in functions don't use stack at all aside from instructions call and ret, so the amount of memory is equal to size of pointer to a function. (2 bytes in my case)

Keep in mind that embedded systems don't always have Von Neumann architecture and they often store instructions and data in separate memories. Size of pointers to function and data may be different.

NO_NAME
  • 2,233
  • 17
  • 45