First step is to profile the code and make sure you understand exactly where the bottleneck is. People are often bad at guessing bottlenecks in code, and you might be surprised with the findings. You can simply instrument several parts of this routine yourself and dump min/avg/max durations in regular intervals. You want to see the worst case (max), and if the average duration increases as the time goes by.
I doubt that malloc
will take any significant portion of these 2ms on a reasonable microcontroller capable of running Linux; I'd say it's more likely you would run out of memory due to fragmentation, than having performance issues. If you have any other syscalls in your function, they will easily take an order of magnitude more than malloc
.
But if malloc
is really the problem, depending on how short-lived your objects are, how much memory you can afford to waste, and how much your requirements are known in advance, there are several approaches you can take:
General purpose allocation (malloc
from your standard library, or any third party implementation): best approach if you have "more than enough" RAM, many short-lived objects, and no strict latency requirements
- PROS: works for any object size out of the box, familiar interface, memory is shared dynamically, no need to "plan ahead" if memory is not an issue
- CONS: slight performance penalty during allocation and/or deallocation, memory fragmentation issues when doing lots of allocations/deallocations of objects of different sizes, whether a run-time allocation will fail is less deterministic and cannot be easily mitigated at runtime
Memory pool: best approach in most cases where memory is limited, low latency is required, and the object needs to live longer than a single block scope
- PROS: allocation/deallocation time is guaranteed to be
O(1)
in any reasonable implementation, does not suffer from fragmentation, easier to plan its size in advance, failure to allocate at run-time is (likely) easier to mitigate
- CONS: works for a single specific object size - memory is not shared between other parts of the program, requires a planning for the right size of the pool or risking potential waste of memory
Stack based (automatic-duration) objects: best for smaller, short-lived objects (single block scope)
- PROS: allocation and deallocation is done automatically, allows having optimum usage of RAM for the object's lifetime, there are tools which can sometimes do a static analysis of your code and estimate the stack size
- CONS: objects limited to a single block scope - cannot share objects between interrupt invocations
Individual statically allocated objects: best approach for long lived objects
- PROS: no allocation whatsoever - all needed objects exist throughout the application life-cycle, no problems with allocation/deallocation
- CONS: wastes memory if the objects should be short-lived
Even if you decide to go for memory pools all over the program, make sure you add profiling/instrumentation to your code. And then leave it there forever to see how the performance changes over time.