1

We had an issue with one of our real time application. The idea was to run one of the threads every 2ms (500 Hz). After the application ran for half or an hour so.. we noticed that the thread is falling behind.

After a few discussions, people complain about the malloc allocations in the real time thread or root cause is malloc allocations.

I am wondering that, is it always a good idea to avoid all dynamic memory allocations in the real time threads?

Internet has very few resource on this ? If you can point to some discussion that would we great too..

Thanks

BhanuKiran
  • 947
  • 1
  • 10
  • 20
  • 1
    The less you do in your thread, the quicker it will execute. And allocating memory needs to take detours to the kernel and back which is always going to add some extra time into realtime processing. The less kernel system calls you make, the better. – Some programmer dude Jan 29 '20 at 10:55
  • Use a profiler to get an accurate picture where your application spends time. Instrument your critical function to get the time it takes to execute. – Maxim Egorushkin Jan 29 '20 at 11:10
  • 3
    Check the need of dynamic memory allocation. Can you replace dynamic allocation to just a VLA's? Or statically allocate memory page for real time task? – TruthSeeker Jan 29 '20 at 11:21
  • @TruthSeekser ..We cross compiled lttng couple of days ago and now in the process of applying it. But i am more curious about people avoiding mallocs in Real time threads. – BhanuKiran Jan 29 '20 at 11:25
  • 2
    That is the classic question of the ease of coding vs. performance balance. 2 ms is rather short so you should profile your code to see where you spend time. When you are at low level optimization you start to chase every context switch userkernel mode. An malloc **is** expensive... – Serge Ballesta Jan 29 '20 at 12:24
  • I face this problem a long time ago. This is the famous memory leak (https://en.wikipedia.org/wiki/Memory_leak). Try to check if all the **free** are in the right place. I am sure you will solve it!! – Leos313 Jan 29 '20 at 12:28
  • 1
    Can you post your code? – Leos313 Jan 29 '20 at 12:30
  • 1
    I write embedded Linux software and what we do is allocate memory pools at the initialisation step of our software, and from that point on memory is allocated from those pools which is a fast operation and doesn't involve a system call; it also avoids the risk of a failed memory allocation in the middle of processing (malloc can fail, we would rather this happen on init instead of later). – JMercer Jan 29 '20 at 13:36

2 Answers2

3

First step is to profile the code and make sure you understand exactly where the bottleneck is. People are often bad at guessing bottlenecks in code, and you might be surprised with the findings. You can simply instrument several parts of this routine yourself and dump min/avg/max durations in regular intervals. You want to see the worst case (max), and if the average duration increases as the time goes by.

I doubt that malloc will take any significant portion of these 2ms on a reasonable microcontroller capable of running Linux; I'd say it's more likely you would run out of memory due to fragmentation, than having performance issues. If you have any other syscalls in your function, they will easily take an order of magnitude more than malloc.

But if malloc is really the problem, depending on how short-lived your objects are, how much memory you can afford to waste, and how much your requirements are known in advance, there are several approaches you can take:

  1. General purpose allocation (malloc from your standard library, or any third party implementation): best approach if you have "more than enough" RAM, many short-lived objects, and no strict latency requirements

    • PROS: works for any object size out of the box, familiar interface, memory is shared dynamically, no need to "plan ahead" if memory is not an issue
    • CONS: slight performance penalty during allocation and/or deallocation, memory fragmentation issues when doing lots of allocations/deallocations of objects of different sizes, whether a run-time allocation will fail is less deterministic and cannot be easily mitigated at runtime
  2. Memory pool: best approach in most cases where memory is limited, low latency is required, and the object needs to live longer than a single block scope

    • PROS: allocation/deallocation time is guaranteed to be O(1) in any reasonable implementation, does not suffer from fragmentation, easier to plan its size in advance, failure to allocate at run-time is (likely) easier to mitigate
    • CONS: works for a single specific object size - memory is not shared between other parts of the program, requires a planning for the right size of the pool or risking potential waste of memory
  3. Stack based (automatic-duration) objects: best for smaller, short-lived objects (single block scope)

    • PROS: allocation and deallocation is done automatically, allows having optimum usage of RAM for the object's lifetime, there are tools which can sometimes do a static analysis of your code and estimate the stack size
    • CONS: objects limited to a single block scope - cannot share objects between interrupt invocations
  4. Individual statically allocated objects: best approach for long lived objects

    • PROS: no allocation whatsoever - all needed objects exist throughout the application life-cycle, no problems with allocation/deallocation
    • CONS: wastes memory if the objects should be short-lived

Even if you decide to go for memory pools all over the program, make sure you add profiling/instrumentation to your code. And then leave it there forever to see how the performance changes over time.

Groo
  • 45,930
  • 15
  • 109
  • 179
  • 1
    `malloc()` and memory pool use can also suffer from inter-thread contention if multiple threads use those resources. `malloc()` implementations in particular can be effectively single-threaded due to internal locking. Stack-based allocation can be used for objects of any size if both the total size is known and the size of the stack can be controlled. If full POSIX threading is available, `pthread_attr_setstack()` and/or `pthread_attr_setstacksize()` can be used to control the thread stack. – Andrew Henle Jan 29 '20 at 14:20
  • @AndrewHenle: Thanks for mentioning `pthread_attr_setstack`! `malloc` is usually the worst option in most aspects, but borrowing an item from a memory pool is so little work that I don't believe it should ever cause contention, especially since the appropriate use case is to keep the object alive longer than just a single function scope, so the amortized cost of an allocation should be minuscule (otherwise you would just place the temporary object on the stack). If static duration is too long and stack-based allocation is too short, there is nothing better than a memory pool that I know of. – Groo Jan 29 '20 at 14:39
  • 1
    "People are often bad", better said "are notoriously off-track, most of the time". An early Unix tale had the maintainer of the FORTRAN compiler estimating a certain routine would be called each 10 compilations, spent a week optimizing it and released the result. Some two years (and some hundred thousand compilations) later it crashed: checking the report, it would have crashed *each time*, the "optimized" version was wrong. – vonbrand Jan 29 '20 at 14:52
0

Being a realtime software engineer in the aerospace industry we see this question a lot. Even within our own engineers, we see that software engineers attempt to use non-realtime programming techniques they learned elsewhere or to use open-source code in their programs. Never allocate from the heap during realtime. One of our engineers created a tool that intercepts the malloc and records the overhead. You can see in the numbers that you cannot predict when the allocation attempt will take a long time. Even on very high end computers (72 cores, 256 GB RAM servers) running a realtime hybrid of Linux we record mallocs taking 100's of milliseconds. It is a system call which is cross-ring, so high overhead, and you don't know when you will get hit by garbage collection, or when it decides it must request another large chunk or memory for the task from the OS.