55

What is the proper/preferred way to allocate memory in a C API?

I can see, at first, two options:

1) Let the caller do all the (outer) memory handling:

myStruct *s = malloc(sizeof(s));
myStruct_init(s);

myStruct_foo(s);

myStruct_destroy(s);
free(s);

The _init and _destroy functions are necessary since some more memory may be allocated inside, and it must be handled somewhere.

This has the disadvantage of being longer, but also the malloc can be eliminated in some cases (e.g., it can be passed a stack-allocated struct:

int bar() {
    myStruct s;
    myStruct_init(&s);

    myStruct_foo(&s);

    myStruct_destroy(&s);
}

Also, it's necessary for the caller to know the size of the struct.

2) Hide mallocs in _init and frees in _destroy.

Advantages: shorter code, since the functions are going to be called anyway. Completely opaque structures.

Disadvantages: Can't be passed a struct allocated in a different way.

myStruct *s = myStruct_init();

myStruct_foo(s);

myStruct_destroy(foo);

I'm currently leaning for the first case; then again, I don't know about C API design.

Tordek
  • 10,075
  • 3
  • 31
  • 63
  • 2
    btw i think this would be a great interview question, to compare and contrast the two designs. – frankc Jul 21 '10 at 05:01
  • 3
    Here's an article by Armin Ronacher on how to make the structures opaque but still allow customizing allocation: http://lucumr.pocoo.org/2013/8/18/beautiful-native-libraries/ – Sam Hartsfield Apr 08 '14 at 18:00

11 Answers11

19

Another disadvantage of #2 is that the caller doesn't have control over how things are allocated. This can be worked around by providing an API for the client to register his own allocation/deallocation functions (like SDL does), but even that may not be sufficiently fine-grained.

The disadvantage of #1 is that it doesn't work well when output buffers are not fixed-size (e.g. strings). At best, you will then need to provide another function to obtain the length of the buffer first so that the caller can allocate it. At worst, it is simply impossible to do so efficiently (i.e. computing length on a separate path is overly expensive over computing-and-copying in one go).

The advantage of #2 is that it allows you to expose your datatype strictly as an opaque pointer (i.e. declare the struct but don't define it, and use pointers consistently). Then you can change the definition of the struct as you see fit in future versions of your library, while clients remain compatible on binary level. With #1, you have to do it by requiring the client to specify the version inside the struct in some way (e.g. all those cbSize fields in Win32 API), and then manually write code that can handle both older and newer versions of the struct to remain binary-compatible as your library evolves.

In general, if your structs are transparent data which will not change with future minor revision of the library, I'd go with #1. If it is a more or less complicated data object and you want full encapsulation to fool-proof it for future development, go with #2.

Pavel Minaev
  • 94,882
  • 25
  • 209
  • 280
  • 1
    +1 for the point about abstraction and opaque pointers - this is a big advantage as it completely decouples your implementation from the calling code – Paul R Jul 21 '10 at 05:12
  • Nice answer for having an actual discerning recommendation about when to use each method. – mtraceur Feb 08 '21 at 05:31
18

Method number 2 every time.

Why? because with method number 1 you have to leak implementation details to the caller. The caller has to know at least how big the struct is. You can't change the internal implementation of the object without recompiling any code that uses it.

JeremyP
  • 80,230
  • 15
  • 117
  • 158
  • 3
    Which means #2 can be implemented as a binary compatible interface, with minor version API additions, enhancements etc not breaking client code when shipped in a .so or .dll This answer needs more upvotes – kert Apr 02 '13 at 02:02
  • 3
    The caller does have to know the size of the object (and perhaps the alignment?), but that doesn't mean that it has to know it _statically_: you could have `myStruct_size(void)` and `myStruct_alignment(void)`. See [this question](http://stackoverflow.com/questions/26471718/dynamically-allocate-properly-aligned-memory-is-the-new-expression-on-char-arra). – Kalrish Oct 23 '14 at 09:53
  • @Kalrish Why does the caller have to know the size? I agree that *if* the caller, at any point needs to know the size, you can add the methods you suggest, but a properly designed API does not require the caller to know anything about the internals of an object - including size and alignment. – JeremyP Oct 28 '14 at 14:37
  • @JeremyP Such design makes it impossible to use, e.g., static memory, or to reuse the same memory - and memory allocation is one of the problems of statically hiding the implementation. I agree, nevertheless, that it wouldn't be pleasant to use. Perhaps, an intermediate solution would be to also implement `*_alloc(...)` methods as part of the API. That way, "lazy" users could go with dynamic allocation, and wrappers (e.g. C++) could do their own memory management. – Kalrish Oct 28 '14 at 18:38
  • 1
    @Kalrish Yes, but so what? You cannot do proper encapsulation if you insist on being able to allocate memory from the stack (for example). Objects should **always** be implemented as references and every sane OO language implements them in this way. C++ is not a sane OO language and fortunately the question is not a C++ question, so we can ignore it. – JeremyP Oct 30 '14 at 10:19
  • @JeremyP After some time, I agree with your conclusion: the object's properties must be hidden. I still have some doubts concerning the performance implications: references/pointers have to be dereferenced, dynamic allocation is slow and can fail, etc. There are intermediate solutions, like private and static memory pools, but they're difficult to maintain. I guess this is why such abstracted OO programming is rejected in performance-sensitive areas. If you have any more thoughts on this or know any further reading, please share it. – Kalrish Dec 02 '14 at 22:43
  • @Kalrish: Many implementations allow you to pass an allocator callback to the library so you don't have to use malloc, although modern mallocs tend to be pretty fast and I have never had an issue with speed. YMMV of course, especially in embedded programming. – JeremyP Dec 05 '14 at 15:49
  • A question related to the third comment, @JeremyP after six years later. Supposing that library does plain array allocation, it must return or provide the array size to the user right (for loops etc)? – Erdem Tuna Dec 18 '20 at 17:43
  • @ErdemTuna The API should only ever expose an incomplete type to callers. i.e. the header file will just have a declaration like so `struct myStruct;` The caller only ever has a pointer and should make API calls to do everything to do with the type. The caller can have arrays of pointers, but if the library wants to give back an array of objects, it must be done though an opaque pointer and indexing and iteration must be done through the library API. – JeremyP Dec 20 '20 at 19:19
  • @JeremyP, I haven't known opaque pointers and iterator functions over them . I made a quick search, I learnt a lot from a single comment. Thanks for getting back to my question. – Erdem Tuna Dec 20 '20 at 22:13
12

Why not provide both, to get the best of both worlds?

Use _init and _terminate functions to use method #1 (or whatever naming you see fit).

Use additional _create and _destroy functions for the dynamic allocation. Since _init and _terminate already exist, it effectively boils down to:

myStruct *myStruct_create ()
{
    myStruct *s = malloc(sizeof(*s));
    if (s) 
    {
        myStruct_init(s);
    }
    return (s);
}

void myStruct_destroy (myStruct *s)
{
    myStruct_terminate(s);
    free(s);
}

If you want it to be opaque, then make _init and _terminate static and do not expose them in the API, only provide _create and _destroy. If you need other allocations, e.g. with a given callback, provide another set of functions for this, e.g. _createcalled, _destroycalled.

The important thing is to keep track of the allocations, but you have to do this anyway. You must always use the counterpart of the used allocator for deallocation.

Secure
  • 4,030
  • 1
  • 14
  • 15
10

My favourite example of a well-design C API is GTK+ which uses method #2 that you describe.

Although another advantage of your method #1 is not just that you could allocate the object on the stack, but also that you could reuse the same instance multiple times. If that's not going to be a common use case, then the simplicity of #2 is probably an advantage.

Of course, that's just my opinion :)

Dean Harding
  • 67,567
  • 11
  • 132
  • 174
  • Now, this is a interesting comment. I've heard many people say exactly the opposite, that GTK+ is a terrible API. I've unfortunately only used it a little, I'm usually up in the clouds of C++, and using Gtkmm. My experience remembers ref-counted pointers, and _new and _free functions, however, which seems to match the 3rd option more. I'd be curious as to your reasons to your opinion. – Thanatos Jul 21 '10 at 04:52
  • 2
    The general design philosophy of GLib/Gtk seems to be "we won't use C++ on principle, so we'll hand-code all the same stuff". This approach has some advantages in a sense that it's still a pure C API, which makes it easier to use with various C-only FFIs... but from a pure C/C++ perspective, it seems to be rather impractical. – Pavel Minaev Jul 21 '10 at 06:24
4

Both are functionally equivalent. But, in my opinion, method #2 is easier to use. A few reasons for prefering 2 over 1 are:

  1. It is more intuitive. Why should I have to call free on the object after I have (apparently) destroyed it using myStruct_Destroy.

  2. Hides details of myStruct from user. He does not have to worry about it's size, etc.

  3. In method #2, myStruct_init does not have to worry about the initial state of the object.

  4. You don't have to worry about memory leaks from user forgetting to call free.

If your API implementation is being shipped as a separate shared library however, method #2 is a must. To isolate your module from any mismatch in implementations of malloc/new and free/delete across compiler versions you should keep memory allocation and de-allocation to yourself. Note, this is more true of C++ than of C.

341008
  • 9,030
  • 10
  • 45
  • 84
4

The problem I have with the first method is not so much that it is longer for the caller, it's that the api now is handcuffed on being able to expand the amount of memory it is using precisely because it doesn't know how the memory it received was alloced. The caller doesn't always know ahead of time how much memory it will need (imagine if you were trying to implement a vector).

Another option you didn't mention, which is going to be overkill most of the time, is to pass in a function pointer that the api uses as an allocator. This doesn't allow you to use the stack, but does allow you to do something like replace the use of malloc with a memory pool, which still keeping the api in control of when it wants to allocate.

As for which method is proper api design, it's done both ways in the C standard library. strdup() and stdio uses the second method while sprintf and strcat use the first method. Personally I prefer the second method (or third) unless 1) I know I will never need to realloc and 2) I expect the lifetime of my objects to be short and thus using the stack is very convienent

edit: There is actually 1 other option, and it is a bad one with a prominent precedent. You could do it the way strtok() does it with statics. Not good, just mentioned for completeness sake.

frankc
  • 10,603
  • 2
  • 30
  • 45
2

Both ways are ok, I tend to do the first way as a lot of the C I do is for embedded systems and all the memory is either tiny variables on the stack or statically allocated. This way there can be no running out of memory, either you have enough at the beginning or you're screwed from the start. Good to know when you have 2K of Ram :-) So all my libraries are like #1 where the memory is assumed to be allocated.

But this is an edge case of C development.

Having said that, I'd probablly go with #1 still. Perhaps using init and finalize/dispose (rather than destroy) for names.

Keith Nicholas
  • 41,161
  • 15
  • 82
  • 145
2

That could give some element of reflexion:

case #1 mimick the memory allocation scheme of C++, with more or less the same benefits :

  • easy allocation of temporaries on stack (or in static arrays or such to write you own struct allocator replacing malloc).
  • easy free of memory if anything goes wrong in init

case #2 hides more informations on used structure and can also be used for opaque structures, typically when structure as seen by user is not exactly the same as internally used by the lib (say there could be some more fields hidden at the end of structure).

Mixed API between case#1 and case #2 is also common : there is a field used to pass in a pointer to some already initialized structure, if it is null it is allocated (and pointer is always returned). With such API the free is usually responsibility of caller even if init performed allocation.

In most cases I would probably go for case #1.

kriss
  • 21,366
  • 15
  • 89
  • 109
1

Both are acceptable - there's tradeoffs between them, as you've noted.

There's large real world examples of both - as Dean Harding says, GTK+ uses the second method; OpenSSL is an example that uses the first.

Community
  • 1
  • 1
caf
  • 216,678
  • 34
  • 284
  • 434
1

I would go for (1) with one simple extension, that is to have your _init function always return the pointer to the object. Your pointer initialization then may just read:

myStruct *s = myStruct_init(malloc(sizeof(myStruct)));

As you can see the right hand side then only has a reference to the type and not to the variable anymore. A simple macro then gives you (2) at least partially

#define NEW(T) (T ## _init(malloc(sizeof(T))))

and your pointer initialization reads

myStruct *s = NEW(myStruct);
Jens Gustedt
  • 72,200
  • 3
  • 92
  • 164
  • How do you handle a malloc failure? – Secure Jul 21 '10 at 06:37
  • @Secure: Good point. I think `_init` functions should be made robust to passing in a `NULL` pointer and just pass this through on return. The check for that is than left to the user of the pointer, as usual. – Jens Gustedt Jul 21 '10 at 06:43
  • The other design philosophy in this regard is that most functions should expect valid pointers (with the obvious exception of deallocators) and assert() them to not being NULL. Which would make your approach to effectively use assert for the program logic, which is a big no-go. It depends on the overall design of your program, for sure, but personally I prefer to be explicit with error handling. I.e. malloc is used separately and tested for validity before anything else is done with the pointer. – Secure Jul 21 '10 at 07:00
  • @Secure: I would tend to just extend the convention to check pointers returned by the macro `NEW`. This is only a slight extension of such a convention since you'd have to check several functions for that already, not only `malloc` but also `realloc` and `calloc` (and maybe others that I forget). – Jens Gustedt Jul 21 '10 at 07:23
0

See your method #2 says

myStruct *s = myStruct_init();

myStruct_foo(s);

myStruct_destroy(s);

Now see if myStruct_init() needs return some error code for various reason then lets go this way.

myStruct *s;
int ret = myStruct_init(&s);  // int myStruct_init(myStruct **s);

myStruct_foo(s);

myStruct_destroy(s);
Jeegar Patel
  • 23,639
  • 42
  • 138
  • 202