
I'm not sure if this have been asked before, so I'll give it a try.

I have code for loading of large clients list (200k clients). Every client is stored in a (currently) fixed-size struct that contains his name, address and phone number as follow:

struct client {
    char name[80];
    char address[80];
    char phonenumber[80];

As you can see, the size of this struct is 240 bytes. So 200k clients would take 48MB of memory. Obviously advantages of such a structure is the ease of management and creating a "free-list" for recycling clients. However, if tommorow I needed to load 5M clients, then this would grow to 1.2Gb of RAM.

Now, obviously in most cases, the client's name, address and phone number take much less than 80 bytes, so instead of the above structure I thought of using a structure as the following:

struct client {
    char *name;
    char *address;
    char *phonenumber;

And then have *name, *address and *phonenumber point to dynamically allocated structures at the exact needed size for storing each information.

I do suspect however, that as more clients are loaded this way, it would greatly increase the number of new[] and delete[] allocations needed, and my question is if this can hurt performance at some point, for example if I want to suddenly delete 500k of the 1M clients and replace them with 350k different clients?

I am suspecting whether after I allocated 1M "variable length" small buffers, if I "delete" many of them and then want to create new allocations that would recycle the ones that were deleted, won't it cause some overhead for the allocator to find them?

The answer is that there is some overhead (both in terms of per-allocation CPU cycles and in per-allocation book-keeping memory) to making many small dynamic allocations and deallocations. How much overhead will depend a lot on how your runtime's memory heap was implemented; however, most modern/popular runtimes have heap implementations that have been optimized to be quite efficient. There are some articles about how various OS's heaps are implemented that you can read about to get an idea about how they work.

In a modern heap implementation, your program probably won't "hit the wall" and grind to a halt when there are "too many" heap allocations (unless your computer actually runs out of physical RAM, of course), but it will use up proportionally more RAM and CPU cycles than a comparable program that doesn't require so many.

Given that, using a zillion tiny memory allocations is probably not the best way to go. In addition to being less than optimally efficient (since every one of those tiny allocations will require a separate block of book-keeping bytes to keep track of), lots of tiny allocations can lead to memory fragmentation problems (which are less of an issue on modern 64-bit systems with virtual memory, but still something to consider), as well as being difficult to manage correctly (it's easy to end up with memory leaks or double-frees if you are doing your allocations manually).

As others have suggested in the comments, calling new and delete explicitly is discouraged in C++; it's almost always better to use higher-level data structures (e.g. std::string, std::map, std::vector, etc, or even a proper database layer instead), since by doing it that way a lot of the difficult design work will have been done for you, saving you the pain of having to re-discover and re-solve all of the problems that others have already dealt with in the past. For example, std::string already implements the short-string-optimization that allows strings shorter than a certain number of bytes to be stored without requiring a separate heap allocation; similar to the tradeoff you are trying to make in your own designs, except you get that optimization "for free", when appropriate, simply by using std::string to store your string-data.

is there a limit to how many new[] & delete[] allocations are allowed before program becomes inefficient?

Even a single allocation will make the program less time efficient compared to a program that doesn't do that allocation, assuming that allocation isn't needed. The inefficiency scales (at least) linearly with the number of allocations (depending on the implementation of the allocation function).

There is no objective limit for when a program is efficient and when it is inefficient. If you're writing a program with a hard real time requirement, then you have a limit for when your program is too inefficient, but for other programs - which is most programs, there is no objective limit for when a program is too inefficient either. Generally, if your program takes too long to execute, then it can be perceived as inefficient by the user. "Too long" is subjective to whoever is using the program.

A better solution than what you suggest is to use std::string members. Now, its size may be some multiple of a pointer size (~4 depending on implementation), but (assuming decent implementation) it does magic and avoids dynamic allocation when the string fits within that space. This saves a ton of time compared to separate allocation for each, and ton of space compared to the in-place array. Even more importantly, it doesn't require error prone manual memory management.

The optimally memory efficient way to store your list of clients is a single massive array of char where each string is stored consecutively. You can use a pointer to a string to signify beginning of a client. If you don't want to do linear search for specific member, then you can use a pointer class like in your question, but point into this single array instead of separate allocations.

