5

A couple of days ago I asked myself which data-structure I should in a function in C. I usually write in C++ and the choice would have fallen to std::vector.

There a some possible choices:

  • a static (big enough) array
  • a dynamic array which grows when needed(e.g. doubling its size)
  • an own list implementation as struct with a pointer next

The last option seems to be unusual. Are there any bigger project where someone uses own structures like lists? Is there a general rule for the decision between array or own data-structures?

When I would need a tree structure I wouldn't think twice an just write a tree. Are there any widely used libs with such structures(like boost for C++)? Or is this considered as bad style because you would have to store a void* instead of the actual type?

Thank a lot for your experience!

tgmath
  • 10,533
  • 2
  • 14
  • 23

7 Answers7

6

Different data structures offer different computational complexity for insertions, lookups and other operations.

To take your specific example, there is a number of differences between an array and a linked list:

  • lookup by index is O(1) in an array and O(n) in a list;
  • insertions into a list are O(1) or O(n) (depending on whether they require traversal), and into an array are amortized O(1) if done well;
  • deletions from an array are O(n) whereas certain types of list deletions can be done in O(1) time;
  • arrays offer better locality of reference and consequently better cache performance.

You might find the following page useful: http://essays.hexapodia.net/datastructures/

As a general rule, when choosing a data structure I first consider whether I have strong reasons to believe that performance of the code in question is going to be important:

  • if I don't, I choose the simplest data structure that would do the job, or the one that lends itself to the clearest code;
  • if I do, I think carefully about what operations I am going to perform on the structure and choose accordingly, possibly followed by profiling.

As for recommendations for good C data structure libraries, take a look at Are there any open source C libraries with common data structures?

Community
  • 1
  • 1
NPE
  • 438,426
  • 93
  • 887
  • 970
  • Thank a lot. The answer I found the best for me: GLib. Normally its easy for me to pick the right structure(. But I wasn't aware of the GLib. And in pure C without any lib I would often choose an array because one can get it working correctly and the current place isn't a bottlenck. – tgmath Apr 08 '11 at 15:09
  • can you explain, how insertion to array can be O(1)? I can imagine only unrolled linked list or something like this, but not "pure" array. – S.J. Apr 08 '11 at 15:34
  • @S.J.: The key word there is *amortized*. What it means is that `n` insertions into an array can be done in `O(n)` total time. This doesn't imply that each insertion is `O(1)`. See, for example, http://stackoverflow.com/questions/200384/constant-amortized-time – NPE Apr 08 '11 at 15:45
1

It completely depends on which operations you are going to perform on the data structure. If you will be retrieving data by index (eg, data[ 3 ]), then a list is a horrible idea since each read will require you to walk the list. If you will be inserting into the first position a lot (eg, data[ 0 ] = x), then an array will be terrible because you will be moving all the data for each insertion.

If you were going to use std::vector, then a dynamic array is probably the best replacement. But perhaps std::vector would not have been the correct choice.

William Pursell
  • 174,418
  • 44
  • 247
  • 279
0

Dynamic Link lists are widely used in C where number of items to be stored is not known.

Rumple Stiltskin
  • 7,691
  • 1
  • 18
  • 24
0

Vector is a dynamic array and is implemented internally as a dynamic array. I would suggest you to use vector as the machines are optimized to retrieve the continuous memory locations,while the link list will not be stored in continuous location.

Having said that,if you don't require the fast retrieval of element by index in your use case then you can go for link list also.The link list also has a benefit of not wasting the space unlike dynamic array(by doubling when it is about to get full) and also insertion in the start or between the elements is cheaper as compared to array.

deovrat singh
  • 1,192
  • 2
  • 15
  • 32
0

Each has its own advantages and disadvantages.

  • Static arrays: perfect for lookup tables and data that doesn't change its size throughout the program execution. They get allocated on program startup, so you don't have to manage this memory in any way. A disadvantage is that you cannot resize or free a static array - it stays there until the program terminates.

  • Dynamically growing arrays: an easy to manage data structure that can almost be a substitute for vector arrays in C++. A disadvantage is the overhead of allocating memory at runtime, but that can be alleviated if you allocate bigger chunks at once.

  • Linked lists: take care if you are going to have a lot of elements because allocating each one separately without using a memory pool can lead to memory waste and fragmentation.

Blagovest Buyukliev
  • 39,704
  • 12
  • 88
  • 124
0

In pure C I mostly use static arrays. It is connected with programming of embedded devices that have problems with allocating and freeing memory - heap fragmentation.

But if there is a need of using list I implement one myself - sometimes its implementation is based on static arrays (again).

I think there are libraries for C that offer decent implementation of more complex data structures. AFAIK glib is widely used and offers at least linked lists.

smbear
  • 948
  • 9
  • 15
0

I remember I read a document from Apple some time ago saying that such a thing could be naively implemented with a structure similar to:

struct {
    void *data; //other type rather than void is easier
    int length;
} MyArray;

MyArray *MyArrayCreate(){
    //mallocate memory
    return ...
}
void MyArrayRelease(){
    free(...);
}

And implement a function that checks the length of the array and its not long enough then another big enough array will be allocated, former data will be copied to it and new data added to it.

MyArrayInsertAt(MyArray *array, index, void *object){
    if (length < index){
        //mallocate again, copy data
        //update length
    }
    data[index] = object;
}

So it can used like so:

MyArray *array = MyArrayCreate();
MyArrayInsertAt(array, 5, something);
MyArrayRelease(array); 

MyArrayInsertAt() function won't win a price for its performance, but it could be a solution for no high-demanding programs/applications

I just can't find the link... maybe someone read it too?

ADDED: I have found that GNU NSMutableArray (Objective-C) methods implementation are done in C and they do the same as above. They double the size of the array each time an object has to be added and won't fit in the array. See line 131 through 145

nacho4d
  • 39,335
  • 42
  • 151
  • 231