1

I am trying to find out if I can allocate additional memory for additional members for a dynamic array in C++. The code below is stripped down to just essential stuff for simplicity's sake.

What I am basically trying to do is read elements into arr in someClass from a file, but I want to make it so the file does not have to specify how many elements it contains (this is essential for the full project I am working on). I obviously thought of the solution of allocating a new array with a size+1, and then copying all the current elements into it and reading a new element, but I find that this would be a brutal waste of processing time and memory if it is possible to simply allocate the next memory address and use it for my array. Is there a way to do that in C++?

To summarize, I want to read an element, allocate a new memory address for my array, and then read another element into that newly allocated memory address. Repeat until the file ends.

I won't bore you with the reasons, but using a simple std::vector is not an option.

Here is the code:

class someClass {
public:
    int *arr;
    void Read(ifstream&,int&);
};

void someClass::Read(ifstream &inFile, int &size) {
    arr = new int[0];
    inFile.open("input.txt");

    int index = 0;
    int element;
    while (!inFile.eof()) {
        inFile >> element;
        *(arr + index) = element;
        index ++;
    }

    size = index;
    inFile.close();
}

int main() {

    int size;
    someClass a;
    ifstream inFile;

    a.Read(inFile,size);
    //obviously unnecessary, just for testing
    for(int i = 0; i < size; i ++) {
        cout << a.arr[i] << " ";
    }
    cout << endl;
}
Ahmad Khan
  • 2,385
  • 17
  • 25
  • 6
    Use a `std::vector`. – tkausl Oct 04 '16 at 08:19
  • If you want to use `new` then you have to do this before reading into the space... – M.M Oct 04 '16 at 08:20
  • Thanks but I thought of that, of course. For reasons that are not important for this discussion, I can't use that. – Nikolay Stoynov Oct 04 '16 at 08:21
  • 2
    You will probably need to resort to C-style memory management with [`realloc` *et al*](https://linux.die.net/man/3/realloc). – Paul R Oct 04 '16 at 08:25
  • 1
    You could implement a linked list. With linked list, you'll have to allocate just the next element, but you'll have to sacrifice with the random access. – Ahmad Khan Oct 04 '16 at 08:32
  • 3
    "I won't bore you with the reasons, but using a simple std::vector is not an option." - I don't think you have *valid* reasons not to use std::vector. The suggestions you are getting boil down to *re-implement a subset of std::vector* – Caleth Oct 04 '16 at 09:06
  • Well, I do, my friend. I strive to obtain a very good understanding of dynamic memory allocation. I do not think using a predefined method is the best way of going about doing that. – Nikolay Stoynov Oct 04 '16 at 09:24
  • 1
    So look up malloc() and realloc() and learn how to do it in C with no complexity of automatic construction, destruction or templates. Then when you understand that, you'll be able to see how C++ extends C functionality to create a std::vector. Writing your own std::vector lookalike is a good exercise. – Malcolm McLean Oct 04 '16 at 09:53
  • 1
    "I won't bore you with the reasons"... to reject valid suggestions. Then I won't bore you with a valid solution. Your "reasons" may have much easier work-arounds than reinventing the `std::vector` wheel. – Matt Oct 04 '16 at 13:55

5 Answers5

1

I just liked the question and did some experiments myself, using MSVC14 compiler (optimizations disabled).
C++11/14 has the following sequence containers (intentionally excluded dynarry introduced in C++14):

  1. No dynamic resizing (up to the programmer to allocate and deallocate)
    • Raw array (e.g. int char[])
    • Array (e.g. new array<int, size>(){...})
  2. With Dynamic resizing
    • Vector (consecutive memory allocation)
    • list (linked-list like array)
    • forward_list (similar to list)
    • deque (double ended queue)

Let me start with your questions,

the solution of allocating a new array with a size+1, and then copying all the current elements into it and reading a new element, but I find that this would be a brutal waste of processing time and memory

You are right, but to mitigate the overhead, when you allocate memory to use and then you figure out you need more memory than allocated, you need to allocate new memory and copy the previous data, then free the previous allocated memory.
But wait! How much to allocated (size+1 is bad)? Each time you are forced to allocate bigger chunk of memory, you better allocate twice the size you had already in hand so that you reduce the probability of another memory reallocation; because it is considered an extremely expensive operation.

if it is possible to simply allocate the next memory address and use it for my array. Is there a way to do that in C++?

It's not totally in your control as C++ runtime has implemented memory management functions. Where your newly allocated memory will be, is not in your control, however sometimes it happens that the newly allocated space will have the same base address as the previous one; it depends on the runtime and the memory fragmentation it faces.

I got some benchmarks using malloc and realloc functions borrowed from C. Here is the code:

    auto start = chrono::steady_clock::now();

    auto partialSize = 100;
    auto arr = (int *) malloc(sizeof(int) * partialSize);


    for (auto i = 0; i < SIZE; i++) {
        arr[i] = i;
        if (i == partialSize - 1) {
            partialSize = partialSize << 1; // for 2X
            arr = (int *) realloc(arr, sizeof(int) * partialSize);
        }
    }

    auto duration = chrono::steady_clock::now() - start;

    free(arr);

    cout << "Duration: " << chrono::duration_cast<chrono::milliseconds>(duration).count() << "ms" << endl;

Results (for insertion of 100,000,000 integers; time is avg. of 3 runs):

  • Start Size = 100, Increment Steps = 1.5X, Time(s) = 1.35s
  • Start Size = 100, Increment Steps = 2X, Time(s) = 0.65s
  • Start Size = 100, Increment Steps = 4X, Time(s) = 0.42s

  • Start Size = 10,000, Increment Steps = 1.5X, Time(s) = 0.96s

  • Start Size = 10,000, Increment Steps = 2X, Time(s) = 0.79s
  • Start Size = 10,000, Increment Steps = 4X, Time(s) = 0.51s

    Another case is using C++'s new keyword and checking for relocation:

    auto start = chrono::steady_clock::now();
    auto partialSize = 100;
    auto arr = new int[partialSize];
    
    
    for (auto i = 0; i < SIZE; i++) {
        arr[i] = i;
        if (i == partialSize - 1) {
            auto newArr = new int[partialSize << 2]; // for 4X
            partialSize = partialSize << 2;
            arr = newArr;
        }
    }
    
    auto duration = chrono::steady_clock::now() - start;
    
    delete[] arr;
    
    cout << "Duration: " << chrono::duration_cast<chrono::milliseconds>(duration).count() << "ms" << endl;  
    

Results (for insertion of 100,000,000 integers; time is avg. of 3 runs):

  • Start Size = 100, Increment Steps = 1.5X, Time(s) = 0.63s
  • Start Size = 100, Increment Steps = 2X, Time(s) = 0.44s
  • Start Size = 100, Increment Steps = 4X, Time(s) = 0.36s

  • Start Size = 10,000, Increment Steps = 1.5X, Time(s) = 0.65s

  • Start Size = 10,000, Increment Steps = 2X, Time(s) = 0.52s
  • Start Size = 10,000, Increment Steps = 4X, Time(s) = 0.42s

For the rest (dynamic resizable containers):

auto start = chrono::steady_clock::now();

//auto arr = vector<int>{};
//auto arr = list<int>{};
//auto arr = new std::array<int, SIZE>{};
//auto arr = new int[SIZE];
//auto arr = deque<int>{};
auto arr = forward_list<int>{};

for (auto i = 0; i < SIZE; i++) {
    arr.push_front(i);
    // arr.push_back(i)
}

auto duration = chrono::steady_clock::now() - start;

cout << "Duration: " << chrono::duration_cast<chrono::milliseconds>(duration).count() << "ms" << endl;

Results (for insertion of 100,000,000 integers; time is avg. of 3 runs):

  1. vector

    • Time(s) = 2.17s
  2. list

    • Time(s) = 10.31s
  3. array (no reallocation)

    • Time(s) = N/A; Error: Compiler is out of heap.
  4. raw int array (no reallocation)

    • Time(s) = 0.22s
  5. deque

    • Time(s) = 3.47s
  6. forward_list

    • Time(s) = 8.78s

Hope it helps.

Community
  • 1
  • 1
Ahmad Siavashi
  • 883
  • 8
  • 23
  • 1
    And what's the time for `vector` using `push_back` instead of `push_front`? Have you tried _enabling_ optimizations? – Bob__ Oct 04 '16 at 12:56
  • @Bob__ The reported time belongs to `push_back` for `vector`. `vecotr` does not have `push_front` method. – Ahmad Siavashi Oct 04 '16 at 13:16
  • Sorry to bother you again, it will probably never happen, but if `newArr == arr` in the second snippet you still `delete[] arr;`. – Bob__ Oct 04 '16 at 13:30
  • @Bob__ oh, you are right. Let me see what has happened! – Ahmad Siavashi Oct 04 '16 at 13:47
  • @Bob__ As you said it would never happen. I probably have thought it is the `realloc` function. Thank you for the comment. The results would be roughly accurate yet as it's probably just 2 or 3 assembly instructions and a jump that has not happened. – Ahmad Siavashi Oct 04 '16 at 13:54
0
arr = new int[1];
int capacity = 1, size = 0;
inFile.open("input.txt");

int element;
while (!inFile.eof()) {
    inFile >> element;
    if (size == capacity){
        capacity *= 2;
        int * newbuf = new int[capacity];
        std::copy_n(arr, size, newbuf);
        delete[] arr;
        arr = newbuf;
    }
    arr[size] = element;
    size++;
}

size = index;

inFile.close();

You can simulate what a std::vector does. Double the capacity every time it gets full.

nishantsingh
  • 4,017
  • 4
  • 21
  • 44
0

You will need to make a new space for the bigger Array and move the old values:

void resize() {
    size_t newSize = size * 2;
    int* newArr = new int[newSize];
    memcpy( newArr, arr, size * sizeof(int) );
    size = newSize;
    delete [] arr;
    arr = newArr;
}
Ahmad Khan
  • 2,385
  • 17
  • 25
  • Thank you. Is the complexity of this method closer to that of just allocating the next memory address (which I am not sure is possible), or copying the elements from one array to another with a loop? In other words, how fast is memcpy? – Nikolay Stoynov Oct 04 '16 at 08:29
  • It's faster, but more prone to code mistakes, just be carefull. is not allocating the next locations, you can verify the new locations printing the "&arr" int value, before and after the resize. – Ricardo Ortega Magaña Oct 04 '16 at 08:33
  • @NikolayStoynov, memcpy is O(N). In practice, it's one of the most heavily optimized functions in the standard library so you shouldn't worry about its performance. –  Oct 04 '16 at 08:38
  • 1
    `realloc` may have some optimizations behind the scenes that mitgate the cost of copying. For example if the memory after the allocated block happens to be free. Or the OS makes it part of a paging operation. – Spencer Oct 04 '16 at 12:55
0

I will propose you a different solution, in which you would not need to copy every time the previously readed content of the file. The idea is to use a linked list, in which every element has a size equal to the double of its predecessor (another possibility is to make it grow like a Fibonacci series). In this way, allocations become rarer and rarer as the size of the file grows and, in case you need it, you can free memory from the beginning of the file by freeing the first elements in the list. Of course, you pay more while reading since access is not sequential. Here is an example code illustrating the idea:

struct buffer_list
{
    void append_next_chunk(size_t size, char * buff)
    {
        if(buffer == nullptr) {
            buffer = buff;
            local_size = size;
            return;
        }
        if(next == nullptr) next = new buffer_list();
        next->append_next_chunk(size, buff);
    }


    char read(int offset)
    {
        if(offset >= local_size) return next->read(offset-local_size);
        return buffer[offset];
    }
    buffer_list * next = nullptr;
    char *buffer = nullptr;
    size_t local_size = 0;
    ~buffer_list()
    {
        delete[] buffer;
        delete next;
    }
};


struct custom_vector
{
    custom_vector(const size_t size) {
        write_ptr = new char[size];
        inner_list.append_next_chunk(size, write_ptr);
        total_size = size;
        last_created_size = size;
    }


    void push_back(char c){
        if(written_size == total_size)
        {
            last_created_size *= 2;
            write_ptr = new char[last_created_size];
            write_offset = total_size;
            inner_list.append_next_chunk(last_created_size, write_ptr);
            total_size += last_created_size;
        }
        write_ptr[written_size - write_offset] = c;
        written_size++;
    }

    char read(int offset)
    {
        return inner_list.read(offset);
    }

    size_t size() { return written_size; }

    char * write_ptr = nullptr;
    buffer_list inner_list;
    size_t written_size = 0;
    size_t total_size = 0;
    size_t write_offset = 0;
    size_t last_created_size = 0;
}; 

On my machine the custom_vector performes way better than std::vector on write operations, while a big penalty is paid while reading. However i think that some optimizations for sequential read can be easily implemented solving the issue.

  • You can pack the linked list into one big vector (or your own analogue)when you're done loading. Whether that saves time or not depends on how much random access reading you're doing. – Spencer Oct 04 '16 at 12:49
0

You can read the number of elements in the file by counting the delimiters and then size the array to that. For example let's assume that your file is delimited by lines as:

1
2
3
4
5

You can count the number of lines in the file with the appropriate line separator. On linux it can be done by:

int elemCount = std::count(std::istreambuf_iterator<char>(inFile),
    std::istreambuf_iterator<char>(), '\n');
inFile.clear();
inFile.seekg(0, std::ios::beg);

Then you can allocate the array as:

arr = new int[elemCount];

If you are using space or tab delimited than change from '\n' to ' ' or '\t' or whatever. You can then read in your information as before. You may need to add or subtract 1 depending on your delimiter and how the file is built. This is also a little dangerous as empty rows, double delimiters, etc could mess up the count. If I did this I would fill the array with some default value and then remove them after reading to be sure all my values were good. This would require one resize after all the reading is done.

Matt
  • 2,179
  • 2
  • 18
  • 36