43

There are a lot of questions which suggest that one should always use a vector, but it seems to me that a list would be better for the scenario, where we need to store "the last n items"

For example, say we need to store the last 5 items seen: Iteration 0:

3,24,51,62,37,

Then at each iteration, the item at index 0 is removed, and the new item is added at the end:

Iteration 1:

24,51,62,37,8

Iteration 2:

51,62,37,8,12

It seems that for this use case, for a vector the complexity will be O(n), since we would have to copy n items, but in a list, it should be O(1), since we are always just chopping off the head, and adding to the tail each iteration.

Is my understanding correct? Is this the actual behaviour of an std::list ?

manlio
  • 16,658
  • 13
  • 67
  • 107
Rahul Iyer
  • 17,034
  • 15
  • 76
  • 159
  • 4
    Check this out... it explains when to choose which one: http://stackoverflow.com/questions/2209224/vector-vs-list-in-stl – Pushpendra May 18 '17 at 05:18
  • 2
    This talk by Bjarne Stroustrup is useful: https://www.youtube.com/watch?v=YQs6IC-vgmo – Galik May 18 '17 at 05:29

10 Answers10

95

Neither. Your collection has a fixed size and std::array is sufficient.

The data structure you implement is called a ring buffer. To implement it you create an array and keep track of the offset of the current first element.

When you add an element that would push an item out of the buffer - i.e. when you remove the first element - you increment the offset.

To fetch elements in the buffer you add the index and the offset and take the modulo of this and the length of the buffer.

chwarr
  • 5,726
  • 1
  • 24
  • 52
Taemyr
  • 3,201
  • 13
  • 25
  • 20
    +1 A circular buffer is a completely ideal implementation for a queue that has fixed maximum size (unfortunately there is no standard library container for this). – manlio May 18 '17 at 08:52
  • 3
    Although I can implement one myself, I think this answer would be more useful to others if there was a simple implementation included. – Rahul Iyer May 18 '17 at 09:13
  • 1
    @KaizerSozay you can always suggest an edit with your implementation if you want – NieDzejkob May 18 '17 at 15:23
  • @NieDzejkob yeah I haven't done it yet, as I am testing using a deque, so it will be a while before I get back to this, but I Imagine this is the correct answer. Although you have to implement it yourself, I imagine performance wise you can't go wrong here. – Rahul Iyer May 18 '17 at 15:31
  • 8
    @KaizerSozay: Use `boost::circular_buffer`. – GManNickG May 18 '17 at 17:00
  • Btw, in typical implementations, does the index ever get reset to 0? And if not, doesn't that risk overflow in long running programs? – Dmiters May 18 '17 at 18:29
  • 1
    The index is reset in the last line, but it could maybe use more emphasis as it's important: "add the index and the offset and take the *modulo* of this and the length of the buffer". – user3067860 May 18 '17 at 18:51
  • 1
    @DmitryNarkevich As user3067860 already mentioned, the index will be reset. That is in fact the whole point of a circular buffer. The memory is static, so you move your view of the memory to make it look like it was an actual circle. – sbecker May 19 '17 at 06:45
  • @DmitryNarkevich The index is passed into the class when you do a lookup - so there is no context in which it makes sense to reset it. The offset should be reset - but because you do a modulo whenever you make a lookup failure to reset shoul not cause a buffer overflow - if you don't reset you could hit undefined behavior when the datatype overflows. – Taemyr May 19 '17 at 15:56
  • @user3067860 Oh I see, I understood that part as the modulo only being taken while doing lookups and not being stored in the index again. Thanks – Dmiters May 19 '17 at 16:56
33

std::deque is a far better option. Or if you had benchmarked std::deque and found its performance to be inadequate for your specific use, you could implement a circular buffer in a fixed size array, storing the index of the start of the buffer. When replacing an element in the buffer, you would overwrite the element at the start index, and then set the start index to its previous value plus one modulo the size of the buffer.

List traversal is very slow, as list elements can be scattered throughout memory, and vector shifting is actually surprisingly fast, as memory moves on a single block of memory are quite fast even if it is a large block.

The talk Taming The Performance Beast from the Meeting C++ 2015 conference might be of interest to you.

David Scarlett
  • 2,831
  • 2
  • 8
  • 26
  • Why is list traversal slow ? If each node has a pointer to the next node, shouldn't it be fast regardless of where in memory it is ? – Rahul Iyer May 18 '17 at 05:17
  • 3
    [Locality of reference](https://en.wikipedia.org/wiki/Locality_of_reference). Arrays (and therefore vectors) have it; linked lists don't. When you access an element in an array, adjacent elements get pulled into the CPU cache along with it, so they can be accessed more quickly. Linked lists are more likely to cause cache misses. – Wyzard May 18 '17 at 05:19
  • @Wyzard then do deques have this ? – Rahul Iyer May 18 '17 at 05:27
  • @KaizerSozay : No, `std::deque` does not store it's contents in a contiguous block like `std::vector`. Frankly, I can't remember the last time I used anything other than `std::vector` for simple collections. – nakiya May 18 '17 at 05:30
  • 3
    Yes, `std::deque` uses arrays — though more than one. Quoting [cppreference](http://en.cppreference.com/w/cpp/container/deque): "As opposed to `std::vector`, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays". That's still a lot better than having every individual item allocated separately in a different place. – Wyzard May 18 '17 at 05:30
  • I am confused then. For keeping track of the last n items (and iterating through them), would I be better off with a vector, list or deque. It seems one has to weigh the costs of vector shifting, vs list traversal vs deque traversal which seems to be implementation specific, which further complicates this because I am developing cross platform... – Rahul Iyer May 18 '17 at 05:33
  • 2
    @KaizerSozay You would do well to make some performance tests with the different structures. Personally I think the *vector* would win for something like this. – Galik May 18 '17 at 05:40
  • @Galik Yes, I should performance test with different structures, but it seems like a pain, because I would have to run each of the tests on different devices too (cross platform). I would have thought that having a standard would help avoid this problem.. – Rahul Iyer May 18 '17 at 05:43
  • `vector` spends time shifting elements up as you remove them. `list` is cache unfriendly, resulting in more accesses form slower memory. `deque` uses multiple non-contiguous blocks of contiguous memory, so most of the time it has good cache usage and does no shifting, but it will miss more often and it can really hurt when it allocates a new block. Over all, `vector` should win on a modern processor unless N is large resulting in a grievous amount of shifting. – user4581301 May 18 '17 at 05:44
  • 1
    @KaizerSozay Inmy experience its hard to beat a *vector*. A `deque` can give the best of both worlds between a *vector* and a *list* but they cal also give the worst of both worlds. So it's highly dependent on your use pattern. But like I said, my money in on the pure vector. – Galik May 18 '17 at 05:45
  • @user4581301 In this case we are inserting elements into the middle of the *deque* so not only is it going to be shifting but it is going to be shifting over block boundaries. So I suspect, for this use, the *deque* may prove to be less efficient than a *vector*. – Galik May 18 '17 at 05:47
  • Holy carp. I missed the middle insertion. Yeah. `deque` fails and so does circular buffer. Vector it is. – user4581301 May 18 '17 at 05:49
  • 1
    Wait, no. @Galik There's no middle insert in OP's question. The way I'm reading it it's always adding to the end. – user4581301 May 18 '17 at 05:51
  • @user4581301 well there is an insertion at position `4`. I assume that's non literal and the actual position could be much larger. But if it is literal then a vector is still better because the number of data is tiny – Galik May 18 '17 at 05:52
  • 1
    @user4581301 Oh I see, position 4 is the other end!! – Galik May 18 '17 at 05:54
  • @user4581301 Then its even stevens which would be more efficient depending on the actual size of the structure. As you get larger the *deque* would seem more attractive – Galik May 18 '17 at 05:55
  • 1
    five elements, vector wins hands down. 5000... Not so sure. A circular buffer should always win. – user4581301 May 18 '17 at 05:56
  • I think there is some confusion with my wording - there is no middle insertion. What I meant is, that I am always looking at only n elements. So I always delete the first element, when i add my n+1th element to make sure I am always looking at only the last n elements. – Rahul Iyer May 18 '17 at 06:18
  • It's very difficult to make any solid claims, without benchmarking on the systems you will be running on using the volume of data you would actually be using. That said, a circular buffer will almost certainly be fastest, but requires you to implement it yourself. List will certainly be slowest. If you don't want to implement a circular buffer and don't want to benchmark the different options, then I would take a guess and use a vector if your buffer size is less than a few kilobytes, or a deque if larger than that. – David Scarlett May 18 '17 at 06:32
  • 1
    @DavidScarlett: No need to implement a circular buffer yourself, Boost has it. – MSalters May 18 '17 at 07:13
  • True, but not everyone uses Boost. – David Scarlett May 18 '17 at 07:34
  • @MSalters Yeah I prefer just sticking to the STL. – Rahul Iyer May 18 '17 at 09:11
  • A `std::deque` is implemented (at least when using the clang or gcc toolchains) as a circular buffer backed by a vector - so using a deque is the best of all worlds in this case - cache friendly and O(1). – Benjamin Gruenbaum May 23 '17 at 21:27
25

If you can use Boost, try boost::circular_buffer:

Boost Circular Buffer

It's a kind of sequence similar to std::list or std::deque. It supports random access iterators, constant time insert and erase operations at the beginning or the end of the buffer and interoperability with std algorithms.

It provides fixed capacity storage: when the buffer is filled, new data is written starting at the beginning of the buffer and overwriting the old

// Create a circular buffer with a capacity for 5 integers.
boost::circular_buffer<int> cb(5);

// Insert elements into the buffer.
cb.push_back(3);
cb.push_back(24);
cb.push_back(51);
cb.push_back(62);
cb.push_back(37);

int a = cb[0];  // a == 3
int b = cb[1];  // b == 24
int c = cb[2];  // c == 51

// The buffer is full now, so pushing subsequent
// elements will overwrite the front-most elements.
cb.push_back(8);   // overwrite 3 with 8
cb.push_back(12);  // overwrite 24 with 12

// The buffer now contains 51, 62, 37, 8, 12.

// Elements can be popped from either the front or the back.
cb.pop_back();  // 12 is removed
cb.pop_front(); // 51 is removed

The circular_buffer stores its elements in a contiguous region of memory, which then enables fast constant-time insertion, removal and random access of elements.


PS ... or implement the circular buffer directly as suggested by Taemyr.

Overload Journal #50 - Aug 2002 has a nice introduction (by Pete Goodliffe) to writing robust STL-like circular buffer.

Community
  • 1
  • 1
manlio
  • 16,658
  • 13
  • 67
  • 107
  • 2
    `which then enables fast constant-time insertion, removal of elements` Only at either end, otherwise it's linear. – sp2danny May 19 '17 at 09:19
5

The problem is that O(n) only talks about the asymptotic behaviour as n tends to infinity. If n is small then the constant factors involved become significant. The result is that for "last 5 integer items" I would be stunned if vector didn't beat list. I would even expect std::vector to beat std::deque.

For "last 500 integer items" I would still expect std::vector to be faster than std::list - but std::deque would now probably win. For "last 5 million slow-to-copy items", std:vector would be slowest of all.

A ring buffer based on std::array or std::vector would probably be faster still though.

As (almost) always with performance issues:

  • encapsulate with a fixed interface
  • write the simplest code that can implement that interface
  • if profiling shows you have a problem, optimize (which will make the code more complicated).

In practise, just using a std::deque, or a pre-built ring-buffer if you have one, will be good enough. (But it's not worth going to the trouble of writing a ring buffer unless profiling says you need to.)

3

If you need to store last N-elements then logically you are doing some kind of queue or a circular buffer, std::stack and std::deque are implementations of LIFO and FIFO queues.

You can use boost::circular_buffer or implement simple circular buffer manually:

template<int Capcity>
class cbuffer
{
public:
    cbuffer() : sz(0), p(0){}
    void push_back(int n)
    {
        buf[p++] = n;
        if (sz < Capcity)
            sz++;
        if (p >= Capcity)
            p = 0;
    }
    int size() const
    {
        return sz;
    }
    int operator[](int n) const
    {
        assert(n < sz);
        n = p - sz + n;
        if (n < 0)
            n += Capcity;
        return buf[n];
    }
    int buf[Capcity];
    int sz, p;
};

Sample use for circular buffer of 5 int elements:

int main()
{
    cbuffer<5> buf;

    // insert random 100 numbers
    for (int i = 0; i < 100; ++i)
        buf.push_back(rand());

    // output to cout contents of the circular buffer
    for (int i = 0; i < buf.size(); ++i)
        cout << buf[i] << ' ';
}

As a note, keep in mind that when you have only 5 elements the best solution is the one that's fast to implement and works correctly.

Pavel P
  • 13,962
  • 11
  • 68
  • 109
3

Here is a minimal circular buffer. I'm primarily posting that here to get a metric ton of comments and ideas of improvement.

Minimal Implementation

#include <iterator>

template<typename Container>
class CircularBuffer
{
public:
    using iterator   = typename Container::iterator;
    using value_type = typename Container::value_type;
private:
    Container _container;
    iterator  _pos;
public:
    CircularBuffer() : _pos(std::begin(_container)) {}
public:
    value_type& operator*() const { return *_pos; }
    CircularBuffer& operator++() { ++_pos ; if (_pos == std::end(_container)) _pos = std::begin(_container); return *this; }
    CircularBuffer& operator--() { if (_pos == std::begin(_container)) _pos = std::end(_container); --_pos; return *this; }
};

Usage

#include <iostream>
#include <array>

int main()
{
    CircularBuffer<std::array<int,5>> buf;

    *buf = 1; ++buf;
    *buf = 2; ++buf;
    *buf = 3; ++buf;
    *buf = 4; ++buf;
    *buf = 5; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; --buf;
    std::cout << *buf << " "; --buf;
    std::cout << *buf << " "; --buf;
    std::cout << *buf << " "; --buf;
    std::cout << *buf << " "; --buf;
    std::cout << *buf << " "; --buf;

    std::cout << std::endl;
}

Compile with

g++ -std=c++17 -O2 -Wall -Wextra -pedantic -Werror

Demo

On Coliru: try it online

YSC
  • 34,418
  • 7
  • 80
  • 129
  • 2
    If you're really wanting those comments and ideas, you might want to ask for critique over at [codereview.se]. Be sure to read [A guide to Code Review for Stack Overflow users](//codereview.meta.stackexchange.com/a/5778) first, as some things are done differently over there! – Toby Speight May 18 '17 at 16:37
2

Yes. Time complexity of the std::vector for removing elements from the end is linear. std::deque might be a good choice for what you are doing as it offers constant time insertion and removal at the beginning as well as at the end of the list and also better performance than std::list

Source:

http://www.sgi.com/tech/stl/Vector.html

http://www.sgi.com/tech/stl/Deque.html

codelyzer
  • 407
  • 1
  • 4
  • 16
  • Since the vector size is fixed at 5, _any_ operation is O(5) = constant time. – MSalters May 18 '17 at 07:15
  • 1
    Yes. But the OP just used n = 5 as an example, that need not mean he/she will always use 5 elements. – codelyzer May 18 '17 at 08:22
  • 1
    Doesn't matter, O(6) is also constant time. The point is that the number of items is _bounded_. – MSalters May 18 '17 at 09:25
  • @MSalters: that's not how big-O notation works. Your line of thought reduces to "Since no integer N is infinitely large, therefore O(N) = O(1)." Codelyzer's point is that removing from the front of a vector (and moving the other N-1 elements forward) is O(N) in the length of the vector — which is absolutely 100% true. – Quuxplusone May 19 '17 at 15:27
  • You might counter: "Ah, but *my* vector has only bounded length — five!" Sure. But someone else might have a vector of length six; or length 100; or length 10000; and the amount of time it takes to `pop_front` on those vectors goes up with the length of the vector — as the vector gets longer, the number of operations involved in `pop_front` goes up *linearly*. *This* is what we mean by "O(N)". – Quuxplusone May 19 '17 at 15:30
  • @Quuxplusone: I know how big-O notation works (and Big-Omega etc). An O(N) algorithm has a running time less than c*N, as N goes to infinity. Mathematically, this is known as a "limit". The fundamental point remains that _bounded_ numbers do not go to _infinity_. – MSalters May 19 '17 at 16:04
2

Here are the beginnings of a ring buffer based dequeue template class that I wrote a while ago, mostly to experiment with using std::allocator (so it does not require T to be default constructible). Note it currently doesn't have iterators, or insert/remove, copy/move constructors, etc.

#ifndef RING_DEQUEUE_H
#define RING_DEQUEUE_H

#include <memory>
#include <type_traits>
#include <limits>

template <typename T, size_t N>
class ring_dequeue {
private:
    static_assert(N <= std::numeric_limits<size_t>::max() / 2 &&
                  N <= std::numeric_limits<size_t>::max() / sizeof(T),
                  "size of ring_dequeue is too large");

    using alloc_traits = std::allocator_traits<std::allocator<T>>;

public:
    using value_type = T;
    using reference = T&;
    using const_reference = const T&;
    using difference_type = ssize_t;
    using size_type = size_t;

    ring_dequeue() = default;

    // Disable copy and move constructors for now - if iterators are
    // implemented later, then those could be delegated to the InputIterator
    // constructor below (using the std::move_iterator adaptor for the move
    // constructor case).
    ring_dequeue(const ring_dequeue&) = delete;
    ring_dequeue(ring_dequeue&&) = delete;
    ring_dequeue& operator=(const ring_dequeue&) = delete;
    ring_dequeue& operator=(ring_dequeue&&) = delete;

    template <typename InputIterator>
    ring_dequeue(InputIterator begin, InputIterator end) {
        while (m_tailIndex < N && begin != end) {
            alloc_traits::construct(m_alloc, reinterpret_cast<T*>(m_buf) + m_tailIndex,
                                    *begin);
            ++m_tailIndex;
            ++begin;
        }
        if (begin != end)
            throw std::logic_error("Input range too long");
    }

    ring_dequeue(std::initializer_list<T> il) :
        ring_dequeue(il.begin(), il.end()) { }

    ~ring_dequeue() noexcept(std::is_nothrow_destructible<T>::value) {
        while (m_headIndex < m_tailIndex) {
            alloc_traits::destroy(m_alloc, elemPtr(m_headIndex));
            m_headIndex++;
        }
    }

    size_t size() const {
        return m_tailIndex - m_headIndex;
    }
    size_t max_size() const {
        return N;
    }

    bool empty() const {
        return m_headIndex == m_tailIndex;
    }
    bool full() const {
        return m_headIndex + N == m_tailIndex;
    }

    template <typename... Args>
    void emplace_front(Args&&... args) {
        if (full())
            throw std::logic_error("ring_dequeue full");
        bool wasAtZero = (m_headIndex == 0);
        auto newHeadIndex = wasAtZero ? (N - 1) : (m_headIndex - 1);
        alloc_traits::construct(m_alloc, elemPtr(newHeadIndex),
                                std::forward<Args>(args)...);
        m_headIndex = newHeadIndex;
        if (wasAtZero)
            m_tailIndex += N;
    }
    void push_front(const T& x) {
        emplace_front(x);
    }
    void push_front(T&& x) {
        emplace_front(std::move(x));
    }

    template <typename... Args>
    void emplace_back(Args&&... args) {
        if (full())
            throw std::logic_error("ring_dequeue full");
        alloc_traits::construct(m_alloc, elemPtr(m_tailIndex),
                                std::forward<Args>(args)...);
        ++m_tailIndex;
    }
    void push_back(const T& x) {
        emplace_back(x);
    }
    void push_back(T&& x) {
        emplace_back(std::move(x));
    }

    T& front() {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        return *elemPtr(m_headIndex);
    }
    const T& front() const {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        return *elemPtr(m_headIndex);
    }
    void remove_front() {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        alloc_traits::destroy(m_alloc, elemPtr(m_headIndex));
        ++m_headIndex;
        if (m_headIndex == N) {
            m_headIndex = 0;
            m_tailIndex -= N;
        }
    }
    T pop_front() {
        T result = std::move(front());
        remove_front();
        return result;
    }

    T& back() {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        return *elemPtr(m_tailIndex - 1);
    }
    const T& back() const {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        return *elemPtr(m_tailIndex - 1);
    }
    void remove_back() {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        alloc_traits::destroy(m_alloc, elemPtr(m_tailIndex - 1));
        --m_tailIndex;
    }
    T pop_back() {
        T result = std::move(back());
        remove_back();
        return result;
    }

private:
    alignas(T) char m_buf[N * sizeof(T)];
    size_t m_headIndex = 0;
    size_t m_tailIndex = 0;
    std::allocator<T> m_alloc;

    const T* elemPtr(size_t index) const {
        if (index >= N)
            index -= N;
        return reinterpret_cast<const T*>(m_buf) + index;
    }
    T* elemPtr(size_t index) {
        if (index >= N)
            index -= N;
        return reinterpret_cast<T*>(m_buf) + index;
    }
};

#endif
Daniel Schepler
  • 2,673
  • 11
  • 19
  • Anybody, feel free to incorporate this into your answers - this was mostly posted as a response to Taemyr's answer and @KaizerSozay's comment on it suggesting to add a sample implementation. – Daniel Schepler May 18 '17 at 16:51
1

Briefly say the std::vector is better for a non-change size of memory.In your case,if you move all data forward or append new data in a vector,that must be a waste.As @David said the std::deque is a good option,since you would pop_head and push_back eg. two way list.

from the cplus cplus reference about the list

Compared to other base standard sequence containers (array, vector and deque), lists perform generally better in inserting, extracting and moving elements in any position within the container for which an iterator has already been obtained, and therefore also in algorithms that make intensive use of these, like sorting algorithms.

The main drawback of lists and forward_lists compared to these other sequence containers is that they lack direct access to the elements by their position; For example, to access the sixth element in a list, one has to iterate from a known position (like the beginning or the end) to that position, which takes linear time in the distance between these. They also consume some extra memory to keep the linking information associated to each element (which may be an important factor for large lists of small-sized elements).

about deque

For operations that involve frequent insertion or removals of elements at positions other than the beginning or the end, deques perform worse and have less consistent iterators and references than lists and forward lists.

vetor

Therefore, compared to arrays, vectors consume more memory in exchange for the ability to manage storage and grow dynamically in an efficient way.

Compared to the other dynamic sequence containers (deques, lists and forward_lists), vectors are very efficient accessing its elements (just like arrays) and relatively efficient adding or removing elements from its end. For operations that involve inserting or removing elements at positions other than the end, they perform worse than the others, and have less consistent iterators and references than lists and forward_lists.

Shihe Zhang
  • 1,977
  • 4
  • 30
  • 49
-1

I think even use std::deque it also have overhead of copy items in certain condition because std::deque is a map of arrays essentially, so std::list is a good idea to eliminate the copy overhead.

To increase the performance of traverse for std::list, you can implement a memory pool so that the std::list will allocate memory from a trunk and it's spatial locality for caching.

cartman
  • 580
  • 2
  • 5
  • 19