Implementation of a lock free vector

Question

After several searches, I cannot find a lock-free vector implementation. There is a document that speaks about it but nothing concrete (in any case I have not found it). http://pirkelbauer.com/papers/opodis06.pdf

There are currently 2 threads dealing with arrays, there may be more in a while.

One thread that updates different vectors and another thread that accesses the vector to do calculations, etc. Each thread accesses the different array a large number of times per second.

I implemented a lock with mutex on the different vectors but when the reading or writing thread takes too long to unlock, all further updates are delayed. I then thought of copying the array all the time to go faster, but copying thousands of times per second an array of thousands of elements doesn't seem great to me.

So I thought to use 1 mutex per value in each table to lock only the value I am working on.

A lock-free could be better but I can not find a solution and I wonder if the performances would be really better.

EDIT:

I have a thread that receives data and ranges in vectors. When I instantiate the structure, I use a fixed size.

I have to do 2 different things for the updates:

-Update vector elements. (1d vector which simulates a 2d vector)

-Add a line at the end of the vector and remove the first line. The array always remains sorted. Adding elements is much much rarer than updating

The thread that is read-only walks the array and performs calculations. To limit the time spent on the array and do as little calculation as possible, I use arrays that store the result of my calculations. Despite this, I often have to scan the table enough to do new calculations or just update them. (the application is in real-time so the calculations to be made vary according to the requests)

When a new element is added to the vector, the reading thread must directly use it to update the calculations.

When I say calculation, it is not necessarily only arithmetic, it is more a treatment to be done.

Do you have a producer/consumer pattern? You might want to use a circular buffer for that. If not, what exactly is your access pattern ("One thread updates different vectors" is a bit vague. How is it updated? An index is changed? An element is added via `push_back`? Are these different to the places accessed by the other threads?) — Artyer, Jan 07 '21 at 18:05
How many rows in your vector? Since you're adding and removing rows to your vector ("1d vector which simulates a 2d vector") ***don't*** simulate a 2d vector. Instead, operate on sets of rows. That way instead of locking the entire 2D array while you muck with the contents of the entire vector, you can create the new row without locking the 2D array and lock it just long enough to remove the first row and add the new row. That should be ***fast*** because they're just pointers to rows. — Andrew Henle, Jan 09 '21 at 15:20
Possibly even easier, if your calculation thread does one set of calculations for each row your producer thread creates, is to use a queue to pass new rows to the calculation thread, which maintains the set of rows it needs with no need for locking your array at all. You can then use another queue to pass the row objects back to the creation thread for filling with new data. — Andrew Henle, Jan 09 '21 at 15:26
There may not be a lock-free approach that would be suitable for your problem. This is because it sounds like you need to make atomic updates that are broader in scope than lock-free atomics can accommodate -- removing a row and adding another, for example. There is also the question of whether performing updates while the reader is processing a vector -- even atomically -- may cause the reader to misinterpret the data. For instance, you may get computed results that do not correspond to any aggregate state that the overall vector ever had. — John Bollinger, Jan 09 '21 at 15:48
2020 CppCon video that goes into the challenges: https://youtu.be/Pi243hGxDyA. The book Concurrency in Action goes into the challenges as well, but I didn't see a lock-free vector example. — John Duffy, Jan 14 '21 at 21:34

score 2 · Answer 1 · answered Jan 14 '21 at 18:36

There is no perfect implementation to run concurrency, each task has it's own good enogh. My goto method to find a decent implementation is to only alow what is needed and then check if i would need somthing more in the future. You described a quite simple scenario, one thread one accion to a shared vector, then the vector needs to tell if the acction is alowed soo std::atomic_flag is good enogh.

This example shuld give you an idea on how it works and what to expent. Mainly i just attached a flag to eatch array and checkt it before to see if is safe to do somthing and some people like to add a guard to the flag, just in case.

#include <iostream>
#include <thread>
#include <atomic>
#include <chrono>

const int vector_size = 1024;

struct Element {
    void some_yield(){
        std::this_thread::yield();
    };
    void some_wait(){
        std::this_thread::sleep_for(
            std::chrono::microseconds(1)
        );
    };
};

Element ** data;
std::atomic_flag * vector_safe;


bool alive = true;
uint32_t c_down_time = 0;
uint32_t p_down_time = 0;
uint32_t c_intinerations = 0;
uint32_t p_intinerations = 0;
std::chrono::high_resolution_clock::time_point c_time_point;
std::chrono::high_resolution_clock::time_point p_time_point;

int simple_consumer_work(){
    Element a_read;
    uint16_t i, e;
    while (alive){
        // Loops thru the vectors
        for (i=0; i < vector_size; i++){
            // locks the thread untin the vector 
            // at index i is free to read
            while (!vector_safe[i].test_and_set()){} 
                // Do the watherver
                for (e=0; e < vector_size; e++){
                    a_read = data[i][e];
                } 
            // And signal that this vector is done
            vector_safe[i].clear();
        }
    }
    return 0;
};
int simple_producer_work(){
    uint16_t i;
    while (alive){
        for (i=0; i < vector_size; i++){
            while (!vector_safe[i].test_and_set()){} 
            data[i][i].some_wait();
            vector_safe[i].clear();
        }
        p_intinerations++;
    }
    return 0;
};

int consumer_work(){
    Element a_read;
    uint16_t i, e;
    bool waiting;
    while (alive){

        for (i=0; i < vector_size; i++){
            waiting = false;
            c_time_point = std::chrono::high_resolution_clock::now();
            while (!vector_safe[i].test_and_set(std::memory_order_acquire)){
                waiting = true;
            } 
            if (waiting){
                c_down_time += (uint32_t)std::chrono::duration_cast<std::chrono::nanoseconds> 
                (std::chrono::high_resolution_clock::now() - c_time_point).count();
            }  
            for (e=0; e < vector_size; e++){
                a_read = data[i][e];
            } 
            vector_safe[i].clear(std::memory_order_release);
        }
        c_intinerations++;
    }
    return 0;
};
int producer_work(){
    bool waiting;
    uint16_t i;
    while (alive){
        for (i=0; i < vector_size; i++){
            waiting = false;
            p_time_point = std::chrono::high_resolution_clock::now();
            while (!vector_safe[i].test_and_set(std::memory_order_acquire)){
                waiting = true;
            } 
            if (waiting){
                p_down_time += (uint32_t)std::chrono::duration_cast<std::chrono::nanoseconds> 
                (std::chrono::high_resolution_clock::now() - p_time_point).count();
            } 
            data[i][i].some_wait();
            vector_safe[i].clear(std::memory_order_release);
        }
        p_intinerations++;
    }
    return 0;
};

void print_time(uint32_t down_time){
    if ( down_time <= 1000) {
        std::cout << down_time << " [nanosecods] \n";

    } else if (down_time <= 1000000) {
        std::cout << down_time / 1000 << " [microseconds] \n";
    
    } else if (down_time <= 1000000000) {
        std::cout << down_time / 1000000 << " [miliseconds] \n";
    
    } else {
        std::cout << down_time / 1000000000 << " [seconds] \n";
    }
};

int main(){

    std::uint16_t i;
    std::thread consumer;
    std::thread producer;

    vector_safe = new std::atomic_flag [vector_size] {ATOMIC_FLAG_INIT};
    data = new Element * [vector_size];
    for(i=0; i < vector_size; i++){
        data[i] = new Element;
    }

    consumer = std::thread(consumer_work);
    producer = std::thread(producer_work);

    std::this_thread::sleep_for(
        std::chrono::seconds(10)
    );

    alive = false;
    producer.join();
    consumer.join();

    std::cout << " Consumer loops > " << c_intinerations << std::endl;
    std::cout << " Consumer time lost > "; print_time(c_down_time);
    std::cout << " Producer loops > " << p_intinerations << std::endl;
    std::cout << " Producer time lost > "; print_time(p_down_time);

    for(i=0; i < vector_size; i++){
        delete data[i];
    }
    delete [] vector_safe;
    delete [] data;

    return 0;
}

And dont forget that the compiler can and will change portions of the code, spagueti code is realy realy buggy in multithreading.

Implementation of a lock free vector

1 Answers1

Linked