select on a bit vector in C++ complexity and implementation

Question

I'm implementing a matrix reduction algorithm, I'm a math student. Obviously I've searched and read around internet but didn't find exactly what I was looking for (I list at the end what I've found and the papers that I've read.)

Quick overview of the problem:

The bitvector b has FIXED LENGTH N.
b changes at every step (could be only at a couple of indexes (most of the times) or at considerably more indexes (from 1/10 to 1/3), this only in ~10% of the cases).

I already have a sparse implementation, now I'd like to code it using some smart implementation of the bitvector.

//initialize to 0 
b=bitvector(0, n=N)

for i in 1 to N
    {some operations on the bitvector b}
    get I= { j | b[j] == 1 }
    {save I}

What I need is:

quickly set b[i]=1 or =0 (possibly O(1))
quickly get the set of indexes I at each step (definitely not more than O(logN), ideally O(1))
a C++ library that allows it
papers/documentation

What would be nice to have:

a fast way to get the "lowest one" (the last index set to 1, namely select(rank(b)), if both operations are fast (O(1)))

What I do not need is:

save space
compress the data

I have been using the library Sdsl 2.0 of Simon Gog et al. (https://github.com/simongog/sdsl-lite) but the select structure

bit_vector::select_1_type

costs O(n) to be initialized, O(1) for every query but does not "follow" the changes in b (right?? I haven't found anything very specific about it), meaning that it needs to be initialized at every step after the modifications.

Papers that I've read are: "Fast, Small, Simple Rank/Select on Bitmaps" (G. Navarro and E. Providel) and "Practical Entropy-Compressed Rank/Select Dictionary" (D. Okanohara K. Sadakane) and I would appreciate any link to solid implementations in C++ (if the structure fulfills my requirements)

Things that I've found here on stackexchange about similar topics that didn't help:

Sorry for the lengthy question, I hope I explained what I need and my determination to finding it. I'm still very confused about various things related with bitvectors, it's definitely not my field of expertise, so any clarification is appreciated.

Thanks in advance.

I don't know a library that does this, but you can easily implement it. You can set b[i] to 0 or 1 in O(1) with `b = b & ~(1 << i)` or `b = b | (1 << i)`, respectively. Besides that, you can use `b = b ^ (1 << i)` to change the bit (if 0, sets it to 1 and if 1, sets it to 0) and to check the bit you can use `(b & (1 << i)) > 0`, if the i bit is 1, returns true, otherwise returns false. — Jean Catanho, Jun 27 '16 at 11:21
"quickly get the set of indexes I at each step (definitely not more than O(logN), ideally O(1))" - what do you then intend to do with it? From what you say it'll contain some number of indices proportional to N at least 10% of the time, so it'll take you time proportional to N to walk. — moonshadow, Jun 27 '16 at 12:22
I thought about some data structure that "keeps track" of the changes and allows one to retrieve the indexes in a faster way than walking through the N sites, maybe some tree implementation or any other over-structure that admits the retrieval of such information! But that would mean an higher complexity for insertion/deletion, am I right? I could admit a cost of O(log N) for something in the for cycle, but definitely not an O(N)...but I'm not sure if it's possible! The more I read the more confused I am...btw, thanks for reading and answering ;) — joerg91, Jun 27 '16 at 12:45
If the cost of retrieving the indices in order is more important to you than the cost of setting/clearing, you're looking at some kind of priority queue / heap. These can theoretically average constant time for most operations (typically deletion becomes log(N), everything else is constant), but the implementation is usually complex and the constant overhead tends to be pretty high. Rather more production implementations of Brodal, Fibonacci etc. heaps exist in the wild than of the sparse set I link to below though :) — moonshadow, Jun 27 '16 at 12:57

moonshadow · Answer 1 · 2016-06-27T12:47:44.827

1

The structure described here is the closest thing I am aware of to the properties you want.

Specifically:

initialisation is constant time
setting/clearing entries is constant time
testing for membership is constant time
retrieving the set of entries is O(N) in the number of entries (assuming you don't need them sorted - you actually end up walking them in order of insertion; you're not going to do better than O(N) overall if you need to walk all of them for whatever happens next, of course)

edited Jun 27 '16 at 12:47

answered Jun 27 '16 at 12:32

moonshadow

75,857
7
78
116

This is extremely nice, probably I can get something useful out of it :) thanks! – joerg91 Jun 27 '16 at 13:05

select on a bit vector in C++ complexity and implementation

1 Answers1