8

I have a reference to std::vector<char> that I want to use as a parameter to a function which accepts std::vector<unsigned char>. Can I do this without copying?

I have following function and it works; however I am not sure if a copy actually takes place - could someone help me understanding this? Is it possible to use std::move to avoid copy or is it already not being copied?

static void showDataBlock(bool usefold, bool usecolor,
            std::vector<char> &chunkdata)  
{
  char* buf = chunkdata.data();                      
  unsigned char* membuf = reinterpret_cast<unsigned char*>(buf); 
  std::vector<unsigned char> vec(membuf, membuf + chunkdata.size()); 
  showDataBlock(usefold, usecolor, vec);   
} 

I was thinking that I could write:

std::vector<unsigned char> vec(std::move(membuf),
                               std::move(membuf) + chunkdata.size());  

Is this overkill? What actually happens?

Toby Speight
  • 23,550
  • 47
  • 57
  • 84
serup
  • 2,969
  • 1
  • 27
  • 29
  • `std::vector vec(membuf, membuf + chunkdata.size());` makes a copy of the data in `chunkdata` – M.M Jan 04 '17 at 11:30
  • 2
    based on the name of `showDataBlock` perhaps it could be redesigned to take a generic iterator pair – M.M Jan 04 '17 at 11:35
  • @WhiZTiM: No, there's a second overload (not shown) that takes a `std::vector` as its third parameter. I do wonder why the overload shown takes its third argument by non-const reference, though. If the other overload _also_ takes its argument by non-const reference, then it presumably modifies it, and the code shown fails to copy back the modifications from `vec` to `chunkdata`. – MSalters Jan 04 '17 at 12:16
  • @MSalters, please is the casting in [this](http://stackoverflow.com/a/41463034/1621391) likely to inhibit certain optimizations as per the OP's concern in the comments of that answer? – WhiZTiM Jan 05 '17 at 15:21
  • @Toby Speight, I think that the change of the title actually makes it difficult to understand what I am asking for - perhaps it could be altered in a different way – serup Jan 17 '17 at 11:59
  • 1
    @serup - I've edited the title again; if you still think it's not helpful, you're always able to [edit] your own post. – Toby Speight Jan 17 '17 at 13:46

6 Answers6

5

...is it possible to use std::move to avoid copy or is it already not being copied

You cannot move between two unrelated containers. a std::vector<char> is not a std::vector<unsigned char>. And hence there is no legal way to "move ~ convert" the contents of one to another in O(1) time.

You can either copy:

void showData( std::vector<char>& data){
    std::vector<unsigned char> udata(data.begin(), data.end());
    for(auto& x : udata)
        modify( x );
    ....
}

or cast it in realtime for each access...

inline unsigned char& as_uchar(char& ch){
    return reinterpret_cast<unsigned char&>(ch);
}

void showDataBlock(std::vector<char>& data){
    for(auto& x : data){
        modify( as_uchar(x) );
    }
}
WhiZTiM
  • 19,970
  • 3
  • 36
  • 56
  • this solution seems correct, however it could be performance wise not so good, so I decided to use another solution – serup Jan 05 '17 at 12:50
  • *First rule of performance tuning is "measure"*. Though, for any good optimizing compiler(obviously modern versions of Clang, GCC, MSVC, Intel are), there should be no code generated for the cast... However, I cannot comment on the optimization implications... MSalters [answered this question](http://stackoverflow.com/questions/3575234/reinterpret-cast-cast-cost) years ago. He is definitely in a better position to comment on whether this particular case may inhibit certain optimizations. – WhiZTiM Jan 05 '17 at 15:10
  • 1
    As it happens, `unsigned char&` is a special case. Basically, for `memcpy` to work, `unsigned char&` must be able to alias anything. So the existing ` modify(unsigned char&)` function would already block optimizations that this `as_uchar()` method would also block. – MSalters Jan 05 '17 at 15:26
2

If you have a v1 of type std::vector<T1> and need a v2 of type std::vector<T2> there is no way around copying the data, even if T1 and T2 are "similar" like char and unsigned char.

Use standard library:

std::vector<unsigned char> v2;
std::copy(v1.begin(), v1.end(), std::back_inserter(v2));

The only possible way around it is to somehow work with only one type: either obtain std::vector<T2> from the start if possible, or work with std::vector<T1> from now on (maybe add an overload that deals with it). Or create generic code (templates) that can deal with any [contigous] container.


I think reinterpret_cast and std::move should make it possible to avoid copy
no, it can't
please elaborate - why not?

A vector can steal resources (move data) only from another vector of the same type. That's how it's interface was designed.

To do what you want you would need a release() method that would release the vector ownership of the underlying data and return it as a (unique) pointer and a move constructor/assignment that would acquire the underlying data from a (unique) pointer. (And even then you would still require an reinterpret_cast which is... danger zone)

std::vector has none of those. Maybe it should have. It just doesn't.

bolov
  • 58,757
  • 13
  • 108
  • 182
  • I think reinterpret_cast and std::move should make it possible to avoid copy – serup Jan 05 '17 at 12:49
  • @serup no, it can't – bolov Jan 05 '17 at 14:05
  • please elaborate - why not? – serup Jan 17 '17 at 09:18
  • @serup A vector can steal resources (move data) only from another vector of **the same type**. That's how it's interface was designed. To do what you want you would need a `release()` method that would release the vector ownership of the underlying data and return it as a (unique) pointer and a move constructor of vector that would acquire the underlying data from a (unique) pointer. `std::vector` has none of those. – bolov Jan 17 '17 at 09:22
  • as I understand the std::move() is a cast that produces an rvalue-reference to an object, to enable moving from it, and this is a fearly new C++ way to avoid copies. For example, using a move constructor, a std::vector could just copy its internal pointer to data to the new object, leaving the moved object in an incorrect state, avoiding to copy all data. reference from : http://stackoverflow.com/questions/3413470/what-is-stdmove-and-when-should-it-be-used/3413547#3413547 – serup Jan 17 '17 at 09:29
  • 1
    @serup it could. But it doesn't. As I've said, `std::vector` doesn't have an interface that allows manual ownership acquire/release of it's internal buffer. The only way to move to a vector is from a `std::vector` of **the same type**. That's how `std::vector` is designed. – bolov Jan 17 '17 at 09:33
  • I know you are not suppose to write thanks in these comments, so I will refrain from doing so ;-). std::move is really only useful when dealing with same type vectors, however no exception ? or compiler warning ? or perhaps I just did not see any warnings – serup Jan 17 '17 at 11:29
  • @serup as you've said `std::move` nothing more than a cast. So the result of `std::move` is seen as a temporary. There is no reason to issue a warning then a temporary is passed as a parameter. If moving is supported, then the object is moved. If moving is not sported then the object is copied – bolov Jan 17 '17 at 12:06
  • no, don't use copy, it kills performance. use memcpy – Arsen Zahray Jan 12 '21 at 12:59
  • @ArsenZaheay if it's safe (i.e. trivially copy able types) the compiler (with optimizations enabled ofc) will generate code calling `memecpy` resulting in exactly the same performance. – bolov Jan 12 '21 at 13:12
1

I guess you coded another overloaded function :-

showDataBlock(usefold, usecolor, std::vector<unsigned char> & vec);  

You try to convert from std::vector<T> to another std::vector<T2>.

There is no way to avoid the copying.

Each std::vector has its own storage, roughly speaking, it is a raw pointer.
The main point is : you can't share such raw pointer among multiple std::vector.
I think it is by design.
I think it is a good thing, otherwise it would waste CPU to keep track.

The code ...

std::move(membuf)

... move the raw pointer = actually do nothing. (same as passing as membuf)

To optimize, you should verify the reason : why you want to convert from std::vector<char> to std::vector<unsigned char> in the first place.

Is it a better idea if you create a new class C that can represent as both char and unsigned char? (e.g. C::getChar() and C::getUnsignedChar(), may be ... store only char but provide converter as its non-static function)

If it doesn't help, I suggest creating a new custom data-structure.
I often do that when it is needed.

However, in this case, I don't think it need any optimization.
It is OK for me, except it is a performance critical code.

javaLover
  • 6,039
  • 2
  • 14
  • 57
  • 3
    It is by design. There's `std::shared_ptr` when you want to share storage, and you can combine the two: `std::shared_ptr>` – MSalters Jan 04 '17 at 12:19
  • 1
    @MSalters Good point sir! `std::shared_ptr>` is a cool notation. ...... "It is by design." >`) in some real cases? Do you encapsulate it? I am curious. :) – javaLover Jan 04 '17 at 12:43
  • 2
    I can't recall a specific case, but it's not exotic code at all. E.g. when you have a sender and a receiver, and they share the buffer in between, then you would expect the receiver to have a `std::shared_ptr>` and the sender to have a `std::weak_ptr>`. (The sender doesn't need to keep the buffer alive after the receiver quits) – MSalters Jan 04 '17 at 12:51
1

As others already pointed out, there is no way around the copy without changing showDataBlock.

I think you have two options:

  1. Extend showDataBlock to work on both signed char and unsigned char (ie. make it a template) or
  2. Don't take the container as argument but an iterator range instead. You could then (in case of value_type being char) use special iterators converting from signed char to unsigned char elementwisely.
m8mble
  • 1,283
  • 1
  • 15
  • 26
  • thank you for your advise about extending my function to handle both - I ended up doing just that – serup Jan 05 '17 at 12:51
-1

while unsigned char and char are unrelated types. I think they're similar enough in this case (same size pods) to get away with a reinterpret_cast of the entire templated class.

static void showDataBlock(bool usefold, bool usecolor,
            std::vector<char> &chunkdata)  
{
  showDataBlock(usefold, usecolor, reinterpret_cast< std::vector<unsigned char>&>(chunkdata));   
}

However, I tend to find these problems are due to not designing the best architecture. Look at the bigger picture of what it is that this software is supposed to be doing to identify why you need to work wit both signed and unsigned char blocks of data.

ichidan
  • 9
  • 3
-3

I ended up doing something like this :

static void showDataBlock(bool usefold,bool usecolor, std::vector<char> chunkdata)
{                                                                                                                           
    std::vector<unsigned char>&cache = reinterpret_cast<std::vector<unsigned char>&>(chunkdata);                                              
    showDataBlock(usefold, usecolor, cache);    
}                                                                             

static bool showDataBlock(bool usefold,bool usecolor, std::vector<unsigned char> &chunkdata)   
{
    // showing the data
}

This solution allowed me to pass vector as ref or as normal it seems to be working - if its the best solution I do not know, however you all came with some really good suggestions - thank you all

I agree I can not avoid the copy, so I let the copy be done with normal parameter passing

Please if you find this solution wrong, then provide a better one in comment, not just downvote

serup
  • 2,969
  • 1
  • 27
  • 29
  • this solution is based on others suggestions combined with trial and errors, so it is a serious solution and working – serup Jan 05 '17 at 12:55
  • if you vote down then write why - otherwise you are not really serious – serup Jan 05 '17 at 13:00
  • 1
    `std::vector` and `std::vector` are 2 classes **completely unrelated** (even though generated from the same template). `reinterpret_cast` between them is Undefined Behavior. – bolov Jan 17 '17 at 09:37
  • @bolov, the undefined behaviour would that relate to how the ref pointer is past on or ?? – serup Jan 17 '17 at 11:41