Static Deserialization Factory in C++ Interface

Question

Is it bad practice to have the static deserialization method in the interface class as below?

class IBase
{
  ~IBase();
  virtual void Foo()=0;

  virtual const char* Serialize()=0;
  static std::unique_ptr<IBase> Deserialize(const char* data); // <-- This
};

IBase will travel across DLL boundaries, so an effective method of serialization is for each object to serialize itself, hence the IBase::Serialize function.

Then when we have raw data and we want to deserialize back to an object, it seems logical to call IBase::Deserialize, so that the Serialize and Deserialize functions are closely coupled.

boost::serialization does some clever automatic stuff so that the Deserialize factory function just looks like data >> pIBase;, so there's no manually updating the factory function for new derived types.

The alternative as I see it is to create an IBaseDeserializer class to hold the Deserialize function:

class IBase
{
  ~IBase();
  virtual void Foo()=0;

  virtual const char* Serialize()=0;
};

class IBaseDeserializer
{
  static std::unique_ptr<IBase> Deserialize(const char* data);
};

Most of the similar questions on SO claim that the first method should be avoided because it violates Single Responsibility Principle, having the base interface also be a factory. However, I feel as though this is a special case in the sense that it's a corresponding function to the base interface's Serialize function, which just happens to be a factory. However, it doesn't make a lot of sense for a derived object to have a Deserialize function, or for an interface to have an implementation, albeit static.

After writing this I'm leaning more towards the second method, but I still see value in the first. What's best practice here? What other pro's and con's are there, or am I way off base (no pun intended) altogether and should be doing something else?

The return type `const char*` is ungood. Who is responsible for deallocation, and how? — Cheers and hth. - Alf, Aug 26 '16 at 00:19
@Cheersandhth.-Alf: Unfortunately, it can't be a std::string return type as it's crossing a DLL boundary. In reality, the value that's returned is a std::string member variable's c_str(), so there's no deallocation responsibility and the c string is valid for the lifetime of the object. Open to other suggestions though... (+1 for "ungood") — parrowdice, Aug 26 '16 at 00:24
Why return a shared pointer? At the very least return a unique pointer and let the user share it if that's what she wants. — Kerrek SB, Aug 26 '16 at 00:29
@KerrekSB: I thought the compiler I was using didn't support it (VS2010), but it turns out it does. I'll update the code, thanks. — parrowdice, Aug 26 '16 at 00:35
Q: which method is better? Short answer: it doesn't matter, the "factory" method will have it's own non-polymorphic identity (no overrides of static methods), so you might as well declare it as a function in the same namespace as your `IBase`. See my answer below for a longer form. — Adrian Colomitchi, Aug 26 '16 at 02:55

score 1 · Answer 1 · edited May 23 '17 at 12:22

1

You're gonna run into a helluva lot more problems passing those std::shared_ptr's across DLL boundaries unless you've used the exact same compiler and standard library for all your DLL's and their respective clients or you get really lucky. (And same goes for std::unique_ptrs. It's the same issue you've already identified with std::string).

Your use of a static deserializer isn't good or bad practice, it's just an arbitrary design choice. Just stay consistent, that's the important part.

But you really ought to find a way to return IBase *'s and keep any shared pointer stuff isolated to the client-side, or consider rolling your own COM-style reference counted approach, or just straight up use COM (dealing with shared objects across DLL boundaries is one of the problems COM was designed to solve).

And it's not just return types. Even e.g. std::shared_ptr (or std::string for that matter) member variables in the DLL headers will be problematic, as their size / member layouts may be different across compilers, you'd have to hide anything like that behind the so-called pimpl pattern. Same deal with throwing std::exceptions across DLL boundaries.

Addressing your comment with a contrived example (I'm keeping this really simple, I hope it's representative even though it's not using any template stuff, the templatedness isn't really the issue):

Consider some stuff exported from a DLL:

// In a header you have no control over, e.g. a standard library header:
struct Example {
    int x;
    int y;
};

// Exported from your DLL:
class IBase {
public:
    static Example constructExample (int x, int y);
    static void incrementX (Example *ex);
};

// And its implementation:
Example IBase::constructExample (int x, int y) {
    Example ex;
    ex.x = x;
    ex.y = y;
    return ex;
}

void IBase::incrementX (Example *ex) {
    ex.x ++;
}

Now, you commented "... won't cross DLL boundaries as it's a static function - i.e. compiled into whatever DLL that's calling them". But it will. For example, in a client using that DLL:

// In a SLIGHTLY DIFFERENT VERSION of that header you have no control over, 
// e.g. a standard library header:
struct Example {
    int y; // note x and y are swapped from the version of this header
    int x; // that the DLL was compiled with.
};

// In your client code somewhere:
Example ex = IBase::constructExample(1, 2);
assert(ex.x == 1); // <- will fail, our broken ex.x is actually 2.
assert(ex.y == 2); // <- will fail, our broken ex.y is actually 1.

// And also:
Example ex;
ex.x = 0;
ex.y = 0;
IBase::incrementX(&ex);    
assert(ex.x == 1); // <- will fail, because our broken ex.y was incremented
assert(ex.y == 0); // <- will fail, because our broken ex.y was incremented

As you can see, the fact that it is static does not make a difference here.

edited May 23 '17 at 12:22

Community

1
1

answered Aug 26 '16 at 00:35

Jason C

34,234
12
103
151

The `std::shared_ptrs` won't cross DLL boundaries as it's a static function - i.e. compiled into whatever DLL that's calling them. – parrowdice Aug 26 '16 at 00:38
@parrowdice Yes it will, when a client calls it, uses the returned value, and tries to interpret whatever the DLL used as a `shared_ptr` using the client's possibly `shared_ptr` template. If your client code does `shared_ptr ptr = IBase::Deserialize(data)`, you're SOL if the DLL's implementation of `shared_ptr` doesn't match the clients. – Jason C Aug 26 '16 at 00:39
When a client compiles the static function, it will place the resulting code for the static function into its own binary. It has to as it's not associated with a particular object instance. When a client calls it, it will create a shared_ptr using its own code and shared_ptr implementation. The instruction pointer never left the DLL. If however it was a virtual function, then the vtable would direct it into the implementation in whichever DLL created the object, in which case we would be in trouble due to all of the reasons you mentioned (at least that's how I believe it works). – parrowdice Aug 26 '16 at 00:49
@parrowdice I added a rather trivial example that might illustrate the issue. – Jason C Aug 26 '16 at 00:49
@parrowdice No, it will not. What you say is *only* the case if *the static function is defined inline in the header*. And then the other requirement *on top of that*, for it to not break anything, is that *said inline static function is not also used inside the DLL* (because otherwise the DLL will contain code compiled using incompatible versions of the `shared_ptr`). You are mistaken about where the code exists; static or not, if you've placed the function definition in one of the DLL's source files, that compiled code is in the DLL. And if you use an inline function in the DLL, same deal. – Jason C Aug 26 '16 at 00:51
I'm trying to understand your point. Thanks for taking the time to explain. It's a good example you provided, I'll try it out tomorrow. I think I can see how in that situation it would break. You say "static or not, if you've placed the function definition in one of the DLL's source files, that compiled code is in the DLL". In my scenario, the function definition is not exported, but in a hpp file, which is included in every DLL along with IBase. Therefore when I call IBase::Deserialize, it's the compiled code in the current DLL being called, right? I wasn't appreciating the other scenarios. – parrowdice Aug 26 '16 at 01:06
@parrowdice So, it's tough to make a general statement, because there's countless scenarios, and of course I don't know exactly how your project is set up. Basically, *if you pass objects declared in headers that you can't guarantee will be the same across compilers, across DLL boundarie,* you have a problem. If you avoid this, you don't. If your `IBase` header is only used internally to your DLL (or all your DLLs that were compiled *in the exact same environment*), and you're passing that `shared_ptr` across *boundaries you have control over* then, yes, you're fine. But as soon as ... – Jason C Aug 26 '16 at 01:17
@parrowdice ... you, say, allow an arbitrary client, like some EXE or DLL that you or somebody else is compiling elsewhere, to gain access to one of those objects, you run into potential issues. Same deal if, say, you recompile one of your DLLs at a later time with a slightly different version of the stdlib and suddenly it's no longer compatible with the rest of your binaries. So more specifically we can say that "passing objects across DLL boundaries is a problem, unless you have *very* precise control over those boundaries and the environments". That's usually walking a thin line, though. – Jason C Aug 26 '16 at 01:20
One of the things you notice frequently in the Windows API is a lot of the structs you pass to DLLs have a member field for their version #, or their size. This actually solves a related problem. This lets e.g. Windows DLLs maintain backwards compatibility with applications that were compiled against previous versions of the headers -- the DLLs look at the size/version in the struct and if it's older they use the older struct. And if a DLL sees a version # / size that it's not familiar with, it can gracefully fail with e.g. "Application is not compatible" rather than randomly crashing. – Jason C Aug 26 '16 at 01:23
1

I think that in the way it's being used at the minute, there are no objects crossing DLL boundaries and we're all good. However, as you point out, it does allow the potential for objects to subtly start crossing boundaries in the future if/when the way it's used changes without full appreciation of ABIs. I think the safest thing to do is to revert to an IBase* and defer ownership responsibility to the client. It's a shame but it's the lesser of two evils. Thanks for you help Jason. I'd +1 more if I could. – parrowdice Aug 26 '16 at 01:32
@parrowdice Great. Then the short answer is, independently of all this DLL-boundary stuff, your use of a static deserializer in that manner is perfectly reasonable, and as long as you stay consistent to that design choice in your other classes too, you'll keep users of your DLL totally happy. So keep on doing what you're doing. It is perfectly good form. – Jason C Aug 26 '16 at 01:36

Adrian Colomitchi · Answer 2 · 2016-08-26T02:56:00.287

1

The common practice is:

class IBase
{
  ~IBase();
  virtual void Foo()=0;

  // this means "Dump your content into output"
  virtual obj_ostream& Serialize(obj_ostream& )=0;

  // This means "fill you content anew from input (destroy your old one if necesary)"
  virtual obj_istream& Deserialize(obj_istream& )=0;

  // or you may prefer the >>, << operators
};

Note the use of obj_stream classes - you'll need to derive your own even if you inherit from or wrap-around^** basic iostreams. This way you don't trip over l/r-shift operators others may want to write for the std::iostream.

Also, the use of streams answers the question "who is going to free the buffer transporting your data"

Now, addressing the "which of the two solution to choose" question. It's absolutely not important, as both of the solutions are absolutely equivalent and equivalent with this one:

class IBase
{
  ~IBase();
  virtual void Foo()=0;

  virtual const char* Serialize()=0;
};

std::unique_ptr<IBase> Deserialize(const char* data); // <-- This

The only difference between them is the context in which the "factory" function is declared: in your first, is a static method of IBase, your second... yeap... still a single (static) function declared by the IBaseDeserializer class, the above yet another declaration place: the same namespace your IBase class is.

Look... it's a static method, you won't be able to override it in any derived classes.
It will have a single non-polymorphic identity and functionality no matter where you declare it (quibbling over the differences is nitpicking in my opinion).
At the best, placing it outside any class saves you the pain of typing the typing the class scope when you call it.

^** I find it easier to use (composition) a wrapped stream rather than derive from iostreams (actually basic_ostream/basic_istream)

edited Aug 26 '16 at 02:56

answered Aug 26 '16 at 00:40

Adrian Colomitchi

3,884
1
12
23

1

This is common practice in general but the case is different across DLL boundaries. You'll be in for a world of hurt if `obj_istream` inherits basic iostreams, or if `obj_istream` has any iostream members and is exported from the DLL. – Jason C Aug 26 '16 at 00:42
@JasonC - you have 2 probls. One is getting the serialized form, the other is transporting the data across DLL boundaries. Treat them separately, because they _are_ separate problems.E.g. as a mechanism of transport, use memory-mapped files in the streambuf to be used by the obj_streams. – Adrian Colomitchi Aug 26 '16 at 00:51
Well, it's not the transporting of the serialized data that's the problem. It's that if `obj_istream` is or uses any iostreams stuff, any member functions / operators called in the `Deserialize` implementation in the DLL run the risk of breaking if the `obj_istream` passed across the boundary by the client isn't precisely the same implementation and layout as the `obj_istream` that was compiled into the DLL. DLL programming is *really* tricky, error-prone, and cumbersome, and also can be counter-intuitive, that's why all these crazy-at-first-glance things like COM and such have to exist. – Jason C Aug 26 '16 at 00:55
@JasonC " run the risk of breaking if the obj_istream passed across the boundary by the client isn't precisely the same implementation and layout as the obj_istream that was compiled into the DLL." Umm... what? Passing data between parties involves one producer (an ostream) and a consumer (an istream). As long as the layout of the serialized data transported between the two is coherently understood by the two parties (it's a protocol, isn't it?), who cares the producer/consumer are compiled with different name mangling conventions or whatever. – Adrian Colomitchi Aug 26 '16 at 01:15
1

I agree with Jason C. Google C++ ABI. Your two DLLs may disagree on what an ostream is if they are compiled differently. – parrowdice Aug 26 '16 at 01:17
@JasonC "that's why all these crazy-at-first-glance things like COM and such have to exist." COM does more than just data serialization and you don't need to go full (D)COM if you only need data transmission between two parties. Here, look at Boost:ASIO, they're doing over the net, between potentially different OS-es. They even have [named pipes](http://www.boost.org/doc/libs/1_61_0/doc/html/boost_asio/overview/windows/stream_handle.html) support. – Adrian Colomitchi Aug 26 '16 at 01:23
@AdrianColomitchi I didn't mean that COM solves serialization problems. I meant that COM was designed the way it was to solve DLL object boundary-crossing problems. – Jason C Aug 26 '16 at 01:27
@AdrianColomitchi And I'm not talking about name mangling. The DLL will call istream members, e.g. `operator >>`. That code will be compiled. The code for `operator >>` that is compiled will depend on the `istream` definition the DLL was compiled with. If a client was compiled with a different `istream` implementation, *then the problem isn't with the serialization, it's that the `istream` constructed by the client and passed to the DLL is not binary-compatible with the `istream` the DLL was compiled with.* So that `operator >>` code that worked great when you compiled the DLL now breaks. – Jason C Aug 26 '16 at 01:31
(And usually it breaks in a very evil, not-fun-to-debug way. It's always fun when things appear to work and then you trace a crash back to say, some random `std::exception` copy constructor somewhere. Until you run it in the debugger, and the problem goes away. *shudder*) – Jason C Aug 26 '16 at 01:33
I saw the other week a group of engineers stood around some crashing code, scratching their heads as to why sizeof(x) != sizeof(x). It can make you smile too :) – parrowdice Aug 26 '16 at 01:35
@JasonC What do you mean by "istream constructed by the client and passed to the DLL is not **binary-compatible** with the istream the DLL was compiled with"? In what sense "binary compatible"? If you mean "given the same input, produces data with the same content and layout", then of course there need to be a match - serialization and deserialization ends are bound by the protocol. But... maybe you mean something else by binary-compatibility? – Adrian Colomitchi Aug 26 '16 at 01:40
1

@AdrianColomitchi By binary compatibility I mean [binary compatibility](https://events.linuxfoundation.org/sites/events/files/slides/Binary_Compatibility_for_library_devs.pdf). C++ makes no guarantees on the std library ABI, and more often than not there are incompatibilities, it's not just one of those rare *might*-not-work things. E.g. if you compile a DLL against an `istream` header, then try to use an application compiled with a different version of that `istream` header that say, adds another member variable to the `istream` in front of the rest, and pass that `istream` across, *boom*. – Jason C Aug 26 '16 at 01:48
1

@AdrianColomitchi: [How do I safely pass objects, especially STL objects, to and from a DLL?](http://stackoverflow.com/a/22797419/1836044) – parrowdice Aug 26 '16 at 01:49
@JasonC: Oh no! It's like a right of passage to becoming a software engineer. – parrowdice Aug 26 '16 at 01:51
@JasonC & parrowdice What does ABI has to do with serialization? Suppose the two producer/receiver ends are written to format/parse data as ... shudders... JSON. How does it matter if the two implementations (in two different DLL-es, or one in main one in DLL) are ABI-compatible or not? – Adrian Colomitchi Aug 26 '16 at 01:59
@JasonC to make the things clear: I'm saying the i/ostream versions in the two DLL-es _never_ crosses the boundary of their binaries. Only the serialized data does (this is what I meant by "transport problem"). – Adrian Colomitchi Aug 26 '16 at 02:05
@AdrianColomitchi I think I may see where our miscommunication is now, and it may be my mistake. If `IBase` is the *only* thing that travels across, you are right, given your pure `virtual` interface. If the DLL exports any `IBase`-derivatives, which is where my mind was, it fails. Is that why we're having a conflict here? – Jason C Aug 26 '16 at 02:07
@JasonC "Stop talking about serialization already!" Your beef is not with serialization, it is with the use of shared_ptr? – Adrian Colomitchi Aug 26 '16 at 02:10
@AdrianColomitchi Yes, that is where my beef is (and also with `istream` being passed across boundaries). But now I realize you were assuming `IBase`-derivatives weren't being exported, and I was, so it may be moot -- so ignore that comment for now, because it appears you and I weren't on the same page to begin with, I think. – Jason C Aug 26 '16 at 02:11
1

Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/121880/discussion-between-adrian-colomitchi-and-jason-c). – Adrian Colomitchi Aug 26 '16 at 02:12

Static Deserialization Factory in C++ Interface

2 Answers2