0

I'm writing binary serialization library with schema less format. And I'm providing two APIs for the user (pseudo-code):

bytes Serialize(string)
string Deserialize(bytes)

I saw a lot of answers on the topic, which suggest using System.Text.Encoding implementations to perform string serialization/deserialization, e.g.: Encoding.ASCII.GetBytes(str); and Encoding.ASCII.GetString(bytes);

Moreover, other implementations like protobuf-net use encodings to serialize strings.

I don't get why people suggest it, since we can directly copy underlying arrays, like in this answer, where author said it will fail for some cases.

So here's my implementation for for .NET Core 3.1:

static void Serialize(ReadOnlySpan<char> sourceString, Span<byte> destBytes) =>
    MemoryMarshal.AsBytes(sourceString).CopyTo(destBytes);

static void Deserialize(ReadOnlySpan<byte> sourceBytes, Span<char> destString) =>
    sourceBytes.CopyTo(MemoryMarshal.AsBytes(destString));

For earlier frameworks, Erans's code should do the same.

Question is: am I right that we don't need to know encoding to reversibly represent the string as bytes, or is there any System.String which will be corrupted, after any number of the proposed Serialize/Deserialize operations?

astef
  • 6,519
  • 3
  • 42
  • 78
  • Guys, I've found the answer to my question here: https://stackoverflow.com/a/10380166/1943849 – astef Sep 26 '20 at 13:09

0 Answers0