I'm writing binary serialization library with schema less format. And I'm providing two APIs for the user (pseudo-code):
bytes Serialize(string)
string Deserialize(bytes)
I saw a lot of answers on the topic, which suggest using System.Text.Encoding
implementations to perform string
serialization/deserialization, e.g.: Encoding.ASCII.GetBytes(str);
and Encoding.ASCII.GetString(bytes);
Moreover, other implementations like protobuf-net use encodings to serialize strings.
I don't get why people suggest it, since we can directly copy underlying arrays, like in this answer, where author said it will fail for some cases.
So here's my implementation for for .NET Core 3.1:
static void Serialize(ReadOnlySpan<char> sourceString, Span<byte> destBytes) =>
MemoryMarshal.AsBytes(sourceString).CopyTo(destBytes);
static void Deserialize(ReadOnlySpan<byte> sourceBytes, Span<char> destString) =>
sourceBytes.CopyTo(MemoryMarshal.AsBytes(destString));
For earlier frameworks, Erans's code should do the same.
Question is: am I right that we don't need to know encoding to reversibly represent the string as bytes, or is there any System.String
which will be corrupted, after any number of the proposed Serialize/Deserialize
operations?