I have C++ code which I want to rewrite to C#. This part
case ID_TYPE_UNICODE_STRING :
if(items[i].GetUString().length() > 0xFFFF)
throw dppError("error");
//GetUstring returns std::wstring type object
DataSize = (WORD) (sizeof(WCHAR)*(items[i].GetUString().length()));
blob.AppendData((const BYTE *) &DataSize, sizeof(WORD)); //blob is byte array
//GetUstring returns std::wstring type object
blob.AppendData((const BYTE *) items[i].GetUString().c_str(), DataSize);
break ;
basically serializes length in bytes of unicode string and string itself to byte array.
Here comes my problem (this code then sends this data to server). I don't know which encoding is used in above lines of code(UTF16, UTF8, etc.). So I don't know what is the best way to reimplement it in C#. How can I guess what encoding is used in this C++ project?
And if I can't find encoding used in C++ project, given endianness is same as stated in accepted answer of this question, do you think the two methods (GetBytes and GetString) in accepted answer will work for me (for serializing the unicode string as in C++ project and retrieving it back)? e.g.
these two:
static byte[] GetBytes(string str)
{
byte[] bytes = new byte[str.Length * sizeof(char)];
System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
return bytes;
}
static string GetString(byte[] bytes)
{
char[] chars = new char[bytes.Length / sizeof(char)];
System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
return new string(chars);
}
Or I am better of to learn what is the encoding used in C++ project?
I will then need to reconstruct the string in the same way from byte array too. And if I am better of learning which encoding was used in C++, how do I get the length of the string in bytes in C#, using System.Text.ASCII.WhateverEncodingWasUsedinC++.GetByteCount(string);
??
PS. Do you think the C++ code is working in encoding agnostic way? If yes, how can I repeat that also in C#?
UPDATE: I am guessing the encoding used is UTF16 because I saw that being mentioned in several variables names, so I think I will assume UTF16 is used, and if something doesn't work out during testing, look for alternative solutions. In that case, what is the best way to get the number of bytes of the UTF16 string? Is following method OK: System.Text.ASCII.Unicode.GetByteCount(string);
??
feedback and comments welcome. Am I wrong somewhere in my reasoning? Thanks