The documentation on RtlUnicodeStringToAnsiString is rather vague about its possible failures - by vague I mean it doesn't say anything about them.
I'm not exactly sure how/if it deals with different encodings, or if my understanding is so flawed that it doesn't even come into the equation, but let's assume input is UTF-16 for argument's sake.
If all the characters are within the ASCII range then there is no problem, they can just get truncated and lose the high order byte - The first 128 Unicode code points are the ASCII characters and UTF-16 encodes U+0000 to U+D7FF as numerically equal to the code points.[1][2]
Note: UNICODE_STRING has a WCHAR* Buffer, and ANSI_STRING a CHAR* Buffer, as may be expected.
[Skipping over 129-255 and locales/codepages]
What happens with characters above 255? There is an RtlUnicodeToUTF8N function so it's safe to assume it doesn't convert to UTF-8.
How about code points outside BMP (surrogate pairs and whatnot)?
I saw a function that does something like the code below:
char *pTarget = reinterpret_cast<char*>(char_str);
const WCHAR *pSource = reinterpret_cast<const WCHAR*>(wchar_str);
for ( long i = 0; i < targetMaxSizeInBytes; i++ )
{
*pTarget = static_cast<char>(*pSource);
if (L'\0' == *pSource)
break;
pTarget++;
pSource++;
}
This would cause problems with any non-ASCII characters, correct?
Update:
From RbMm's answer:
RtlUnicodeStringToAnsiString is shell over RtlUnicodeToMultiByteN routine
I get a little more information:
Like RtlUnicodeToMultiByteSize, RtlUnicodeToMultiByteN supports only precomposed Unicode characters that are mapped to the current system ANSI code page installed at system boot.
WideCharToMultiByte has an option to be notified if a default character is used in the conversion for a character that cannot be represented in the specified code page:
lpUsedDefaultChar [out, optional]
Pointer to a flag that indicates if the function has used a default character in the conversion. The flag is set to TRUE if one or more characters in the source string cannot be represented in the specified code page. Otherwise, the flag is set to FALSE. This parameter can be set to NULL.
However, it seems RtlUnicodeToMultiByteN, and therefore RtlUnicodeStringToAnsiString, simply don't support characters outside the current code page?
I tried a few characters and got seemingly random conversions (see below) - more importantly, I got STATUS_SUCCESS returned.
U+03A3 Σ -> 0n83 'S'
U+03A4 Τ -> 0n63 '?'
U+03A5 Υ -> 0n63 '?'
U+03A6 Φ -> 0n70 'F'