We all know UTF-8 is hard. I exported my messages from Facebook and the resulting JSON file escaped all non-ascii characters to unicode code points.
I am looking for an easy way to unescape these unicode code points to regular old UTF-8. I also would love to use PowerShell.
I tried
$str = "\u00f0\u009f\u0091\u008d"
[Regex]::Replace($str, "\\[Uu]([0-9A-Fa-f]{4})", `
{[char]::ToString([Convert]::ToInt32($args[0].Groups[1].Value, 16))} )
but that only gives me ð as a result, not .
I also tried using Notepad++ and I found this SO post: How to convert escaped Unicode (e.g. \u0432\u0441\u0435
) to UTF-8 chars (все) in Notepad++. The accepted answer also results in exactly the same as the example above: ð.
I found the decoding solution here: the UTF8.js library that decodes the text perfectly and you can try it out here (with \u00f0\u009f\u0091\u008d
as input).
Is there a way in PowerShell to decode \u00f0\u009f\u0091\u008d
to receive ? I'd love to have real UTF-8 in my exported Facebook messages so I can actually read them.
Bonus points for helping me understand what \u00f0\u009f\u0091\u008d
actually represents (besides it being some UTF-8 hex representation). Why is it the same as U+1F44D
or \uD83D\uDC4D
in C++?