0

What is the best way to check if string contains specified Unicode character? My problem is I cannot parse string/characters to format \u[byte][byte][byte][byte]. I followed many tutorials and threads here on StackOverflow, but when I have method like this one:

private bool ContainsInvalidCharacters(string name)
{
    if (translation.Any(c => c > 255))
    {
        byte[] bytes = new byte[name.Length];
        Buffer.BlockCopy(name.ToCharArray(), 0, bytes, 0, bytes.Length);
        string decoded = Encoding.UTF8.GetString(bytes, 0, name.Length);
        (decoded.Contains("\u0001"))
        {
            //do something
        }
}

I get output like: "c\0o\0n\0t\0i\0n\0g\0u\0t\0".

This really is not my cup of tea. I will be grateful for any help.

Qerts
  • 885
  • 1
  • 12
  • 27
  • 2
    It isn't exactly clear what you are trying to do here... Try writing step-by-step what you want to do... – xanatos Feb 18 '16 at 11:44
  • 1
    You have a string... If it has some > 255 characters, you consider it to be badly decoded, so you copy half of it to a `byte[]` (half of it because a char is 2 bytes). Then you decode it as UTF8... Then? – xanatos Feb 18 '16 at 11:46
  • @xanatos well, what I am trying to do is to detect if given string contains specific unicode character by using its excaped form. In first step I followed http://stackoverflow.com/questions/4459571/how-to-recognize-if-a-string-contains-unicode-chars , I next step I followed http://stackoverflow.com/questions/472906/converting-a-string-to-byte-array-without-using-an-encoding-byte-by-byte . But now I see it was not fortunate approach. – Qerts Feb 18 '16 at 11:59

3 Answers3

4

If I were to picture a rage of Unicode characters that would be my bet:

ლ(~•̀︿•́~)つ︻̷┻̿═━一

So to answer your question, that is to check string for such rage you could simply:

private bool ContainsInvalidCharacters(string name)
{
    return name.IndexOf("ლ(~•̀︿•́~)つ︻̷┻̿═━一") != -1;
}

;)

Kuba Wyrostek
  • 5,913
  • 1
  • 18
  • 39
2

Is this what you want?

public static bool ContainsInvalidCharacters(string name)
{
    return name.IndexOfAny(new[] 
    {
        '\u0001', '\u0002', '\u0003', 
    }) != -1;
}

and

bool res = ContainsInvalidCharacters("Hello\u0001");

Note the use of '\uXXXX': the ' denote a char instead of a string.

xanatos
  • 102,557
  • 10
  • 176
  • 249
0

Check this also

    /// <summary>
    /// Check invalid character based on the pattern
    /// </summary>
    /// <param name="text">The string</param>
    /// <returns></returns>
    public static string IsInvalidCharacters(this string text)
    {
        string pattern = @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
        var match = Regex.Match(text, pattern, "");
        return match.Sucess;
    }   
Eldho
  • 6,426
  • 3
  • 38
  • 66