1290

I made a comment yesterday on an answer where someone had used [0123456789] in a regex rather than [0-9] or \d. I said it was probably more efficient to use a range or digit specifier than a character set.

I decided to test that out today and found out to my surprise that (in the c# regex engine at least) \d appears to be less efficient than either of the other two which don't seem to differ much. Here is my test output over 10000 random strings of 1000 random characters with 5077 actually containing a digit:

Regex \d           took 00:00:00.2141226 result: 5077/10000
Regex [0-9]        took 00:00:00.1357972 result: 5077/10000  63.42 % of first
Regex [0123456789] took 00:00:00.1388997 result: 5077/10000  64.87 % of first

It's a surprise to me for two reasons, that I would be interested if anyone can shed some light on:

  1. I would have thought the range would be implemented much more efficiently than the set.
  2. I can't understand why \d is worse than [0-9]. Is there more to \d than simply shorthand for [0-9]?

Here is the test code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Text.RegularExpressions;

namespace SO_RegexPerformance
{
    class Program
    {
        static void Main(string[] args)
        {
            var rand = new Random(1234);
            var strings = new List<string>();
            //10K random strings
            for (var i = 0; i < 10000; i++)
            {
                //generate random string
                var sb = new StringBuilder();
                for (var c = 0; c < 1000; c++)
                {
                    //add a-z randomly
                    sb.Append((char)('a' + rand.Next(26)));
                }
                //in roughly 50% of them, put a digit
                if (rand.Next(2) == 0)
                {
                    //replace 1 char with a digit 0-9
                    sb[rand.Next(sb.Length)] = (char)('0' + rand.Next(10));
                }
                strings.Add(sb.ToString());
            }

            var baseTime = testPerfomance(strings, @"\d");
            Console.WriteLine();
            var testTime = testPerfomance(strings, "[0-9]");
            Console.WriteLine("  {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
            testTime = testPerfomance(strings, "[0123456789]");
            Console.WriteLine("  {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
        }

        private static TimeSpan testPerfomance(List<string> strings, string regex)
        {
            var sw = new Stopwatch();

            int successes = 0;

            var rex = new Regex(regex);

            sw.Start();
            foreach (var str in strings)
            {
                if (rex.Match(str).Success)
                {
                    successes++;
                }
            }
            sw.Stop();

            Console.Write("Regex {0,-12} took {1} result: {2}/{3}", regex, sw.Elapsed, successes, strings.Count);

            return sw.Elapsed;
        }
    }
}
weston
  • 51,132
  • 20
  • 132
  • 192
  • 186
    Maybe `\d` deals with locales. E.g. Hebrew uses letters for digits. – Barmar May 18 '13 at 07:20
  • 1
    Basically, when you have to deal with Unicode, then it is going to be much slower (since it has to do more checks). – nhahtdh May 18 '13 at 08:14
  • 6
    related: http://stackoverflow.com/a/6479605/674039 – wim May 18 '13 at 15:04
  • 39
    This is an interesting question precisely because `\d` does not mean the same thing in different languages. In Java, for example [`\d` does indeed match 0-9 only](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) – Ray Toal May 18 '13 at 17:59
  • 18
    @Barmar Hebrew does not use letters for digits normally, rather the same latin numeral digits [0-9]. Letters can be substituted for digits, but this is a rare use and reserved for special terms. I would not expect a regex parser to match [כ"ג יורדי סירה](http://he.wikipedia.org/wiki/%D7%9B%22%D7%92_%D7%99%D7%95%D7%A8%D7%93%D7%99_%D7%94%D7%A1%D7%99%D7%A8%D7%94) (with כ"ג being a substitue for 23). Also, as can be seen in Sina Iravanian's answer, Hebrew letters do not appear as valid matches for \d. – Yuval Adam May 20 '13 at 09:20
  • 1
    It's **not** in JavaScript, FYI: http://jsperf.com/d-and-09-in-regex – Afshin Mehrabani May 20 '13 at 11:10
  • 1
    In case anyone was wondering, this strangely seems to apply to Java as well, though to a lesser degree. [0123456789] being ~4% faster than \d, on a 6mb file with a bunch of random garbage, patterns precompiled, thousands of iterations. Mean duration for 0123456789: 466.46ms (stDev: 19.78). And \d: mean: 484.35ms (stDev: 25.98). – Nim May 20 '13 at 15:52
  • 7
    Porting weston's code to Java yields: -- Regex \d took 00:00:00.043922 result: 4912/10000 -- Regex [0-9] took 00:00:00.073658 result: 4912/10000 167% of first -- Regex [0123456789] took 00:00:00.085799 result: 4912/10000 195% of first – Lunchbox May 22 '13 at 16:35
  • @Lunchbox thanks, that's the order I was expecting it to be in. – weston May 22 '13 at 17:56
  • 1
    Premature optimization is the root of all evil. – Isaac Rabinovitch May 23 '13 at 18:45
  • @IsaacRabinovitch Just so you know your edit was rejected because tags should not appear in the questions and the question is already tagged as c# – weston May 23 '13 at 19:52
  • 1
    @IsaacRabinovitch and in reply to your quote, that's just true 97% of the time! – weston May 23 '13 at 19:56
  • 1
    \d required less button presses, thus it is better :D – David says Reinstate Monica Jun 05 '13 at 20:05
  • @lunchbox the warm up period is not completed - the numbers are not reliable yet. – Thorbjørn Ravn Andersen Jun 03 '14 at 16:20

5 Answers5

1601

\d checks all Unicode digits, while [0-9] is limited to these 10 characters. For example, Persian digits, ۱۲۳۴۵۶۷۸۹, are an example of Unicode digits which are matched with \d, but not [0-9].

You can generate a list of all such characters using the following code:

var sb = new StringBuilder();
for(UInt16 i = 0; i < UInt16.MaxValue; i++)
{
    string str = Convert.ToChar(i).ToString();
    if (Regex.IsMatch(str, @"\d"))
        sb.Append(str);
}
Console.WriteLine(sb.ToString());

Which generates:

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789

dakab
  • 4,576
  • 8
  • 38
  • 56
Sina Iravanian
  • 15,001
  • 4
  • 28
  • 44
  • 126
    Here is a more complete list of digits that aren't 0-9: http://www.fileformat.info/info/unicode/category/Nd/list.htm – Robert McKee May 18 '13 at 07:29
  • 2
    Cool, though should be `UInt16`? Also that link from Robert shows characters above `\uFFFF` which I am surprised about, I thought it was just 16 bit. So your code wont find these, e.g. \u104A0. – weston May 18 '13 at 07:47
  • 8
    @weston Unicode has 17 planes with 16 bits each. Most important characters are in the basic plane, but some special characters, mostly Chinese, are in the supplemental planes. Dealing with those in C# is a bit annoying. – CodesInChaos May 18 '13 at 07:55
  • @CodesInChaos: To be precise, plane 2 is for ideograph (rare characters), plane 1 contains quite a number of symbols and ancient scripts. – nhahtdh May 18 '13 at 08:18
  • 1
    Correct. The full unicode character set is actually 32-bit (UTF32), but there are many ways of encoding it so that it can be represented with 16-bit (UTF16) or 8-bit (UTF8), by reserving one or more entries to shift parts of the set in and out. UTF16 and UTF8 will sometimes take multiple characters to represent a single unicode character, which can make processing much more difficult. – Robert McKee May 18 '13 at 08:45
  • 10
    @RobertMcKee: Nitpick: The full unicode character set is actually 21 bit (17 planes of 16 bit each). But of course a 21-bit-datatype is impractical, so if you use a power-of-2 datatype, it's true that you need 32 bit. – sleske May 18 '13 at 21:32
  • @sleske There will inevitably come a day when 10FFFF characters is not enough. UTF-8 and UCS-32 will survive that day (it's just a matter of turning off the rejection of wider characters), UTF-16 won't. – zwol May 18 '13 at 21:38
  • @sleske You are correct, for now. They keep expanding the unicode glyphs, so I can definitely see a point in time where 21 bits aren't enough (although there is a lot of unused/undefined space in there). I find it easier to just think of unicode requiring, or will require some day all 32-bits. – Robert McKee May 19 '13 at 01:14
  • What font can display all of these characters? Even Arial Unicode MS is missing a lot like Myanmar, Sudanese, Tai Tham and other characters. – Samuel Neff May 20 '13 at 00:50
  • 3
    According to [this Wikipedia article](http://en.wikipedia.org/wiki/Plane_(Unicode)), the Unicode Consortium has stated that the limit of 1,114,112 code points (0 to 0x010FFFF) will never be changed. It links to unicode.org, but I didn't find the statement there (I probably just missed it). – Keith Thompson May 20 '13 at 02:50
  • 17
    It'll never be changed -- until they need to change it. – Robert McKee Jul 08 '13 at 21:00
  • I've added an answer to address the code Point issue: http://stackoverflow.com/a/18781614/281306 – Sebastian Jan 10 '14 at 22:08
  • 2
    This answer has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under "Character Classes". – aliteralmind Apr 10 '14 at 00:19
  • checking up to `UInt16.MaxValue` is not enough since Unicode codepoints can go up to 2^21-1 – phuclv Jun 22 '14 at 10:17
  • You can't cast a 32-bit value (such as 0x010FFFF) to char, because .NET char type stores UTF-16 characters, not Unicode characters. (Characters outside the first Unicode plane are stored as two UTF-16 characters, using surrogate pairs.) – Mike Rosoft Jan 31 '19 at 12:16
276

Credit to ByteBlast for noticing this in the docs. Just changing the regex constructor:

var rex = new Regex(regex, RegexOptions.ECMAScript);

Gives new timings:

Regex \d           took 00:00:00.1355787 result: 5077/10000
Regex [0-9]        took 00:00:00.1360403 result: 5077/10000  100.34 % of first
Regex [0123456789] took 00:00:00.1362112 result: 5077/10000  100.47 % of first
weston
  • 51,132
  • 20
  • 132
  • 192
  • 11
    What does the `RegexOptions.ECMAScript` do? – laurent May 20 '13 at 01:36
  • 7
    From [Regular Expression Options](http://msdn.microsoft.com/en-us/library/yd1hzczs.aspx): "Enable ECMAScript-compliant behavior for the expression." – chrisaycock May 20 '13 at 01:58
  • 28
    @0xFE: Not quite. Unicode escapes are still valid in `ECMAScript` (`\u1234`). It's "just" the shorthand character classes that change meaning (like `\d`) and the Unicode property/script shorthands that go away (like `\p{N}`). – Tim Pietzcker May 20 '13 at 09:51
  • 12
    This is not an answer to the "why" part. It is a "fix the symptoms" answer. Still valuable information. – usr May 29 '13 at 16:52
  • 3
    Generally, Regrex support unicode matching. But ECMAScript does not. Hence, when using the RegexOptions.ECMAScript, it only match the ascii, i.e., 0-9. – lzlstyle Oct 16 '13 at 15:33
122

From Does “\d” in regex mean a digit?:

[0-9] isn't equivalent to \d. [0-9] matches only 0123456789 characters, while \d matches [0-9] and other digit characters, for example Eastern Arabic numerals ٠١٢٣٤٥٦٧٨٩

Community
  • 1
  • 1
İsmet Alkan
  • 5,141
  • 3
  • 38
  • 64
  • 50
    According to: http://msdn.microsoft.com/en-us/library/20bw873z.aspx `If ECMAScript-compliant behavior is specified, \d is equivalent to [0-9]. ` – User 12345678 May 18 '13 at 07:30
  • 2
    huh, am i wrong or this sentence from the link is telling the opposite. "\d matches any decimal digit. It is equivalent to the \p{Nd} regular expression pattern, which includes the standard decimal digits 0-9 as well as the decimal digits of a number of other character sets." – İsmet Alkan May 18 '13 at 07:51
  • 4
    @ByteBlast thanks, using the constructor: `var rex = new Regex(regex, RegexOptions.ECMAScript);` makes them all pretty much indistinguishable in performance terms. – weston May 18 '13 at 07:53
  • 2
    oh anyway, thanks everyone. this question turned out to be a great learning for me. – İsmet Alkan May 18 '13 at 07:54
  • @weston: I think it would be nice if you post the new timing in form of an answer (or edit to your question, but I think it can be an answer). – nhahtdh May 18 '13 at 08:20
  • 3
    Please don't "just copy" answers from other questions. If the question is a duplicate, flag it as such. – BoltClock May 18 '13 at 12:00
  • 1
    Additionally, you may want to quote the original text instead of just copying it, as is usually done when using content you haven't written yourself. – slhck May 18 '13 at 12:02
  • 1
    question isn't duplicate but the answer is fitting. stated that i'm copying, but now i see that's not the proper structure to do it. thanks for edit and help. – İsmet Alkan May 18 '13 at 12:20
20

An addition to top answer from Sina Iravianian, here is a .NET 4.5 version (since only that version supports UTF16 output, c.f. the first three lines) of his code, using the full range of Unicode code points. Due to the lack of proper support for higher Unicode planes, many people are not aware of always checking for and including the upper Unicode planes. Nevertheless they sometimes do contain some important characters.

Update

Since \d does not support non-BMP characters in regex (thanks xanatos), here a version that uses the Unicode character database

Update 2

Thanks to damilola-adegunwa, I have added the missing reference to the UCD (via NuGet package UnicodeInformation). Also udpated to the latest .NET core version and UTF-8 output.

// reference https://www.nuget.org/packages/UnicodeInformation/
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;
using System.Unicode;
                    
public class Program
{
    public static void Main()
    {
        var unicodeEncoding = new UTF8Encoding(false);
        Console.OutputEncoding = unicodeEncoding;

        var numberCategories = new HashSet<UnicodeCategory>(new []{
            UnicodeCategory.DecimalDigitNumber,
            UnicodeCategory.LetterNumber,
            UnicodeCategory.OtherNumber
        });
        var numberLikeChars =
            from codePoint in Enumerable.Range(0, 0x10ffff)
            where codePoint > UInt16.MaxValue 
                || (!char.IsLowSurrogate((char) codePoint) && !char.IsHighSurrogate((char) codePoint))
            let charInfo = UnicodeInfo.GetCharInfo(codePoint)
            where numberCategories.Contains(charInfo.Category)
            let codePointString = char.ConvertFromUtf32(codePoint)
            select (codePoint, charInfo, codePointString);

        foreach (var (codePoint, charInfo, codePointString) in numberLikeChars)
        {
            Console.Write("U+{0} ", codePoint.ToString("X6"));
            Console.Write(" {0,-4}", codePointString);
            Console.Write(" {0,-40}", charInfo.Name ?? charInfo.OldName);
            Console.Write(" {0,-6}", CharUnicodeInfo.GetNumericValue(codePointString, 0));
            Console.Write(" {0,-6}", CharUnicodeInfo.GetDigitValue(codePointString, 0));
            Console.Write(" {0,-6}", CharUnicodeInfo.GetDecimalDigitValue(codePointString, 0));
            Console.WriteLine(" {0}", charInfo.Category);
        }
    }
}

Yielding the following output:

U+000030  0    DIGIT ZERO                               0      0      0      DecimalDigitNumber
U+000031  1    DIGIT ONE                                1      1      1      DecimalDigitNumber
U+000032  2    DIGIT TWO                                2      2      2      DecimalDigitNumber
U+000033  3    DIGIT THREE                              3      3      3      DecimalDigitNumber
U+000034  4    DIGIT FOUR                               4      4      4      DecimalDigitNumber
U+000035  5    DIGIT FIVE                               5      5      5      DecimalDigitNumber
U+000036  6    DIGIT SIX                                6      6      6      DecimalDigitNumber
U+000037  7    DIGIT SEVEN                              7      7      7      DecimalDigitNumber
U+000038  8    DIGIT EIGHT                              8      8      8      DecimalDigitNumber
U+000039  9    DIGIT NINE                               9      9      9      DecimalDigitNumber
U+0000B2  ²    SUPERSCRIPT TWO                          2      2      -1     OtherNumber
U+0000B3  ³    SUPERSCRIPT THREE                        3      3      -1     OtherNumber
U+0000B9  ¹    SUPERSCRIPT ONE                          1      1      -1     OtherNumber
U+0000BC  ¼    VULGAR FRACTION ONE QUARTER              0.25   -1     -1     OtherNumber
U+0000BD  ½    VULGAR FRACTION ONE HALF                 0.5    -1     -1     OtherNumber
U+0000BE  ¾    VULGAR FRACTION THREE QUARTERS           0.75   -1     -1     OtherNumber
U+000660  ٠    ARABIC-INDIC DIGIT ZERO                  0      0      0      DecimalDigitNumber
U+000661  ١    ARABIC-INDIC DIGIT ONE                   1      1      1      DecimalDigitNumber
U+000662  ٢    ARABIC-INDIC DIGIT TWO                   2      2      2      DecimalDigitNumber
U+000663  ٣    ARABIC-INDIC DIGIT THREE                 3      3      3      DecimalDigitNumber
U+000664  ٤    ARABIC-INDIC DIGIT FOUR                  4      4      4      DecimalDigitNumber
U+000665  ٥    ARABIC-INDIC DIGIT FIVE                  5      5      5      DecimalDigitNumber
U+000666  ٦    ARABIC-INDIC DIGIT SIX                   6      6      6      DecimalDigitNumber
U+000667  ٧    ARABIC-INDIC DIGIT SEVEN                 7      7      7      DecimalDigitNumber
U+000668  ٨    ARABIC-INDIC DIGIT EIGHT                 8      8      8      DecimalDigitNumber
U+000669  ٩    ARABIC-INDIC DIGIT NINE                  9      9      9      DecimalDigitNumber
U+0006F0  ۰    EXTENDED ARABIC-INDIC DIGIT ZERO         0      0      0      DecimalDigitNumber
U+0006F1  ۱    EXTENDED ARABIC-INDIC DIGIT ONE          1      1      1      DecimalDigitNumber
U+0006F2  ۲    EXTENDED ARABIC-INDIC DIGIT TWO          2      2      2      DecimalDigitNumber
U+0006F3  ۳    EXTENDED ARABIC-INDIC DIGIT THREE        3      3      3      DecimalDigitNumber
U+0006F4  ۴    EXTENDED ARABIC-INDIC DIGIT FOUR         4      4      4      DecimalDigitNumber
U+0006F5  ۵    EXTENDED ARABIC-INDIC DIGIT FIVE         5      5      5      DecimalDigitNumber
U+0006F6  ۶    EXTENDED ARABIC-INDIC DIGIT SIX          6      6      6      DecimalDigitNumber
U+0006F7  ۷    EXTENDED ARABIC-INDIC DIGIT SEVEN        7      7      7      DecimalDigitNumber
U+0006F8  ۸    EXTENDED ARABIC-INDIC DIGIT EIGHT        8      8      8      DecimalDigitNumber
U+0006F9  ۹    EXTENDED ARABIC-INDIC DIGIT NINE         9      9      9      DecimalDigitNumber
U+0007C0  ߀    NKO DIGIT ZERO                           0      0      0      DecimalDigitNumber
U+0007C1  ߁    NKO DIGIT ONE                            1      1      1      DecimalDigitNumber
U+0007C2  ߂    NKO DIGIT TWO                            2      2      2      DecimalDigitNumber
U+0007C3  ߃    NKO DIGIT THREE                          3      3      3      DecimalDigitNumber
U+0007C4  ߄    NKO DIGIT FOUR                           4      4      4      DecimalDigitNumber
U+0007C5  ߅    NKO DIGIT FIVE                           5      5      5      DecimalDigitNumber
U+0007C6  ߆    NKO DIGIT SIX                            6      6      6      DecimalDigitNumber
U+0007C7  ߇    NKO DIGIT SEVEN                          7      7      7      DecimalDigitNumber
U+0007C8  ߈    NKO DIGIT EIGHT                          8      8      8      DecimalDigitNumber
U+0007C9  ߉    NKO DIGIT NINE                           9      9      9      DecimalDigitNumber
U+000966  ०    DEVANAGARI DIGIT ZERO                    0      0      0      DecimalDigitNumber
U+000967  १    DEVANAGARI DIGIT ONE                     1      1      1      DecimalDigitNumber
U+000968  २    DEVANAGARI DIGIT TWO                     2      2      2      DecimalDigitNumber
U+000969  ३    DEVANAGARI DIGIT THREE                   3      3      3      DecimalDigitNumber
U+00096A  ४    DEVANAGARI DIGIT FOUR                    4      4      4      DecimalDigitNumber
U+00096B  ५    DEVANAGARI DIGIT FIVE                    5      5      5      DecimalDigitNumber
U+00096C  ६    DEVANAGARI DIGIT SIX                     6      6      6      DecimalDigitNumber
U+00096D  ७    DEVANAGARI DIGIT SEVEN                   7      7      7      DecimalDigitNumber
U+00096E  ८    DEVANAGARI DIGIT EIGHT                   8      8      8      DecimalDigitNumber
U+00096F  ९    DEVANAGARI DIGIT NINE                    9      9      9      DecimalDigitNumber
U+0009E6  ০    BENGALI DIGIT ZERO                       0      0      0      DecimalDigitNumber
U+0009E7  ১    BENGALI DIGIT ONE                        1      1      1      DecimalDigitNumber
U+0009E8  ২    BENGALI DIGIT TWO                        2      2      2      DecimalDigitNumber
U+0009E9  ৩    BENGALI DIGIT THREE                      3      3      3      DecimalDigitNumber
U+0009EA  ৪    BENGALI DIGIT FOUR                       4      4      4      DecimalDigitNumber
U+0009EB  ৫    BENGALI DIGIT FIVE                       5      5      5      DecimalDigitNumber
U+0009EC  ৬    BENGALI DIGIT SIX                        6      6      6      DecimalDigitNumber
U+0009ED  ৭    BENGALI DIGIT SEVEN                      7      7      7      DecimalDigitNumber
U+0009EE  ৮    BENGALI DIGIT EIGHT                      8      8      8      DecimalDigitNumber
U+0009EF  ৯    BENGALI DIGIT NINE                       9      9      9      DecimalDigitNumber
U+0009F4  ৴    BENGALI CURRENCY NUMERATOR ONE           0.0625 -1     -1     OtherNumber
U+0009F5  ৵    BENGALI CURRENCY NUMERATOR TWO           0.125  -1     -1     OtherNumber
U+0009F6  ৶    BENGALI CURRENCY NUMERATOR THREE         0.1875 -1     -1     OtherNumber
U+0009F7  ৷    BENGALI CURRENCY NUMERATOR FOUR          0.25   -1     -1     OtherNumber
U+0009F8  ৸    BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR 0.75   -1     -1     OtherNumber
U+0009F9  ৹    BENGALI CURRENCY DENOMINATOR SIXTEEN     16     -1     -1     OtherNumber
U+000A66  ੦    GURMUKHI DIGIT ZERO                      0      0      0      DecimalDigitNumber
U+000A67  ੧    GURMUKHI DIGIT ONE                       1      1      1      DecimalDigitNumber
U+000A68  ੨    GURMUKHI DIGIT TWO                       2      2      2      DecimalDigitNumber
U+000A69  ੩    GURMUKHI DIGIT THREE                     3      3      3      DecimalDigitNumber
U+000A6A  ੪    GURMUKHI DIGIT FOUR                      4      4      4      DecimalDigitNumber
U+000A6B  ੫    GURMUKHI DIGIT FIVE                      5      5      5      DecimalDigitNumber
U+000A6C  ੬    GURMUKHI DIGIT SIX                       6      6      6      DecimalDigitNumber
U+000A6D  ੭    GURMUKHI DIGIT SEVEN                     7      7      7      DecimalDigitNumber
U+000A6E  ੮    GURMUKHI DIGIT EIGHT                     8      8      8      DecimalDigitNumber
U+000A6F  ੯    GURMUKHI DIGIT NINE                      9      9      9      DecimalDigitNumber
U+000AE6  ૦    GUJARATI DIGIT ZERO                      0      0      0      DecimalDigitNumber
U+000AE7  ૧    GUJARATI DIGIT ONE                       1      1      1      DecimalDigitNumber
U+000AE8  ૨    GUJARATI DIGIT TWO                       2      2      2      DecimalDigitNumber
U+000AE9  ૩    GUJARATI DIGIT THREE                     3      3      3      DecimalDigitNumber
U+000AEA  ૪    GUJARATI DIGIT FOUR                      4      4      4      DecimalDigitNumber
U+000AEB  ૫    GUJARATI DIGIT FIVE                      5      5      5      DecimalDigitNumber
U+000AEC  ૬    GUJARATI DIGIT SIX                       6      6      6      DecimalDigitNumber
U+000AED  ૭    GUJARATI DIGIT SEVEN                     7      7      7      DecimalDigitNumber
U+000AEE  ૮    GUJARATI DIGIT EIGHT                     8      8      8      DecimalDigitNumber
U+000AEF  ૯    GUJARATI DIGIT NINE                      9      9      9      DecimalDigitNumber
U+000B66  ୦    ORIYA DIGIT ZERO                         0      0      0      DecimalDigitNumber
U+000B67  ୧    ORIYA DIGIT ONE                          1      1      1      DecimalDigitNumber
U+000B68  ୨    ORIYA DIGIT TWO                          2      2      2      DecimalDigitNumber
U+000B69  ୩    ORIYA DIGIT THREE                        3      3      3      DecimalDigitNumber
U+000B6A  ୪    ORIYA DIGIT FOUR                         4      4      4      DecimalDigitNumber
U+000B6B  ୫    ORIYA DIGIT FIVE                         5      5      5      DecimalDigitNumber
U+000B6C  ୬    ORIYA DIGIT SIX                          6      6      6      DecimalDigitNumber
U+000B6D  ୭    ORIYA DIGIT SEVEN                        7      7      7      DecimalDigitNumber
U+000B6E  ୮    ORIYA DIGIT EIGHT                        8      8      8      DecimalDigitNumber
U+000B6F  ୯    ORIYA DIGIT NINE                         9      9      9      DecimalDigitNumber
U+000B72  ୲    ORIYA FRACTION ONE QUARTER               0.25   -1     -1     OtherNumber
U+000B73  ୳    ORIYA FRACTION ONE HALF                  0.5    -1     -1     OtherNumber
U+000B74  ୴    ORIYA FRACTION THREE QUARTERS            0.75   -1     -1     OtherNumber
U+000B75  ୵    ORIYA FRACTION ONE SIXTEENTH             0.0625 -1     -1     OtherNumber
U+000B76  ୶    ORIYA FRACTION ONE EIGHTH                0.125  -1     -1     OtherNumber
U+000B77  ୷    ORIYA FRACTION THREE SIXTEENTHS          0.1875 -1     -1     OtherNumber
U+000BE6  ௦    TAMIL DIGIT ZERO                         0      0      0      DecimalDigitNumber
U+000BE7  ௧    TAMIL DIGIT ONE                          1      1      1      DecimalDigitNumber
U+000BE8  ௨    TAMIL DIGIT TWO                          2      2      2      DecimalDigitNumber
U+000BE9  ௩    TAMIL DIGIT THREE                        3      3      3      DecimalDigitNumber
U+000BEA  ௪    TAMIL DIGIT FOUR                         4      4      4      DecimalDigitNumber
U+000BEB  ௫    TAMIL DIGIT FIVE                         5      5      5      DecimalDigitNumber
U+000BEC  ௬    TAMIL DIGIT SIX                          6      6      6      DecimalDigitNumber
U+000BED  ௭    TAMIL DIGIT SEVEN                        7      7      7      DecimalDigitNumber
U+000BEE  ௮    TAMIL DIGIT EIGHT                        8      8      8      DecimalDigitNumber
U+000BEF  ௯    TAMIL DIGIT NINE                         9      9      9      DecimalDigitNumber
U+000BF0  ௰    TAMIL NUMBER TEN                         10     -1     -1     OtherNumber
U+000BF1  ௱    TAMIL NUMBER ONE HUNDRED                 100    -1     -1     OtherNumber
U+000BF2  ௲    TAMIL NUMBER ONE THOUSAND                1000   -1     -1     OtherNumber
U+000C66  ౦    TELUGU DIGIT ZERO                        0      0      0      DecimalDigitNumber
U+000C67  ౧    TELUGU DIGIT ONE                         1      1      1      DecimalDigitNumber
U+000C68  ౨    TELUGU DIGIT TWO                         2      2      2      DecimalDigitNumber
U+000C69  ౩    TELUGU DIGIT THREE                       3      3      3      DecimalDigitNumber
U+000C6A  ౪    TELUGU DIGIT FOUR                        4      4      4      DecimalDigitNumber
U+000C6B  ౫    TELUGU DIGIT FIVE                        5      5      5      DecimalDigitNumber
U+000C6C  ౬    TELUGU DIGIT SIX                         6      6      6      DecimalDigitNumber
U+000C6D  ౭    TELUGU DIGIT SEVEN                       7      7      7      DecimalDigitNumber
U+000C6E  ౮    TELUGU DIGIT EIGHT                       8      8      8      DecimalDigitNumber
U+000C6F  ౯    TELUGU DIGIT NINE                        9      9      9      DecimalDigitNumber
U+000C78  ౸    TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF FOUR 0      -1     -1     OtherNumber
U+000C79  ౹    TELUGU FRACTION DIGIT ONE FOR ODD POWERS OF FOUR 1      -1     -1     OtherNumber
U+000C7A  ౺    TELUGU FRACTION DIGIT TWO FOR ODD POWERS OF FOUR 2      -1     -1     OtherNumber
U+000C7B  ౻    TELUGU FRACTION DIGIT THREE FOR ODD POWERS OF FOUR 3      -1     -1     OtherNumber
U+000C7C  ౼    TELUGU FRACTION DIGIT ONE FOR EVEN POWERS OF FOUR 1      -1     -1     OtherNumber
U+000C7D  ౽    TELUGU FRACTION DIGIT TWO FOR EVEN POWERS OF FOUR 2      -1     -1     OtherNumber
U+000C7E  ౾    TELUGU FRACTION DIGIT THREE FOR EVEN POWERS OF FOUR 3      -1     -1     OtherNumber
U+000CE6  ೦    KANNADA DIGIT ZERO                       0      0      0      DecimalDigitNumber
U+000CE7  ೧    KANNADA DIGIT ONE                        1      1      1      DecimalDigitNumber
U+000CE8  ೨    KANNADA DIGIT TWO                        2      2      2      DecimalDigitNumber
U+000CE9  ೩    KANNADA DIGIT THREE                      3      3      3      DecimalDigitNumber
U+000CEA  ೪    KANNADA DIGIT FOUR                       4      4      4      DecimalDigitNumber
U+000CEB  ೫    KANNADA DIGIT FIVE                       5      5      5      DecimalDigitNumber
U+000CEC  ೬    KANNADA DIGIT SIX                        6      6      6      DecimalDigitNumber
U+000CED  ೭    KANNADA DIGIT SEVEN                      7      7      7      DecimalDigitNumber
U+000CEE  ೮    KANNADA DIGIT EIGHT                      8      8      8      DecimalDigitNumber
U+000CEF  ೯    KANNADA DIGIT NINE                       9      9      9      DecimalDigitNumber
U+000D58  ൘    MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH 0.00625 -1     -1     OtherNumber
U+000D59  ൙    MALAYALAM FRACTION ONE FORTIETH          0.025  -1     -1     OtherNumber
U+000D5A  ൚    MALAYALAM FRACTION THREE EIGHTIETHS      0.0375 -1     -1     OtherNumber
U+000D5B  ൛    MALAYALAM FRACTION ONE TWENTIETH         0.05   -1     -1     OtherNumber
U+000D5C  ൜    MALAYALAM FRACTION ONE TENTH             0.1    -1     -1     OtherNumber
U+000D5D  ൝    MALAYALAM FRACTION THREE TWENTIETHS      0.15   -1     -1     OtherNumber
U+000D5E  ൞    MALAYALAM FRACTION ONE FIFTH             0.2    -1     -1     OtherNumber
U+000D66  ൦    MALAYALAM DIGIT ZERO                     0      0      0      DecimalDigitNumber
U+000D67  ൧    MALAYALAM DIGIT ONE                      1      1      1      DecimalDigitNumber
U+000D68  ൨    MALAYALAM DIGIT TWO                      2      2      2      DecimalDigitNumber
U+000D69  ൩    MALAYALAM DIGIT THREE                    3      3      3      DecimalDigitNumber
U+000D6A  ൪    MALAYALAM DIGIT FOUR                     4      4      4      DecimalDigitNumber
U+000D6B  ൫    MALAYALAM DIGIT FIVE                     5      5      5      DecimalDigitNumber
U+000D6C  ൬    MALAYALAM DIGIT SIX                      6      6      6      DecimalDigitNumber
U+000D6D  ൭    MALAYALAM DIGIT SEVEN                    7      7      7      DecimalDigitNumber
U+000D6E  ൮    MALAYALAM DIGIT EIGHT                    8      8      8      DecimalDigitNumber
U+000D6F  ൯    MALAYALAM DIGIT NINE                     9      9      9      DecimalDigitNumber
U+000D70  ൰    MALAYALAM NUMBER TEN                     10     -1     -1     OtherNumber
U+000D71  ൱    MALAYALAM NUMBER ONE HUNDRED             100    -1     -1     OtherNumber
U+000D72  ൲    MALAYALAM NUMBER ONE THOUSAND            1000   -1     -1     OtherNumber
U+000D73  ൳    MALAYALAM FRACTION ONE QUARTER           0.25   -1     -1     OtherNumber
U+000D74  ൴    MALAYALAM FRACTION ONE HALF              0.5    -1     -1     OtherNumber
U+000D75  ൵    MALAYALAM FRACTION THREE QUARTERS        0.75   -1     -1     OtherNumber
U+000D76  ൶    MALAYALAM FRACTION ONE SIXTEENTH         0.0625 -1     -1     OtherNumber
U+000D77  ൷    MALAYALAM FRACTION ONE EIGHTH            0.125  -1     -1     OtherNumber
U+000D78  ൸    MALAYALAM FRACTION THREE SIXTEENTHS      0.1875 -1     -1     OtherNumber
U+000DE6  ෦    SINHALA LITH DIGIT ZERO                  0      0      0      DecimalDigitNumber
U+000DE7  ෧    SINHALA LITH DIGIT ONE                   1      1      1      DecimalDigitNumber
U+000DE8  ෨    SINHALA LITH DIGIT TWO                   2      2      2      DecimalDigitNumber
U+000DE9  ෩    SINHALA LITH DIGIT THREE                 3      3      3      DecimalDigitNumber
U+000DEA  ෪    SINHALA LITH DIGIT FOUR                  4      4      4      DecimalDigitNumber
U+000DEB  ෫    SINHALA LITH DIGIT FIVE                  5      5      5      DecimalDigitNumber
U+000DEC  ෬    SINHALA LITH DIGIT SIX                   6      6      6      DecimalDigitNumber
U+000DED  ෭    SINHALA LITH DIGIT SEVEN                 7      7      7      DecimalDigitNumber
U+000DEE  ෮    SINHALA LITH DIGIT EIGHT                 8      8      8      DecimalDigitNumber
U+000DEF  ෯    SINHALA LITH DIGIT NINE                  9      9      9      DecimalDigitNumber
U+000E50  ๐    THAI DIGIT ZERO                          0      0      0      DecimalDigitNumber
U+000E51  ๑    THAI DIGIT ONE                           1      1      1      DecimalDigitNumber
U+000E52  ๒    THAI DIGIT TWO                           2      2      2      DecimalDigitNumber
U+000E53  ๓    THAI DIGIT THREE                         3      3      3      DecimalDigitNumber
U+000E54  ๔    THAI DIGIT FOUR                          4      4      4      DecimalDigitNumber
U+000E55  ๕    THAI DIGIT FIVE                          5      5      5      DecimalDigitNumber
U+000E56  ๖    THAI DIGIT SIX                           6      6      6      DecimalDigitNumber
U+000E57  ๗    THAI DIGIT SEVEN                         7      7      7      DecimalDigitNumber
U+000E58  ๘    THAI DIGIT EIGHT                         8      8      8      DecimalDigitNumber
U+000E59  ๙    THAI DIGIT NINE                          9      9      9      DecimalDigitNumber
U+000ED0  ໐    LAO DIGIT ZERO                           0      0      0      DecimalDigitNumber
U+000ED1  ໑    LAO DIGIT ONE                            1      1      1      DecimalDigitNumber
U+000ED2  ໒    LAO DIGIT TWO                            2      2      2      DecimalDigitNumber
U+000ED3  ໓    LAO DIGIT THREE                          3      3      3      DecimalDigitNumber
U+000ED4  ໔    LAO DIGIT FOUR                           4      4      4      DecimalDigitNumber
U+000ED5  ໕    LAO DIGIT FIVE                           5      5      5      DecimalDigitNumber
U+000ED6  ໖    LAO DIGIT SIX                            6      6      6      DecimalDigitNumber
U+000ED7  ໗    LAO DIGIT SEVEN                          7      7      7      DecimalDigitNumber
U+000ED8  ໘    LAO DIGIT EIGHT                          8      8      8      DecimalDigitNumber
U+000ED9  ໙    LAO DIGIT NINE                           9      9      9      DecimalDigitNumber
...
U+01F10B     DINGBAT CIRCLED SANS-SERIF DIGIT ZERO    0      -1     -1     OtherNumber
U+01F10C     DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO 0      -1     -1     OtherNumber
U+01FBF0     SEGMENTED DIGIT ZERO                     -1     -1     -1     DecimalDigitNumber
U+01FBF1     SEGMENTED DIGIT ONE                      -1     -1     -1     DecimalDigitNumber
U+01FBF2     SEGMENTED DIGIT TWO                      -1     -1     -1     DecimalDigitNumber
U+01FBF3     SEGMENTED DIGIT THREE                    -1     -1     -1     DecimalDigitNumber
U+01FBF4     SEGMENTED DIGIT FOUR                     -1     -1     -1     DecimalDigitNumber
U+01FBF5     SEGMENTED DIGIT FIVE                     -1     -1     -1     DecimalDigitNumber
U+01FBF6     SEGMENTED DIGIT SIX                      -1     -1     -1     DecimalDigitNumber
U+01FBF7     SEGMENTED DIGIT SEVEN                    -1     -1     -1     DecimalDigitNumber
U+01FBF8     SEGMENTED DIGIT EIGHT                    -1     -1     -1     DecimalDigitNumber
U+01FBF9     SEGMENTED DIGIT NINE                     -1     -1     -1     DecimalDigitNumber
Sebastian
  • 5,275
  • 5
  • 30
  • 45
0

\d checks all Unicode, while [0-9] is limited to these 10 characters. If just 10 digits, you should use. Others I recommend using \d,Because writing less.

dengkai
  • 11
  • 2