1

Given a list of numbers that are regex patterns, sort them by the last 4 numbers in that numeric value.

I have a list of regex (phone number) patterns and I am trying to sort them by the last 4 characters. Here's a sample list of phone numbers:

8062
\+13066598273
4083100
408320[0-3]
408320[4-6]
752[234569]
\+13066598305
8059

I would like to re-order these numbers by the last 4 numbers so that I would end up with a list like this:

4083100
408320[0-3]
408320[4-6]
752[234569]
8059
8062
\+13066598273
\+13066598305

Now if my patterns were nothing but numbers, I could sort them easily in either SQL or my MVC C# project. In SQL, I could use ORDER BY RIGHT(pattern, 4), or in C# MVC, I could sort my IQueryable list with patterns = patterns.OrderByDescending(s => s.Substring(...etc...)).

Patterns are a bit more difficult. The brackets count as characters, so sorting by the last 4 characters doesn't work as well.

Are there any built-in utilities in C#, MVC, or SQL that allow me to convert regex patterns to the largest possible match?

  • Given a regex pattern, return the largest possible matching regex that matches my condition. For example, if I had the pattern 4[12]00[1-3], I'd have 6 possible results: 41001, 41002, 41003, 42001, 42002, 42003. Then I can get the biggest number possible, and use that for ordering in my original regex list.
    • The regex pattern doesn't contain anything like * or +, special characters that may cause infinite combinations.
  • There may be a C# library that does exactly what I ask for, sorting regex pattern strings.

EDIT:

I've accepted Diego's answer, but it took me a bit of time to wrap my head around it. For other readers who want to know what it's doing, this is what I think Diego is doing:

  1. Make a range of ints, starting at 9999, all the way back to 0. [9999], [9998], [9997]...[0].
  2. Replace the regex-ish part of the strings with a single character. Example, "500[1-5]" would become "500X", or "20[1-9]00[89]" would become "20X00X", so on, so forth.
  3. Get the length of the "last" 4 characters + regex characters.

    var len = lastNChars + pattern.Length - Regex.Replace(pattern, @"\[[^\]]+\]", "X").Length;
    

    So for the pattern 20[1-9]00[89], the above formula translates to "len = 4 + 13 - 6", or 11.

  4. Using the len variable from above, get a substring that represents the "last" 4 numbers of the phone number, even with regex characters. The original string = "20[1-9]00[89]", while the new substring = "[1-9]00[89]" (the 20 is gone now)

  5. Enumerate through and compare array values to the substring regex pattern. [9999] doesn't match regex pattern, [9998] doesn't match...[9997] doesn't match...aha! 9009 matches! The first match I get is going to the biggest possible regex match, which is what I want.
  6. So each regex pattern has been converted to its largest possible matching pattern. Now we can use C#/LINQ/other in-built methods that can sort by those sub-regex matches for us!

Thank god I'm dealing with only numbers. If I were trying to do sort regex patterns that were actually words/had alpha characters, that would be way harder, and that array would be hella bigger (I think).

JTsoi
  • 13
  • 4

2 Answers2

0

It is hard to find sample strings that match a regular expression without enumerating them all and testing them. I also don't think you will be able to find a C# library that sorts regexes. However, you can solve this problem, however, for the special case of patterns that do not contain quantifiers (such as [0-9]+ or [3-6]{4}) as follows:

const int lastNChars = 4;
var patterns = new string[]{@"8062", @"\+13066598273", @"4083100", 
        @"408320[0-3]", @"408320[4-6]", @"752[234569]", 
        @"\+13066598305", @"8059"};
var range = Enumerable.Range(0, (int) Math.Pow(10, lastNChars))
            .Reverse().ToArray();

var sortedPatterns = patterns.OrderBy(pattern=> {
    var len = lastNChars + pattern.Length 
            - Regex.Replace(pattern, @"\[[^\]]+\]", "X").Length;

    // Get the biggest number in range that matches this regex:
    var biggestNumberMatched = range.FirstOrDefault(x => 
        Regex.IsMatch(x.ToString(new String('0', lastNChars)), 
                    pattern.Substring(pattern.Length - len, len))
    );
    return biggestNumberMatched;
}).ToArray();

After which, sortedPatterns contains the desired output.

Diego
  • 16,895
  • 5
  • 56
  • 64
  • This is what I was looking for, thanks a lot! Right now I'm in the process of adapting this for my own MVC project. I also dun-goofed and found out that my list of numbers includes numbers shorter than 4 digits. I'm probably gonna deal with that just forcing them into 4 digits temporarily, like turning '40' into '0040'. – JTsoi Aug 25 '16 at 16:12
0

Here is one solution, credits to Matt Hamilton from this question:

var pList = new List<string>() 
{ 
    "01233[0-3]", "12356[1-3]", "55555[7-9]"
};
var paired = 
    pList.Select(x => 
        new KeyValuePair<int, string>
        (Int32.Parse(new String((new String(x.Where(Char.IsDigit).Reverse().ToArray()))
        .Substring(0, 4).Reverse().ToArray())), x));

var pairedOrdered = paired.OrderByDescending(x => x.Key);
foreach(var kvp in pairedOrdered)
{
    Console.WriteLine("Key: {0} Value: {1}", kvp.Key, kvp.Value);
}

Output:

Key: 5613 Value: 12356[1-3]

Key: 5579 Value: 55555[7-9]

Key: 3303 Value: 01233[0-3]

Community
  • 1
  • 1
Jakotheshadows
  • 1,383
  • 1
  • 12
  • 23