0

I am trying to find a regex query, such that, for instance, the following strings match the same expression

  • "1116.67711..44."
  • "2224.43322..88."
  • "9993.35599..22."
  • "7779.91177..55."
  • I.e. formally "x1x1x1x2.x2x3x3x1x1..x4x4." where xi ≠ xj if i ≠ j, and where xi is some number from 1 to 9 inclusive.

Or (another example), the following strings match the same expression, but not the same expression as before:

  • "94..44.773399.4"
  • "25..55.886622.5"
  • "73..33.992277.3"
  • I.e. formally "x1x2..x2x2.x3x3x4x4x1x1.x2" where xi ≠ xj if i ≠ j, and where xi is some number from 1 to 9 inclusive.

That is two strings should be equal if they have the same form, but with the numbers internally permuted so that they are pairwise distinct.

The dots should mean a space in the sequence, this could be any value that is not a single digit number, and two "equal" strings, should have spaces the same places. If it helps, the strings all have the same length of 81 (above they all have a length of 15, as to not write too long strings).

That is, if I have some string as above, e.g. "3566.235.225..45" i want to have some reqular expression that i can apply to some database to find out if such a string already exists

Is it possible to do this?

Norse
  • 450
  • 4
  • 17
  • 1
    Looks like you are looking to create a regex, but do not know where to get started. Please check [Reference - What does this regex mean](https://stackoverflow.com/questions/22937618) resource, it has plenty of hints. Also, refer to [Learning Regular Expressions](https://stackoverflow.com/questions/4736) post for some basic regex info. Once you get some expression ready and still have issues with the solution, please edit the question with the latest details and we'll be glad to help you fix the problem. – Wiktor Stribiżew Jun 07 '20 at 19:34
  • 1
    If I understand the question correctly, "I am trying to find a regex query..[to determine if]...the following strings are considered equal" is misleading. I believe you wish to confirm that all strings shown match the specified pattern. Correct? You may wish to also give examples of strings that don't match the pattern. – Cary Swoveland Jun 07 '20 at 19:43
  • @CarySwoveland I have tried to clarify what i want in the second to last paragraph – Norse Jun 07 '20 at 19:46
  • 1
    The problem is "considered equal". They are in no sense "equal". Each string may or may not match the pattern, that's all. Note typo in your next-to-last paragraph. – Cary Swoveland Jun 07 '20 at 19:48
  • `where xi ≠ xj if i ≠ j,` this seems to be a positional spec, yes ? can explain ? –  Jun 07 '20 at 19:50
  • That just means that the x's should be pairwise different from each other, i.e. x_1 should be different from x_2 and so on – Norse Jun 07 '20 at 19:51
  • so positional _pairs_ XY where X != Y, yes ? `((\d)(?!\2)\d)` what else for ya ? –  Jun 07 '20 at 19:52
  • Yes, if I understand what you are saying correctly – Norse Jun 07 '20 at 19:54
  • what i said, how should pairs be compared to other pairs ? example, pair in my regex cant exist where, downstream, yes, no? –  Jun 07 '20 at 19:55
  • 1
    To whoever upvoted my now-deleted comment containing a regex: thanks, but I'm afraid that regex was wrong as I overlooked one condition. I will repair it, however. – Cary Swoveland Jun 07 '20 at 20:01
  • 2
    You could use the regex `\b(\d)\1{2}(?!\1)(\d).\2(?!\1|\2)(\d)\3\1{2}\.\.(?!\1|\2\3)(\d)\4\b` to match the first set of strings. [Demo](https://regex101.com/r/u3iSYI/2/). A similar approach is used to match the second set of strings. If you want a single regex to match both use an alternation: something like `|`. – Cary Swoveland Jun 07 '20 at 20:04
  • @CarySwoveland Thanks a lot! I think that's exactly what I am looking for. I had no idea where to start – Norse Jun 07 '20 at 20:09
  • carefull with permutatunz quickly get out of hand regex waze, but could alwas look at "what does regex mean" dupe for some mirakle –  Jun 07 '20 at 20:11

1 Answers1

-1

The answer is fairly straightforward:

import re

pattern = re.compile(r'^(\d)\1{3}$')

print(pattern.match('1234'))
print(pattern.match('333'))
print(pattern.match('3333'))
print(pattern.match('33333'))

You capture what you need once, then tell the regex engine how often you need to repeat it. You can refer back to it as often as you like, for example for a pattern that would match 11.222.1 you'd use ^(\d)\1{1}\.(\d)\2{2}\.(\1){1}$.

Note that the {1} in there is superfluous, but it shows that the pattern can be very regular. So much so, that it's actually easy to write a function that solves the problem for you:

def make_pattern(grouping, separators='.'):
    regex_chars = '.\\*+[](){}^$?!:'

    groups = {}
    i = 0
    j = 0
    last_group = 0
    result = '^'
    while i < len(grouping):
        if grouping[i] in separators:
            if grouping[i] in regex_chars:
                result += '\\'
            result += grouping[i]
            i += 1
        else:
            while i < len(grouping) and grouping[i] == grouping[j]:
                i += 1
            if grouping[j] in groups:
                group = groups[grouping[j]]
            else:
                last_group += 1
                groups[grouping[j]] = last_group
                group = last_group
                result += '(.)'
                j += 1
            result += f'\\{group}{{{i-j}}}'
        j = i
    return re.compile(result+'$')


print(make_pattern('111.222.11').match('aaa.bbb.aa'))

So, you can give make_pattern a good example of the pattern and it will return the compiled regex for you. If you'd like other separators than '.', you can just pass those in as well:

my_pattern = make_pattern('11,222,11', separators=',')
print(my_pattern.match('aa,bbb,aa'))
Grismar
  • 12,597
  • 2
  • 21
  • 37