4

What I'm trying to do is easier to show up than explain. Let say I have a string like this :

The ^APPLE is a ^FRUIT

using regular expression re.sub(), I want to get this :

The ^V1 is a ^V2

see how they are incremental. But now comes the harder case :

The ^X is ^Y but ^X is not ^Z

should be translated to this :

The ^V1 is ^V2 but ^V1 is not ^V3

i.e. if it repeats then it preserves the substitution i.e ^X => ^V1 case.

I hear the substitution can be a function, but cant get it right.

https://www.hackerrank.com/challenges/re-sub-regex-substitution/problem

sten
  • 5,313
  • 6
  • 33
  • 43

4 Answers4

3

IIUC, you don't need re. String operations will do the job:

from collections import defaultdict

def sequential(str_):
    d = defaultdict(int)
    tokens = str_.split()
    for i in tokens:
        if i.startswith('^') and i not in d:
            d[i] = '^V%s' % str(len(d) + 1)
    return ' '.join(d.get(i, i) for i in tokens)

Output:

sequential('The ^APPLE is a ^FRUIT')
# 'The ^V1 is a ^V2'

sequential('The ^X is ^Y but ^X is not ^Z')
# 'The ^V1 is ^V2 but ^V1 is not ^V3'
Chris
  • 22,987
  • 3
  • 18
  • 40
1

After some searching it turns out there is a solution for multiple replacement using re module and dict.setdefault, if your terms can contains number use this pattern '\^\w[\w\d]*':

import re

string = 'The ^X is ^Y but ^X is not ^Z'
terms = {}
print(re.sub('\^\w+', lambda match: terms.setdefault(match.group(0), '^V{}'.format(len(terms)+1)), string))

Output:

  The ^V1 is ^V2 but ^V1 is not ^V3

sub check the type of replacement argument if it's a str type it replace the match directly with it, if it's a function it calls that method with match as argument and replace the match with the returned value.

Charif DZ
  • 13,500
  • 3
  • 13
  • 36
1

You can create a simple object to handle the incrementation:

import re
class inc:
   def __init__(self):
      self.a, self.c = {}, 0
   def __getitem__(self, _v):
      if _v not in self.a:
         self.c += 1
         self.a[_v] = self.c
      return self.a[_v]

n = inc()
r = re.sub('(?<=\^)\w+', lambda x:f'V{n[x.group()]}', 'The ^APPLE is a ^FRUIT')

Output:

'The ^V1 is a ^V2'

n = inc()
r = re.sub('(?<=\^)\w+', lambda x:f'V{n[x.group()]}', 'The ^X is ^Y but ^X is not ^Z')

Output:

'The ^V1 is ^V2 but ^V1 is not ^V3'
Ajax1234
  • 58,711
  • 7
  • 46
  • 83
0

We can try iterating the input string word by word, and then doing an re.sub global replacement on each occurrence of ^TERM, using a counter to keep track of how many distinct terms we have seen:

inp = "The ^X is ^Y but ^X is not ^Z"
seen = dict()
counter = 0
for term in inp.split():
    if re.match(r'\^([^^]+)', term):
        if term not in seen:
            counter = counter + 1
        seen[term] = 1
        print(term)
        for key, value in seen.iteritems():
            print key, value
        m = re.match(r'\^([^^]+)', term)
        label = "V" + str(counter)
    inp = re.sub(r'\^' + m.group(1), '^' + label, inp)

print(inp)

This prints:

The ^V1 is ^V2 but ^V1 is not ^V3
Tim Biegeleisen
  • 387,723
  • 20
  • 200
  • 263