1

I have a string like this:

Hello [@foo] how are you [@bar] more text

Ultimately I need to modify each instance of a substring matching /\[@.+?\]/, but I also need to modify each substring before/after the [@foo] and [@bar].

The following regex matches the substring before a [@.+], the [@.+] itself, then a substring after the [@.+] until the next character is followed by another [@.+].

(.*?)(\[(@.+?)\])((.(?!(\[@.+?\])))*)

So the first match is "Hello [@foo] how are you" and the second match is " [@bar] more text".

Note the space at the beginning of the second match. That's the problem. Is there a way to get the first match to include all characters right up to the next [@.+]?

My regex includes characters after the [@.+] that are not followed by an instance of [@.+], and I cannot see any way of getting it to include all characters until we are actually in another instance of [@.+].

I'm really interested in whether I'm missing something - it certainly feels like there should be a simpler way to capture the characters around a given match, or a simpler way to capture characters not part of a match...

Ollie H-M
  • 475
  • 4
  • 15
  • Change the position of dot in tempered dot match `(.*?)(\[(@.+?)\])(((?!(\[@.+?\])).)*)` – revo Apr 20 '19 at 16:59

4 Answers4

3

You have this regex:

(.*?)(\[(@.+?)\])((.(?!(\[@.+?\])))*)
                   ^

Look at that dot. It precedes a negative lookahead. It matches a unit of data only if negative lookahead is satisfied. If negative lookahead fails, dot won't match. This happens at a character before matching a \[@.+?\]. Hence the space character isn't included.

To include it you just change the order. Put the dot after negative lookahead is passed:

(.*?)(\[(@.+?)\])(((?!(\[@.+?\])).)*)
                                 ^

See live demo here

revo
  • 43,830
  • 14
  • 67
  • 109
1

If I understand correctly, you want to separate your text into groups, each one having one instance of [@.+], and all of the text must be matched into a group.

Try (?:^.*?)?\[@.+?\].*?(?=\[|$).

Nicolas
  • 5,182
  • 2
  • 22
  • 58
  • NIce, thank you. I also need the different parts (before the `[@.+]`, the thing itself, and after it) grouped separately within each match. So I think this does what I need: `(.*?)(\[@.+?\])(.*?(?=\[@.+?\]|$))` – Ollie H-M Apr 20 '19 at 20:34
  • You can use non-capturing groups `(?:)` instead of `()` but the result is the same. Do you need any more help or your problem is solved? – Nicolas Apr 20 '19 at 21:27
0

This RegEx might help you to get those vars.

(?:\[@[A-Za-z0-9]+\])

You can also add any other char to [A-Za-z0-9] such as ., +, @:

`[A-Za-z0-9\.\+\@]` 

and change it as you wish:

(?:\[@[A-Za-z0-9\.\+\@]+\])

enter image description here

Emma
  • 1
  • 9
  • 28
  • 53
0
x = 'Hello [@foo] how are you [@bar] more text'
out = re.search('((.*)(\[.*\])(.*))((\[.*\])(.*))',x)

After getting above output you can use groups method to access different groups:

out.group(1)

'Hello [@foo] how are you '


out.group(2)

'Hello '


out.group(3)

'[@foo]'


out.group(4)

' how are you '


out.group(5)

'[@bar] more text'


out.group(6)

'[@bar]'


out.group(7)

' more text'

r_hudson
  • 184
  • 6