0
import re

data = "[json][17:50 timestamp] hello [mike18][18:06 timestamp] hi"
print( re.split("\[(.*?)\]\[(.*?)\][^a-zA-Z0-9_]", data) )

The result I expected is:

["[json][17:50 timestamp] hello", "[mike18][18:06 timestamp] hi"]

But the real result is:

['', 'json', '17:50 timestamp', 'hello ', 'mike18', '18:06 timestamp', 'hi']

What regular expression should I use?

l'L'l
  • 40,316
  • 6
  • 77
  • 124
HHK Mmk
  • 37
  • 2

2 Answers2

1

You can use re.findall instead with a pattern that matches any number of square-bracket-enclosed sequences followed by a sequence of non-square-bracket characters. Use a positive lookahead pattern to ensure that it is followed by either another square bracket or the end of the string:

re.findall(r'\s*((?:\[.*?\])*\s*[^[]+?)(?=\s*\[|$)', data)

This returns:

['[json][17:50 timestamp] hello', '[mike18][18:06 timestamp] hi']

Note that with the positive lookahead pattern you would be able to avoid matching a trailing space, which @WiktorStribiżew's solution in the comment would include but your expected output does not.

blhsing
  • 70,627
  • 6
  • 41
  • 76
1

Yet another one could be a lookaround:

(?<=\s)(?=\[)

See a demo on regex101.com.

Jan
  • 38,539
  • 8
  • 41
  • 69