2

This is very similar to Python: split string by a multi-character delimiter unless inside quotes, from where I took my starting point.

Consider this test string:

{{Institution Name 1} and {Institution name 2}} and {Institution name 3} and {Institution and institution name 4}

I basically want to split this, so I get (it's the same to me if enclosing braces are included or not):

  1. {Institution Name 1} and {Institution name 2}
  2. Institution name 3
  3. Institution and institution name 4

or (with enclosing braces):

  1. {{Institution Name 1} and {Institution name 2}}
  2. {Institution name 3}
  3. {Institution and institution name 4}

Basically, each set of braces delimits an item, and items are separated by "and".

However, an item can be composed of multiple items, which I don't want to split in first pass; and "and" can appear as part of institution name as well, in which case I do not want to use it as a split delimiter either.

Modifying the regex from the linked post, I came up with and (?=(?:[^{]*{[^{]*})*[^}]*$); on https://pythex.org/ (link to regex), it results with this:

pythex-scr.png

So, the regex successfully avoided the "and" as separator in the third item where it is part of institution name, but it still is used as separator in the first field, where it should be ignored because it is within a grouping set of braces.

Is there a Python regex that I can use instead, to split in the given way?

sdaau
  • 32,015
  • 34
  • 178
  • 244
  • 2
    For the level of nesting you have `r' and (?![^{}]*(?:\{[^{}]*})?})'` will work: https://regex101.com/r/l5re8V/1 – Nick Feb 09 '20 at 01:41
  • Thanks @Nick - your suggested regex does indeed work for this level of nesting! – sdaau Feb 09 '20 at 07:11
  • No worries - I just adapted it for your types of braces (and `and` instead of `,`) from the proposed dupe. – Nick Feb 09 '20 at 21:46

1 Answers1

3

You can achieve this by using a recursive regular expression like so.

{(?>[^{}]|(?R))*}

This will result in matches including the enclosing braces.

Here you can see a live example.


According to this question the module regex is needed instead of re. The recursion should be supported then.

Bee
  • 1,171
  • 2
  • 8
  • 22
  • 1
    I just figured out this doesn't work for Python's regex flavor. Seems like the Python flavor supports recursion so I guess conversion should be possible, didn't accomplish that yet though. – Bee Feb 09 '20 at 02:29
  • 2
    Appears to work. You can try it at repl.it, they support installing packages: https://repl.it/repls/LightcyanWearyMonitor – Kelly Bundy Feb 09 '20 at 04:00
  • 1
    Many thanks, accepted - especially because I was unaware of recursive regexes, and Python `regex` vs `re`; although I might try [@Nick](https://stackoverflow.com/questions/60132729/python-split-on-substring-delimiter-but-not-when-within-in-braces#comment106354851_60132729)'s suggestion first, since I don't expect deeper levels of nesting – sdaau Feb 09 '20 at 07:10