This is very similar to Python: split string by a multi-character delimiter unless inside quotes, from where I took my starting point.
Consider this test string:
{{Institution Name 1} and {Institution name 2}} and {Institution name 3} and {Institution and institution name 4}
I basically want to split this, so I get (it's the same to me if enclosing braces are included or not):
{Institution Name 1} and {Institution name 2}
Institution name 3
Institution and institution name 4
or (with enclosing braces):
{{Institution Name 1} and {Institution name 2}}
{Institution name 3}
{Institution and institution name 4}
Basically, each set of braces delimits an item, and items are separated by "and
".
However, an item can be composed of multiple items, which I don't want to split in first pass; and "and
" can appear as part of institution name as well, in which case I do not want to use it as a split delimiter either.
Modifying the regex from the linked post, I came up with and (?=(?:[^{]*{[^{]*})*[^}]*$)
; on https://pythex.org/ (link to regex), it results with this:
So, the regex successfully avoided the "and
" as separator in the third item where it is part of institution name, but it still is used as separator in the first field, where it should be ignored because it is within a grouping set of braces.
Is there a Python regex that I can use instead, to split in the given way?