1

I am trying to split a string with a word that is on a new line by itself.

For example, I want to split the string into parts whenever I encounter the word "SPLIT". For my use case the word "SPLIT" should only be all by itself on a new line:

I tried str.split("\nSPLIT"), but having trouble making it work for after the word.

Hello there,
SPLIT
how are you?

should return ["Hello there,", "how are you?"]

Hello there, SPLIT how are you?

should return ["Hello there, SPLIT how are you?"]

Hello there,
SPLIT

should return ["Hello there,", ""]

Hello there,
SPLIT how are you?

should return ["Hello there,\nSPLIT how are you?"]

Appreciate the help.

Shrav
  • 369
  • 3
  • 17
  • 2
    Where are you stuck? Please share the code you tried to see where the problem is. – Wiktor Stribiżew Apr 13 '21 at 21:32
  • Yeah, your question isn't totally clear. I'd like to help but I'm not sure what you're asking – Mainly Karma Apr 13 '21 at 21:37
  • not sure if I understood exactly, but something like this might help: `re.match( '^(Hello there,\n)(SPLIT\n)?(how are you?)', mystring, re.MULTILINE)` – EWJ00 Apr 13 '21 at 21:38
  • @WiktorStribiżew I started off with str.split("\nSPLIT") and thought it's kind of easier to achieve this with `split` + regex rather than iterating over and building the substrings again. – Shrav Apr 13 '21 at 21:51
  • Ok, all you need is `re.split(r'\n?^SPLIT$\n?', text, flags=re.M)` – Wiktor Stribiżew Apr 13 '21 at 21:54

1 Answers1

1

You can use

re.split(r'\n?^SPLIT$\n?', text, flags=re.M)
re.split(r'(?:^|\n)SPLIT(?:$|\n)', text)

See the Python demo.

The \n?^SPLIT$\n? regex used with re.M flag matches an optional newline char, then makes sure the index is at the start of a line (^) and then matches and consumes SPLIT and then checks if there is end of a line position right after SPLIT and then matches an optional newline char.

The (?:^|\n)SPLIT(?:$|\n) regex just matches either start of string or a newline, SPLIT and then either end of string or a newline char. Note the use of non-capturing groups so as to avoid having newlines or extra empty strings as part of the resulting list.

See the regex demo #1 and regex demo #2.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397