-1

Here's my problem. I am creating a parser for a movie-type script (not a computer script, but a screenplay), and I need to select all of the lines underneath a certain scene heading. Here's an example script from Shakespeare's Hamlet.

#Scene 1#
Bernardo: Who's there?
Francisco: Nay, answer me: stand, and unfold yourself.

#Scene 2#
Horatio: Tis now struck twelve; get thee to bed, Francisco.
Marcellus: Peace, break thee off; look, where it comes again!

I need a way to select everything between "#Scene 1#" and '#Scene 2#'. Bernardo and Francisco should match, but Horatio and Marcellus should not.

I've tried using lookahead and lookbehind, but apparently they don't work across multiple lines.

/(?<=#Scene 1#)(.*)(?=#Scene 2#)/gim

If it's important, I'm using Python 2.7.

James Hall
  • 35
  • 1
  • 6

2 Answers2

0

Explanation of this regex here.

import re

data = """
#Scene 1#
Bernardo: Who's there?
Francisco: Nay, answer me: stand, and unfold yourself.

#Scene 2#
Horatio: Tis now struck twelve; get thee to bed, Francisco.
Marcellus: Peace, break thee off; look, where it comes again!
"""

print(re.findall(r'(?:#Scene 1#)\s*(.*?)\s*(?:#Scene 2#)', data, flags=re.DOTALL)[0])

Prints:

Bernardo: Who's there?
Francisco: Nay, answer me: stand, and unfold yourself.
Andrej Kesely
  • 81,807
  • 10
  • 31
  • 56
0

Your regex works perfectly fine. Just remember to use re.DOTALL flag

>>> re.search(r'(?<=#Scene 1#)(.*)#Scene 2#', text, flags=re.DOTALL).group(1)
"\nBernardo: Who's there?\nFrancisco: Nay, answer me: stand, and unfold yourself.\n\n"
Sunitha
  • 11,046
  • 2
  • 14
  • 21