Python 3 Regex extract part of string

Question

I have tried all the StackOverflow solutions for similar questions, and could`t find anything.

I have this snippet of code and I need to extract the text between html tags, everything between > and < .

word = "div class="name">
                        Text_I_Want_To_Extract 
                    </div>"

m = re.search('>(.+)<', word)
print (m)

I have tried various regex expressions but I failed. I always get empty result. I am guessing it is because I am extracting everything between > and < symbols.

Did anyone had this kind of problems with Python 3 ?

Why you shouldn't parse HTML with regular expressions: https://stackoverflow.com/a/1732454/247696 — Flimm, Aug 02 '18 at 12:52
@Rakesh, very close to solution, I just have a lot of /t added to the string — Adrian Ivasku, Aug 02 '18 at 12:54

score 0 · Accepted Answer · answered Aug 02 '18 at 12:52

0

Try using flags

Ex:

import re

word = """div class="name">
                        Text_I_Want_To_Extract 
                    </div>"""

m = re.search('>(.+)<', word, flags=re.DOTALL)
print (m.group(1).strip())

Output:

Text_I_Want_To_Extract

answered Aug 02 '18 at 12:52

Rakesh

75,210
17
57
95

Yes, this works. Thank you ! – Adrian Ivasku Aug 02 '18 at 12:56
You are welcome :) – Rakesh Aug 02 '18 at 12:57

Python 3 Regex extract part of string

1 Answers1