How to create pattern for method search RegEx on python3?

Question

I am writing a function to retrieve a string from html code with Regular Expression.

ex: <p class = 3DFormText> [Telephone] <o: p> </ o: p> <w:sdtPr></w:sdtPr> </p> for this, I want to get [Telephone], so the format for the string I want to receive is [anything]. But I do not know this pattern of method search of Regular Expressions. So anyone please help me create this or give me any suggestions.

Don't try to parse markup like html with regular expressions . It's not sufficient and ends up helping very little. What you want for parsing html is https://www.crummy.com/software/BeautifulSoup/bs4/doc/ — Daniel Farrell, Nov 03 '19 at 16:51
just use BeautifulSoup4 (that's the name of the package CASE SENSITIVE) — Ahmed I. Elsayed, Nov 04 '19 at 01:56
As @DanielFarrell said, it's better to use a HTML/XML parser rather than a regex. You could use [Parsel](https://github.com/scrapy/parsel), [BeautifulSoup](https://pypi.org/project/beautifulsoup4/), etc — reisdev, Nov 04 '19 at 01:57
Is that actually HTML, or is it XML (e.g. XHTML embedded in an XML-based word processor document of some kind)? — Ry-, Nov 04 '19 at 02:04

Ahmed I. Elsayed · Answer 1 · 2019-11-04T02:15:18.727

Better use BeautifulSoup4

you can also run pip install BeautifulSoup4 (case sensitive)

but if you insist, Try improving this pattern, I just made it so it's not 100% perfect of course, and this matches only opening tag

<[A-Za-z]+\s*(\s*[a-zA-Z0-9]\s*=*"*[A-Za-z0-9\(\)]*"*)*>

it matches <tag ANY="ANY" checked> and attributes are optional of course.

it mached my dummy tag

<tag required name1="ahmed" name="mohamed" person="idk" whatever="whatever" checked >

note that I made it accept attributes Capitalized (first letter) just because html accepts them nothing else feel free to remove that if you want.

How to create pattern for method search RegEx on python3?

1 Answers1