Understanding Regex Expressions in Python

Question

I am a beginner in regular expressions in python, and I was hoping to understand the following line of code:

 HTML_TAG_REGEX = re.compile(r'<[^>]*>', re.IGNORECASE)

I know that re.compile creates a regular expression object, and that the 'r' tells python we're dealing with a regular expression; however, I was hoping someone could explain what's going on with the rest of the code and specifically the usage of the less than/greater than signs. Thank you!

The `r` is not for regular expressions but for a raw string. — Klaus D., Aug 05 '18 at 02:47
regex and html should not mix... https://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg — Stephen Rauch, Aug 05 '18 at 03:08

score 2 · Accepted Answer · answered Aug 05 '18 at 03:05

Your expression:

matches a "<" character
Then matches 0 or more characters that are not ">"
matches a ">" the end of the pattern

As pointed above, the r before the string means raw string, not regular expression.

You can use a regex translator to get these details.

Understanding Regex Expressions in Python

1 Answers1