0

I am a beginner in regular expressions in python, and I was hoping to understand the following line of code:

 HTML_TAG_REGEX = re.compile(r'<[^>]*>', re.IGNORECASE)

I know that re.compile creates a regular expression object, and that the 'r' tells python we're dealing with a regular expression; however, I was hoping someone could explain what's going on with the rest of the code and specifically the usage of the less than/greater than signs. Thank you!

Eric101
  • 131
  • 6
  • 2
    The `r` is not for regular expressions but for a raw string. – Klaus D. Aug 05 '18 at 02:47
  • 1
    regex and html should not mix... https://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg – Stephen Rauch Aug 05 '18 at 03:08

1 Answers1

2

Your expression:

  1. matches a "<" character
  2. Then matches 0 or more characters that are not ">"
  3. matches a ">" the end of the pattern

As pointed above, the r before the string means raw string, not regular expression.

You can use a regex translator to get these details.

PabTorre
  • 2,488
  • 18
  • 28