0

I am reading the Shinken source code in shinken/misc/perfdata.py and i finally find a regex that i can not understand. like this:

metric_pattern = re.compile('^([^=]+)=([\d\.\-\+eE]+)([\w\/%]*);?([\d\.\-\+eE:~@]+)?;?([\d\.\-\+eE:~@]+)?;?([\d\.\-\+eE]+)?;?([\d\.\-\+eE]+)?;?\s*')

what confused me is that what does \/ mean in ([\w\/%]*)?

andy
  • 3,171
  • 4
  • 25
  • 39
  • This is a literal forward slash. It's being escaped by a backslash. It will result in matching a literal `/` – Lix Oct 20 '14 at 13:57
  • 2
    which is useless, since forward slash don't need escaping in a character class. (in Python) – njzk2 Oct 20 '14 at 13:58
  • Use my explain tool [explainer](http://regexdoc.com/re/explain.pl?re=%28%5B%5Cw%5C%2F%25%5D*%29&.submit=Explain%21&mode=SO&.cgifields=mode) – hwnd Oct 20 '14 at 13:59
  • @Lix, i think in order to match a `/`, there is no need to escape it. Cause it is not a special character. – andy Oct 20 '14 at 14:04

1 Answers1

7

You're rightfully confused, because that regex must have been written by someone who doesn't know Python regexes well.

In some languages (e.g. JavaScript), regexes are delimited by slashes. That means that if you need an actual slash in your regex, you have to escape it. Since Python doesn't use slashes, there's no need to escape the slash (but it doesn't cause an error, either).

Much more worrisome is that the author failed to use a raw string. In many cases, that won't matter (because Python will treat "\d" as "\\d" which then correctly translates to the regex \d, but in other cases, it will cause problems. One example is "\b" which means "a backspace character" and not "a word boundary anchor" like the regex \b would.

Also, the author has escaped a lot of characters that didn't need escaping at all. The entire regex could be rewritten as

metric_pattern = re.compile(r'^([^=]+)=([\d.+eE-]+)([\w/%]*);?([\d.+eE:~@-]+)?;?([\d.+eE:~@-]+)?;?([\d.+eE-]+)?;?([\d.+eE-]+)?;?\s*')

and even then, I'm surprised that it works at all. Looks very chaotic to me and is definitely not foolproof. For example, there appears to be a big potential for catastrophic backtracking meaning that users could freeze your server with malicious input.

Tim Pietzcker
  • 297,146
  • 54
  • 452
  • 522
  • What makes me more confusing is that Python have warn nothing about this wrong usage(in my own opinion). – andy Oct 21 '14 at 07:08
  • Well, it's not wrong (the syntax is legal), and Python can't know if you meant to write `"\b"` or `"\\b"`. If the string contains an actual Syntax Error, Python will complain, but the current regex is just awkward and possibly inefficient, not illegal. – Tim Pietzcker Oct 21 '14 at 07:44
  • @Time Pietzcker, I mean the `\/` use may be illegal. – andy Oct 22 '14 at 01:28
  • @andy: Not really. Most regex engines ignore unnecessarily escaped characters like `\/`, but there *are* some exceptions. IIRC, .NET is a bit more strict here; also, some regex engines have special escapes like `\ – Tim Pietzcker Oct 22 '14 at 04:54
  • Thanks a lot for your explanation. @Tim Pietzcker, – andy Oct 23 '14 at 03:28