8

How to find minimum and maximum length given a regular expression ?

For example

[1-9]?[0-9]

This regular expression can generate a minimum 1 (0 or 1 0r 2.... or 9) and a maximum of string length 2 (10 or 11 or 12 or ......19 or 20 or 21...........or 99)

Similarly, can anyone provide a function which can calculate minimum and maximum length given a regular expression? Which can take below regex as input?

^[a-zA-Z0-9][a-zA-Z0-9.-]{0,64}[a-zA-Z0-9]$
^[a-zA-Z0-9._-]{1,255}$
^[a-zA-Z0-9 !#$'()*+,./:;=?@\\^_`~-]{1,30}$
^[]a-zA-Z0-9 !#$'()*+,./:;=?@[^_`{|}~-]{0,50}$
^((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])$
Aaron Hall
  • 291,450
  • 75
  • 369
  • 312
  • What do you mean by: "I have tried Regular Expression Minimum Length"??? – Aaron Hall Feb 27 '14 at 23:53
  • Regular expressions do not generate anything. They describe (or match) sth. – Alfe Feb 27 '14 at 23:54
  • I am sorry not generate.Just the minimum n maximum lengths i meant. – user3362766 Feb 28 '14 at 00:00
  • Keep in mind, as soon as you see * or + the maximum length goes to (essentially) infinity. – Hugh Bothwell Feb 28 '14 at 00:02
  • 1
    @Alfe: Regular expressions in the CS theory sense generate regular languages. The language generated by an expression is simply the set of all strings that the expression will match. – nneonneo Feb 28 '14 at 02:52
  • @HughBothwell: Assuming there's no skullduggery involved with backtracking (which takes the expression beyond the theoretical regular languages). – nneonneo Feb 28 '14 at 02:53
  • @nneonneo, I've taught CS beginners for several years at the university in Berlin. Your comment reminds me of that time ;-) And, yes, as you said it: In that sense, a regular expression generates a regular _language_ (and _not strings_ which are elements of that language, as OP put it). But here we are at Stackoverflow which is much more practical oriented. If the question really was about a theoretical CS, it would even be offtopic here, and I would vote for migrating it to [computer science](http://cs.stackexchange.com/). Maybe that's a good choice anyway? – Alfe Feb 28 '14 at 08:56
  • @Alfe: Possibly. Though the question also asks about Python regexes in particular, which makes it slightly more concrete. – nneonneo Feb 28 '14 at 15:53
  • Have you looked at the output of `re.compile(expression, re.DEBUG)`? It doesn't fully answer your question but it goes quite a long way... – Floris Feb 28 '14 at 16:20
  • Unfortunately, that debug output is printed directly to stdout; so to even get it into data structures for further processing, you'd have to capture that output and then you'd have to parse the debug output. All in all the complexity will be the same as with parsing the original regexp, I fear. – Alfe Mar 01 '14 at 00:15

3 Answers3

5

Regular expressions consist of just a very small set of elements.

  1. atoms (e. g. a or [a-k] or .),
  2. choices (e. g. r1|r2),
  3. repetitions (e. g. r{3,10}, r+, r*, r?).
  4. groups (e. g. (r)) which can be subject to repetitions or choices.
  5. Specials (e. g. ^, $).

That's more or less it unless we want to add non-consuming look-aheads and similar, but they are not part of your example input, so I will not consider these.

How long (minimum/maximum) can these be?

  1. 1 / 1 (atoms are constant in size)
  2. min(minlen(r) for r in choices) / max(maxlen(r) for r in choices)
  3. minlen(r) * minrepretition / maxlen(r) * maxrepetition
  4. minlen(r) / maxlen(r)
  5. 0 (positional parameters match the empty string).

So, what you will need is a regexp parser (as Hugh Bothwell suggested it in his answer) which returns you sth like an abstract syntax tree (absy) of a given regexp; this absy then can be analyzed using the rules I sketched above to find the minimal or maximal length for a string the given regexp can match.

Alfe
  • 47,603
  • 17
  • 84
  • 139
4

There is some starting code at http://pyparsing.wikispaces.com/file/view/invRegex.py for a regex parser in pyparsing; it shouldn't be hard to modify to do what you want.

Some tutorials can be found at http://pyparsing.wikispaces.com/Examples

Hugh Bothwell
  • 50,702
  • 6
  • 75
  • 95
2

Looks like you need to build a regex parser to parse these regular expressions and calculate that for you. Something that would look at brackets as a single character, braces as the variable len, and the |'s for more variability. Looks like you've got a lot of homework in front of you. Good luck!

Edit, some additional help.

Ok, here's a bit to perhaps get you started:

This regular expression, for example:

^[a-zA-Z0-9 !#$'()*+,./:;=?@\\^_`~-]{1,30}$
^^--------one of these characters--^^----^^-end of string
^---start of string                   ^one to thirty times

So this regular expression would be 1 to 30 characters long.

Does that help? But seriously, I'm not going to do more than this, you need to read the re docs: http://docs.python.org/library/re.html

Aaron Hall
  • 291,450
  • 75
  • 369
  • 312