Deciphering this regex in ruby

Question

octet = /\d{,2}|1\d{2}|2[0-4]\d|25[0-5]/
ip_regex = /^#{octet}\.#{octet}\.#{octet}\.#{octet}/

The regex above is used to match an IP address. I understand that \d is used to match a digit, and I also understand the ip_regex part, but after looking at some tutorials I'm still not able to completely understand the octet part. Could someone enlighten me? What does {,2}|1 mean for example?

One could write `ip_regex = /^(?:#{octet}\.){3}#{octet}/`. Note `^` is the start-of-line anchor. If the start-of-string anchor is wanted use `\A` instead. — Cary Swoveland, Apr 13 '18 at 05:31
Easier would be `arr = str.split('.'); arr.size == 4 && arr.all? { |s| s =~ /\A\d+\z/ && s.to_i <= 255 }`. Better yet is `require 'ipaddr'; IPAddr.new(str).ipv4?`. If `str` is not a valid string representation of an IP address a syntax error is raised by `IPAddr.new(str)` (which have to be handled). See [IPAddr](http://ruby-doc.org/stdlib-1.9.3/libdoc/ipaddr/rdoc/IPAddr.html). — Cary Swoveland, Apr 13 '18 at 06:13
The actual octet bit should be `octet = '(?:\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])'` and ip regex `ip_regex = /\A#{octet}(?:\.#{octet}){3}\z/` — Wiktor Stribiżew, Apr 13 '18 at 09:15

Ry- · Accepted Answer · 2018-04-13T11:13:37.293

What does {,2}|1 mean for example?

You should be looking at the parts separated by | – \d{,2} is a pattern, 1\d{2} is a pattern, etc. Here’s what they mean:

\d{,2} – up to 2 digit characters, i.e. numbers from 0 to 99
1\d{2} – the digit 1 followed by 2 digits, i.e. numbers from 100 to 199
2[0-4]\d – 2, then a digit from 0 to 4, then a digit, i.e. numbers from 200 to 249
25[0-5] – 2, 5, and a digit from 0 to 5, i.e. numbers from 250 to 255

When you join them together with |, it’s the pattern matching any of those patterns, i.e. numbers from 0 to 255.

The \d{,2} pattern is a bit wrong because it also matches the empty string and allows a leading zero, which is inconsistent with the other patterns.

If you wanted to check whether an entire string matched the pattern, a correct version would probably be this:

octet = /\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5]/
ip_regex = /\A#{octet}\.#{octet}\.#{octet}\.#{octet}\z/

@Ry Your answer is better than mine, as it mentions empty strings and leading zero. — sawa, Apr 13 '18 at 03:32

score 0 · Answer 2 · answered Apr 13 '18 at 03:29

0

One octet in an IP address (in dotted-octet notation) may not exceed 255.

So given /\d{,2}|1\d{2}|2[0-4]\d|25[0-5]/, break it apart like this: / \d{,2} | 1\d{2} | 2[0-4]\d | 25[0-5] /x

The first snip, \d{,2}, matches a 1 or 2 digit number. The second snip, 1\d{2}, matches any number between 100 and 199. The third snip, 2[0-4]\d, matches any number between 200 and 249. The last snip, 25[0-5], matches any number between 250 and 255. Put them all together, and an octet may be any number between 1 and 255.

answered Apr 13 '18 at 03:29

Phlip

5,151
2
27
44

Last bit: 0 and 255 – Ry- Apr 13 '18 at 03:30
Yup. Now imagine how complex the regular expression would be if it followed the actual bitwise rules for valid IP address numbers... – Phlip Apr 13 '18 at 03:32

score 0 · Answer 3 · answered Apr 13 '18 at 10:00

There is a really cool tool to help understanding regular expressions: https://regexper.com It gives you the finite-state automaton, which is more visual and easy to understand that the regular expression.

For example, for octet you get:

Although with the {,2} is still not very clear. a{,2} means maximum 2, so it is equivalent to {0,2} (between 0 and 2). Changing this in the regular expresion regexper makes it a bit better:

And now I think it is easy to read.

Another good tool to try your regular expression is Rubular.

Deciphering this regex in ruby

3 Answers3