4

I am trying to match IP addresses found in the output of traceroute by means of a regex. I'm not trying to validate them because it's safe enough to assume traceroute is valid (i.e. is not outputting something like 999.999.999.999. I'm trying the following regex:

([0-9]{1,3}.?){4}

I'm testing it in regex101 and it does validate an IP address. However, when I try

echo '192.168.1.1 foobar' | grep '([0-9]{1,3}.?){4}' 

I get nothing. What am I missing?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
mrbolichi
  • 485
  • 2
  • 8
  • 21
  • 4
    And if you use `grep -E '([0-9]{1,3}\.?){4}'`? You used a POSIX ERE pattern, but did not pass `-E` option. Thus, POSIX BRE was used by `grep`. – Wiktor Stribiżew Jun 20 '17 at 13:44
  • @WiktorStribiżew this worked. Care to elaborate an answer so I can accept it? I also would like to see how would I solve my problem using BREs – mrbolichi Jun 20 '17 at 13:51
  • I did not notice the comment, but I actually was working on this update, to show how BRE syntax can be used here. – Wiktor Stribiżew Jun 20 '17 at 13:56
  • just my opinion but since you know you are looking for an IP, something simple like \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b would do the trick. Making the \.? optional means that 1234 would be a valid IP match. Or even w/o the \b would probably be fine. – sniperd Jun 20 '17 at 14:24
  • Note the [How do you extract IP addresses from files using a regex in a linux shell?](https://stackoverflow.com/questions/427979/how-do-you-extract-ip-addresses-from-files-using-a-regex-in-a-linux-shell) post has no valid POSIX BRE/ERE solution for IP address as whole word extraction using `grep`. – Wiktor Stribiżew May 24 '21 at 10:53

2 Answers2

10

You used a POSIX ERE pattern, but did not pass -E option to have grep use the POSIX ERE flavor. Thus, grep used POSIX BRE instead, where you need to escape {n,m} quantifier and (...) to make them be parsed as special regex operators.

Note you need to escape a . so that it could only match a literal dot.

To make your pattern work with grep the way you wanted you could use:

grep -E '([0-9]{1,3}\.?){4}'      # POSIX ERE
grep '\([0-9]\{1,3\}\.\?\)\{4\}'  # POSIX BRE version of the same regex

See an online demo.

However, this regex will also match a string of several digits because the . is optional.

You may solve it by unrolling the pattern as

grep -E '[0-9]{1,3}(\.[0-9]{1,3}){3}'      # POSIX ERE
grep '[0-9]\{1,3\}\(\.[0-9]\{1,3\}\)\{3\}' # POSIX BRE

See another demo.

Basically, it matches:

  • [0-9]{1,3} - 1 to 3 occurrences of any ASCII digit
  • (\.[0-9]{1,3}){3} - 3 occurrences of:
    • \. - a literal .
    • [0-9]{1,3} - 1 to 3 occurrences of any ASCII digit

To make sure you only match valid IPs, you might want to use a more precise IP matching regex:

grep -E '\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}\b' # POSIX ERE

See this online demo.

You may further tweak it with word boundaries (can be \< / \> or \b), etc.

To extract the IPs use -o option with grep: grep -oE 'ERE_pattern' file / grep -o 'BRE_pattern' file.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
  • 1
    That regex [wont work](https://regex101.com/r/qFE95M/1). Use [this](https://regex101.com/r/qFE95M/2) instead. – Olian04 Jun 20 '17 at 13:50
  • @Olian04: I understand what you mean, I just translated OP regex into POSIX BRE/ERE. Edited. – Wiktor Stribiżew Jun 20 '17 at 13:51
  • @Olian04 Or a little shorter: `([0-9]{1,3}\.){3}[0-9]{1,3}`. Still, this doesn't catch broken IP addresses (e.g. 256.256.256.256). – Victor Zamanian Dec 14 '18 at 10:01
  • @VictorZamanian *You might want to use a [more precise IP matching regex](http://www.regular-expressions.info/ip.html).* The answer just points out how to fix the current OP approach. There are more ways to match IPs and all that depends on what the use case is. – Wiktor Stribiżew Dec 14 '18 at 10:02
1

To make a more effective validation, it is better to use a function instead of a simple regex match:

#!/bin/bash
is_valid_ip() {
  local arr element
  IFS=. read -r -a arr <<< "$1"                  # convert ip string to array
  [[ ${#arr[@]} != 4 ]] && return 1              # doesn't have four parts
  for element in "${arr[@]}"; do
    [[ $element =~ ^[0-9]+$ ]]       || return 1 # non numeric characters found
    [[ $element =~ ^0[1-9]+$ ]]      || return 1 # 0 not allowed in leading position if followed by other digits, to prevent it from being interpreted as on octal number
    ((element < 0 || element > 255)) && return 1 # number out of range
  done
  return 0
}

You can invoke this as:

while read -r ip; do
  is_valid_ip "$ip" && printf '%s\n' "$ip" 
done < <(your command that extracts ip address like strings)

Related:

codeforester
  • 28,846
  • 11
  • 78
  • 104