-2

I need to use ruby to find a MAC Address in some text. The format of the MAC address is in the text is XXXX.XXXX.XXXX, where X could be an upper or lower case letter from A-F, or a number from 0-9.

I then need to convert the said MAC into this format: XX:XX:XX:XX:XX:XX.

The below code works when the MAC address in some_text is a081.572c.9a04, but doesn't work when it is 9c21.0adf.9a41.

# Matches
some_text = "some text a081.572c.9a04 some other text ..."
mac_addr = some_text.scan(/.\d+[A-Fa-f0-9].\d+[A-Fa-f0-9].\d+[A-Fa-f0-9].\d+/)[0]
puts mac_addr.to_s.gsub('.','').scan(/\w{2}/).join(':')

=> a0:81:57:2c:9a:04

# Does not match
some_text = "some text 9c21.0adf.9a41 some other text ..."
mac_addr = some_text.scan(/.\d+[A-Fa-f0-9].\d+[A-Fa-f0-9].\d+[A-Fa-f0-9].\d+/)[0]
puts mac_addr.to_s.gsub('.','').scan(/\w{2}/).join(':')

=> 

Why does this code match a081.572c.9a04, but not 9c21.0adf.9a41, when they both have the same character groups?

grizzthedj
  • 5,879
  • 14
  • 37
  • 53
  • 1
    Try with `/\b[A-Fa-f0-9]{4}(?:\.[A-Fa-f0-9]{4}){2}\b/` – Wiktor Stribiżew Aug 07 '20 at 13:01
  • Try this: `([A-Fa-f0-9]{2})([A-Fa-f0-9]{2}).([A-Fa-f0-9]{2})([A-Fa-f0-9]{2}).([A-Fa-f0-9]{2})([A-Fa-f0-9]{2})`. Please check [demo](https://regex101.com/r/AtTFMB/1) – marianc Aug 07 '20 at 13:07
  • `.` in your regex, doesn't match literal `.`, it matches any character. If you want to match literal `.` you must scape it: `\.`. – TeWu Aug 07 '20 at 13:08
  • @WiktorStribiżew: I have a working regex now, and your regex also works(they are almost the same) but my question is more "why" does the above regex match one, but not the other – grizzthedj Aug 07 '20 at 13:09
  • Then the explanation is at https://regex101.com/r/E2z4R3/1 – Wiktor Stribiżew Aug 07 '20 at 13:12

1 Answers1

1

You are looking for:

  • any character
  • followed by a sequence of decimal digits (at least one)
  • followed by a single hexadecimal digit
  • followed by any character
  • followed by a sequence of decimal digits (at least one)
  • followed by a single hexadecimal digit
  • followed by any character
  • followed by a sequence of decimal digits (at least one)
  • followed by a single hexadecimal digit
  • followed by any character
  • followed by a sequence of decimal digits (at least one)

Your second example already fails at step #2:

  • any character: check, we have 9
  • followed by a sequence of decimal digits (at least one): FAIL!!! After the 9, we have c which is not a decimal digit, but you specified that there has to be at least one decimal digit after the first character

So, the reason why the second example is not considered to be a MAC address is simply that you have specified that a MAC address starts with any arbitrary character (?, !, , doesn't matter) followed by a sequence of decimal digits (0-9).

A more sensible way to characterize a MAC address written in the format you describe, would something like this:

/\b\h+(?:\.\h+){2}\b/

A sequence of hexadecimal digits in the beginning followed by a period, followed by sequence of digital digits, and repeating that twice. The whole thing enclosed in word boundaries.

You said that the MAC address is always in groups of 4, but are you sure about that? What about leading zeroes? What about all zeroes? Would they be written 0.0.0 or 0000.0000.0000?

The "really correct" way would be to parse your MAC address as a three-digit number in base 65536 and then pretty-print it as a 6-digit number in base 256.

Jörg W Mittag
  • 337,159
  • 71
  • 413
  • 614