4

I'm trying to use a Regex pattern (in Java) to find a sequence of 3 digits and only 3 digits in a row. 4 digits doesn't match, 2 digits doesn't match.

The obvious pattern to me was:

"\b(\d{3})\b"

That matches against many source string cases, such as:

">123<"
" 123-"
"123"

But it won't match against a source string of "abc123def" because the c/1 boundary and the 3/d boundary don't count as a "word boundary" match that the \b class is expecting.

I would have expected the solution to be adding a character class that includes both non-Digit (\D) and the word boundary (\b). But that appears to be illegal syntax.

"[\b\D](\d{3})[\b\D]"

Does anybody know what I could use as an expression that would extract "123" for a source string situation like:

"abc123def"

I'd appreciate any help. And yes, I realize that in Java one must double-escape the codes like \b to \b, but that's not my issue and I didn't want to limit this to Java folks.

Michael Oryl
  • 18,335
  • 14
  • 68
  • 107
  • 1
    For more information, check out [`\b`:word boundaries](http://stackoverflow.com/a/6664167) (listed under "Anchors") and the whole section on "Lookarounds" in the [Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/2736496). – aliteralmind Apr 10 '14 at 16:50

2 Answers2

10

You should use lookarounds for those cases:

(?<!\d)(\d{3})(?!\d)

This means match 3 digits that are NOT followed and preceded by a digit.

Working Demo

Community
  • 1
  • 1
anubhava
  • 664,788
  • 59
  • 469
  • 547
4

Lookarounds can solve this problem, but I personally try to avoid them because not all regex engines fully support them. Additionally, I wouldn't say this issue is complicated enough to merit the use of lookarounds in the first place.

You could match this: (?:\b|\D)(\d{3})(?:\b|\D)

Then return: \1

Or if you're performing a replacement and need to match the entire string: (?:\b|\D)+(\d{3})(?:\b|\D)+

Then replace with: \1

As a side note, the reason \b wasn't working as part of a character class was because within brackets, [\b] actually has a completely different meaning--it refers to a backspace, not a word boundary.

Here's a Working Demo.

CAustin
  • 4,254
  • 11
  • 25
  • That's a good answer. The odd thing is, I tried that and came up empty handed in my unit tests. I must have munged it up somehow. I'll give it another shot. Thanks! – Michael Oryl Apr 10 '14 at 17:11
  • Sorry, had to make a slight edit. The second pattern needs to use `+` instead of `*` or else it will match the first three digits of a of a string of four digits or more. – CAustin Apr 10 '14 at 17:21
  • Yeah - I came across that problem since I was actually doing a replacement in my app. Thanks for the update. – Michael Oryl Apr 10 '14 at 17:23