1

Currently i'm trying to create a regex, that can match 3 numbers under some given circumstances. I've tried various attempts now, but it won't work with a single Expression - it's either "false positive" or "matching the wrong numbers"...

In words: I want to match ANY 3 digits that are

  • Appearing at the start of a string
  • Appering somewhere inside the string
  • (End of the string is NOT possible)

IF:

  • There is not another 3-digit-group matching this condition. (ambigious)
  • The group is not followed by "p" or "i"
  • The group is not lead by "x"

In Examples (the number in () is what i want to match):

  • This is (321) an example.
  • (321) also
  • including (321) //basically not possible, but can't hurt.
  • this (321) has another group with a p: 122p
  • this (321) has another group with a I: 123i
  • this x235 should be ignored cause (123) is what i want to match.
  • (123) is what i want, not x111 or 125p or 999i
  • in this 111 case there is no solution 555

(I need it like (1 number)(2 numbers) - but that would just be a little modification to a 3 number match)

My last attempt looked like this:

(?:[^x]|^)(\d{1})(\d{2})[^pi]

Regular expression visualization

Debuggex Demo

However it fails on the last case. I tried to cover this with preg_match_all(...) === 1 to make sure, only one result is matched

However, now a teststring like "101 202" will be positive, because the first check matches 101 (including the whitespace) and then does not match on 202, which makes the pattern assume that 101 is the only valid solution - which is wrong.

(?:[^x]|^)(\d{1})(\d{2})[^pi]

Regular expression visualization

Debuggex Demo

Any idea?

Note: It should work accross different regex engines, no matter if php, javascript, java, .net or Ook! :)

dognose
  • 18,985
  • 9
  • 54
  • 99
  • I think you're complicating things and you just want to use some lookarounds. Try [**this pattern**](http://regex101.com/r/aB1vD2) `~(? – HamZa Apr 25 '14 at 22:52
  • @HamZa thought about look arounds, also. My Approach was `(? – dognose Apr 25 '14 at 22:58
  • You get an error because JavaScript doesn't support lookbehinds. Every modern regex engine (java, .net, python, pcre and a lot more) supports lookaround. It's a powerful tool. So why don't you use it if you're using php? – HamZa Apr 25 '14 at 23:00
  • @HamZa The problem is, that I wan't to provide "any" client a pattern for validation. I don't know, whether the client uses Javascript, php, .net or whatever. Stating "Javascript is unsuported" would be the last approach :) – dognose Apr 25 '14 at 23:04
  • 2
    Well there's your problem. There are some hacky ways to "emulate" lookbehinds but it all depends on the language. In JavaScript you might use a callback. So stating "I want an universal regex" in this case is insane, since the syntax of regexes aren't universal let alone their capabilities. Just look at [this for example](http://www.cowburn.info/2010/04/30/glob-patterns/) – HamZa Apr 25 '14 at 23:11
  • @HamZa that's why i'm searching for a "basic pattern" every Regex Engine can understand... I do not *expect* that it's possible - but One never knows everything - that's why I thought: "let's give SO a chance :P" – dognose Apr 25 '14 at 23:28
  • @HamZa My tagging for `php` was missleading - (just using PHP as a test-system) - sry for that. – dognose Apr 25 '14 at 23:34

2 Answers2

2

I'm not sure if it's this that you want, give it a try:

JAVASCRIPT

var myregexp = /(?:\b[\s]?|[^x])([\d]{1}[\d]{2})(?:[^pi]|[\s]?\b)/m;

http://regex101.com/r/jY6mG9

PHP

preg_match_all('/(?:\b[\s]?|[^x])([\d]{1}[\d]{2})(?:[^pi]|[\s]?\b)/m', $code, $result, PREG_PATTERN_ORDER);

http://regex101.com/r/oW1tJ7

JAVA

Pattern regex = Pattern.compile("(?:\\b[\\s]?|[^x])([\\d]{1}[\\d]{2})(?:[^pi]|[\\s]?\\b)", Pattern.MULTILINE);

RUBY

regexp = /(?:\b[\s]?|[^x])([\d]{1}[\d]{2})(?:[^pi]|[\s]?\b)/

http://rubular.com/r/OHgMLS2gGs

PYTHON

reobj = re.compile(r"(?:\b[\s]?|[^x])([\d]{1}[\d]{2})(?:[^pi]|[\s]?\b)", re.MULTILINE)

https://pythex.org

C (PCRE)

myregexp = pcre_compile("(?:\\b[\\s]?|[^x])([\\d]{1}[\\d]{2})(?:[^pi]|[\\s]?\\b)", PCRE_MULTILINE, &error, &erroroffset, NULL);
Pedro Lobito
  • 75,541
  • 25
  • 200
  • 222
  • What no `PCRE`, I kid... Nice answer I never thought to provide almost every possible languages variants. I am going to start doing this for `regex` questions. – MattSizzle Apr 26 '14 at 03:10
  • @MattGreen: Yeah, [Tuga tends to do that](http://stackoverflow.com/a/23224898/2736496) :) – aliteralmind Apr 26 '14 at 03:21
  • @MattGreen PCRE also :) – Pedro Lobito Apr 26 '14 at 03:35
  • although, I'm not sure if it's this that the OP wants. – Pedro Lobito Apr 26 '14 at 03:47
  • @Tuga: +1 for the work you did there :) But it's not exactly what I need: Im creating an API, where a user can post data programatically. To keep it easy, i wan't to provide the user a Regex along with the field definition to verify it's data before posting for each field. When the client is retrieving this information, I do not know whether its a php or javascript client. Providing ALL Regexes ALL the time seems not suitable. – dognose Apr 26 '14 at 09:30
  • @Tuga: just saw that it's indeed all the time the same pattern (Just looked different cause of different escaping). Just the false positiv in the "double number case" is not resolved with this :) – dognose Apr 26 '14 at 09:43
  • @dognose Happy to help you :) – Pedro Lobito Apr 26 '14 at 12:09
1

We can write the numbers you are looking for like this:

re_n = (?:[^x]|^)\d\d\d(?:[^ip]|$)

Then the whole expression is:

^(?!.*re_n.*re_n.*$).*(re_n)

which basically eliminates double numbers using a negative lookahead following the line start anchor, then matches a valid number.

The interpolated expression looks ugly:

/^(?!.*(?:(?:[^x]|^)\d\d\d(?:[^ip]|$)).*(?:(?:[^x]|^)\d\d\d(?:[^ip]|$)).*$).*((?:(?:[^x]|^)\d\d\d(?:[^ip]|$)))/

This Perl code:

my $re_n = qr/(?:[^x]|^)\d\d\d(?:[^ip]|$)/;
while (<DATA>) { chomp;
    if (/^(?!.*$re_n.*$re_n.*$).*($re_n)/) {
        print "$_: $1\n";
    } else {
        print "$_: NONE\n";
    }   
}

__DATA__
This is 321 an example.
321 also
including 321 //basically not possible, but can't hurt.
this 321 has another group with a p: 122p
this 321 has another group with a I: 123i
this x235 should be ignored cause 123 is what i want to match.
123 is what i want, not x111 or 125p or 999i
in this 111 case there is no solution 555

Produces:

This is 321 an example.:  321 
321 also: 321 
including 321 //basically not possible, but can't hurt.:  321 
this 321 has another group with a p: 122p:  321 
this 321 has another group with a I: 123i:  321 
this x235 should be ignored cause 123 is what i want to match.:  123 
123 is what i want, not x111 or 125p or 999i: 123 
in this 111 case there is no solution 555: NONE
perreal
  • 85,397
  • 16
  • 134
  • 168
  • It's ugly you are right - but it seems to work well across different Implementations of Regex. And nice idea about the whole expression. – dognose Apr 26 '14 at 09:37