3

I came across the following condition written in Java code:

    String pattern = "(?i:U[A-Z0-9]C.*)";
    if (foo.matches(pattern))) {
    ...

I don't understand what the ?i: means. I've seen (?i) used to indicate case-insensitivity, but not sure about the form here.

Thanks for any help!

The Gilbert Arenas Dagger
  • 10,095
  • 10
  • 57
  • 73

2 Answers2

5

In the javadoc of Pattern, it is defined as:

(?idmsuxU-idmsuxU) - Nothing, but turns match flags i d m s u x U on - off

(?idmsux-idmsux:X) - X, as a non-capturing group with the given flags i d m s u x on - off

Whereas (?i) turns the flag CASE_INSENSITIVE on for the remainder of the regex pattern, (?i:X) only turns the flag on for X.

E.g. these are the same1:

Foo(?i)Bar(?-i)Baz
Foo(?i:Bar)Baz

Also note the following comment in the javadoc:

In Perl, embedded flags at the top level of an expression affect the whole expression. In this class, embedded flags always take effect at the point at which they appear, whether they are at the top level or within a group; in the latter case, flags are restored at the end of the group just as in Perl.

1) This doesn't mean that (?i)X(?-i) and (?i:X) is always the same, see comments.


UPDATE - Proof:

System.out.println("Foo(?i)Bar(?-i)Baz  Foo(?i:Bar)Baz");
for (String s : new String[] {"FooBarBaz","FoobarBaz","FooBARBaz","FoobARBaz","FOOBarBaz","FooBarBAZ"})
    System.out.printf("      %-18s%-12s%s%n", s.matches("Foo(?i)Bar(?-i)Baz"), s.matches("Foo(?i:Bar)Baz"), s);

Output

Foo(?i)Bar(?-i)Baz  Foo(?i:Bar)Baz
      true              true        FooBarBaz
      true              true        FoobarBaz
      true              true        FooBARBaz
      true              true        FoobARBaz
      false             false       FOOBarBaz
      false             false       FooBarBAZ
Community
  • 1
  • 1
Andreas
  • 138,167
  • 8
  • 112
  • 195
2

According to (?i:U[A-Z0-9]C.*) the following rules must be true:

  • Case insensitive, determined by ?i
  • U or u must be the first character
  • A-Z or a-z or 0-9 will make up the middle character
  • C or c must be the last character

Testing shows that the following strings all pass:

  • UaC
  • uac
  • UAC
  • uAc

And the following strings fail:

  • baC
  • uAB
  • Uaac
  • UAaC

Here is a helpful site to break down the rules of your regex pattern and here is a helpful site to check whether a string should pass or fail

Manaar
  • 192
  • 3
  • 14