5

I have taken from this oracle tutorial on java regex, the following bit:

Intersections

To create a single character class matching only the characters common to all of its nested classes, use &&, as in [0-9&&[345]]. This particular intersection creates a single character class matching only the numbers common to both character classes: 3, 4, and 5.

Enter your regex: [0-9&&[345]] Enter input string to search: 3 I found the text "3" starting at index 0 and ending at index 1.

Why would it be useful? I mean if one wants to pattern only 345 why not only [345] instead of "the intersection"?

Thanks in advance.

Rollerball
  • 11,004
  • 22
  • 81
  • 136
  • 1
    If you have two groups of numbers and you want to see if a given number is within both ranges, it would useful. Why wouldn't it be useful? – Darwind Apr 10 '13 at 15:33
  • 5
    In this trivial case it's not useful. They're just giving a simple example of how intersection works. If you were _dynamically generating_ the regex then this might be useful. Otherwise, I typically find examples of the form `[0-9&&[^45]]` a more typical use case. – DaoWen Apr 10 '13 at 15:36

1 Answers1

4

Let us consider a simple problem: match English consonants in a string. Listing out all consonants (or a list of ranges) would be one way:

[B-DF-HJ-NP-TV-Zb-df-hj-np-tv-z]

Another way is to use look-around:

(?=[A-Za-z])[^AEIOUaeiou]
(?![AEIOUaeiou])[A-Za-z]

Not sure if there is any other way to do this without the use of character class intersection.

Character class intersection solution (Java):

[A-Za-z&&[^AEIOUaeiou]]

For .NET, there is no intersection, but there is character class subtraction:

[A-Za-z-[AEIOUaeiou]]

I don't know the implementation details, but I wouldn't be surprised if character class intersection/subtraction is faster than the use of look-around, which is the cleanest alternative if character class operation is not available.

Another possible usage is when you have a pre-built character class and you want to remove some characters from it. One case that I have come across where class intersection might be applicable would be to match all whitespace characters, except for new line.

Another possible use case as @beerbajay has commented:

I think the built-in character classes are the main use case, e.g. [\p{InGreek}&&\p{Ll}] for lowercase Greek letters.

nhahtdh
  • 52,949
  • 15
  • 113
  • 149
  • I think the built-in character classes are the main use case, e.g. `[\p{InGreek}&&\p{Ll}]` for lowercase greek letters. – beerbajay Apr 11 '13 at 01:48
  • @beerbajay: You are probably right. I have yet to run into that use case myself (Well, whether we runs into some use case or not depends on what we are doing). – nhahtdh Apr 11 '13 at 01:55
  • This answer has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under "Character Classes". – aliteralmind Apr 10 '14 at 00:17
  • I can't get `[A-Za-z&&[^AEIOUaeiou]]` working in RegexBuddy (Java or Perl flavors), or in Debuggex with any flavor. But in Java `Arrays.toString(Pattern.compile("[A-Za-z&&[^AEIOUaeiou]]").split("hello"))` is returning `[, e, , , o]` as expected. – aliteralmind Apr 11 '14 at 18:07