10

The following regular expression will match "Saturday" or "Sunday" : (?:(Sat)ur|(Sun))day

But in one case backreference 1 is filled while backreference 2 is empty and in the other case vice-versa.

PHP (pcre) provides a nice operator "?|" that circumvents this problem. The previous regex would become (?|(Sat)ur|(Sun))day. So there will not be empty backreferences.

Is there an equivalent in C# or some workaround ?

Stephan
  • 37,597
  • 55
  • 216
  • 310

3 Answers3

15

.NET doesn't support the branch-reset operator, but it does support named groups, and it lets you reuse group names without restriction (something no other flavor does, AFAIK). So you could use this:

(?:(?<abbr>Sat)ur|(?<abbr>Sun))day

...and the abbreviated name will be stored in Match.Groups["abbr"].

Alan Moore
  • 68,531
  • 11
  • 88
  • 149
  • Yes, that's the way to go (in case of a regex). Neat. – Bart Kiers Mar 21 '11 at 13:16
  • 3
    Perl lets you reuse named groups in the same pattern. There’s no restriction. That also allows you to get back a list of named groups that matched. – tchrist Apr 28 '11 at 01:17
  • 3
    with PCRE, you can use the `(?J)` modifier (must be placed at the first position in the pattern) that allows the use of named groups with the same name. (This is a once and for all option, you can't unset it after). – Casimir et Hippolyte Apr 08 '14 at 13:18
  • This answer has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under "Groups". – aliteralmind Apr 10 '14 at 00:27
  • 1
    Python's [regex](https://pypi.org/project/regex/#notes-on-named-capture-groups) allows reusing group names, but of-course it also knows [branch reset](https://pypi.org/project/regex/#branch-reset) operator. – AXO May 25 '20 at 09:07
3

should be possible to concat backref1 and backref2.
As one of each is always empty and a string concat with empty is still the same string...

with your regex (?:(Sat)ur|(Sun))day and replacement $1$2
you get Sat for Saturday and Sun for Sunday.

 regex (?:(Sat)ur|(Sun))day
 input    | backref1 _$1_ | backref2 _$2_ | 'concat' _$1$2_
 ---------|---------------|---------------|----------------
 Saturday | 'Sat'         | ''            | 'Sat'+'' = Sat
 Sunday   | ''            | 'Sun'         | ''+'Sun' = Sun

instead of reading backref1 or backref2 just read both results and concat the result.

bw_üezi
  • 4,311
  • 3
  • 20
  • 41
  • i'd prefer having the results directly without any manipulation on the input string (like branch-reset operator). – Stephan Mar 21 '11 at 13:24
  • @Stephan I don't understand your comment? I don't think I did manipulation on the input string. I just pointed out that you could concat group1 and group2 from the regex result with no changes on the regex. – bw_üezi Mar 21 '11 at 13:32
  • @bw_üezi can you edit your answer with a detailed sample code because i still don't understand your solution. – Stephan Mar 21 '11 at 14:33
  • 1
    +1 this seems so much better than making the regular expression more complicated just to get the desired value out in one back reference. – juharr Mar 21 '11 at 14:45
  • @juharr thanks for understanding my [KISS](http://en.wikipedia.org/wiki/KISS_principle) solution ;-) – bw_üezi Mar 21 '11 at 14:54
  • @bw_üezi I'm a big KISS fan, the band and the principle. – juharr Mar 21 '11 at 15:41
  • @bw_üzei well, your solution is simple but it requires two steps. – Stephan Mar 24 '11 at 10:17
-1

You can use the branch-reset operator:

(?|foo(bar)|still(life)|(like)so)

That will only set group one no matter which branch matches.

tchrist
  • 74,913
  • 28
  • 118
  • 169