3

I have a weird problem with Regex.Replace.

I think my immediate window says it all:

pattern
"([^_]*)(.*)"

fileNameToReplicate
"{Productnr}_LEI1.JPG"

Regex.Replace(fileNameToReplicate, pattern, $"$1")
"{Productnr}"

Regex.Replace(fileNameToReplicate, pattern, $"$2")
"_LEI1.JPG"

Regex.Replace(fileNameToReplicate, pattern, $"sometext$2")
"sometext_LEI1.JPGsometext"

Thus, my pattern looks for the first underscore and captures everything until that underscore in group1.

Then it captures the rest of the text (starting with that underscore until the end of the string) and captures that as group 2.

The regex captures correctly, look here to review it.

Why is the prefixed text outputted twice? Once before the group, and once after the group. Obviously I expected to have this is output:

"sometext_LEI1.JPG"

Uwe Keim
  • 36,867
  • 50
  • 163
  • 268
bas
  • 11,697
  • 16
  • 53
  • 116
  • Maybe the `.* ` is [too greedy](https://stackoverflow.com/q/11898998/107625)? – Uwe Keim Nov 05 '18 at 20:31
  • 1
    @UweKeim I bet you are right, but I don't understand why. When I call $1 and $2 they look fine, right? – bas Nov 05 '18 at 20:33
  • It is always the same thing: unanchored regex that can match an empty string that matches the whole string and the end of the string (since `Regex.Replace` replaces all non-overlapping matches). If you replace once using a `.Replace` method of the non-static `Regex` instance, you would get your expected results. – Wiktor Stribiżew Nov 05 '18 at 20:38
  • Ahhh, that starts to make a bit of sense. – bas Nov 05 '18 at 20:41

1 Answers1

1

It does not matter how many X-stars occur in sequence:

(.*)(.*)(.*)(...

since there is a position called end of subject string that all of them will match it. To see your expected result change your pattern to:

^([^_]*)(.*)

Above adds a caret which defines a boundary and makes engine to not start a match right at the end of input string.

revo
  • 43,830
  • 14
  • 67
  • 109
  • Thanks. That indeed does work. In the meantime I figured that `"([^_]+)(.*)"` works too. I don't understand why either of them makes a difference though (sorry..). – bas Nov 05 '18 at 20:39
  • @bas Your regex can match an empty string. `"([^_]+)(.*)"` cannot since there is `[^_]+`. – Wiktor Stribiżew Nov 05 '18 at 20:40
  • @bas *End of subject string* is a valid *position*. It's an empty string that could be matched by `WHATEVER*`. If you change your quantifiers from `*` to `+` or `{1,}` then it wouldn't match this position. Defining a boundary does the similar thing. – revo Nov 05 '18 at 20:41
  • Roger that. Learned something new. Thanks both! – bas Nov 05 '18 at 20:44