0

I have only recently started using regex (.Net Framework) and in addition to this have started really trying to improve my C# knowledge. I'm currently trying to unpick another developers C# script as flaws have been identified with the output. Please consider the snippet of C# below, where 'CodeValue' is something like 'C300'.

  string CodeValue = "C300";
  Regex CodePattern = new Regex(@"(\d{1,3})/?([A-Z0-9]{1,2})?");
  char padChar = '0';

  Match m = CodePattern.Match(CodeValue);
  ReturnString = m.Groups[1].Value.PadLeft(3, padChar)
    + (m.Groups[2].Success ? m.Groups[2].ToString() : "0")
    + "/"
  ;

Now, would I be correct in saying that this script will strip off the 'C' at the start of 'CodeValue' and then add an extra '0' at the end? Resulting in the variable 'ReturnString' being equal to 3000. I am almost certain I know what is going on here (thanks to google), but would like a little bit of clarification on what exactly is happening. e.g. what exactly is meant by the separate 'Groups[]'.

Any help would be appreciated, thanks.

Nico Butler
  • 111
  • 1
  • 5
  • What is the value of `CodeValue`? Giving variables names starting with a capital letter is confusing to read, btw. –  Feb 07 '20 at 14:23
  • 1
    Hi. I mentioned CodeValue in my question, but it would be equal to 'C300'. Sorry about the capital letter, but I am just using the existing script. Cheers. – Nico Butler Feb 07 '20 at 14:26
  • We shouldn't need to get its value from statements in the question test. I'll edit the question so it is clearer. When asking questions, providing a [mre] is extremely helpful. If we can copy your code into a fresh project and test it ourselves under a debugger, it makes it far easier to get your question answered. –  Feb 07 '20 at 14:28
  • 1
    I'll remember that for next time, thanks – Nico Butler Feb 07 '20 at 14:30
  • 1
    For "C300" the result will be "3000/", but it would also match "D300" and give the same result. Basically it finds 1 to 3 digits followed by an optional / and an optional 1 or 2 alphanumerics and gives you the digits left padded with 0 to 3 digits and the alphanumerics if found or 0 if not and then a /. – juharr Feb 07 '20 at 14:30
  • See [this question](https://stackoverflow.com/questions/4736/learning-regular-expressions), and search it for the word "capture". Does this help? –  Feb 07 '20 at 14:32
  • @NicoButler Also, try going to https://regex101.com, and enter your regex as `(\d{1,3})\/?([A-Z0-9]{1,2})?` (I made a minor correction inserting a slash before the `/`; the language on the left doesn't include .Net so the syntax is *slightly* different). It will break down its meaning on the right side for you. Regex101 is a *fantastic* tool. Other such tools are available. –  Feb 07 '20 at 14:35
  • @juharr thank you for clarifying that, much appreciated. – Nico Butler Feb 07 '20 at 14:37
  • @Amy I have a fairly good understanding of the regex itself and have used regexer.com to pick the syntax apart (but thanks for the link to the other question, I will give it a read). I was more aiming at understanding how C# uses the regex if that makes sense? i.e. how C# uses the various groupings which I'm assuming are Groups[1] and Groups[2], I think. – Nico Butler Feb 07 '20 at 14:40
  • @NicoButler I see, so you aren't asking about the *concept* of capturing groups... Then, I'm not sure I follow what you're trying to ask. Do you mean to ask "how do I make use of capturing groups in C#"? I'd like to help but I'm not sure we're on the same page? –  Feb 07 '20 at 14:42
  • @Amy Yeah more of a walkthrough of the capturing groups in C#, so basically a dissection of how 'ReturnString' is created. Does that clarify things? – Nico Butler Feb 07 '20 at 14:46

1 Answers1

1

Let's break down this regex a bit before launching into an expanation. Each set of arrows is a start/end point for a group:

Capture group #1
↓       ↓
(\d{1,3})/?([A-Z0-9]{1,2})?
           ↑             ↑
           Capture group #2

You have three capture groups. One I have denoted with arrows above the regex; another is below. Where is the third? The third group is the entire matched expression. As you might expect, every regex that matches some part of the input string will contain at least one capture group: the part of the string that matched.

In your code, you are using m.Groups[n] to refer to the group each value receives during the match. m.Groups[0] contains the entire matched expression. m.Groups[1] is the first capturing group, marked on top. m.Groups[2] is the second capture group.

In code, each group has a few properties you can inspect, such as the starting position, one or more captured values (see Remarks in the link), and whether that group captured anything at all.

In your regex, capture group #2 might not match anything because the entire group is optional (due to the following ?). Consequently, the code checks to see if it captured anything successfully before getting its value.

  • So essentially because conditional operator '?' is returning false (as codeValue doesn't have a 2nd capturing group) an extra '0' is being added to the end of the string. Thank you for taking the time to answer my query, much appreciated. – Nico Butler Feb 07 '20 at 15:29
  • 1
    @NicoButler Minor correction: `codeValue` *has* a second capturing group, it just didn't capture anything. It's a minor distinction, but an important one. If the capture group exists in the regex, it will exist in the match's `Groups` collection, even if it didn't match anything. So the regex `amy(blah)?Nico` has two capture groups, regardless of the string being matched against. The string `amyNico` would match that regex, but the second capturing group would be empty. –  Feb 07 '20 at 18:20