1

I'd like to parse the following sample string

foo :6

into two groups: Text and Number. The number group should be populated only if the character ":" precedes the number itself.

so:

foo 6 -> Text = "foo 6"
foo :6 -> Text = "foo", Number = "6"

The best I could come up with so far is

(?<Text>.+)(?=:(?<Number>\d+)h?)?

but that doesn't work because the first group greedily expands to the whole string.

Any suggestions?

Chris
  • 11,554
  • 5
  • 17
  • 22

5 Answers5

5

If you really want to use a regex you can write quite a simple one, without lookarounds:

(?<Text>[^:]+):?(?<Number>\d*)

In my opinion, regexes should be as simple as possible; if you do not want spaces around the Text group I suggest you use match.Groups["Text"].Value.Strip().

Note that if you are parsing a multiline string this pattern will not work because, as @OscarHermosilla mentioned below, [?:]+ will also match newlines. The fix is simple though, change it with [^:\n]

BlackBear
  • 20,590
  • 9
  • 41
  • 75
  • @Unihedron I guess it depends on OP's intentions – BlackBear Sep 12 '14 at 10:46
  • See test case: `foo :6 -> Text = "foo", Number = "6"` – Unihedron Sep 12 '14 at 10:46
  • I agree with you regex should be simple and this will do the job. But it should be applied individually to every string to be parsed. But it won't work in a text with several lines becuase [^:]+ will also match new-line. – Oscar Hermosilla Sep 12 '14 at 12:53
  • @OscarHermosilla you are correct, indeed, I posted this because the question seems to suggest that individual strings will be used. I will add this to my answer – BlackBear Sep 12 '14 at 12:57
2

You don't need any seperate function for stripping the trailing whitespaces

The below regex would capture all the characters into the named group Text except :\d+(ie; : followed by one or more numbers). If it finds a colon followed by numbers, then it starts capturing the number into the named group Number

^(?<Text>(?:(?!:\d+).)+(?=$|\s+:(?<Number>\d+)$))

DEMO

String input = "foo 6";
String input1 = "foo :6";
Regex rgx = new Regex(@"^(?<Text>(?:(?!:\d+).)+(?=$|\s+:(?<Number>\d+)$))");

foreach (Match m in rgx.Matches(input))
{
Console.WriteLine(m.Groups["Text"].Value);
}
foreach (Match m in rgx.Matches(input1))
{
Console.WriteLine(m.Groups["Text"].Value);
Console.WriteLine(m.Groups["Number"].Value);
}

Output:

foo 6
foo
6

IDEONE

Avinash Raj
  • 160,498
  • 22
  • 182
  • 229
1

You can repeat the group name text with an alternation. This way:

(?<Text>.+)\s+:(?<Number>\d)|(?<Text>.+)

DEMO

Based on the idea behind this post: Regex Pattern to Match, Excluding when... / Except between

Community
  • 1
  • 1
Oscar Hermosilla
  • 421
  • 5
  • 20
0

You can simply use split instead of regex:

"foo :6".Split(':');
Rahul Tripathi
  • 152,732
  • 28
  • 233
  • 299
0

You can try like:

(\D+)(?:\:(\d+))

or do a Regex.Split using this pattern:

(\s*\:\s*)
NeverHopeless
  • 10,503
  • 4
  • 33
  • 53