5

I'm trying to highlight markdown code, but am running into this weird behavior of the .NET regex multiline option.

The following expression: ^(#+).+$ works fine on any online regex testing tool:

enter image description here

But it refuses to work with .net:

enter image description here

It doesn't seem to take into account the $ tag, and just highlights everything until the end of the string, no matter what. This is my C#

RegExpression = new Regex(@"^(#+).+$", RegexOptions.Multiline)

What am I missing?

user2950509
  • 902
  • 1
  • 13
  • 30

2 Answers2

6

It is clear your text contains a linebreak other than LF. In .NET regex, a dot matches any char but LF (a newline char, \n).

See Multiline Mode MSDN regex reference

By default, $ matches only the end of the input string. If you specify the RegexOptions.Multiline option, it matches either the newline character (\n) or the end of the input string. It does not, however, match the carriage return/line feed character combination. To successfully match them, use the subexpression \r?$ instead of just $.

So, use

@"^(#+).+?\r?$"

The .+?\r?$ will match lazily any one or more chars other than LF up to the first CR (that is optional) right before a newline.

Or just use a negated character class:

@"^(#+)[^\r\n]+"

The [^\r\n]+ will match one or more chars other than CR/LF.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
0

What you have is good. The only thing you're missing is that . doesn't match newline characters, even with the multiline option. You can get around this in two different ways.

The easiest is to use the RegexOptions.Singleline flag which cause newlines to be treated as characters. That way, ^ still matches the start of the string, $ matches the end of the string and . matches everything including newlines.

The other way to fix this (although I wouldn't recomend it for your use case) is to modify your regex to explicitly allow newlines. To do this you can just replace any . with (?:.|\n) which means either anycharacter or a newline. For your example, you would end up with ^(#+)(?:.|\n)+$. If you want to ensure that there's a non-linebreak character first, add an extra dot: ^(#+).(?:.|\n)+$

3ocene
  • 1,833
  • 1
  • 14
  • 29
  • I think you've misunderstood my question. I don't want to match new lines. The second image is what I've got, and the first image is what I SHOULD get. The input string "this is a \n #header \n but this isn't" should only match "#header". Currently, it's matching "#header but this isn't" – user2950509 Oct 15 '16 at 12:43
  • Please never suggest `(?:.|\n)+` pattern. It is very inefficient and can cause system freeze because of the number of backtracking (or expansion in case of lazy quantifier) steps it has to perform. Always use `.` with the `(?s)` inline modifier or `RegexOptions.Singleline` in .NET. You do not need to every use the `[\s\S]` like workarounds as you may use modifier groups in .NET regex. e.g.: `^.*\r?\n(?s:.*)`. – Wiktor Stribiżew Jul 02 '18 at 10:32