3

In .NET System.Text.RegularExpressions.Regex if ^ and $ are added to the Regex to look for exact matches, it still returns true for IsMatch if a terminating \n is appended to the string being verified.

For example, the following code:

Regex regexExact = new Regex(@"^abc$");
Console.WriteLine(regexExact.IsMatch("abc"));
Console.WriteLine(regexExact.IsMatch("abcdefg"));
Console.WriteLine(regexExact.IsMatch("abc\n"));
Console.WriteLine(regexExact.IsMatch("abc\n\n"));

returns:

true
false
true
false

What is the Regex that will return false for all of the above except the first?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
Hutch
  • 861
  • 9
  • 18

1 Answers1

4

Solution for the current .NET regex

You should use the very end of string anchor that is \z in .NET regex:

Regex regexExact = new Regex(@"^abc\z");

See Anchors in Regular Expressions:

$    The match must occur at the end of the string or line, or before \n at the end of the string or line. For more information, see End of String or Line.
\Z    The match must occur at the end of the string, or before \n at the end of the string. For more information, see End of String or Before Ending Newline.
\z    The match must occur at the end of the string only. For more information, see End of String Only.

The same anchor can be used in , , , , and . In , use \Z. In JavaScript RegExp (ECMAScript) compatible patterns, the $ anchor matches the very end of string (if no /m modifier is defined).

Background

see Strings Ending with a Line Break at regular-expressions.info:

Because Perl returns a string with a newline at the end when reading a line from a file, Perl's regex engine matches $ at the position before the line break at the end of the string even when multi-line mode is turned off. Perl also matches $ at the very end of the string, regardless of whether that character is a line break. So ^\d+$ matches 123 whether the subject string is 123 or 123\n.

Most modern regex flavors have copied this behavior. That includes .NET, Java, PCRE, Delphi, PHP, and Python. This behavior is independent of any settings such as "multi-line mode".

In all these flavors except Python, \Z also matches before the final line break. If you only want a match at the absolute very end of the string, use \z (lower case z instead of upper case Z). \A\d+\z does not match 123\n. \z matches after the line break, which is not matched by the shorthand character class.

In Python, \Z matches only at the very end of the string. Python does not support \z.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397