1

I'm trying to validate email addresses using a regex pattern with negative lookbehind. More specifically, allow only those that don't end on a specific sequence @mydomain.de.

This works fine on most of my test strings. However, adding a newline at the end of the string (\r\n) seems to break it, as it does no longer match.

I'm aware that this could normally be more easily solved using .endsWith(). I'm just intending to use the regex in a javax pattern annotation.

Pattern p = Pattern.compile("^.*(?<!@mydomain\\.de)$")

p.matcher("test@gmail.com").matches()  // => true
p.matcher("test@gmail.com\r\n").matches()  // => false

I would expect both strings to match as they do not end on the forbidden sequence @mydomain.de

Bakkenrak
  • 53
  • 6
  • 1
    Does `"^.*(? – ggorlen Aug 14 '19 at 16:09
  • @ggorlen Yes, that would work. Thank you. I like Andreas answer below though, too :) – Bakkenrak Aug 15 '19 at 06:17
  • Fair enough, but keep in mind that the above is more precise than dotall. The problem with dotall is that if the email portion has line breaks in it, those should be rejected as invalid, but dotall happily accepts them. – ggorlen Aug 15 '19 at 06:41

3 Answers3

5

By default, the dot (.) wildcard does not match line terminator characters.

This means that ^.*$ doesn't match the second string.

You can make it match all characters by specifying the DOTALL mode:

Pattern p = Pattern.compile("^.*(?<!@mydomain\\.de)$", Pattern.DOTALL)

Or:

Pattern p = Pattern.compile("(?s)^.*(?<!@mydomain\\.de)$")

public static final int DOTALL

Enables dotall mode.

In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.

Dotall mode can also be enabled via the embedded flag expression (?s). (The s is a mnemonic for "single-line" mode, which is what this is called in Perl.)

Andreas
  • 138,167
  • 8
  • 112
  • 195
  • 1
    I don't think that a fixed width lookbehind is the best way to write the regex pattern. – Tim Biegeleisen Aug 14 '19 at 16:16
  • @TimBiegeleisen Not sure I can agree with that, but that isn't the topic of the question anyway. The question issue is the lack of knowledge that `.` doesn't match line terminators, so that's what I decided to address in the answer. OP hasn't even specified whether `"test@mydomain.de\r\n"` should match or not, so how would you do the regex different to change nothing but the issue listed in the question? – Andreas Aug 14 '19 at 16:27
  • Fair enough +1. You answered within the bounds of the original question. – Tim Biegeleisen Aug 14 '19 at 16:31
  • Thank you, that actually nailed my problem! And yes, I am expecting `"test@mydomain.de\r\n"` to match. I could have maybe phrased that more clearly in the last sentence of my question. – Bakkenrak Aug 15 '19 at 06:14
0

Your current pattern has some problems, and I would use this (simplified) pattern:

^.*(?!@mydomain\.de$)@[^.]+(?:\.[^.]+)+$

The negative lookahead belongs immediately before the @ sign, to assert that what follows strictly is not @mydomain.de, followed by the end of the string.

Demo

With these changes, your code now behaves as expected:

Pattern p = Pattern.compile("^.*(?!@mydomain\\.de$)@[^.]+(?:\\.[^.]+)+$");
System.out.println(p.matcher("test@gmail.com").matches());      // => true
System.out.println(p.matcher("test@gmail.com\r\n").matches());  // => ture
Tim Biegeleisen
  • 387,723
  • 20
  • 200
  • 263
  • *FYI:* Your new regex will fail to match when no dot exists in the domain. OP did not ask for that restriction to be added. --- Also, and maybe irrelevant, but your regex is a lot slower than the original regex, since it has to do a lot of backtracking. – Andreas Aug 14 '19 at 16:33
0

By default ^ and $ match the start and end of a string. If you want to let them match the start and end of a line you need set the MULTILINE flag.

Pattern p = Pattern.compile("^.*(?<!@mydomain\\.de)$", Pattern.MULTILINE)
public static final int MULTILINE

Enables multiline mode.

In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.

Multiline mode can also be enabled via the embedded flag expression (?m).

See Also:

Constant Field Values

You can also use \A and \z which always match the start and end of the string. Using \Z instead of \z matching just before the final terminator (\r\n is considered a terminator). See the pattern bounds documentation for additional info.

Pattern p = Pattern.compile("\\A.*(?<!@mydomain\\.de)\\Z")
3limin4t0r
  • 13,832
  • 1
  • 17
  • 33