0

I try to filter strings, that don't contain word "spam". I use the regex from here!

But I can't understand why I need the symbol ^ at the start of expression. I know that it signs the start of regex but I do not understand why it doesn't work without ^ in my case?

UPD. All the answers hereunder are very usefull. It's completely clear now. Thank you!

vitm
  • 273
  • 1
  • 2
  • 10
  • Because without it, the final character of the string alone will match the pattern, eg `m` in `spam` (or `am` in `spam`, or `pam` in `spam`). You need to make sure that `spam` doesn't occur *anywhere* in the string, rather than letting the start of the tested substring forward-track. – CertainPerformance Feb 11 '19 at 07:47
  • It's easy to see that without `^` regexp matches any single character in the string. – Zefick Feb 11 '19 at 07:47
  • Possible duplicate of [Reference - What does this regex mean?](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – tripleee Feb 11 '19 at 07:58
  • Please consider accepting an answer if it helped you (green tick on the left). – Jan Feb 11 '19 at 08:06
  • It is not required in many situations, as a lot of validation tasks are performed with methods that only test the input string against the regex *once*. However, it is best practice, as the regex might be used with methods that search for all occurrences. – Wiktor Stribiżew Feb 11 '19 at 08:17

2 Answers2

4

The regex (?!.*?spam) matches a position in a string that is not followed by something matching .*?spam.

Every single string has such a position, because if nothing else, the very end of the string is certainly not followed by anything matching .*?spam.

So every single string contains a match for the regex (?!.*?spam).

The anchor ^ in ^(?!.*?spam) restricts the regex, so that it only matches strings where the very beginning of the string isn't followed by anything matching .*?spam — i.e., strings that don't contain spam at all (or anywhere in the first line, at least, depending on whether . matches newlines).

ruakh
  • 156,364
  • 23
  • 244
  • 282
2

The lookahead is a zero-width assertion (that is, it ensures a position in your string). In your case it is a negative lookahead making sure that not "zero more characters, followed by the word spam" are following. This is true for a couple of positions in your string, see a demo on regex101.com without the anchor.

With the anchor the matching process starts right at the very beginning, so the whole string is analyzed, see the altered demo on regex101.com as well.

Jan
  • 38,539
  • 8
  • 41
  • 69