36

What is meant by [\s\S]* in regex in PHP? Does [\s\S]* actually match every string the same as .*?

waldyrious
  • 3,128
  • 4
  • 29
  • 35
yoyo
  • 1,053
  • 2
  • 13
  • 25

3 Answers3

55

By default . doesn't match new lines - [\s\S] is a hack around that problem.
This is common in JavaScript, but in PHP you can use the /s flag to to make the dot match all characters.

Kobi
  • 125,267
  • 41
  • 244
  • 277
  • When does `.` match new line? – yoyo Dec 28 '10 at 08:20
  • @BoltClock - Thanks, I was just getting to it, just had to confirm that's how it's done in PHP. – Kobi Dec 28 '10 at 08:20
  • @yoyo - I've changed "Sometimes" to "by default". It doesn't match unless you tell it to match. Just to be clear - there's probably no good reason to use it, unless you want mixed behavior in the same regex, which can also be achieved in better ways (which I see codaddict has already edited into his answer). – Kobi Dec 28 '10 at 08:22
  • If I am not mistaken, in JavaScript, you can use the `m` modifier for the same effect than `/s` in PHP. – PhiLho Jun 09 '14 at 09:00
  • 2
    @PhiLho - `/m` changes the meaning of `^` and `$`, so they also match newlines, it has no effect on `.`. Calling `/s` `singleline` is an historical error which causes confusion with `multiline`, it should be `Dot-Matches-All`. – Kobi Jun 09 '14 at 09:12
  • 1
    @Kobi You are right, so I was indeed mistaken... :-) https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions for example is clear on the `.` and `m` semantics. – PhiLho Jul 11 '14 at 11:25
22

The . meta character matches any character except a newline. So the pattern .* which is used to match anything will not work if you have to match newlines as-well.

preg_match('/^.*$/',"hello\nworld"); // returns 0

[\s\S] which is a character class of white-space characters and non-whitespace characters matches any character including a newline so do [\d\D], [\w\W]. So your pattern [\s\S]* now matches anything.

preg_match('/^[\s\S]$/s',"hello\nworld"); // returns 1

An alternative to make . match anything (including a newline) is to use a s modifier.

preg_match('/^.*$/s',"hello\nworld"); // returns 1 

Alternative way of using the s modifier is in-lining it as:

preg_match('/^(?s).*(?-s)$/',"hello\nworld"); // returns 1

(?s) turns on the s mode and (?-s) turns if off. Once turned off any following . will not match a newline.

codaddict
  • 410,890
  • 80
  • 476
  • 515
  • Does the last bit only modify the behavior of the adjacent `.`? I've not seen that syntax before. – BoltClock Dec 28 '10 at 08:24
  • 1
    @BoltClock - you can set or clear flags inline - it works for all dots until you clear it: `(?smi).*abc(?-smi).*aBc` – Kobi Dec 28 '10 at 08:28
3

[\s\S] A character set that matches any character including line breaks.

. Matches any character except line breaks.

Arley
  • 902
  • 1
  • 12
  • 18