5

I have this nice preg_match regex:

if(preg_match ("%^[A-Za-z0-9ążśźęćń󳥯ŚŹĘĆŃÓŁ\.\,\-\?\!\(\)\"\ \/\t\/\n]{2,50}$%", stripslashes(trim($_POST['x']))){...}

Which should allow all characters that could be used in and eventual text content of a post. Problem is, despite the \n it the functions still doesn't work for new lines in my post, so a syntax of

foo

bar

would not work. Does anybody know why the function would not work properly?

Any help would be gratefully appreciated.

aln447
  • 841
  • 2
  • 14
  • 41
  • did you tried with /m flag ? – Quijote Shin Mar 22 '16 at 15:04
  • the m flag worked. Klaar was first tho – aln447 Mar 22 '16 at 15:05
  • @aln447: no it doesn't work, if you use the m modifier, preg_match will succeed if only one line of your string matches the pattern (try yourself with a first line with allowed characters and a second line with forbidden characters). Your problem is probably that your string uses a windows newline sequence `\r\n` (CRLF) and since `\r` isn't in your character class, it doesn't work. – Casimir et Hippolyte Mar 22 '16 at 15:55

1 Answers1

5

By default a preg_match() with a pattern using ^ and $ will consider the whole string, even if it contains newlines.

This behaviour can be altered using Pattern Modifiers, of which I will list the ones that fit this topic:

  • s (PCRE_DOTALL): by default, the dot (.) will not match newlines, but by using the modifier s it will. However, character classes (e.g. [a-z] and [^a-z]) never treat the newline as a special character anyway, thus this modifier will not affect their behaviour like it will for the dot (.).

  • m (PCRE_MULTILINE): by default, the start (^) and end ($) anchors will by default match the start and end of the whole string that is subjected to pattern matching, even if that string contains newlines. However, when this modifier is used, the preg-function is allowed to consider each part of the string that is separated by newlines as a complete string, so "foo\nbar\nbar" will result in three matches (1: foo, 2: bar, 3: bar) when matched against the pattern /^[a-z]$/m, not just one (1: foo\nbar\bar) as when the m modifier is not used: /^[a-z]$/.

  • D (PCRE_DOLLAR_ENDONLY): by default, the end ($) anchor will not only match the very end of a string, but also right before a trailing newline (trailing meaning: at the very end of the string). To undo this behaviour and make it very stricly only match the string ending, use this pattern modifier.

YOUR PROBLEM:

if(preg_match("%^[A-Za-z0-9ążśźęćń󳥯ŚŹĘĆŃÓŁ\.\,\-\?\!\(\)\"\ \/\t\/\n]{2,50}$%m", stripslashes(trim($_POST['x']))){...}

I don't see much wrong with your pattern, except that it is not required that you escape characters other than \, -, ^ (only at the start of the character class) and ] (only when not at the start of the character class), but the PHP doc says it's not a violation to still do so.

It might be, though, that your text snippet contains newlines in the form of \r\n and since \r is not included in the character class of your pattern, it will not be matched.

Since my original post mentioned the use of the Patter Modifier m to which you replied that that worked, I wonder what really might have been the issue.

klaar
  • 585
  • 4
  • 17
  • Sorry but your answer is totally false. The modifier m doesn't allow a match to span over multiple line (the word "multiline" is in this context a bit misleading). What does the m modifier is only to change the meaning of the `^` and `$` anchors from start and end of the string, to start and end of the line. – Casimir et Hippolyte Mar 22 '16 at 15:42
  • @CasimiretHippolyte: You are right of course! I have rewritten my answer to better reflect the truth about Pattern Modifiers and how they may influence the behaviour of preg-functions. I still wonder, though, how my wrong answer might have helped OP. – klaar Mar 22 '16 at 16:21
  • OP believes that your wrong answer was good because he has obtained a match (the first line that matches the pattern, ie:`foo`) with a string that should pass (and he didn't asking questions or test with other strings mixing lines with allowed characters and lines with forbidden characters that should not pass). But the pattern doesn't check all the string. – Casimir et Hippolyte Mar 22 '16 at 17:04
  • so what do you people propose as the actual correct way to deal with the issue? The answer provided did do the trick for me, so I accepted it. Not a pro in php – aln447 Mar 22 '16 at 17:25
  • @aln447 Can you check the string you wanted to match your pattern to and see if there is indeed a carriage return (`\r`) in it that will not be matched because it isn't included in the character class provided by your pattern? That's my only hunch right now. You can use any semi-advanced editor like Notepad++ and turn on the feature that displays all characters, including whitespace (meaning space, tab and newlines). – klaar Mar 23 '16 at 08:09