5

I would like to validate a textarea and I just don't get regex (It took me the day and a bunch of tutorials to figure it out).

Basically I would like to be able to allow everything (line breaks and chariots included), but the characters that could be malicious( those which would lead to a security breach). As there are very few characters that are not allowed, I assume that it would make more sense to create a black list than a white one.

My question is: what is the standard "everything but" in Regex?

I'm using javascript and jquery.

I tried this but it doesn't work (it's awful, I know..):

var messageReg = /^[a-zA-Z0-9éèêëùüàâöïç\"\/\%\(\).'?!,@$#§-_ \n\r]+$/;

Thank you.

Baylock
  • 1,116
  • 4
  • 23
  • 47
  • 1
    What do you mean by it doesn't work? What's the code that is using that regex? – Juan Mendes Aug 23 '12 at 18:11
  • 4
    This will not do anything for security. A form can be submitted without javascript and even without using a browser (since it's just a certain type of HTTP request). – Esailija Aug 23 '12 at 18:13
  • 1
    RegEx isn't really the right way to be safe from security breaches. This [thread](http://stackoverflow.com/questions/24723/best-regex-to-catch-xss-cross-site-scripting-attack-in-java) might be of interested. Defense against XSS isn't done with regex. – sQVe Aug 23 '12 at 18:14
  • 1
    javascript validation is to save users the time of waiting for a server trip to come back with a validation error. it does absolutely nothing for security. – jbabey Aug 23 '12 at 18:16
  • 1
    What do you want to accept and what do you want to exclude? – Toto Aug 23 '12 at 18:16
  • Juan Mendes: I tried to type an angle bracket and it has been accepted. @Esailija: The values are filtered in PHP after sumbmission but I need to know if there is un unwanted character typed on the fronted in order to throw an inline message to the user (it's a form on a lightbox). – Baylock Aug 23 '12 at 18:18
  • 1
    @Baylock "I tried to type an angle bracket" doesn't tell me the code you used to strip it out, you're probably not using it correctly, see my answer – Juan Mendes Aug 23 '12 at 18:21
  • Thank you everyone for your help. Obviously and as mentioned before, I have some catch up to do with regex. I misspoke when I talked about security. It's true at the php level of my contact form but here, I need to set my form fields behaviors in order to throw an error to the user before accessing the php file. – Baylock Aug 23 '12 at 18:33

3 Answers3

19

If you want to exclude a set of characters (some punctuation characters, for example) you would use the ^ operator at the beginning of a character set, in a regex like

/[^.?!]/

This matches any character that is not ., ?, or !.

murgatroid99
  • 15,284
  • 8
  • 52
  • 88
  • 2
    On Stack Overflow we usually prefer that you use votes instead of comments to say that a question is helpful because comments like that are generally considered noise and votes are more useful to others. – murgatroid99 Aug 23 '12 at 18:34
  • 1
    Yeah, the reason I do that is that I don't know how to vote here. All I have is the ability to check a green flag. But there is only one flag allowed and too many good answers. I'm trying to figure out this thing but still no clue. – Baylock Aug 23 '12 at 18:37
  • 2
    You should be able to click the up or down arrow to vote up or down – murgatroid99 Aug 23 '12 at 18:38
  • 1
    Thanks. Not so long ago, I wasn't able to do that. Anyway, I voted! – Baylock Aug 23 '12 at 18:39
  • 2
    @Baylock You can only vote up/down after a certain rep, that's why you couldn't before – Juan Mendes Aug 23 '12 at 18:49
8

You can use the ^ as the first character inside brackets [] to negate what's in it:

/^[^abc]*$/

This means: "from start to finish, no a, b, or c."

Westy92
  • 11,877
  • 2
  • 53
  • 44
Krycke
  • 2,930
  • 14
  • 21
1

As Esailija mentioned, this won't do anything for real security.

The code you mentioned is almost a negated set, as murgatroid99 mentioned, the ^ goes inside the brackets. So the regular expression will match anything that is not in that list. But it looks like you really want to strip out those characters, so your regexp doesn't need to be negated.

Your code should look like:

str.replace(/[a-zA-Z0-9éèêëùüàâöïç\"\/\%\(\).'?!,@$#-_ \n\r]/g, "");

That says, remove all the characters in my regular expression.

However, that is saying you don't want to keep a-zA-Z0-9 are you sure you want to strip those out?

Also, chrome doesn't like § in Regular Expressions, you have to use the \x along with the hex code for the character

Juan Mendes
  • 80,964
  • 26
  • 138
  • 189
  • Basically, I wanted to allow those characters (it's a large set but it was meant to lose some weight later). thank you for the explanation – Baylock Aug 23 '12 at 18:29