2

I am using clish and regular expressions for parameter entry. http://clish.sourceforge.net/clish-0.7.3/group__clish__ptype.html

I am whitelisting characters like so:

 pattern="[a-zA-Z0-9\!\[\£\$\%\/\^\_\+\=\#\@\;\,\|\*\{\}\(\)\~\.\>\<\&\-]+"

This works fine, I can enter any of the specified characters. However if I add \] or \\] to escape a right square bracket this is not working. It is matching the [ and therefore can not be entered, not anything after the ]. Any ideas how to escape it so as to enter ] as a valid character? [ works fine.

hjpotter92
  • 71,576
  • 32
  • 131
  • 164
Paul
  • 5,068
  • 6
  • 41
  • 73
  • Try to debug using this tool: http://regex101.com/ – MightyPork Jul 11 '14 at 11:39
  • It's not working, but what is it doing then? Is it throwing an error, or matching the wrong thing, or...? – Kendall Frey Jul 11 '14 at 11:41
  • `>\<\&` ? Why would you need to escape those as html entities? – MightyPork Jul 11 '14 at 11:42
  • @MightyPork It matches fine on that site if I add \] – Paul Jul 11 '14 at 11:43
  • 1
    On severe unrelated issue with your regex is that `[]` only matches a single character at a time, not `&`, which is 5. – Kendall Frey Jul 11 '14 at 11:43
  • @KendallFrey Basically if the patter above works those characters are allowed as input. if they are not allowed then you get a message along the lines of "invalid character". This is what happens for ]. It is not being escaped and is instead matching with [, so all ]'s are invalid characters – Paul Jul 11 '14 at 11:44
  • @MightyPork This is in an xml file, as I am using clish. – Paul Jul 11 '14 at 11:45
  • @Paul Could you explain how to reproduce this? `\]` is certainly valid. – Kendall Frey Jul 11 '14 at 11:45
  • @KendallFrey To reproduce this you would need to be using the same thing I am, clish. http://clish.sourceforge.net/ Which doesn't seem timely for you, so i was wondering if there was any possibly reason anybody would think this is happening, as you say \] should be valid but it is not escaping it. – Paul Jul 11 '14 at 11:47
  • @Paul How are you escaping it? it should be escaped by \. If the language you're using treats \ as a string escape character, you'll have to escape it twice. – Kendall Frey Jul 11 '14 at 11:48
  • @KendallFrey I am using \ and have tried \\ too, wasn't working, I use \ for everything else and it works. Maybe it is a problem with klish, I don't know. It jsut always see's ] as a matching bracket, no problem escaping [ :/ – Paul Jul 11 '14 at 13:08
  • 1
    It might be worth changing the order of items in your pattern and seeing if that makes a difference. I would try this: `"[-\]\.\^\\![$%/£+=#@;,|*{}()~<>&\w]+"`, encoding the minimum you need to to get it into valid XML. This pattern is also a little simpler than the one you have, since it removes some unneeded escaping and uses \w instead of a-zA-Z0-9. – mjk Jul 11 '14 at 13:08
  • @mjk I simly move the ] to the first position after [ and it works. good tip about all my escaping – Paul Jul 11 '14 at 13:22

2 Answers2

3

Try this pattern

pattern="[][a-zA-Z0-9!£$%/^_+=#@;,|*{}()~.&-]+"

The literal closing square bracket must be at the first position in the character class to avoid ambiguity with the closing square bracket that closes the character class (since an empty character class is not allowed). You can put the opening square bracket anywhere you want (obviously not at the first position, or after the -)

Casimir et Hippolyte
  • 83,228
  • 5
  • 85
  • 113
  • But that's only because the whole regex is in a string. If I understand it correctly, this "clish" is some sort of XML format, so [ and ] are not special, neither is \. – MightyPork Jul 11 '14 at 11:50
  • What's the point of `{1,}` instead of `+`? Is there something really cool I'm missing? – Kendall Frey Jul 11 '14 at 12:59
  • @KendallFrey: `+` doesn't exist in the basic POSIX syntax. – Casimir et Hippolyte Jul 11 '14 at 13:00
  • @MightyPork: No, this is not related with the fact that the pattern is in a string. This is related with the POSIX syntax, that doesn't care about escaped characters in a character class. The way (tested with sed) is probably to put the square brackets at the begining of the character class in this order to avoid ambiguity with the closing square bracket. – Casimir et Hippolyte Jul 11 '14 at 13:03
  • @KendallFrey: However, I think you are right, because the examples in the clish documentation seem to use the POSIX extended syntax (which allows the use of `+`) – Casimir et Hippolyte Jul 11 '14 at 13:22
  • @CasimiretHippolyte This did not work off the bat but I took your advice,I can simply move the `]` to the start and it works `pattern="[]a-zA-Z0-9\!\[\£\$\%\/\^\_\+\=\#\@\;\,\|\*\{\}\(\)\~\.\>\<\&\-]+"` I can't use things like `\s` but I can use `+?*` and so on. – Paul Jul 11 '14 at 13:23
  • @CasimiretHippolyte I don't understand why it doesn't care about escaping in some cases and it does in others, any good documentation on that? I am confined to having `[]` to avoid ambiguity? why can I not have `[` after `-`? – Paul Jul 11 '14 at 13:26
  • Having to use [] as an empty character class to avoid ambiguity seems more like a hack than the proper way to do it, which i don't mind at all but I am wondering why this is so, the POSIX syntax just doesn't care that I have escaped it? – Paul Jul 11 '14 at 13:32
  • 1
    @Paul: in the most of regex flavors, all special regex characters loose there special meaning when they are in a character class and don't need to be escaped. But there are exceptions like `]` `-` `^`. In other flavors you can escape these special characters, but with POSIX it seems to not be the case. – Casimir et Hippolyte Jul 11 '14 at 13:32
  • 1
    @Paul: it is not an empty character class. It is a character class where the first character in it is the `]`. If you want more informations about this syntax, you can take a look here: http://stackoverflow.com/questions/17845014/what-does-the-regex-mean/17845034#17845034 – Casimir et Hippolyte Jul 11 '14 at 13:33
  • @CasimiretHippolyte Thanks, this is the final one, I can enter `] - ^` ok. I have learned a lot from this actually, thanks again. `pattern="[]a-zA-Z0-9![£$%/^_+=#@;,|*{}()~.><&-]+"` I will need to document that `]` needs to be the first character in a character class! – Paul Jul 11 '14 at 13:38
  • @Paul: As other comments said, a character class is for characters not for substring: `[>]` is the same that `[;&tg]`, you must put all the html entities out of the character class. – Casimir et Hippolyte Jul 11 '14 at 13:41
  • @CasimiretHippolyte Why is this? It compiles and works fine, translating them to the correct characters. `>` is parsed and seen as `>` correctly. It represents a character not a substring? – Paul Jul 11 '14 at 13:48
  • @Paul: because all needed characters are present in the class. But you can write this instead: `pattern="[]a-zA-Z0-9![£$%/^_+=#@;,|*{}()~.&-]+"` you will obtain the same result. – Casimir et Hippolyte Jul 11 '14 at 13:51
  • @CasimiretHippolyte I think you are being confused about why I need to enter them like that. This is written in xml, you can't simply use & as this is an illegal character. `&` is what you have to use, writing the above would is an invalid file. `&` is the same as writing `&` in this case, it is the character itself. – Paul Jul 11 '14 at 13:59
  • @Paul: Ok, in this case, if it doesn't matter to include the `?` and the `'` in the character class you can avoid to write these literal characters `>` ` – Casimir et Hippolyte Jul 11 '14 at 14:09
  • @CasimiretHippolyte That works but I don't understand why! Pretty cool. Unfortunately I can't have `'` in the character class but I want to learn why this works anyway! – Paul Jul 11 '14 at 14:18
  • 1
    @Paul: It works because the needed characters are included the ranges `#-'` and `;-@`, take a look at the ascii table, you will understand. – Casimir et Hippolyte Jul 11 '14 at 14:29
  • @CasimiretHippolyte Ah makes perfect sense, thanks very much. – Paul Jul 11 '14 at 14:31
2

Try this :

.*[~!@#$%^&*()_+-={}|\\\]\[:";'<>?,./].*

Verify Regex Here

Sheraz Ahmed
  • 317
  • 1
  • 2
  • 17