0

I have seen a regular expression @"^[A-Z]+[a-zA-Z''-'\s]*$" for checking whether a string is person's name and had the same question described in Regular Expression format meaning of [RegularExpression(@“^[A-Z]+[a-zA-Z''-'\s]*$”)] [duplicate]. Namely, what's the intention of [''-']?

Since all [''-'], ['] and ['-'] match single apostrophe, ['] should be enough and ['-'] looks redundant. However, I have seen this expression in several articles in MSDN and in books and so it seems a common pattern. I think there should be a legitimate reason to include ['-']. The comments to above question doesn't go into this point.

Added

Just for reference, I add the articles in Microsoft that use this expression. '' before hyphen is two consecutive single quotes but not double quotes.

Community
  • 1
  • 1
emoacht
  • 1,023
  • 8
  • 18
  • 5
    It does seem unncessary, and perhaps the creator of the expression enclosed the hyphen in primes to match the hyphen rather than have it interpreted as an indicator of a range. That said, it should have been escaped with a backslash if that were the intention. – Brian Warshaw Oct 16 '15 at 14:45
  • 2
    it could also be possible that the regex originally contained the different stylized apostrophes/single quotes that Microsoft Word converts a normal single quote to, but the regex somehow lost those characters. Pasting this into notepad would cause this to happen. This is where dev comments come in handy! – ps2goat Oct 16 '15 at 14:51
  • @ps2goat Pasting from Word to notepad does not convert stylized characters to straight quotes. It's possible that some online form did so to avoid having smart quotes show up as unicode replacement characters, but I don't think it's likely. It also wouldn't explain why there are two extra instances of the prime. – Brian Warshaw Oct 16 '15 at 15:34
  • Given the context I think Brian's suggestion makes a lot of sense: "checking whether a string is person's name" includes last names, which are well-known to have dashes in them. I can easily imagine the creator of this regex just didn't test it properly, and the result got pasted around. (This situation calls for a link to [Falsehoods Programmers Believe about Names](http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/).) Maybe you should contact one of the authors? – 31eee384 Oct 16 '15 at 16:31
  • It sounds quite possible that the original intension is to add hyphen for matching. I should ask the authors for crarification. – emoacht Oct 16 '15 at 16:43

1 Answers1

1

The hyphen in a regex character class has two meanings: a range between two characters like the classic [a-z] and simply the literal hyphen.

For the specific regex you posted [''-'] I think you mean ["-'], what you can find the explanation below. However, the pattern [''-'] it is a really ugly regex to just define the literal string ' (or it is simply a wrong regex).

However, usually people use hyphen with the most common ranges [A-Z], or [a-z] or [0-9], but there is a deeper behavior behind this. The ranges are applied using the ASCII table:

enter image description here


Range from " to '

So, if you have a range with ["-'] you will accept the characters: "#$%&'

Regex demo

enter image description here


Range from ' to ' (literal ')

Likewise, if you use ['-'], you are using a range from ' to ', which is the same of using the literal string ':

Regex demo

enter image description here


Hyphen as literal in character class

On the other hand, if you put the hyphen at the end of the character class it won't work as a range and will be used a literal string (same if you escape it with backslash \-:

Regex demo

enter image description here


The redundant pattern for a single quote: [''-']

As an additional comment for the specific pattern you posted, you can see the explanation in the screenshoot, what briefly it is a redundant pattern for simple match the literal string '.

Regex demo

enter image description here

Federico Piazza
  • 27,409
  • 11
  • 74
  • 107
  • @Downvoter, any comment to improve the answer since you didn't find it useful? – Federico Piazza Oct 16 '15 at 16:04
  • Thanks for your informative explanation. If the first character were double quotes, your first explanation will make sense. However, they are two consecutive single quotes in the articles which I added. I don't know the authors would make this kind of mistake. – emoacht Oct 16 '15 at 16:29
  • @emoacht I've put all the scenarios, for your two consecutive single quotes you can find the last explanation – Federico Piazza Oct 16 '15 at 16:48
  • Yes, the last explanation is exactly why I thought it is redundant. – emoacht Oct 16 '15 at 17:11
  • @emoacht It is redundant, however I think the person who did that regex simply did it wrong... or the first reader of that regex misunderstood `''` with `"`. Anyway, did my answer solve your question? – Federico Piazza Oct 16 '15 at 17:22