1

I have just started with Regular Expressions and was to solving this question in which the task is to check whether that username is valid. A valid username will have the following properties:

  1. The username can contain alphanumeric characters and/or underscores(_).
  2. The username must start with an alphabetic character.

  3. 8<=(Username Length)<=30.

I am using this as my reference that says

\w Matches the word characters.

and I came up with a solution like this String pattern = "^\\w(\\d|\\w|_){7,29}$"; which is not the correct solution. And after searching for a while I found the correct solution is

String pattern = "^[a-zA-Z][a-zA-Z0-9_]{7,29}$"; which is pretty clear to understand.

What I want to confirm is (\\w|\\d|_) equivalent to [a-zA-Z0-9_] or not?

I think they are because String pattern = "^[a-zA-z](\\w|\\d|_){7,29}$"; is accecpted for all test cases.

Also, this stackoverflow post has two different equivalent expressions for \\w as answers with one upvote each, want to know which one is correct [A-Za-z\s] or [A-Za-z0-9_] ?

Community
  • 1
  • 1
Aman Tugnawat
  • 92
  • 1
  • 2
  • 12
  • I'm not sure this needs to be a question on Stack Overflow. It's fairly common knowledge with people who know regex that `\w` is equivalent to `[A-Za-z0-9_]` – 4castle Sep 21 '16 at 02:35
  • I guess so but there are still questions relating to this here and they have confusing answers like [this](http://stackoverflow.com/questions/28226004/regex-difference-between-a-za-z-s-and-w-d) I believe this could help some begginer like me. – Aman Tugnawat Sep 21 '16 at 02:46
  • I don't know but if you feel this is totally inappropriate then I could remove this? But as for me, It took me a while to discover that \w is [A-Za-z0-9_] and not just [A-Za-z] as it's usually mentioned in a way that's not so specific. like [here](http://www.tutorialspoint.com/java/java_regular_expressions.htm) its just says " \w -Matches the word characters". – Aman Tugnawat Sep 21 '16 at 02:55
  • Look at the docs: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html, it lists the predefined character classes. – John Sep 21 '16 at 02:57
  • yeah I see that I believe I should have taken that as a reference but will this post create any confusion? – Aman Tugnawat Sep 21 '16 at 03:01
  • 1
    I agree that it is confusing for beginners. That tutorial was really vague in what it does, but your question is targeted to that specific combination of characters, and perhaps highlighting what `\w` means would be a better Q&A for beginners to see. It could be boiled down a bit so that others can quickly see if this question is what they're looking for. – 4castle Sep 21 '16 at 03:02
  • Thanks for your advice I have made some changes to the title, is there anything else I should do? – Aman Tugnawat Sep 21 '16 at 03:14

4 Answers4

5

Yes, according to the Java summary of regular expression constructs found here: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html,

\d  A digit: [0-9]
\w  A word character: [a-zA-Z_0-9]

So (\w|\d|_) is equivalent to ([a-zA-Z_0-9]|[0-9]|_), where the extra underscore as well as \d is redundant since it's included as part of \w.

(\w|\d|_) is equivalent to (\w)

John
  • 2,285
  • 13
  • 21
2

Okay so after thinking over this for a while and trying some different solution to the question

\w is, in fact, equivalent to [A-Za-z0-9_] which is also given in the official documentation. https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html

not [a-zA-Z\s] as stated in this answer.

and as for the question String pattern = ^[a-zA-Z]\\w{7,29}; is accepted for all the test cases and seems to me the shortest answer possible.

And therfore although (\\w|\\d|_) is equivalent to [a-zA-Z0-9_] but only using \\w is sufficient.

P.S. Always stick to official documentation when in doubt during the learning phase and not anybody's answer or tutorial anywhere. Hope this helps someone with the same doubt.

Edit: Thank you @4castle @trey for your suggestions.

Community
  • 1
  • 1
Aman Tugnawat
  • 92
  • 1
  • 2
  • 12
2

In regex \w is equivalent to [a-zA-Z0-9_] so it will match letters a, B, 3, _ etc. To match words you would have to use \w+. The plus meaning one or more times. https://regex101.com is a great website for testing regex and finding out what they do.

0

\w stands for “word character”. Exactly which characters it matches differs between regex engines.

  1. In all engines, it will include [A-Za-z].
  2. In most, the underscore and digits are also included.
  3. In some engines, word characters from other languages may also match.

The best way to find out is to do a couple of tests with the regex engine you are using. write a test string and search by regex \w to see what it matches.

Rizwan M.Tuman
  • 9,424
  • 2
  • 24
  • 40
  • 1
    `In most, the underscore and digits are also included` - can you please give me an example of when this isn't the case? – Shadow Feb 08 '17 at 22:47