2

The pattern that I have so far using regex

Pattern regex = Pattern.compile("^.*?\/\/([^:\/\s]+)(.*(?=\?|\#))", Pattern.DOTALL);

While working on the string https://url.spec.whatwg.org/#url-syntax, it successfully grabs just the / as I am trying to avoid ? and #, however the problem arises when I try https://url.spec.whatwg.org/

The whitespace at the end is preventing it from finding / in group 2. I have tried including \p{Blank} in the lookahead, however it did nothing.

"https://www.google.com/search?q=Regular+Expressions&num=1000"

Same for the string above; it grabs the /search before the ? but as soon as there as I try "https://www.google.com/search" it breaks down.

How can I fix this?

Thank you for your time!

nelac123
  • 91
  • 1
  • 6
  • `The whitespace at the end is preventing it from finding` just [`trim()`](https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#trim()) it. Also, [read this SO Answer](http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url) – Bagus Tesa Dec 02 '16 at 02:48
  • I've tried trimming all white space, however it still cant find the token without anything after the / – nelac123 Dec 02 '16 at 02:57

1 Answers1

2

The answer below assumes that the input will be URL and we'll take only a bit of it without the query string. Try this

(http)s?:\/\/[^#?]+

You could change the (http)s? with (.+) if you want your old multi-catch approach.. although we could define protocols directly like (http|ftp|...)s?.

Online Test

Bagus Tesa
  • 688
  • 1
  • 13
  • 29