0

I need a URL pattern that can recognize all the urls in plain text. Now I have one that is working fine in Java (using Pattern): ("(@)?(http(s)?://)?[a-zA-Z_0-9\-]+(\.\w[a-zA-Z_0-9\-]+)+(/[#&\n\-=?\+\%/\.,\w]+)?")

It recognizes most of the URLs such as:

http://www.aaa.com

https://www.aaa.com

www.aaa.com

aaa.com

www.aaa.com/abcd/asdf?a=12

but it could NOT recognize the URLs with port number like www.aaa.com:8000 or www.aaa.com:8000/asdf

Can any of the regular expression experts help me to solve this problem making the above pattern recognizes URLs with port number?

HamZa
  • 13,530
  • 11
  • 51
  • 70
  • This link might help you http://stackoverflow.com/questions/285619/how-to-detect-the-presence-of-url-in-a-string – user3842632 Jul 15 '14 at 21:13
  • 2
    You have forgotten about [IDN](http://en.wikipedia.org/wiki/Internationalized_domain_name), ftp, sftp and much more. Basically, it's a really hard job. – HamZa Jul 15 '14 at 21:13
  • What do you want to do with the URLs once recognized? Strip them from the input, linkify them as HTML, fetch them from the intarwebz? – Philipp Reichart Jul 15 '14 at 22:01
  • @PhilippReichart, I need to identify all the urls from the message (which is in plain text), and replace them with a hard coded string and count how many links in this message. – user3842642 Jul 15 '14 at 22:20
  • [Using a regular expression to match a url](http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url/190405#190405), as listed in the [Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/2736496), under "Common Tasks > Validation > Internet". – aliteralmind Jul 16 '14 at 00:49

1 Answers1

0

I guess you could squeeze in a (:\d+)?

 # (@)?(http(s)?://)?[a-zA-Z_0-9\-]+(\.\w[a-zA-Z_0-9\-]+)+(:\d+)?(/[#&\n\-=?\+\%/\.,\w]+)?
 # "(@)?(http(s)?://)?[a-zA-Z_0-9\\-]+(\\.\\w[a-zA-Z_0-9\\-]+)+(:\\d+)?(/[#&\\n\\-=?\\+\\%/\\.,\\w]+)?"

 ( @ )?
 (
      http
      ( s )?
      ://
 )?
 [a-zA-Z_0-9\-]+ 
 ( \. \w [a-zA-Z_0-9\-]+ )+
 ( : \d+ )?
 ( / [#&\n\-=?\+\%/\.,\w]+ )?