0

I have right regex to find urls in text, but one thing i can't solve. If url ends with DOT - this dot matches as part of url.

This is my pattern:

/(^|[\?\s])(www\.[^\? ]+\/[^\/ ]*\?[^\? ]+|www\.[^\? ]+)/g

For sample, text is 'The url is www.domain.com. Second is wiki.org.'

Urls last dot is not part of url, but regex replace it too.

JSFiddle

Vinod Louis
  • 4,659
  • 1
  • 19
  • 41
Dmitry
  • 799
  • 12
  • 28

1 Answers1

1

The simplest fix is to require a non-punctuation character as the last character:

/(^|[?\s])(www\.[^? ]+\/[^/ ]*\?[^? ]*[^?.,! ]|www\.[^? ]*[^?.,! ])/g

Note that I removed some of your backslash, because they were not necessary.

JSFiddle.

However, this is still by for not a robust URL pattern. So, why reinvent the wheel instead of just using some established URL pattern?

Community
  • 1
  • 1
Martin Ender
  • 40,690
  • 9
  • 78
  • 120
  • Well, this fixes dots, but not things like "google.com, yahoo.com!". – georg Aug 23 '13 at 08:49
  • @thg435 that's true actually... I fixed that bit, but the main point of my answer is actually that this solution isn't robust either, and the OP should resort to existing solutions. – Martin Ender Aug 23 '13 at 08:52
  • Yes, i was add it in another feed, its easy – Dmitry Aug 23 '13 at 08:53
  • Why reinvent... good question :) Some of regex not convenient coz detect A-Z and not cyrilic, some use list of TLDs, not convenient too.. I love use self made, but regex new for me – Dmitry Aug 23 '13 at 08:55
  • One thing really not ready, it where url contain PORT. Like www.domain.net:8080 – Dmitry Aug 23 '13 at 08:57