regex to catch any http address

Question

I'm trying hard to write a regex that should catch any http address. (background: I'd like to use it in a tkinter window, a simple editor, to transform an http address into a clickable link) Due to how complicated they can be, which is the better regex?

alessandro

Answer is in a [question](http://stackoverflow.com/questions/591859/a-regex-that-validates-a-web-address-and-matches-an-empty-string) — DrTyrsa, Jun 30 '11 at 08:36
Try the one John Gruber came up with for Markdown: http://daringfireball.net/2010/07/improved_regex_for_matching_urls — kindall, Jun 30 '11 at 16:09

score 1 · Answer 1 · answered Jun 30 '11 at 08:36

1

Considering the possibilities that came with Punycode, I'd say this is almost impossible to do with a RegEx.

Of course you could restrict your view to ASCII URLs.

You should take a look at the Regular Expression Library.

answered Jun 30 '11 at 08:36

primfaktor

2,569
23
33

He will also have to take into account GET parameters in the URL... – SJuan76 Jun 30 '11 at 08:38

score 1 · Answer 2 · edited May 23 '17 at 12:23

1

Using A regex that validates a web address and matches an empty string? as a basis for an answer.

Assuming that an HTTP (or HTTPS) address :

starts with "http://" or "https://"
contains at least one "." between the TLD and the domain name
the domain name is composed of letters, numbers _ and -
the URL is delimited at the end by a space and can contain any other character

then the regular expression could be '(http|https)://[\w-]+(.[\w-]+)+\S*'

>>> import re
>>> re.sub("(http|https)://[\w\-]+(\.[\w\-]+)+\S*", "### URL ###", "There is an URL in this string : https://stackoverflow.com/questions/6532089/regex-to-catch-any-http-address and it is followed by text")
'There is an URL in this string : ### URL ### and it is followed by text'

But it doesn't detect a punctuation after the URL.

edited May 23 '17 at 12:23

Community

1
1

answered Jun 30 '11 at 08:49

Teg

31
2

Your answer will match, but completely different than you expect. `/*.*` means match any amount of `/` and then any amount of any character. `.` is a special character in regex and means ANY character. Your regex will match e.g. `http:/` – stema Jun 30 '11 at 08:55
Your expression matches input like `http:/`, `http:/not-a-url` and `http:///////////////`. It would also catch all whitespace, so as soon as a URL is typed in the OP's editor window, it would never end! – anton.burger Jun 30 '11 at 08:57
For the intended effect (from Teg's prose I take it that he was trying to specify a glob instead of a regex), try "http://[^/.]+\.[^/.]+" - not that I recommend this as a way of recognizing links – Sasha Jun 30 '11 at 09:08
welcome to StackOverflow. Answers here are reviewed very quickly, here. But you have the possibility to edit and improve your answer. If you change it into something that is not wrong, I am able to take back my downvote, if its good I will give you an upvote. But don't be discouraged, SO is a great place to learn and you already got hints in the comments to your answer. – stema Jun 30 '11 at 09:25
I believe I do answer now. The regular expression doesn't verify a valid URL but detects what should be interpreted as such in a string. It still can be improved by detecting punctuation. – Teg Jun 30 '11 at 12:03

score 1 · Answer 3 · answered Jun 30 '11 at 09:27

1

In tornado.escape module is nice method "linkify" for that. You can view source here:escape.py ps: I wanted to add this post as comment, but i dont have enough privileges, but anyway i hope you found it useful.

answered Jun 30 '11 at 09:27

timgluz

1,014
10
14

To me this seems like it could be a better answer than a regex, so +1 – stema Jun 30 '11 at 09:33

regex to catch any http address

3 Answers3