0

I am looking for regex that would find all of the following urls:

hello.com hello1.com 1hello.com hello-1.com hello-hi1.com 1hello-hi.com h3ll0.com

I have tried a few different Regexs but nothing seems to be quite right.

regex = re.compile('\w+\.(com|org|net)')
data = regex.search(string)
url = data.group(0)

I want it to return all of the above urls

Canna
  • 67
  • 5
  • 2
    Possible duplicate of [Learning Regular Expressions](https://stackoverflow.com/questions/4736/learning-regular-expressions) – jonrsharpe Feb 01 '19 at 11:43

3 Answers3

1

You can add this part (-\w+)* in your regex which will allow it to have optional hyphen in the domain name part of your url. You can use this URL,

\w+(?:-\w+)*\.(?:com|org|net)
   ^^^^^^^^^ this allows the URL to have optional hyphen

Demo

You should make the group non-capture unless you really need them as it improves its performance.

Pushpesh Kumar Rajwanshi
  • 17,850
  • 2
  • 16
  • 35
0

Could try spliting the string by a '.' delimiter and then checking if the value is in a white list of say ['com', 'org', 'net', 'io' ....]

For example

whitelist = {'com', 'org', 'net', 'io'}
possible_url = 'hello.com'
if possible_url.split('.')[-1] in whitelist:
    return True
Arran Duff
  • 716
  • 1
  • 5
  • 16
0

using a simple regex can cause you to accidentally match words. for example simply using [\w-]+\.(com|org|net) demo#1 would meet your requirements but would miss all the other domains, miss sub-domains and match normal words.

This regex might be better suited \b\w[-.\w]+\.(com|org|net)\b demo#2

Damo
  • 3,355
  • 2
  • 23
  • 41