What is a regular expression which will match a valid domain name without a subdomain?

Question

I need to validate a domain name:

google.com

stackoverflow.com

So a domain in its rawest form - not even a subdomain like www.

Characters should only be a-z | A-Z | 0-9 and period(.) and dash(-)
The domain name part should not start or end with dash (-) (e.g. -google-.com)
The domain name part should be between 1 and 63 characters long
~~The extension (TLD) can be anything under #1 rules for now, I may validate them against a list later, it should be 1 or more characters though~~

~~Edit: TLD is apparently 2-6 chars as it stands~~

no. 4 revised: TLD should actually be labelled "subdomain" as it should include things like .co.uk -- I would imagine the only validation possible (apart from checking against a list) would be 'after the first dot there should be one or more characters under rules #1

Thanks very much, believe me I did try!

It would be wise to consider URI validation rather than domain name validation. Look at IETF RFC3986. — ingyhere, Apr 24 '12 at 22:10
May be not helpful at all. When it comes to google.co.uk, and some Japanese domains, I'm sure you will have to think twice before using regex for that. My personal thought is that regex is not enough to validate a domain to a real-life domain. FYI, here is an almost complete list of tlds and country code second level domains list: http://static.ayesh.me/misc/SO/tlds.txt — AKS, May 15 '12 at 14:42
See my answer to the related question about [hostname validation](http://stackoverflow.com/questions/106179/regular-expression-to-match-hostname-or-ip-address/3824105#3824105). — SAM, Sep 07 '13 at 14:41
Often forgotten: For full qualified domain names you should match a period after the tld. — schmijos, Nov 13 '13 at 13:45
Please note that a lot of new TLDs are coming up. And some are quite long, for example xn--vermgensberatung-pwb is a valid TLD and is 24 characters long :p — Romuald Brunet, Jan 06 '15 at 17:40
Some of these answers are pretty good, but there's also another [good answer on this other question](http://stackoverflow.com/questions/3026957/how-to-validate-a-domain-name-using-regex-php/16491074#16491074) that's worth a look. — craftworkgames, Jan 25 '16 at 01:45
.co.uk is not a TLD and I would bitchslap anyone who claims otherwise. Stop confusing people by repeating the same mistake. The TLD is .uk and co.uk is a SLD which would make it a subdomain of .uk — Bojidar Stanchev, Jun 19 '20 at 11:21

Tim Groeneveld · Answer 1 · 2019-06-11T07:20:58.310

94

I know that this is a bit of an old post, but all of the regular expressions here are missing one very important component: the support for IDN domain names.

IDN domain names start with xn--. They enable extended UTF-8 characters in domain names. For example, did you know "♡.com" is a valid domain name? Yeah, "love heart dot com"! To validate the domain name, you need to let http://xn--c6h.com/ pass the validation.

Note, to use this regex, you will need to convert the domain to lower case, and also use an IDN library to ensure you encode domain names to ACE (also known as "ASCII Compatible Encoding"). One good library is GNU-Libidn.

idn(1) is the command line interface to the internationalized domain name library. The following example converts the host name in UTF-8 into ACE encoding. The resulting URL https://nic.xn--flw351e/ can then be used as ACE-encoded equivalent of https://nic.谷歌/.

  $ idn --quiet -a nic.谷歌
  nic.xn--flw351e

This magic regular expression should cover most domains (although, I am sure there are many valid edge cases that I have missed):

^((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.(xn--)?([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[a-z]{2,})$

When choosing a domain validation regex, you should see if the domain matches the following:

xn--stackoverflow.com
stackoverflow.xn--com
stackoverflow.co.uk

If these three domains do not pass, your regular expression may be not allowing legitimate domains!

Check out The Internationalized Domain Names Support page from Oracle's International Language Environment Guide for more information.

Feel free to try out the regex here: http://www.regexr.com/3abjr

ICANN keeps a list of tlds that have been delegated which can be used to see some examples of IDN domains.

Edit:

 ^(((?!-))(xn--|_{1,1})?[a-z0-9-]{0,61}[a-z0-9]{1,1}\.)*(xn--)?([a-z0-9][a-z0-9\-]{0,60}|[a-z0-9-]{1,30}\.[a-z]{2,})$

This regular expression will stop domains that have '-' at the end of a hostname as being marked as being valid. Additionally, it allows unlimited subdomains.

edited Jun 11 '19 at 07:20

answered Nov 18 '14 at 06:08

Tim Groeneveld

7,699
1
37
54

1

Note that this will only support max one subdomain, anything more than that will result in false. It's not something that you're libel to run into unless using it for internal sites, etc... A quick attempt to allow it to support more subdomains: `/^((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,}\.?((xn--)?([a-z0-9\-.]{1,61}|[a-z0-9-]{1,30})\.?[a-z]{2,})$/i` – stakolee Aug 25 '16 at 19:01
1

But lonely tld's are not working :( For example `to.` ( http://to./ ) is valid url with content. – iiic Sep 16 '16 at 08:41
@iiic, yes, but `to.` is not a fully qualified domain name. If you want to allow top level domains, then you should use something like `^(((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,1}\.)?(x--)?([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[a-z]{2,})\.?$`, but be warned, you will let through people putting in domains like `test` or `na`, too! – Tim Groeneveld Sep 20 '16 at 01:23
It accepts `invali.d` as a valid domain name while `invali.d.co.uk` is invalid. – Pawel Krakowiak Apr 20 '17 at 08:32
@PawelKrakowiak `^(((?!-))(xn--)?[a-z0-9-_]{0,61}[a-z0-9]{1,1}\.)*(xn--)?([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[a-z]{2,})$` (Note that 'invali.d.co.uk' is not really a domain name in the traditional sense. A domain name usually only has a first or second-level hierarchy from the TLD/2LD) – Tim Groeneveld Jun 06 '17 at 06:52
This regular expression is not correct. For example, it matches domain names starting with `-`. – Eugene Morozov May 20 '18 at 01:41
@EugeneMorozov the regular expression starts with `^(((?!-))`, which explicitly states that it should not start with an `-`. Do you have an example of a domain where it does match a domain starting with `-`? My quick tests show that the regular expression fails with domains starting with `-`. – Tim Groeneveld May 21 '18 at 04:28
@timgws In python: `In [2]: d = re.compile('^(((?!-))(xn--)?[a-z0-9-_]{0,61}[a-z0-9]{1,1}\.)*(xn--)?([a-z0-9\-]{1,61}|[a-z0-9-]{1,30}\.[a-z]{2,})$')` `In [3]: d.match('-google')` `Out[3]: <_sre.sre_match at="">` It works if I move `(?!-)` before the `^`: `(?!-)^((xn--)?[-a-z0-9_]{0,61}[a-z0-9]\.)*(xn--)?([a-z0-9-]{1,61}|[a-z0-9-]{1,30}\.[a-z]{2,})$` (also cleaned useless fluff like `{1,1}`) – Eugene Morozov May 22 '18 at 03:41
@EugeneMorozov `{1,1}` is not useless fluff. It means that it must match (and only match) one character between `a-z` or `0-9`. This is to stop having `-` at the end of domain names (which is invalid). – Tim Groeneveld May 22 '18 at 04:40
2

@timgws But any character or character class or group matches exactly one time even without `{1,1}`. It's like writing `h{1,1}i{1,1}` instead of `hi` - just makes reading harder. – Eugene Morozov May 23 '18 at 06:35
this regex accepts `1-800-321-43` as a domain name. Is it valid domain name (i'm seriously asking...)? – Filip Bartuzi Jun 17 '18 at 13:21
@FilipBartuzi it shouldn't match raw TLD's. I just checked and it is. Example: *.com* is a TLD, with `com.` being the correct root for the domain. 1-800-321-43.com is a valid domain, and I guess there is no technical reason why `.1-800-321-43` could not be a valid TLD – Tim Groeneveld Jun 18 '18 at 02:29
This regex does not match domains with latin extension - eg. `rębąśęd.pl` – Filip Bartuzi Jun 18 '18 at 09:28
@FilipBartuzi this is mentioned: `use an IDN library to ensure you encode domain names to ACE`. They are not actually valid domain names (DNS only supports a subset of letters and numbers, ACE/IDN is what makes these work). – Tim Groeneveld Jun 19 '18 at 03:22
@TimGroeneveld, your regex considers a normal string valid: iE 'ihave-no-dot'. `^(((?!\-))(xn\-\-)?[a-z0-9\-_]{0,61}[a-z0-9]{1,1}\.)*(xn\-\-)?([a-z0-9\-]{1,61}|[a-z0-9\-]{1,30})\.[a-z]{2,}$` fixes this - it closes the last `)` bracket before the dot `\.` and the TLD `[a-z]{2,}` – davegson Oct 08 '18 at 14:22
"-" it is the valid domain according to your regex – Roman Yakoviv Jun 11 '19 at 07:11
1

It should be noted that `xn--stackoverflow.com` is not a valid name as 'stackoverflow' can not be converted from Punycode. That however is beyond what a regex can do. As a general remark, `xn--[a-z0-9]+` labels would be IDN-only whereas `xn--[a-z0-9]+\-[a-z0-9]+` indicate a mix of ASCII- and non-ASCII characters – Marcus Oct 09 '19 at 10:30
@Marcus noted, it's more just that if xn--stackoverflow.com does not pass the validation, you can be almost guaranteed that the regex will not support IDNs – Tim Groeneveld Oct 11 '19 at 01:33

Cameron · Accepted Answer · 2012-04-24T22:20:13.313

58

Well, it's ~~pretty straightforward~~ a little sneakier than it looks (see comments), given your specific requirements:

/^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}$/

But note this will reject a lot of valid domains.

edited Apr 24 '12 at 22:20

answered Apr 24 '12 at 22:07

Cameron

86,330
19
177
216

Nice thanks this one seems to be working. What kind of domains won't pass validation do you know? – Dominic Apr 24 '12 at 22:13
@infensus: Well, anything with leading URL components attached (e.g. `http://example.com` or `user:pass@example.com`), though to be fair that's not actually part of the domain. Longer domains wouldn't be matched. But most importantly, domains containing sub-domains won't be matched. – Cameron Apr 24 '12 at 22:18
.museum is longer than 4 characters. And OP says he only has one rule for tlds. – sch Apr 24 '12 at 22:19
Thats alright I already removed the http:// type protocol and all subdomains from the string, though I haven't considered user:pass@example.com - will remove anything before an @. This is for http domain name lookup so I don't need any of that stuff! – Dominic Apr 24 '12 at 22:22
12

@infensus - While this regex is correct given your specs, your specs are wrong. `g.co` is a valid domain name but `g` is only one character. – sch Apr 24 '12 at 22:23
@sch thanks I read that wrong somewhere - Good to know! Now I am reading mixed things like it can be as long as you want, that most domain providers limit to 255, and still more that say the toplevel part can be a max of 63 – Dominic Apr 24 '12 at 22:28
3

This should match all cases I think: ^([a-z0-9])(([a-z0-9-]{1,61})?[a-z0-9]{1})?(\.[a-z0-9](([a-z0-9-]{1,61})?[a-z0-9]{1})?)?(\.[a-zA-Z]{2,4})+$ – transilvlad May 16 '13 at 16:38
@infensus: this regexp will reject bbc.co.uk: the 'subdomain' concept is totally wrong as you use it, since TLD means Top Domain Name - 'com' is a domain, 'google.com' a 'subdomain' – Iacopo Aug 27 '13 at 07:03
1

x.com would not pass here – Neil McGuigan Nov 06 '13 at 23:22
4

@Neil: You're right. The original question asked for 3-63 characters (see edit 3). It can be changed to support one-character domains fairly easily: `/^[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.[a-zA-Z]{2,}$/`. But this still rejects tons of valid stuff... – Cameron Nov 07 '13 at 01:02
@Cameron what valid domains does – Doktor J Nov 07 '14 at 04:51
Use ^[a-zA-Z0-9]+([a-zA-Z0-9-]+\.[a-zA-Z0-9-]+){1,61}[a-zA-Z0-9]+\.[a-zA-Z]{2,}$ for allowing sub domains and domain name starts www. I modified this to work with knockout validation. – Ninthu Dec 02 '14 at 06:58
@Nish That does not "allow" sub domains, it requires them. – Kehlan Krumme Jun 22 '15 at 21:13
please elaborate more on what you mean by `But note this will reject a lot of valid domains.` – Filip Bartuzi Jun 16 '18 at 17:45
This doesn't have an exclusion for double-dashes in the domain name, which I believe aren't allowed in the RFC, outside of punycode. – Rohaq Aug 07 '18 at 13:04
this doesn't even support `.co.uk` tld either. – stardust4891 Mar 18 '19 at 02:25
localhost doesn't match but it's a valid domain name. – mastazi Feb 04 '20 at 01:45
@mastazi: Yes, as I wrote in the answer "But note this will reject a lot of valid domains." This was an answer to a very specific question, that unfortunately gets undue search traffic thanks to the generic title, but is a terrible way to validate domains. – Cameron Feb 04 '20 at 17:08

score 51 · Answer 3 · edited Jan 04 '17 at 21:34

51

My RegEx is next:

^[a-zA-Z0-9][a-zA-Z0-9-_]{0,61}[a-zA-Z0-9]{0,1}\.([a-zA-Z]{1,6}|[a-zA-Z0-9-]{1,30}\.[a-zA-Z]{2,3})$

it's ok for i.oh1.me and for wow.british-library.uk

UPD

Here is updated rule

^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$

Regular expression visualization

https://www.debuggex.com/r/y4Xe_hDVO11bv1DV

now it check for - or _ in the start or end of domain label.

edited Jan 04 '17 at 21:34

LukeP

10,276
6
25
45

answered Nov 18 '13 at 11:45

paka

1,521
21
35

9

Looks pretty good, but the `{2,6}` criteria will need to be updated for the new TLD. Probably `{2,}`. – jwatts1980 Mar 12 '14 at 14:42
@jwatts1980 is there an examples of such zones? or you mean for possible future zones? – paka Mar 13 '14 at 13:42
1

Here is an article discussing the upcoming changes with examples and links to related resources: http://www.zdnet.com/ready-or-not-here-come-the-new-internet-top-level-domain-names-7000025744/ – jwatts1980 Mar 13 '14 at 15:26
@behz4d can you please provide an example? – paka Apr 23 '14 at 09:01
@behz4d it does't accepts for me nor in that regexp service (debuggex) nor in my html/js input fields – paka Apr 23 '14 at 09:19
1

Why ([a-zA-Z]{1}[a-zA-Z]{1}) and not ([a-zA-Z]{2})? – Anton Dec 17 '14 at 21:47
fails if you provide an ip – Marek R Dec 18 '14 at 18:17
@Anton yeah, you are right, for code clean uping, it's better to use [a-zA-Z]{2} =) – paka Jan 02 '15 at 15:32
@MarekR, yep, as ip's is out of scope here – paka Jan 02 '15 at 15:33
Unfortunately, this misses a lot of official TLD names, like .audio etc. Here is the list of official TLD's. http://data.iana.org/TLD/tlds-alpha-by-domain.txt – mrbinky3000 Apr 15 '15 at 15:34
4

the last part with the two alternatives is also wrong: there exists ccTLDs (two letters) that accept IDNA sublabels. There also exists now TLDs labels already using IDNA labels. You should not special case the last label which is not different from others (and now has many extensions added with variable lengths, jsut like all other labels in subdomains. note the IDNA labels may also appear Punycoded (in which case there will be "--" a segment in the label, the only case where "--" is allowed in labels.. Finally the underscore is invalid everywhere in all labels. – verdy_p Dec 06 '15 at 02:45
Couldn't the 'non-TLD' portion of this updated regex be simplified to `([a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{0,61}[a-zA-Z0-9])`; Thus the whole regex could be simplified to `^(([a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{0,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$` – FCTW Oct 31 '16 at 23:02
@DanielDai it should not, as it designed to work with top level domain names and not sub-domains – paka Nov 02 '16 at 16:08
`www.japantimes.co.jp` doesn't match but `www.google.com` does? – RoyalTS Jan 23 '17 at 00:28
@RoyalTS validate domains, but not subdomains. You have to strip out `www` before validating – paka Jan 23 '17 at 10:33
@verdy_p Are you sure about the underscore or are you possibly mixing it up with possible hostname characters? – phk Mar 27 '17 at 14:13

Yaroslav Stavnichiy · Answer 4 · 2015-08-12T17:05:15.550

27

My bet:

^(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]$

Explained:

Domain name is built from segments. Here is one segment (except final):

[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?

It can have 1-63 characters, does not start or end with '-'.

Now append '.' to it and repeat at least one time:

(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+

Then attach final segment, which is 2-63 characters long:

[a-z0-9][a-z0-9-]{0,61}[a-z0-9]

Test it here: http://regexr.com/3au3g

edited Aug 12 '15 at 17:05

answered May 02 '15 at 21:50

Yaroslav Stavnichiy

18,503
6
46
51

@GaneshBabu What do you mean by exact matches? – Yaroslav Stavnichiy Dec 15 '16 at 11:16
1

All other answers didn't worked for me but this one did. – Danny Coulombe Jan 16 '18 at 15:47
I had a similar requirement where I want to avoid semicolon and comma at the end I tried a lot but no success below is the Regex I am using const regexDomain = /^(?:[A-Za-z0-9](?:[A-Za-z0-9-]{0,61}[A-Za-z0-9])?\.)+[A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9]/g; Well it validates if I use , and ; in between but fails at the end to vliadate. – Harry Sep 10 '18 at 13:43
I found several domains that should be valid but are invalid with your regex. For example редбулл.москва is a valid domain or also редбулл.рф and 红色的公牛.中国 – pubkey Jun 15 '20 at 12:01
1

@pubkey, you need to convert those domain names to [punycode](https://en.wikipedia.org/wiki/Punycode). Actual name for редбулл.москва is xn--90afc0aazy.xn--80adxhks And my regex does match it. – Yaroslav Stavnichiy Jun 16 '20 at 17:19
1

This really is the best regex that doens't go off the rails. It handles single character labels, it can handle IDN domains (converted to punycode), and has no absurd length requirement on TLD. I think you would be hardpressed to find a domain which it does not match. The only thing it doens't enforce is the max length of a domain (253 characters) However a simple length check could easily be used with the regex. – Nicholi Aug 24 '20 at 18:09

score 15 · Answer 5 · edited May 23 '17 at 12:26

This answer is for domain names (including service RRs), not host names (like an email hostname).

^(?=.{1,253}\.?$)(?:(?!-|[^.]+_)[A-Za-z0-9-_]{1,63}(?<!-)(?:\.|$)){2,}$

It is basically mkyong's answer and additionally:

Max length of 255 octets including length prefixes and null root.
Allow trailing '.' for explicit dns root.
Allow leading '_' for service domain RRs, (bugs: doesn't enforce 15 char max for _ labels, nor does it require at least one domain above service RRs)
Matches all possible TLDs.
Doesn't capture subdomain labels.

By Parts

Lookahead, limit max length between ^$ to 253 characters with optional trailing literal '.'

(?=.{1,253}\.?$)

Lookahead, next character is not a '-' and no '_' follows any characters before the next '.'. That is to say, enforce that the first character of a label isn't a '-' and only the first character may be a '_'.

(?!-|[^.]+_)

Between 1 and 63 of the allowed characters per label.

[A-Za-z0-9-_]{1,63}

Lookbehind, previous character not '-'. That is to say, enforce that the last character of a label isn't a '-'.

(?<!-)

Force a '.' at the end of every label except the last, where it is optional.

(?:\.|$)

Mostly combined from above, this requires at least two domain levels, which is not quite correct, but usually a reasonable assumption. Change from {2,} to + if you want to allow TLDs or unqualified relative subdomains through (eg, localhost, myrouter, to.)

(?:(?!-|[^.]+_)[A-Za-z0-9-_]{1,63}(?<!-)(?:\.|$)){2,}

Unit tests for this expression.

Thanks! This is the best regex here. Your thorough explanation and unit test are a bonus. — naudster, Mar 27 '17 at 02:59
Resource Record. Usually a text or informational field that tells you how to interact with a service. — Andrew Domaszek, Dec 04 '17 at 16:55
This regex is not correct. For example the domain redbull.移动 is valid but the regex will not match. — pubkey, Jun 15 '20 at 12:03
Convert to punycode first, then match. Length limits on the pre-punycode version are really hard to implement. — Andrew Domaszek, Jun 15 '20 at 12:06

score 14 · Answer 6 · answered Sep 08 '14 at 04:33

14

Accepted answer not working for me, try this :

^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\.)+[A-Za-z]{2,6}$

Visit this Unit Test Cases for validation.

answered Sep 08 '14 at 04:33

mkyong

1,869
1
17
14

4

no support for new longer TLD names like .audio, .photography, and most of these... http://data.iana.org/TLD/tlds-alpha-by-domain.txt – mrbinky3000 Apr 15 '15 at 15:35
@mrbinky3000 Just change the last `{2,6}` to something else and it'll work. Mine: `^((?!-)[a-zA-Z0-9-]{1,63}(? – Mygod Jan 06 '17 at 02:24
@Mygod your regex contains some zero width garbage past the last question mark, so anyone copying it will be unpleasantly surprised – MightyPork May 14 '17 at 17:58
1

@MightyPork You're right! Sorry here's a (hopefully) clean version: `^((?!-)[a-zA-Z0-9-]{1,63}(? – Mygod May 15 '17 at 03:00
Very nice. Alas, lookbehind expressions are not valid in JavaScript. :/ – PhiLho Dec 06 '18 at 10:07
So I changed it to `/^(?:(?!-)[a-z0-9-]{0,62}[a-z0-9]\.)+[a-z]{2,}$/i`.Slightly less elegant, but does the job. – PhiLho Dec 06 '18 at 10:28
I like this one: it's fairly simple and short, well-explained (even if one has to go to another page for the explanation), allows potential future options (if they come up with some xn-- like thing again)... the only thing is the TLD restriction, that should in 2019 definitely be more than 6 in length, and maybe we'll have IDN TLDs some day, so I'd just add a dot to the user's input and remove the TLD matching part (but leave the `$`). – Luc Mar 07 '19 at 10:10
I changed the last {2,6} to allow a longer TLD (ex. Photography) and this works great for all my use-cases. – technonaut Dec 29 '20 at 19:54

score 13 · Answer 7 · answered Jun 04 '13 at 15:45

13

Just a minor correction - the last part should be up to 6. Hence,

^[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6}$

The longest TLD is museum (6 chars) - http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains

answered Jun 04 '13 at 15:45

ahadinyoto

163
1
4

3

Note: This will not pass the valid (yet rare) domain name www.my---domain.com – Chris Bier Sep 17 '13 at 21:35
17

Doesn't cut it with new TLD e.g. `.photography` – Sam Figueroa Mar 12 '14 at 10:57
2

@SamFigueroa You'll just have to modify the length of it – Steel Brain Jul 05 '15 at 11:11
`.museum` is no longer the longest TLD. – Quinn Comendant Aug 07 '15 at 04:53
what about existing .consulting (10 chars)?? – Heitor Nov 17 '15 at 05:33
3

there shouldn't be a check for the TLD it's not different from the subdomains. And basing the regex on currently `available` tlds isn't future proof. – Loïc Faure-Lacroix Apr 04 '16 at 14:49
1

Suggest last bit be `{2,63}`: see https://stackoverflow.com/questions/9238640/how-long-can-a-tld-possibly-be – Eric Dobbs Jan 25 '19 at 16:02

Chris · Answer 8 · 2017-11-20T22:17:53.607

9

^[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,7}$

[domain - lower case letters and 0-9 only] [can have a hyphen] + [TLD - lower case only, must be beween 2 and 7 letters long]
http://rubular.com/ is brilliant for testing regular expressions!
Edit: Updated TLD maximum to 7 characters for '.rentals' as Dan Caddigan pointed out.

edited Nov 20 '17 at 22:17

answered May 23 '13 at 13:27

Chris

982
8
19

1

Why limit TLDs? Now `.photography` would be invalid. Just make it unlimited chars or something like that. – adriaan Aug 13 '18 at 17:52

PeterM · Answer 9 · 2017-04-20T14:25:30.960

8

Thank you for pointing right direction in domain name validation solutions in other answers. Domain names could be validated in various ways.

If you need to validate IDN domain in it's human readable form, regex \p{L} will help. This allows to match any character in any language.

Note that last part might contain hyphens too! As punycode encoded Chineese names might have unicode characters in tld.

I've came to solution which will match for example:

google.com
masełkowski.pl
maselkowski.pl
m.maselkowski.pl
www.masełkowski.pl.com
xn--masekowski-d0b.pl
中国互联网络信息中心.中国
xn--fiqa61au8b7zsevnm8ak20mc4a87e.xn--fiqs8s

Regex is:

^[0-9\p{L}][0-9\p{L}-\.]{1,61}[0-9\p{L}]\.[0-9\p{L}][\p{L}-]*[0-9\p{L}]+$

Check and tune here

NOTE: This regexp is quite permissive, as is current domain names allowed character set.

UPDATE: Even more simplified, as a-aA-Z\p{L} is same as just \p{L}

NOTE2: The only problem is that it will match domains with double dots in it... , like masełk..owski.pl. If anyone know how to fix this please improve.

edited Apr 20 '17 at 14:25

answered Jul 20 '16 at 09:46

PeterM

1,227
1
19
25

We can just use `[:alpha:]` and `[:digit]` instead of `\p{L}`. It works fine. – puchu Apr 26 '18 at 21:38
You can't validate an IDN this way without first converting it to punycode. For example with your expr, `中国互联网络信息中心中国互联网络信息中心中国互联网络信.中国` checks as valid, but after IDN conversion, it's too many bytes per label. \p{L} matches symbols, not punycode bytes (which vary from symbol to symbol), so repeat count is unhelpful when trying to limit its post-conversion size. – Andrew Domaszek Nov 02 '18 at 18:36
Good point, each part is limited to 64 bytes. However we can't check it with RegExp, so further validation steps are required using punycode decoder - which will fail with your example hostname. The chineese must be mad by this limitation. – PeterM Nov 02 '18 at 18:52

zaTricky · Answer 10 · 2014-08-13T09:30:48.553

Not enough rep yet to comment. In response to paka's solution, I found I needed to adjust three items:

The dash and underscore were moved due to the dash being interpreted as a range (as in "0-9")
Added a full stop for domain names with many subdomains
Extended the potential length for the TLDs to 13

Before:

^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$

After:

^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][-_\.a-zA-Z0-9]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,13}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$

thisismydesign · Answer 11 · 2020-01-09T12:26:30.340

As already pointed out it's not obvious to tell subdomains in the practical sense (e.g. .co.uk domains). We use this regex to validate domains which occur in the wild. It covers all practical use cases I know of. New ones are welcome. According to our guidelines it avoids non-capturing groups and greedy matching.

^(?!.*?_.*?)(?!(?:[\d\w]+?\.)?\-[\w\d\.\-]*?)(?![\w\d]+?\-\.(?:[\d\w\.\-]+?))(?=[\w\d])(?=[\w\d\.\-]*?\.+[\w\d\.\-]*?)(?![\w\d\.\-]{254})(?!(?:\.?[\w\d\-\.]*?[\w\d\-]{64,}\.)+?)[\w\d\.\-]+?(?<![\w\d\-\.]*?\.[\d]+?)(?<=[\w\d\-]{2,})(?<![\w\d\-]{25})$

Proof, explanation and examples: https://regex101.com/r/FLA9Bv/9 (Note: currently only works in Chrome because the regex uses lookbehinds which are only supported in ECMA2018)

There're two approaches to choose from when validating domains.

By-the-books FQDN matching (theoretical definition, rarely encountered in practice):

max 253 character long (as per RFC-1035/3.1, RFC-2181/11)
max 63 character long per label (as per RFC-1035/3.1, RFC-2181/11)
any characters are allowed (as per RFC-2181/11)
TLDs cannot be all-numeric (as per RFC-3696/2)
FQDNs can be written in a complete form, which includes the root zone (the trailing dot)

Practical / conservative FQDN matching (practical definition, expected and supported in practice):

by-the-books matching with the following exceptions/additions
valid characters: [a-zA-Z0-9.-]
labels cannot start or end with hyphens (as per RFC-952 and RFC-1123/2.1)
TLD min length is 2 character, max length is 24 character as per currently existing records
don't match trailing dot

You're right, thanks for pointing that out. Can't promise to look into it now but happy to take suggestions. — thisismydesign, Apr 15 '21 at 20:11

score 3 · Answer 12 · edited Oct 20 '17 at 09:51

3

For new gTLDs

/^((?!-)[\p{L}\p{N}-]+(?<!-)\.)+[\p{L}\p{N}]{2,}$/iu

edited Oct 20 '17 at 09:51

Paulo Freitas

11,380
13
68
93

answered Mar 11 '16 at 09:14

Ben Keil

598
6
22

2

Please give us some more details what you answer make better than the others? What do you match more?Please edit your post directly to add the information. – Sven R. Mar 11 '16 at 09:33
Like i wrote: new gTLDs. Domains with unicode chars and also unicode TLDs. – Ben Keil Jul 19 '16 at 07:58
1

@BenKeil: What is this part about: (? – jor Jan 13 '17 at 10:32
@jor that is negative look behind. Check this out https://www.shortcutfoo.com/app/dojos/regex/cheatsheet – Muhammad Faizan Mar 29 '18 at 08:20

user unknown · Answer 13 · 2014-11-18T06:17:50.977

2

^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]+(\.[a-zA-Z]+)$

edited Nov 18 '14 at 06:17

answered Apr 24 '12 at 22:10

user unknown

32,929
11
72
115

5

-1 for the addition of {2,4}. It's possible to have single character TLDs (however, there are not currently any in the root). What about .mobile? .associates? Both are valid TLDs, and would be rejected by this regex. http://data.iana.org/TLD/tlds-alpha-by-domain.txt – Tim Groeneveld Nov 18 '14 at 05:52

score 2 · Answer 14 · edited Oct 20 '17 at 09:53

2

^((localhost)|((?!-)[A-Za-z0-9-]{1,63}(?<!-)\.)+[A-Za-z]{2,253})$

Thank you @mkyong for the basis for my answer. I've modified it to support longer acceptable labels.

Also, "localhost" is technically a valid domain name. I will modify this answer to accommodate internationalized domain names.

edited Oct 20 '17 at 09:53

Paulo Freitas

11,380
13
68
93

answered Aug 05 '15 at 02:54

Nate Watson

31
1

score 2 · Answer 15 · answered Jun 27 '17 at 12:05

Here is complete code with example:

<?php
function is_domain($url)
{
    $parse = parse_url($url);
    if (isset($parse['host'])) {
        $domain = $parse['host'];
    } else {
        $domain = $url;
    }

    return preg_match('/^(?!\-)(?:[a-zA-Z\d\-]{0,62}[a-zA-Z\d]\.){1,126}(?!\d+)[a-zA-Z\d]{1,63}$/', $domain);
}

echo is_domain('example.com'); //true
echo is_domain('https://example.com'); //true
echo is_domain('https://.example.com'); //false
echo is_domain('https://localhost'); //false

score 1 · Answer 16 · answered Jun 12 '15 at 13:43

^[a-zA-Z0-9][-a-zA-Z0-9]+[a-zA-Z0-9].[a-z]{2,3}(.[a-z]{2,3})?(.[a-z]{2,3})?$

Examples that work:

stack.com
sta-ck.com
sta---ck.com
9sta--ck.com
sta--ck9.com
stack99.com
99stack.com
sta99ck.com

It will also work for extensions

.com.uk
.co.in
.uk.edu.in

Examples that will not work:

-stack.com

it will work even with the longest domain extension ".versicherung"

score 1 · Answer 17 · answered Nov 03 '20 at 02:27

Quite simple, quite permissive. It will have false positives like -notvalid.at-all, but it won't have false negatives.

/^([0-9a-z-]+\.?)+$/i

It makes sure it has a sequence of letters numbers and dashes that could end with a dot, and following it, any number of those kind of sequences.

The things I like about this regexp: it's short (maybe the shortest here), easily understandable, and good enough for validating user input errors in the client side.

score 0 · Answer 18 · edited Apr 02 '15 at 10:55

/^((([a-zA-Z]{1,2})|([0-9]{1,2})|([a-zA-Z0-9]{1,2})|([a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]))\.)+[a-zA-Z]{2,6}$/

([a-zA-Z]{1,2}) -> for accepting only two characters.
([0-9]{1,2})-> for accepting two numbers only

if anything exceeds beyond two ([a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]) this regex will take care of that.

If we want to do the matching for at least one time + will be used.

score 0 · Answer 19 · answered Dec 10 '19 at 13:06

0

^((?!-))(xn--)?[a-z0-9][a-z0-9-_]{0,61}[a-z0-9]{0,}\.?((xn--)?([a-z0-9\-.]{1,61}|[a-z0-9-]{0,30})\.[a-z-1-9]{2,})$

will validate such domains as яндекс.рф after encoding.

https://regex101.com/r/Hf8wFM/1 - sandbox

answered Dec 10 '19 at 13:06

Danila Kulakov

832
1
10
18

landen99 · Answer 20 · 2019-12-30T20:31:59.297

The following regex extracts the sub, root and tld of a given domain:

^(?<domain>(?<domain_sub>(?:[^\/\"\]:\.\s\|\-][^\/\"\]:\.\s\|]*?\.)*?)(?<domain_root>[^\/\"\]:\s\.\|\n]+\.(?<domain_tld>(?:xn--)?[\w-]{2,7}(?:\.[a-zA-Z-]{2,3})*)))$

Tested for the following domains:

* stack.com
* sta-ck.com
* sta---ck.com
* 9sta--ck.com
* sta--ck9.com
* stack99.com
* 99stack.com
* sta99ck.com
* google.com.uk
* google.co.in

* google.com
* masełkowski.pl
* maselkowski.pl
* m.maselkowski.pl
* www.masełkowski.pl.com
* xn--masekowski-d0b.pl
* xn--fiqa61au8b7zsevnm8ak20mc4a87e.xn--fiqs8s

* xn--stackoverflow.com
* stackoverflow.xn--com
* stackoverflow.co.uk

score 0 · Answer 21 · answered Jul 25 '20 at 07:56

I did the below to simple fetch the domain along with the protocol. Example: https://www.facebook.com/profile/user/ ftp://182.282.34.337/movies/M

use the below Regex pattern : [a-zA-Z0-9]+://.*?/

will get you the output : https://www.facebook.com/ ftp://192.282.34.337/

score 0 · Answer 22 · answered Mar 03 '21 at 11:49

0

For Javascript you can have a look into the validator library: https://www.npmjs.com/package/validator

Method: isFQDN(str [, options])

answered Mar 03 '21 at 11:49

Sebastian Thees

2,034
2
10
19

What is a regular expression which will match a valid domain name without a subdomain?

22 Answers22

This answer is for domain names (including service RRs), not host names (like an email hostname).

By Parts

Linked

Related